JP3478267B2

JP3478267B2 - Digital audio signal compression method and compression apparatus

Info

Publication number: JP3478267B2
Application number: JP2000387351A
Authority: JP
Inventors: 典雄鈴木
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2000-12-20
Filing date: 2000-12-20
Publication date: 2003-12-15
Anticipated expiration: 2020-12-20
Also published as: JP2002189499A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は、MPEG/Audio規
格、ATRAC規格、ドルビーディジタル規格等によるオー
ディオ信号圧縮において用いられるディジタルオーディ
オ信号圧縮方法および圧縮装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a digital audio signal compression method and a compression device used in audio signal compression according to the MPEG / Audio standard, ATRAC standard, Dolby digital standard and the like.

【０００２】[0002]

【従来の技術】図３は、MPEG(Moving Picture Coding E
xperts Group)/Audio規格によるディジタルオーディオ
信号圧縮回路の構成を示す回路図である。この図におい
て、入力ディジタルオーディオ信号Ｄａは所定のサンプ
ル数毎にブロック化（フレームと言う）され、２つのパ
スに分かれて処理される。まず、一方のパスにおけるフ
ィルタバンク１は、入力信号を等しい帯域幅を持つ３２
バンドのサブバンド信号に分割する。この場合、各々の
サブバンド信号は１／３２のサンプリング周波数にダウ
ンサンプルされる。スケールファクタ抽出・正規化回路
２は、１フレームにおける各々のサブバンド信号に対
し、最大絶対値を持つサンプルを検出する。その値を対
数に変換し量子化したものをスケールファクタと呼ぶ。
そして、このスケールファクタによって各サブバンドサ
ンプルを除算し、それらの値を±１の範囲内に正規化す
る。2. Description of the Related Art FIG. 3 shows MPEG (Moving Picture Coding E).
FIG. 3 is a circuit diagram showing a configuration of a digital audio signal compression circuit according to the xperts Group) / Audio standard. In this figure, the input digital audio signal Da is divided into blocks (called frames) for each predetermined number of samples, and divided into two paths for processing. First, the filter bank 1 in one path has an input signal of equal bandwidth 32
The band is divided into subband signals. In this case, each subband signal is downsampled to a sampling frequency of 1/32. The scale factor extraction / normalization circuit 2 detects the sample having the maximum absolute value for each subband signal in one frame. A value obtained by converting the value into a logarithm and quantizing the value is called a scale factor.
Each subband sample is then divided by this scale factor and their values are normalized to within ± 1.

【０００３】一方、心理聴覚分析部３は、ＦＦＴ（高速
フーリエ変換）による周波数スペクトルの計算を行い、
それよって得られた各周波数データに基づき各サブバン
ド毎のマスキングしきい値を計算して出力する。ビット
割当部４は心理聴覚分析部３の出力と、ビットレートで
決まる１フレームで使用可能なビット数の制限の下で反
復ループ処理により各サブバンド毎の量子化ビット数を
決定する。量子化部５は、各サブバンド毎に設定された
量子化ビット数でスケールファクタ抽出・正規化回路２
から出力されるサブバンド信号を量子化する。ビットス
トリーム生成部６は、量子化されたサブバンドサンプ
ル、各サブバンドに対するビット割当情報およびスケー
ルファクタをマルチプレックスし、それにヘッダを付け
てビットストリームを作成し出力する。On the other hand, the psychoacoustic analysis unit 3 calculates the frequency spectrum by FFT (Fast Fourier Transform),
A masking threshold value for each sub-band is calculated and output based on each frequency data thus obtained. The bit allocation unit 4 determines the number of quantization bits for each subband by iterative loop processing under the limitation of the number of bits that can be used in one frame determined by the output of the psychoacoustic analysis unit 3 and the bit rate. The quantizer 5 uses the scale factor extraction / normalization circuit 2 with the number of quantization bits set for each subband.
Quantize the subband signal output from. The bitstream generation unit 6 multiplexes the quantized subband samples, the bit allocation information for each subband, and the scale factor, attaches a header thereto, and creates and outputs a bitstream.

【０００４】次に、従来の心理聴覚分析部３における処
理手順の一例を説明する。なお、以下に説明する手順
は、ISO/IEC 11172-3による心理聴覚モデルにおけるMOD
EL1の手順である。 (1)ＦＦＴによって周波数特性を求め、５１２の周波数
データを得る。 (2)３２のサブバンドの各音圧レベルを求める。 (3)絶対しきい値を決定する。 (4)音として聞こえる周波数（マスカー）を選ぶ。 (5)マスカーを減らす。 (6)個々のマスクしきい値を計算する。 (7)大域的マスクしきい値を計算する。 (8)各サブバンドの最小マスクしきい値を決定する。 (9)各サブバンドの信号対マスク比（ＳＭＲ）を計算す
る。そして、上記ＳＭＲがビット割当情報としてビット
割当部４へ出力される。Next, an example of a processing procedure in the conventional psychoacoustic analysis unit 3 will be described. The procedure described below is based on the MOD in the psychoacoustic model according to ISO / IEC 11172-3.
It is the procedure of EL1. (1) Obtain frequency characteristics by FFT and obtain 512 frequency data. (2) Obtain each sound pressure level of 32 sub-bands. (3) Determine the absolute threshold. (4) Select the frequency (masker) that can be heard as a sound. (5) Reduce the masker. (6) Calculate individual mask thresholds. (7) Calculate the global mask threshold. (8) Determine the minimum mask threshold for each subband. (9) Calculate the signal-to-mask ratio (SMR) for each subband. Then, the SMR is output to the bit allocation unit 4 as bit allocation information.

【０００５】[0005]

【発明が解決しようとする課題】ところで、上述した従
来の心理聴覚分析部３における処理は、計算に時間がか
かる欠点があった。特に、上記(6)、(7)の計算に時間が
かかっていた。例えば、上記(7)の計算は、By the way, the above-described processing in the psychoacoustic analysis unit 3 has a drawback that the calculation takes time. In particular, calculation of the above (6) and (7) took a long time. For example, the calculation in (7) above

【数１】なる式の計算であり、ｌｏｇの計算を除外したとして
も、ｉは１〜約１３０であり、ｍ、ｎは音の数とノイズ
の数で１０〜２０程度あり、したがって、ＬＴtmとＬＴ
nmをそれぞれ１０００回以上計算しなくてはならない。
ＬＴtmとＬＴnmは３項の和で表現されるが、その内２項
は１次関数となっており、計算に時間がかかる。[Equation 1] Even if the calculation of log is excluded, i is 1 to about 130, m and n are the numbers of sounds and noises of about 10 to 20, and therefore LTtm and LT
Each nm must be calculated 1000 times or more.
LTtm and LTnm are expressed by the sum of three terms, but two of them are linear functions and it takes time to calculate.

【０００６】この発明は、このような事情を考慮してな
されたもので、その目的は、計算時間を従来より大幅に
短縮することができるディジタルオーディオ信号圧縮方
法および圧縮装置を提供することにある。The present invention has been made in consideration of such circumstances, and an object thereof is to provide a digital audio signal compression method and a compression apparatus capable of significantly reducing the calculation time as compared with the conventional method. .

【０００７】[0007]

【課題を解決するための手段】この発明は上記の課題を
解決すべくなされたもので、請求項１に記載の発明は、
入力ディジタルオーディオ信号に対して複数のサブバン
ドに周波数分割すると共に、心理聴覚分析処理を行い、
該心理聴覚分析処理の結果に従って各サブバンドのビッ
ト割り当てを行い、ビット割り当てに従って各サブバン
ドの信号を量子化して出力するディジタルオーディオ信
号圧縮方法において、前記心理聴覚分析処理は、入力デ
ィジタルオーディオ信号を周波数解析して周波数成分に
変換した後、各サブバンド毎の周波数成分の最大値を検
出し、検出した最大値にそのサブバンドの周波数帯域に
応じた重み付けを行い、重み付けを行った各最大値を用
いて各サブバンド毎の前記周波数成分の分散を計算する
処理であることを特徴とするディジタルオーディオ信号
圧縮方法である。The present invention has been made to solve the above problems, and the invention according to claim 1 is
The input digital audio signal is frequency-divided into multiple sub-bands, and psychoacoustic analysis processing is performed.
In the digital audio signal compression method of allocating bits of each subband according to the result of the psychoacoustic analysis process and quantizing and outputting the signal of each subband according to the bit allocation, the psychoacoustic analysis process processes the input digital audio signal. After frequency analysis and conversion into frequency components, the maximum value of frequency components for each subband is detected.
The detected maximum value is weighted according to the frequency band of that subband, and the variance of the frequency component of each subband is calculated using each weighted maximum value. It is a digital audio signal compression method.

【０００８】また、請求項２に記載の発明は、請求項
１に記載のディジタルオーディオ信号圧縮方法におい
て、前記心理聴覚分析処理は、前記分散を計算した後、
該計算によって得られた値に、予め求められている聴覚
感度に対応するデータを演算することを特徴とする。請
求項３に記載の発明は、入力ディジタルオーディオ信号
に対して複数のサブバンドに周波数分割すると共に、心
理聴覚分析処理を行う心理聴覚分析手段と、該心理聴覚
分析処理の結果に従って各サブバンドのビット割り当て
を行うビット割り当て手段と、該ビット割り当てに従っ
て各サブバンドの信号を量子化して出力する量子化手段
とを具備するディジタルオーディオ信号圧縮装置におい
て、前記心理聴覚分析手段は、入力ディジタルオーディ
オ信号を周波数解析して周波数成分に変換する第１の手
段と、各サブバンド毎の周波数成分の最大値を検出し、
検出した最大値にそのサブバンドの周波数帯域に応じた
重み付けを行う第２の手段と、重み付けを行った各最大
値を用いて各サブバンド毎の前記周波数成分の分散を計
算する第３の手段とを具備することを特徴とするディジ
タルオーディオ信号圧縮装置である。Further, the invention according to claim 2 is the digital audio signal compression method according to claim 1, wherein the psychoacoustic analysis processing calculates the variance,
The value obtained by the said calculation, characterized by calculating the data corresponding to the auditory sensitivity obtained in advance. The invention according to claim 3 provides an input digital audio signal.
A frequency division into a plurality of subbands , a psychoacoustic analysis means for performing a psychoacoustic analysis processing, a bit allocation means for allocating bits of each subband according to the result of the psychoacoustic analysis processing, and a bit allocation according to the bit allocation. In a digital audio signal compression apparatus comprising a quantizing means for quantizing and outputting a signal of each sub-band, the psychoacoustic analyzing means frequency-analyzes the input digital audio signal and converts it into frequency components. And, detect the maximum value of the frequency component for each sub-band,
Second means for weighting the detected maximum value according to the frequency band of the subband, and third means for calculating the variance of the frequency component for each subband using each weighted maximum value. And a digital audio signal compression device.

【０００９】また、請求項４に記載の発明は、請求項
３に記載のディジタルオーディオ信号圧縮装置におい
て、前記心理聴覚分析手段は、前記第３の手段による計
算によって得られた分散の値に、予め求められている聴
覚感度に対応するデータを演算する第４の手段をさらに
具備することを特徴とする。[0009] The invention of claim 4, claim
In the digital audio signal compressing device according to the third aspect, the psycho-acoustic analysis means calculates a value corresponding to a variance obtained by the calculation by the third means, the data corresponding to a previously-obtained auditory sensitivity. Is further provided.

【００１０】[0010]

【発明の実施の形態】以下、図面を参照し、この発明の
一実施の形態について説明する。この実施形態による圧
縮方法を適用した圧縮回路は図３と同じであり、以下に
説明する処理手順は、図３の心理聴覚分析部３において
用いられる。図１は同実施の形態による圧縮方法を説明
するためのフローチャートである。以下、各ステップＳ
１〜Ｓ６を順次説明する。なお、以下の説明において
は、サブバンドの数を３２、各サブバンド内の周波数デ
ータの数を１６とする。BEST MODE FOR CARRYING OUT THE INVENTION An embodiment of the present invention will be described below with reference to the drawings. The compression circuit to which the compression method according to this embodiment is applied is the same as that in FIG. 3, and the processing procedure described below is used in the psychoacoustic analysis unit 3 in FIG. FIG. 1 is a flow chart for explaining the compression method according to the embodiment. Below, each step S
1 to S6 will be sequentially described. In the following description, the number of subbands is 32, and the number of frequency data in each subband is 16.

【００１１】○ステップＳ１（周波数解析）入力されるディジタルオーディオデータ（実数）に対
し、ＦＦＴ処理（ステップＳ１に記載される式の演算）
を行い、MPEG/Audio Layer2(MP2)の場合、周波数データ
Ｆ（ｋ）として、１０２４データを得る。この周波数デ
ータＦ（ｋ）は複素数データであり、実数入力のＦＦＴ
の対象性により、意味があるのは５１２データである。
なお、図の式においてｊは虚数単位である。Step S1 (frequency analysis) FFT processing (calculation of the formula described in step S1) on the input digital audio data (real number)
In the case of MPEG / Audio Layer2 (MP2), 1024 data is obtained as frequency data F (k). This frequency data F (k) is complex number data, and FFT of real number input
Depending on the symmetry of, 512 data is meaningful.
It should be noted that j in the formula in the figure is an imaginary unit.

【００１２】○ステップＳ２（音圧レベル測定）各周波数データＦ（ｋ）の実部と虚部の二乗の和、すな
わちＦ（ｋ）の絶対値の二乗Ｐ（ｋ）を求める。この値
Ｐ（ｋ）が音圧レベルに相当する。 ○ステップＳ３（平均値の測定）上述した５１２の音圧データを１６データ毎に３２のサ
ブバンド（帯域）に分ける。そして、各サブバンド毎
に、音圧レベルＰ（ｋ）の平均値Ｅ（ｓｂ）を求める。
なお、ｓｂはサブバンド番号であり、低音側から０〜３
１が割り当てられている。例えば、ｓｂ＝３のサブバン
ドには、Ｐ（４８）〜Ｐ（６３）の音圧レベルが含まれ
る。Step S2 (Sound Pressure Level Measurement) The sum of the squares of the real part and the imaginary part of each frequency data F (k), that is, the square of the absolute value P (k) of F (k) is obtained. This value P (k) corresponds to the sound pressure level. Step S3 (measurement of average value) The sound pressure data of 512 described above is divided into 32 sub-bands (bands) every 16 data. Then, the average value E (sb) of the sound pressure level P (k) is obtained for each subband.
In addition, sb is a sub-band number, and is 0 to 3 from the low tone side
1 is assigned. For example, the subband of sb = 3 includes sound pressure levels P (48) to P (63).

【００１３】○ステップＳ４（最大値の重み付け）各サブバンド毎に音圧レベルＰ（ｋ）の最大値Ｐ’
（ｋ）を検出し、検出した音圧レベルＰ’（ｋ）にsqrt
（３３／（ｓｂ＋１））｛sqrt：二乗根｝なる値を乗算
することによって、重み付けした音圧レベルＰ’（ｋ）
を得る。 ○ステップＳ５（分散の計算）各サブバンド毎に、上述したステップＳ２〜Ｓ４の演算
結果を用いて、音圧レベルＰ（ｋ）の分散Ｖ（ｓｂ）を
計算する。ここで、各帯域の最大音圧レベルＰ（ｋ）に
ついては、上記重み付けした音圧レベルＰ’（ｋ）を用
いる。Step S4 (weighting of maximum value) The maximum value P ′ of the sound pressure level P (k) for each subband
(K) is detected, and sqrt is added to the detected sound pressure level P ′ (k).
The weighted sound pressure level P ′ (k) is obtained by multiplying the value by (33 / (sb + 1)) {sqrt: square root}.
To get Step S5 (calculation of variance) For each subband, the variance V (sb) of the sound pressure level P (k) is calculated using the calculation results of steps S2 to S4 described above. Here, the weighted sound pressure level P ′ (k) is used as the maximum sound pressure level P (k) in each band.

【００１４】○ステップＳ６（ＳＭＲの計算）各サブバンドのＳＭＲを計算する。すなわち、各サブバ
ンドについて、分散Ｖ（ｓｂ）の対数をとり、２．５倍
し、その値から聴覚感度データＱ’（ｓｂ）の０．５倍
を減算する。ここで、聴覚感度データＱ’（ｓｂ）と
は、人間の耳の聴覚感度曲線に対応したデータであり、
予めメモリ内に記憶させておく。図２に感度聴覚データ
の一例を示す。この図において「Ｆｓ」は、アナログオ
ーディオ信号をディジタルオーディオ信号に変換する際
のサンプリング周波数である。Step S6 (calculation of SMR) The SMR of each subband is calculated. That is, the logarithm of the variance V (sb) is taken for each subband, multiplied by 2.5, and 0.5 times the auditory sensitivity data Q ′ (sb) is subtracted from the value. Here, the hearing sensitivity data Q ′ (sb) is data corresponding to the hearing sensitivity curve of the human ear,
It is stored in the memory in advance. FIG. 2 shows an example of the sensitivity auditory data. In this figure, "Fs" is a sampling frequency when converting an analog audio signal into a digital audio signal.

【００１５】以上がこの発明の実施形態による処理手順
である。上述したことから明らかなように、この実施形
態による心理聴覚分析部３のＳＭＲ計算方法は、基本的
には、各サブバンド毎に音圧レベルＰ（ｋ）の分散を求
め、求めた分散値をＳＭＲとして使用するという方法で
ある。ここで、分散がＳＭＲとして使用できる理由は次
の通りである。すなわち、各サブバンドの１６の音圧レ
ベルの平均値が同じであったとしても、バラツキが大き
ければ分散は大きな値となり、逆に、バラツキが小さけ
れば分散は小さな値となる。全ての音圧レベルが等しけ
れば分散は０となる。一方、１つのサブバンドにおける
周波数成分にバラツキが少ない場合、それぞれの周波数
成分の位相がまちまちであれば、人間の耳には雑音に聞
こえる。これに対し、サブバンド内にピークがあれば、
その音が知覚される。すなわち、周波数成分（振幅また
は音圧レベル）の分散を計算した時に、分散が小さい波
形より大きい波形の方がより重要な波形であり、したが
って、ビット数も多くすることが必要となり、一方、分
散が小さい波形はビット数を少なくすることができる。The above is the processing procedure according to the embodiment of the present invention. As is clear from the above, the SMR calculation method of the psycho-acoustic analysis unit 3 according to this embodiment basically calculates the variance of the sound pressure level P (k) for each subband, and calculates the calculated variance value. Is used as the SMR. Here, the reason why the dispersion can be used as the SMR is as follows. That is, even if the average value of 16 sound pressure levels in each sub-band is the same, the variance has a large value if the variation is large, and conversely, the variance has a small value if the variation is small. If all sound pressure levels are equal, the variance will be zero. On the other hand, when there is little variation in the frequency components in one subband, the human ears hear noise if the phases of the frequency components are different. On the other hand, if there is a peak in the subband,
The sound is perceived. That is, when the variance of frequency components (amplitude or sound pressure level) is calculated, a waveform with a smaller variance is more important than a waveform with a smaller variance, and therefore, it is also necessary to increase the number of bits. A small waveform can reduce the number of bits.

【００１６】ところで、ＳＭＲと分散には以下の性質が
ある。 (1)データ量割り当て１（ビット）がＳＭＲの約６（ｄ
Ｂ）に相当する。 (2)分散は音圧レベルの二乗の次元を持っている。 (3)音圧レベルは振幅の二乗の次元を持っている。 (4)データ量は振幅が２倍になった時に１（ビット）増
やすのが妥当である。すなわち、分散を振幅に直すため
に４乗根をとり、さらに、常用対数をとって２０倍（ｄ
Ｂへの変換）してＳＭＲとするのが妥当である。実際に
は、対数の性質から、分散の常用対数をとって５倍すれ
ばよい。つまり、ＳＭＲは、基本的には、ＳＭＲ（ｓｂ）＝５×ｌｏｇ₁₀Ｖ（ｓｂ）なる式によって求められる。By the way, SMR and dispersion have the following properties. (1) Data amount allocation 1 (bit) is about 6 (d) of SMR
Corresponds to B). (2) The variance has the dimension of the square of the sound pressure level. (3) The sound pressure level has the dimension of the square of the amplitude. (4) It is appropriate to increase the data amount by 1 (bit) when the amplitude doubles. That is, the fourth root is taken to convert the variance into the amplitude, and the common logarithm is taken to obtain 20 times (d
It is appropriate to convert it to B) to obtain SMR. In practice, due to the property of logarithm, the common logarithm of variance may be taken and multiplied by 5. That is, the SMR is basically obtained by the formula: SMR (sb) = 5 × log ₁₀ V (sb).

【００１７】しかし、人間の耳には各種聴覚特性がある
ため、その特性を利用した方がよりよい結果が得られ
る。まず、人間の耳は低域ほど周波数分解能がよく、高
域になるに従い分解能が悪くなる性質がある。この性質
を考慮し、次の補正方法が考えられる。分散を計算する
時、各サブバンドにおける最大の音圧レベルＰ（ｋ）の
み周波数に反比例させて大きくする。すなわち、各サブ
バンドにおける最大の音圧レベルをＰ’（ｋ）とする
と、このＰ’（ｋ）に、Ｐ’（ｋ）＝Ｐ’（ｋ）×（３２／ｓｂ）なる補正を加え、この補正後の音圧レベルＰ’（ｋ）を
使用して分散を計算する。However, since the human ear has various auditory characteristics, it is possible to obtain better results by utilizing those characteristics. First, the human ear has a property that the frequency resolution is better at lower frequencies, and the resolution becomes worse at higher frequencies. Considering this property, the following correction method can be considered. When calculating the variance, only the maximum sound pressure level P (k) in each subband is increased in inverse proportion to the frequency. That is, assuming that the maximum sound pressure level in each subband is P ′ (k), a correction of P ′ (k) = P ′ (k) × (32 / sb) is added to this P ′ (k), The variance is calculated using the corrected sound pressure level P ′ (k).

【００１８】実験的には、直接反比例させると高域特性
が著しく劣化するため、二乗根に反比例させた方が良い
結果が得られる。また、ｓｂは０から始まるため計算の
都合上「１」を加算する。結局、次式によって音圧レベ
ルを補正することが好ましい。Ｐ’（ｋ）＝Ｐ’（ｋ）×sqrt（３３／（ｓｂ＋１））前述したステップＳ４の重み付けはこの補正である。な
お、平均値の計算（ステップＳ３）においては、補正前
の音圧レベルＰ（ｋ）を使用する。Experimentally, since high-frequency characteristics are significantly deteriorated when directly inversely proportional, it is possible to obtain a better result when inversely proportional to the square root. Since sb starts from 0, "1" is added for convenience of calculation. After all, it is preferable to correct the sound pressure level by the following equation. P ′ (k) = P ′ (k) × sqrt (33 / (sb + 1)) The weighting in step S4 described above is this correction. In the calculation of the average value (step S3), the sound pressure level P (k) before correction is used.

【００１９】次に、人間の耳にはいわゆる聴覚曲線で表
される周波数特性がある。各サブバンドの中央における
聴覚感度をＱ（ｓｂ）とすると、この値は音圧レベル
（ｄＢ）単位で表されており、値が小さい（負にもな
る）ほど、耳の感度が良いことを示している。そこで、
この聴覚感度Ｑ（ｓｂ）を加味した次式によってＳＭＲ
を計算する方が好ましい。ＳＭＲ2（ｓｂ）＝５×ｌｏｇ₁₀Ｖ（ｓｂ）−Ｑ（ｓ
ｂ）Next, the human ear has a frequency characteristic represented by a so-called auditory curve. Letting Q (sb) be the auditory sensitivity in the center of each subband, this value is expressed in units of sound pressure level (dB). The smaller the value (the more negative the value), the better the ear sensitivity. Shows. Therefore,
The SMR is calculated by the following equation that takes into account the hearing sensitivity Q (sb).
Is preferred to be calculated. SMR2 (sb) = 5 × log 10 V (sb) -Q (s
b)

【００２０】しかし、実際には、例えばＭＰ２でサンプ
リング周波数４８ＫＨｚの時を考えてみると、ｓｂ＝０
に相当する周波数は０〜７５０Ｈｚであり、この範囲に
は多くの音の基本波が存在するため、感度を落とすと音
質が低下し、ゴソゴソという感じのノイズが増加する。
したがって、耳の感度がある程度良くなる２ＫＨｚ程度
までは補正をしない方がよい結果が得られる。そのよう
に修正した値をＱ’（ｓｂ）とする。図２はこのＱ’
（ｓｂ）の値を示す。また、実験的には、単純に和をと
るより、０．５：０．５の割合で和をとった方が良い結
果が得られる。すわわち、次式によってＳＭＲを計算し
た方がより好ましい。ＳＭＲ3（ｓｂ）＝２．５×ｌｏｇ₁₀Ｖ（ｓｂ）−０．
５Ｑ’（ｓｂ）上述したステップＳ６はこの計算処理を示している。However, actually, for example, considering MP2 and a sampling frequency of 48 KHz, sb = 0.
The frequency corresponding to is 0 to 750 Hz, and there are many fundamental waves of sound in this range, so if the sensitivity is lowered, the sound quality deteriorates, and noise that feels like noise increases.
Therefore, better results are obtained without correction up to about 2 KHz at which the ear sensitivity is improved to some extent. The value thus corrected is designated as Q '(sb). Figure 2 shows this Q '
The value of (sb) is shown. In addition, experimentally, a better result can be obtained by taking the sum at a ratio of 0.5: 0.5 than by simply taking the sum. That is, it is more preferable to calculate the SMR by the following equation. SMR3 (sb) = 2.5 × log 10 V (sb) -0.
5Q ′ (sb) Step S6 described above shows this calculation processing.

【００２１】以上詳述したように、上記実施形態によれ
ば、各サブバンド毎に分散を計算し、この計算結果から
ＳＭＲを求めているので、従来の演算（[数１]参照）よ
りはるかに簡単な演算によってＳＭＲを求めることがで
きる。実験では、従来の方法の約１／３の時間でＳＭＲ
を求めることができた。なお、上記実施形態による方法
は、従来の心理聴覚モデルに比較し、ビットレートが下
がると、高域の特性が早く落ちる傾向がある。この結
果、従来の心理聴覚モデルでは目立つ高域の「ピロピ
ロ」という感じの量子化ノイズがこの実施形態による方
法では目立たず、代わりに、低域の「ゴソゴソ」という
量子化ノイズが目立つようになる。As described above in detail, according to the above-described embodiment, the variance is calculated for each subband and the SMR is obtained from the calculation result, so that it is far more than the conventional calculation (see [Equation 1]). The SMR can be obtained by a simple calculation. In the experiment, the SMR took about 1/3 the time of the conventional method.
Could be asked. Note that the method according to the above-described embodiment has a tendency that the characteristics in the high frequency range are quickly deteriorated when the bit rate is lower than in the conventional psychoacoustic model. As a result, in the conventional psychoacoustic model, the quantization noise of the high frequency "Piro Piro" which is conspicuous in the conventional psychoacoustic model is not conspicuous in the method according to this embodiment, and instead the quantization noise of the low frequency "Gosogoso" becomes conspicuous. .

【００２２】[0022]

【発明の効果】以上説明したように、この発明によれ
ば、各サブバンド毎に周波性成分を求め、求めた周波数
成分の分散を計算し、この計算結果に基づいてビット割
り当てを行うので、ビット割り当てのための計算時間を
従来より大幅に短縮することができる利点が得られる。As described above, according to the present invention, the frequency component is obtained for each subband, the variance of the obtained frequency component is calculated, and bit allocation is performed based on the calculation result. The advantage is that the calculation time for bit allocation can be significantly shortened compared to the conventional one.

[Brief description of drawings]

【図１】この発明による方法の一実施形態の処理手順
を示すフローチャートである。FIG. 1 is a flowchart showing a processing procedure of an embodiment of a method according to the present invention.

【図２】同実施形態において用いられる聴覚感度デー
タを示す図である。FIG. 2 is a diagram showing hearing sensitivity data used in the same embodiment.

【図３】ディジタルオーディオ信号圧縮回路の構成例
を示すブロック図である。FIG. 3 is a block diagram showing a configuration example of a digital audio signal compression circuit.

[Explanation of symbols]

３…心理聴覚分析部、４…ビット割当部、５…量子化回
路。3 ... psychoacoustic analysis unit, 4 ... bit allocation unit, 5 ... quantization circuit.

フロントページの続き (56)参考文献特開平６−242798（ＪＰ，Ａ) 特開平４−104618（ＪＰ，Ａ) 特開平９−134200（ＪＰ，Ａ) 山崎芳男，高能率符号化の動向，日本音響学会誌，1991年，47巻12号，ｐ. 955−961 杉山昭彦，音響信号の高能率符号化, テレビジョン学会誌，1994年，Ｖｏｌ. 48，Ｎｏ．４，ｐ．447−454 守谷健弘，金子孝夫，音声／楽音の情報圧縮符号化の基礎技術，Ｉｎｔｅｒｆａｃｅ，1998年８月，ｐ．92−99 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 19/00 G10L 19/02 H03M 7/30 ＪＩＣＳＴファイル（ＪＯＩＳ)Continuation of the front page (56) References JP-A-6-242798 (JP, A) JP-A-4-104618 (JP, A) JP-A-9-134200 (JP, A) Yamao Yoshio, High-efficiency coding Trends, The Acoustical Society of Japan, 1991, Vol. 12, No. 12, p. 955-961 Akihiko Sugiyama, High Efficiency Coding of Acoustic Signals, Journal of the Television Society, 1994, Vol. 48, No. 4, p. 447-454 Takehiro Moriya, Takao Kaneko, Basic Technology for Information Compression Coding of Speech / Music, Interface, August 1998, p. 92-99 (58) Fields surveyed (Int.Cl. ⁷ , DB name) G10L 19/00 G10L 19/02 H03M 7/30 JISST file (JOIS)

Claims

(57) [Claims]

1. An input digital audio signal is frequency-divided into a plurality of subbands, psychoacoustic analysis processing is performed, bits are assigned to each subband according to the result of the psychoacoustic analysis processing, and each subband is assigned according to the bit allocation. In the digital audio signal compression method for quantizing and outputting a subband signal, the psychoacoustic analysis processing frequency-analyzes the input digital audio signal to convert it into frequency components, and
Detected maximum value of frequency component for each band and detected maximum value
Its performs weighting corresponding to the frequency band of the sub subband, digital audio signal compression method, characterized by using each maximum value by weighting a process of calculating the variance of said frequency components for each subband .

2. The psychoacoustic analysis process, after calculating the variance, calculates data corresponding to a previously obtained auditory sensitivity to a value obtained by the calculation. A method for compressing a digital audio signal according to.

3. An input digital audio signal is frequency-divided into a plurality of subbands, and psychoacoustic analysis means for performing psychoacoustic analysis processing, and bit allocation of each subband is performed according to the result of the psychoacoustic analysis processing. In a digital audio signal compression apparatus comprising bit allocation means and quantizing means for quantizing and outputting signals of respective subbands according to the bit allocation, the psychoacoustic analysis means frequency-analyzes the input digital audio signal. The first means for converting to frequency components and the maximum value of frequency components for each sub-band are detected and detected.
Third means for calculating the variance of said frequency components of each sub-band by using the second means for performing a maximum value weighted according to the frequency band of the sub-bands out, each maximum value were weighted A digital audio signal compression apparatus comprising:

4. The psychoacoustic analysis means includes the third hand.
4. The digital audio signal compression apparatus according to claim 3, further comprising fourth means for calculating data corresponding to a previously-obtained auditory sensitivity based on the variance value obtained by the calculation by the step. .