JPH0746137A

JPH0746137A - Highly efficient sound encoder

Info

Publication number: JPH0746137A
Application number: JP5205721A
Authority: JP
Inventors: Norihiko Fuchigami; 徳彦渕上; Shoji Ueno; 昭治植野
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 1993-07-28
Filing date: 1993-07-28
Publication date: 1995-02-14
Anticipated expiration: 2014-12-20
Also published as: JP2993324B2

Abstract

PURPOSE:To provide the highly efficient sound encoder which better satisfies the auditory mentality and improves the tone quality. CONSTITUTION:An audio signal has several samples segmented by a blind segmenting part 1, and the audio signal of each sample is divided into plural subbands by an orthogonal transformation part 2. An auditory mentality analysis part 3 analyzes the audio signal with the analysis band width in each frequency area, which is narrower than the subband width and narrower than the critical band width on auditory mentality and is ideally 1/2 to 1/3 of the critical band width, to determine the number of quantization bits in each subband. A quantizing and encoding part 4 quantizes and encodes the audio signal of each divided subband with this number of quantization bits by an orthogonal transformation part 2.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、オーディオ信号を複数
の周波数帯域（サブバンド）に分割し、分割された信号
をサブバンド毎に量子化および符号化する音声高能率符
号化装置に関し、特に聴覚心理分析に基づいてサブバン
ド毎の量子化ビット数を決定する音声高能率符号化装置
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a high-efficiency speech coding apparatus for dividing an audio signal into a plurality of frequency bands (subbands) and quantizing and coding the divided signals for each subband. The present invention relates to a speech efficient coding apparatus that determines the number of quantization bits for each subband based on psychoacoustic analysis.

【０００２】[0002]

【従来の技術】ミニディスク（ＭＤ）、ディジタルコン
パクトカセット（ＤＣＣ）、カラオケＣＤ等における音
声高能率符号化は、オーディオ信号のデータ量を圧縮す
るので音楽圧縮とも呼ばれている。このような符号化方
式では、オーディオ信号がデジタルフィルタまたは直交
変換により複数のサブバンドに分割され、周波数領域に
おける聴覚心理分析に基づいてサブバンド毎の量子化ビ
ット数が決定される。なお、以下の説明では「エンコー
ド」という用語を符号化の他に圧縮の意味で用いる場合
もある。2. Description of the Related Art High-efficiency audio coding in a mini disc (MD), a digital compact cassette (DCC), a karaoke CD, etc. is called music compression because it compresses the data amount of an audio signal. In such an encoding method, an audio signal is divided into a plurality of subbands by a digital filter or orthogonal transformation, and the number of quantization bits for each subband is determined based on psychoacoustic analysis in the frequency domain. In the following description, the term “encode” may be used to mean compression in addition to encoding.

【０００３】さて図１０（ａ）〜（ｄ）はこのような符
号化方式で周波数帯域を直交変換により分割する例を示
している。図１０（ａ）はエンコードの対象となる１６
ビットＰＣＭオーディオ信号を５１２サンプル分切り出
したことを示し、ここでは図の長方形で囲まれる全情報
量が１６ビット＊５１２＝８１９２ビットとして説明す
る。もちろん、切り出されるサンプル数やＰＣＭのビッ
ト数はこの値に限定されない。FIGS. 10 (a) to 10 (d) show an example in which the frequency band is divided by orthogonal transformation in such an encoding method. FIG. 10A shows the target 16 of encoding.
This shows that 512 samples of a bit PCM audio signal are cut out, and here it is assumed that the total amount of information enclosed by the rectangle in the figure is 16 bits * 512 = 8192 bits. Of course, the number of samples to be cut out and the number of PCM bits are not limited to this value.

【０００４】図１０（ｂ）は図１０（ａ）に示す信号を
ＤＣＴ（離散コサイン変換）やＦＦＴ（高速フーリエ変
換）等の直交変換により周波数変換した信号を示し、図
の曲線が周波数スペクトルのエンベロープを示してい
る。ここで、直交変換により情報量が保存されると仮定
すると、この全情報量も図の長方形領域で表現すること
ができる。一方、聴覚心理モデルによれば、図１０
（ｂ）に示す信号が存在したときに、その信号によりマ
スキングされて聞こえなくなる信号レベルをカーブとし
て規定することができ、これは一般にマスキング効果
（詳しくは後述）と言われる。FIG. 10B shows a signal obtained by frequency-converting the signal shown in FIG. 10A by orthogonal transform such as DCT (discrete cosine transform) or FFT (fast Fourier transform). The curve in the figure shows the frequency spectrum. The envelope is shown. Here, assuming that the amount of information is preserved by orthogonal transformation, this total amount of information can also be expressed by the rectangular area in the figure. On the other hand, according to the psychoacoustic model, FIG.
When the signal shown in (b) is present, the signal level masked by the signal and inaudible can be defined as a curve, which is generally called a masking effect (described in detail later).

【０００５】図１０（ｂ）からマスキングカーブを描く
と図１０（ｃ）に示すように表すことができ、ここで、
図１０（ｂ）に示す信号を再量子化することを考慮する
と、再量子化により発生する量子化ノイズレベルがマス
キングカーブで規定されるレベル以下であれば、そのノ
イズは人間の耳には聞こえないということができる。そ
こで、図１０（ｄ）に示すようにスペクトルを複数デー
タ毎にサブバンドに分割し、各サブバンド毎の最大信号
レベルをＳとし、また、図１０（ｃ）から許容されるノ
イズレベルをＮとしてこのＳ／Ｎを満足するビット数で
再量子化すれば、そのときの量子化ノイズはマスキング
されて聞こえない。A masking curve can be drawn from FIG. 10 (b) as shown in FIG. 10 (c).
Considering requantization of the signal shown in FIG. 10B, if the quantization noise level generated by the requantization is equal to or lower than the level defined by the masking curve, the noise can be heard by the human ear. It can be said that there is no. Therefore, as shown in FIG. 10 (d), the spectrum is divided into sub-bands for each plurality of data, the maximum signal level for each sub-band is set to S, and the allowable noise level from FIG. 10 (c) is N. If requantization is performed with a bit number that satisfies this S / N, the quantization noise at that time is masked and cannot be heard.

【０００６】図１０（ｄ）の矩形は圧縮時および伸長時
に必要な情報量を示し、特に図の中央の変形矩形は主情
報を、図の下側の細長い矩形は補助情報を示している。
なお、補助情報とはデコード時に必要な各サブバンドの
最大値（スケール値）と量子化ビット数を示す情報等で
ある。したがって、図１０（ｄ）において示される全情
報量は主情報量と補助情報量の和であり、図１０（ａ）
や図１０（ｂ）における全情報量の数分の１になること
が分かる。以上の処理を所定区間（この例では５１２サ
ンプル区間）毎に繰り返すことにより音質を殆ど劣化す
ることなくエンコードすることができる。The rectangle in FIG. 10 (d) shows the amount of information required at the time of compression and decompression. In particular, the deformed rectangle in the center of the figure shows the main information, and the elongated rectangle at the bottom of the figure shows the auxiliary information.
The auxiliary information is information indicating the maximum value (scale value) and the number of quantization bits of each subband necessary for decoding. Therefore, the total information amount shown in FIG. 10D is the sum of the main information amount and the auxiliary information amount, and FIG.
It can be seen that the total amount of information in FIG. By repeating the above processing for every predetermined section (512 sample sections in this example), encoding can be performed with almost no deterioration in sound quality.

【０００７】次に聴覚心理に基づくマスキングカーブの
計算方法を説明する。マスキング効果とはある周波数ス
ペクトルが存在する場合に、その周辺のある一定レベル
以下の音が検知できなくなることを言い、各種周波数ス
ペクトルに関するマスキングカーブを図１１に示す。図
１１に示されるようにカーブの傾斜は低域ほど急峻であ
り、高域ほど緩慢である。また、図１１の横軸（周波
数）を聴覚の臨界帯域幅に比例したスケールに変換する
と、図１２に示すようにこれらのカーブはほぼ同様な形
および傾斜のカーブになることが知られている。また、
この臨界帯域幅は図１３に示すように、ＤＣ〜２０ｋＨ
ｚを２５バンドで分割して表すことができ、マスキング
を始めとする聴覚特性は、この臨界帯域幅に比例した振
る舞いをすることが多い、Next, a method of calculating a masking curve based on auditory psychology will be described. The masking effect means that when a certain frequency spectrum is present, sounds around a certain level below a certain level cannot be detected, and masking curves for various frequency spectra are shown in FIG. As shown in FIG. 11, the slope of the curve is steeper in the lower range and slower in the higher range. Further, it is known that when the horizontal axis (frequency) in FIG. 11 is converted into a scale proportional to the auditory critical bandwidth, these curves have almost the same shape and slope as shown in FIG. . Also,
This critical bandwidth is DC to 20 kHz, as shown in FIG.
z can be expressed by being divided into 25 bands, and auditory characteristics such as masking often behave in proportion to this critical bandwidth.

【０００８】さて、図１０（ｂ）に示すような一般的な
信号が存在したときのマスキングカーブは、図１１また
は図１２のように個々の周波数スペクトルに対するマス
キングカーブの和（重ね合わせ）で表して図１０（ｃ）
に示すような曲線で表すことができるが、実際の計算で
は図１０（ｃ）に示すような滑らかな曲線としてマスキ
ングカーブを算出すると演算量が膨大となるので困難で
ある。そこで、近似としてスペクトルを分析バンド毎の
パワーに置き換え、マスキングカーブを分析バンド毎の
折れ線波形として評価することが行われる。The masking curve when a general signal as shown in FIG. 10 (b) is present is represented by the sum (overlap) of the masking curves for each frequency spectrum as shown in FIG. 11 or 12. Fig. 10 (c)
Although it can be represented by a curve as shown in FIG. 10, it is difficult in actual calculation to calculate the masking curve as a smooth curve as shown in FIG. Therefore, as an approximation, the spectrum is replaced with the power for each analysis band, and the masking curve is evaluated as a polygonal line waveform for each analysis band.

【０００９】次に、図１０（ｄ）においてマスキングカ
ーブの各サブバンド区間における最小値をそのサブバン
ドにおいて許容されるノイズレベルＮとして、マスキン
グカーブの算出からノイズレベルＮを導出する従来例を
説明する。（１）先ず、直交変換して得られた周波数スペクトルＮ
本からｍ個の分析バンドｉ毎のバンドトータルパワーＰ
〔ｉ〕を算出する。Next, referring to FIG. 10 (d), a conventional example of deriving the noise level N from the calculation of the masking curve, where the minimum value in each subband section of the masking curve is the noise level N allowed in that subband, will be described. To do. (1) First, the frequency spectrum N obtained by orthogonal transformation
Band total power P for each of m analysis bands i from the book
[I] is calculated.

【００１０】[0010]

【数１】 [Equation 1]

【００１１】次に、次式（数２）のように分析バンドに
応じたマスキングの基準カーブＢとバンドトータルパワ
ーＰ〔ｉ〕との畳み込み演算を行うことにより、各分析
バンドｉにおけるマスキングレベルＭ〔ｉ〕を算出す
る。ここで、マスキングの基準カーブＢは、分析バンド
ｉ毎に形が異なるカーブを必要とする場合には図１４
（ｂ）に示すようなＢ〔ｉ〕〔ｋ〕（ｋは整数）で表す
ことができ、また、分析バンドｉに依らず一定の形の場
合には図１４（ａ）に示すようなＢ〔ｋ〕（ｋは整数）
で表すことができる。Next, the masking level M in each analysis band i is calculated by performing the convolution operation of the masking reference curve B and the band total power P [i] according to the analysis band as in the following equation (Equation 2). [I] is calculated. Here, the masking reference curve B is shown in FIG. 14 when a curve having a different shape for each analysis band i is required.
It can be represented by B [i] [k] (k is an integer) as shown in (b), and in the case of a constant shape irrespective of the analysis band i, B as shown in FIG. [K] (k is an integer)
Can be expressed as

【００１２】[0012]

【数２】 [Equation 2]

【００１３】（３）分析バンドｉとサブバンドｓが異な
る場合にはサブバンドｓの区間における最小のマスキン
グレベルＭ〔ｉ〕をそのサブバンドｓの許容ノイズレベ
ルＮ〔ｓ〕とする（サブバンド数ｎとする）。(3) When the analysis band i is different from the subband s, the minimum masking level M [i] in the section of the subband s is set as the allowable noise level N [s] of the subband s (subband Number n).

【００１４】[0014]

【数３】Ｎ〔ｓ〕＝min 〔Ｍ〔ｉ〕〕但し、ｉはサブバンドｓの中に含まれるいずれかの区間
を示すｓ＝０・・・ｎ−１## EQU00003 ## N [s] = min [M [i]] where i indicates any section included in subband s s = 0 ... n-1

【００１５】分析バンドｉとサブバンドｓが同一の場合
にはWhen the analysis band i and the subband s are the same,

【００１６】[0016]

【数３】Ｎ〔ｓ〕＝Ｍ〔ｉ〕但し、ｓ＝０・・・ｎ−１## EQU3 ## N [s] = M [i] where s = 0 ... n-1

【００１７】[0017]

【発明が解決しようとする課題】以上の処理（１）〜
（３）により各サブバンドｓの許容ノイズレベルＮ
〔ｓ〕が求まるが、ここで、分析バンドｉとサブバンド
ｓとの関係について説明すると、一般的には次の３通り
の場合のいずれかが多い。（ａ）分析バンドｉがサブバンドｓと等しく、バンド幅
が臨界帯域と等しい（比例関係にある）場合（ｂ）分析バンドｉがサブバンドｓと等しく、バンド幅
が臨界帯域と比例しない場合（ｃ）分析バンドｉがサブバンドｓと異なり、バンド幅
が臨界帯域と等しい（比例関係にある）場合[Problems to be Solved by the Invention] The above processes (1) to
Due to (3), the allowable noise level N of each subband s
[S] can be obtained. Here, the relationship between the analysis band i and the subband s will be described. Generally, there are many cases of the following three cases. (A) When the analysis band i is equal to the subband s and the bandwidth is equal to (in proportion to) the critical band (b) When the analysis band i is equal to the subband s and the bandwidth is not proportional to the critical band ( c) When the analysis band i is different from the subband s and the bandwidth is equal to (proportional to) the critical band.

【００１８】上記（ａ）の場合には、上記許容ノイズレ
ベルＮ〔ｓ〕と、バンド幅を充分小さくして計算した場
合のマスキングカーブすなわち真のマスキングカーブか
ら得られる真の許容ノイズレベルＮａ〔ｓ〕の差は図１
５に例示することができ、各バンドの真の許容ノイズレ
ベルＮａ〔ｓ〕は、そのバンドの真のマスキングカーブ
の最小値にする必要がある。なお、図１５の曲線は真の
マスキングカーブを示し、実線の折れ線が上記許容ノイ
ズレベルＮ〔ｓ〕を、また、点線の折れ線が真の許容ノ
イズレベルＮａ〔ｓ〕を示している。In the case of the above (a), the allowable noise level N [s] and the true allowable noise level Na [obtained from the masking curve, that is, the true masking curve when the bandwidth is calculated sufficiently small. The difference between
5, the true permissible noise level Na [s] of each band needs to be the minimum value of the true masking curve of that band. The curve in FIG. 15 shows a true masking curve, the solid polygonal line shows the allowable noise level N [s], and the dotted polygonal line shows the true allowable noise level Na [s].

【００１９】また、図１５（ａ）は信号スペクトルがフ
ラット（ノイズライク）な場合を示し、図１５（ｂ）は
信号スペクトルがピーキー（トーンライク）であって信
号の中心パワーがバンドの中心近くにある場合を示し、
図１５（ｃ）は信号スペクトルがピーキーであって信号
の中心パワーがバンドの境界に近い場合を示している。
図１５（ａ）に示す場合には上記許容ノイズレベルＮ
〔ｓ〕と真の許容ノイズレベルＮａ〔ｓ〕の差が少ない
が、図１５（ｂ）および（ｃ）に示す場合には大きな誤
差がある。この誤差の理由は、図１４に示すように一般
にマスキングの基準カーブは臨界帯域当たり１０〜２０
ｄＢの傾きを有するからであり、したがって、このよう
な許容ノイズレベルＮ〔ｓ〕でエンコードすると音質が
劣化することがある。Further, FIG. 15A shows the case where the signal spectrum is flat (noise-like), and FIG. 15B shows the signal spectrum being peaky (tone-like) and the center power of the signal is near the center of the band. The case of
FIG. 15C shows the case where the signal spectrum is peaky and the center power of the signal is close to the band boundary.
In the case shown in FIG. 15A, the allowable noise level N
The difference between [s] and the true allowable noise level Na [s] is small, but there is a large error in the cases shown in FIGS. 15 (b) and 15 (c). The reason for this error is that the masking reference curve is generally 10 to 20 per critical band as shown in FIG.
This is because it has a slope of dB. Therefore, encoding with such an allowable noise level N [s] may deteriorate the sound quality.

【００２０】上記（ｂ）の場合とは、システムの都合上
サブバンド幅を臨界帯域に比例させることができず、分
析バンドをサブバンドと共通にした場合に相当する。こ
の場合、バンド幅がある帯域では臨界帯域よりも広く、
ある帯域では臨界帯域より狭くなるのが殆どである。ま
た、バンド幅が臨界帯域と同一かそれより広い領域では
上記（ａ）において説明した問題点がさらに悪化する。The above case (b) corresponds to the case where the subband width cannot be made proportional to the critical band due to the convenience of the system and the analysis band is made common with the subband. In this case, the bandwidth is wider than the critical band in a certain band,
In a certain band, it is almost narrower than the critical band. Further, in the region where the bandwidth is equal to or wider than the critical band, the problem described in (a) above is aggravated.

【００２１】上記（ｃ）の場合にも上記（ａ）において
説明した問題点がそのまま当てはまる。さらに、サブバ
ンドｓが分析バンドｉよりも狭い状況が発生する場合に
は、図１６に示すように分析バンドｉから得られるマス
キングレベルＭ〔ｉ〕＝Ｎ〔ｓ〕は、サブバンドｓより
周波数の分解能が悪く、真の許容ノイズレベルＮａ
〔ｓ〕との差が大きくなる。ここで、もし分析バンド幅
をサブバンド幅以下に設定すれば、得られる許容ノイズ
レベルＮ〔ｓ〕はより真の許容ノイズレベルＮａ〔ｓ〕
に近くなり、したがって、上記（ａ）の問題を緩和する
能力があるにもかかわらずその利点を生かすことができ
ない。In the case of the above (c), the problem described in the above (a) applies as it is. Further, when a situation occurs in which the subband s is narrower than the analysis band i, the masking level M [i] = N [s] obtained from the analysis band i as shown in FIG. Resolution is poor, the true allowable noise level Na
The difference from [s] becomes large. Here, if the analysis bandwidth is set to be equal to or smaller than the sub-bandwidth, the obtained allowable noise level N [s] is a more true allowable noise level Na [s].
Therefore, the advantage cannot be utilized even though it has the ability to mitigate the problem (a).

【００２２】次に、他の問題点を説明する。前述した処
理（１）〜（３）により許容ノイズレベルＮ〔ｓ〕を求
める一連の処理において、他の重要な役割を果たすのは
図１４において説明したマスキングの基準カーブの形状
や傾斜である。図１４の横軸には分析バンドの相対イン
デックスｋが採られているが、図１２からも明らかなよ
うに横軸に臨界帯域を採った場合のマスキング量−臨界
帯域特性は、マスキングの中心がどの臨界帯域にあった
としても一定であると考えられているので、図１７に示
すように横軸に相対臨界帯域に採ってパワーＰを表す
と、全ての周波数帯域のマスキングカーブを一義的に表
現することができる。この意味では、図１４においてマ
スキング基準カーブがＢ〔ｉ〕と分析バンドｉに依るこ
となく表される理由は、分析バンドｉが臨界帯域に比例
しているからであり、したがって、Ｂ〔ｉ〕〔ｋ〕と表
す必要があるのは、分析バンドｉが臨界帯域に比例して
いない場合である。Next, other problems will be described. In the series of processes for obtaining the allowable noise level N [s] by the processes (1) to (3) described above, another important role is the shape and inclination of the masking reference curve described in FIG. The relative index k of the analysis band is taken on the abscissa of FIG. 14, but as is clear from FIG. 12, the masking amount-critical band characteristic when the critical band is taken on the abscissa is that the center of masking is Since it is considered to be constant in any critical band, when the power P is represented in the relative critical band on the horizontal axis as shown in FIG. 17, the masking curves of all frequency bands are uniquely defined. Can be expressed. In this sense, the reason why the masking reference curve is represented in FIG. 14 without depending on B [i] and the analysis band i is that the analysis band i is proportional to the critical band, and thus B [i] It is necessary to represent [k] when the analysis band i is not proportional to the critical band.

【００２３】しかしながら、上記の如くマスキング量−
臨界帯域特性が一定であるという考え方では、実際にエ
ンコードおよびデコードした場合の音質では、特に低域
側で劣化が検知されることがある。その原因としては、
低域におけるマスキング量−臨界帯域特性のカーブの形
が不適切であることが考えられる。図１８（ａ）（ｂ）
はそれぞれマスキングカーブが広い場合（傾斜が緩やか
な場合）と狭い場合（傾斜が急な場合）における各サブ
バンドの真の許容ノイズレベルＮａ〔ｓ〕を示す。この
図は信号がピーキー（トーンライク）な場合を想定して
おり、得られる許容ノイズレベルＮ〔ｓ〕とは大きな違
いがある。したがって、従来の方法では、適切でないマ
スキングカーブによりエンコードするので、音質が劣化
する場合がある。However, as described above, the masking amount-
With the idea that the critical band characteristic is constant, deterioration may be detected particularly in the low frequency side in the sound quality when actually encoded and decoded. The cause is
It is conceivable that the curve shape of the masking amount-critical band characteristic in the low frequency range is inappropriate. 18 (a) (b)
Shows the true permissible noise level Na [s] of each subband when the masking curve is wide (when the slope is gentle) and when it is narrow (when the slope is steep). This figure assumes the case where the signal is peaky (tone-like), and there is a big difference from the obtained allowable noise level N [s]. Therefore, in the conventional method, the sound quality may be deteriorated because the encoding is performed by an inappropriate masking curve.

【００２４】本発明は上記従来の問題点に鑑み、聴覚心
理をより満足させて音質を向上させることができる音声
高能率符号化装置を提供することを目的とする。In view of the above-mentioned conventional problems, it is an object of the present invention to provide a high-efficiency speech coding apparatus capable of satisfying auditory psychology and improving sound quality.

【００２５】[0025]

【課題を解決するための手段】本発明は上記目的を達成
するために、各周波数領域においてサブバンド幅以下で
あり、かつ聴覚心理上の臨界帯域幅以下に設定されると
共に少なくとも低域においては帯域幅をそれらに比べ十
分狭く設定するような周波数依存の分析バンド幅でオー
ディオ信号を分析して各サブバンドの量子化ビット数を
決定するようにしている。すなわち、本発明によれば、
オーディオ信号を複数の周波数帯域のサブバンドに分割
する分割手段と、前記分割手段により分割された各サブ
バンドのオーディオ信号を可変の量子化ビット数で量子
化および符号化する量子化・符号化手段と、各周波数領
域においてサブバンド幅以下であり、かつ聴覚心理上の
臨界帯域幅以下の分析バンド幅でオーディオ信号を分析
し、前記量子化および符号化手段の各サブバンドの量子
化ビット数を決定する聴覚心理分析手段とを有する音声
高能率符号化装置が提供される。In order to achieve the above object, the present invention is set to a subband width or less in each frequency range and a critical psychoacoustic bandwidth, and at least in a low range. The audio signal is analyzed with a frequency-dependent analysis bandwidth that sets the bandwidth sufficiently narrower than those, and the number of quantization bits of each subband is determined. That is, according to the present invention,
Dividing means for dividing an audio signal into a plurality of frequency band subbands; and quantizing / encoding means for quantizing and encoding the audio signal of each subband divided by the dividing means with a variable number of quantization bits. And an audio signal is analyzed with an analysis bandwidth that is less than or equal to the subband width in each frequency domain and less than or equal to the critical psychoacoustic bandwidth, and the number of quantization bits of each subband of the quantization and encoding means is A high-efficiency speech coding apparatus having a psychoacoustic analysis means for determining is provided.

【００２６】本発明はまた、聴覚心理を分析する際の周
波数領域のノイズマスキングレベルを表す複数のマスキ
ング量−臨界帯域特性を適用臨界帯域に応じて選択して
各サブバンドのオーディオ信号を分析し、各サブバンド
の量子化ビット数を決定するようにしている。すなわ
ち、本発明によれば、オーディオ信号を複数の周波数帯
域のサブバンドに分割する分割手段と、前記分割手段に
より分割された各サブバンドのオーディオ信号を可変の
量子化ビット数で量子化および符号化する量子化・符号
化手段と、聴覚心理を分析する際の周波数領域のノイズ
マスキングレベルを表すための、少なくとも一部の各別
の臨界帯域において相対臨界帯域カーブの傾斜が異なる
よう選定されたマスキング量−臨界帯域特性を用いて各
サブバンドのオーディオ信号を分析し、前記量子化およ
び符号化手段の各サブバンドの量子化ビット数を決定す
る聴覚心理分析手段とを有する音声高能率符号化装置が
提供される。The present invention also analyzes the audio signal of each sub-band by selecting a plurality of masking amount-critical band characteristics representing a noise masking level in the frequency domain when analyzing psychoacoustics according to the applied critical band. , The number of quantization bits for each subband is determined. That is, according to the present invention, a dividing unit that divides an audio signal into subbands of a plurality of frequency bands, and an audio signal of each subband divided by the dividing unit is quantized and encoded with a variable number of quantization bits. The quantizing / encoding means for encoding and the relative critical band curve slopes are selected to be different in at least some of the respective critical bands for representing the noise masking level in the frequency domain when analyzing auditory psychology. High-efficiency speech coding with audio-acoustic analysis means for analyzing the audio signal of each sub-band by using the masking amount-critical band characteristic and determining the number of quantization bits of each sub-band of the quantization and coding means. A device is provided.

【００２７】[0027]

【作用】本発明では、各周波数領域においてサブバンド
幅以下であり、かつ聴覚心理上の臨界帯域幅以下の分析
バンド幅でオーディオ信号が分析されて各サブバンドの
量子化ビット数が決定される。したがって、量子化およ
び符号化の単位であるサブバンドと、周波数領域におけ
る聴覚心理分析（マスキングカーブの算出）の単位であ
る分析バンドとの関係を最適にするので、聴覚心理をよ
り満足させて音質を向上させることができる。In the present invention, the audio signal is analyzed with the analysis bandwidth which is less than the sub-bandwidth in each frequency domain and less than the critical psychoacoustic bandwidth to determine the number of quantization bits of each subband. . Therefore, the relationship between the subband, which is the unit of quantization and coding, and the analysis band, which is the unit of auditory psychological analysis (calculation of masking curve) in the frequency domain, is optimized, so that the auditory psychology is more satisfied and the sound quality is improved. Can be improved.

【００２８】本発明ではまた、聴覚心理を分析する際の
周波数領域のノイズマスキングレベルを表す複数のマス
キング量−臨界帯域特性が適用臨界帯域に応じて選択さ
れて各サブバンドのオーディオ信号が分析され、各サブ
バンドの量子化ビット数が決定される。したがって、適
用臨界帯域に応じて最適なマスキングカーブを選択する
ので、聴覚心理をより満足させて音質を向上させること
ができる。In the present invention, a plurality of masking amount-critical band characteristics representing a noise masking level in the frequency domain when analyzing psychoacoustic characteristics are selected according to the applied critical band, and the audio signal of each subband is analyzed. , The number of quantization bits for each subband is determined. Therefore, since the optimum masking curve is selected according to the applied critical band, the psychology of hearing can be more satisfied and the sound quality can be improved.

【００２９】[0029]

【実施例】以下、図面を参照して本発明の実施例につい
て説明する。図１は本発明に係る音声高能率符号化装置
の一実施例を示すブロック図、図２は図１の音声高能率
符号化装置の処理を説明するためのフローチャート、図
３は図１の音声高能率符号化装置の変形例を示すブロッ
ク図、図４はサブバンド幅と、臨界帯域幅と分析バンド
幅の関係を示す説明図、図５は分析バンド幅が臨界帯域
幅の１／３の場合の許容ノイズレベルＮ〔ｓ〕を示す説
明図、図６は分析バンド幅と臨界帯域幅が同一の場合の
許容ノイズレベルＮ〔ｓ〕を示す説明図である。Embodiments of the present invention will be described below with reference to the drawings. 1 is a block diagram showing an embodiment of a high-efficiency speech coding apparatus according to the present invention, FIG. 2 is a flowchart for explaining processing of the high-efficiency speech coding apparatus of FIG. 1, and FIG. 3 is a speech diagram of FIG. FIG. 4 is a block diagram showing a modified example of the high-efficiency coding apparatus, FIG. 4 is an explanatory diagram showing the relationship between the sub-bandwidth, the critical bandwidth and the analysis bandwidth, and FIG. 5 is the analysis bandwidth of 1/3 of the critical bandwidth. FIG. 6 is an explanatory diagram showing the allowable noise level N [s] in the case, and FIG. 6 is an explanatory diagram showing the allowable noise level N [s] in the case where the analysis bandwidth and the critical bandwidth are the same.

【００３０】図１に示す音声高能率符号化装置では、例
えば１６ビットＰＣＭオーディオ信号が窓掛け切り出し
部１により５１２サンプル分切り出され、各サンプルの
オーディオ信号が直交変換部２によりＤＣＴやＦＦＴ等
により直交変換され、複数のサブバンドｓに分割される
（図２のステップＳ１）。そして、聴覚心理分析部３に
より各サブバンドｓの最大値（スケール値）が決定され
るとともに（ステップＳ２）、各周波数領域における分
析バンド幅がサブバンド幅以下であり、かつ聴覚心理上
の臨界帯域幅以下、理想的には臨界帯域幅の１／２〜１
／３の幅で各サブバンドｓが聴覚分析されて、先ず各サ
ブバンドｓの許容ノイズレベルＮ〔ｓ〕が決定され（ス
テップＳ３）、次いで各サブバンドｓに必要なＳ／Ｎ比
が決定され（ステップＳ４）、次いでこのＳ／Ｎ比から
量子化ビット数が決定される（ステップＳ５）。In the high-efficiency speech coding apparatus shown in FIG. 1, for example, a 16-bit PCM audio signal is cut out by 512 samples by the windowing cutout unit 1, and the audio signal of each sample is output by the orthogonal transformation unit 2 by DCT or FFT. It is orthogonally transformed and divided into a plurality of subbands s (step S1 in FIG. 2). Then, the psychoacoustic analysis unit 3 determines the maximum value (scale value) of each subband s (step S2), and the analysis bandwidth in each frequency region is equal to or less than the subband width, and the psychoacoustic criticality. Below bandwidth, ideally 1/2 to 1 of critical bandwidth
Each sub-band s is auditory-analyzed with a width of / 3 to determine the allowable noise level N [s] of each sub-band s (step S3), and then determine the S / N ratio required for each sub-band s. (Step S4), and then the number of quantization bits is determined from this S / N ratio (step S5).

【００３１】量子化・符号化部４はこの量子化ビット数
で、直交変換部２により分割された各サブバンドｓのオ
ーディオ信号を量子化および符号化し、この量子化・符
号化部４により量子化および符号化されて圧縮されたデ
ータと、聴覚心理分析部３により決定された量子化ビッ
ト数はマルチプレックス部５により多重化されてＭＤや
ＤＣＣ等に出力される（ステップＳ６）。なお、伸長時
には圧縮データは各サブバンドｓ毎の量子化ビット数に
基づいて復号化および逆量子化される。The quantizing / encoding unit 4 quantizes and encodes the audio signal of each sub-band s divided by the orthogonal transforming unit 2 with this quantized bit number, and the quantizing / encoding unit 4 quantizes The data compressed and encoded and the number of quantization bits determined by the psychoacoustic analysis unit 3 are multiplexed by the multiplex unit 5 and output to MD, DCC or the like (step S6). During decompression, the compressed data is decoded and dequantized based on the number of quantization bits for each subband s.

【００３２】図３に示す音声高能率符号化装置では、入
力オーディオ信号がデジタルフィルタ６によりサブバン
ドｓに分割されるように構成されている。ここで、フィ
ルタバンクによるサブバンド分割方法では、本発明が必
要とする低域のバンド分解能を得ることができないの
で、図１に示す場合と同様に、切り出し部１により切り
出された各サンプルのオーディオ信号が直交変換部２に
より複数のサブバンドｓに分割され、聴覚心理分析部３
により量子化・符号化部４の量子化ビット数が決定され
ている。The high-efficiency speech coding apparatus shown in FIG. 3 is configured so that the input audio signal is divided into subbands s by the digital filter 6. Here, the sub-band division method using the filter bank cannot obtain the low-frequency band resolution required by the present invention. Therefore, as with the case shown in FIG. The signal is divided into a plurality of subbands s by the orthogonal transformation unit 2 and the psychoacoustic analysis unit 3
Is used to determine the number of quantization bits of the quantization / encoding unit 4.

【００３３】ここで、上記量子化ビット数の決定方法で
は、実際にはシステムの制約やマスキングカーブを算出
する際に割り当て可能な演算量との兼ね合いで分析バン
ドが決定される。図４は直交変換部２により分割される
サブバンド幅と、聴覚の臨界帯域幅と聴覚心理分析部３
の分析バンド幅の関係を示している。この図は低域側で
はサブバンド幅より臨界帯域幅が狭く、高域側ではサブ
バンド幅より臨界帯域幅が広い例を示し、この場合にも
分析バンド幅はサブバンド幅以下であって臨界帯域幅以
下になるようにして分析される。また、実際のマスキン
グカーブは前述した方法（１）〜（３）で算出される
が、その結果得られるサブバンドｓ毎の許容ノイズレベ
ルＮ〔ｓ〕は前述した（ａ）〜（ｃ）の問題を解決する
ように設定される。Here, in the above method of determining the number of quantization bits, the analysis band is actually determined in consideration of the constraints of the system and the amount of calculation that can be assigned when calculating the masking curve. FIG. 4 shows the sub-bandwidth divided by the orthogonal transformation unit 2, the critical auditory bandwidth, and the psychoacoustic analysis unit 3.
3 shows the relationship between the analysis bandwidths of. This figure shows an example where the critical bandwidth is narrower than the sub-bandwidth on the low frequency side and wider than the sub-bandwidth on the high frequency side. It is analyzed so that it is below the bandwidth. Further, the actual masking curve is calculated by the above-described methods (1) to (3), and the allowable noise level N [s] for each subband s obtained as a result is the same as the above-mentioned (a) to (c). Set to fix the problem.

【００３４】図５（ａ）（ｂ）は分析バンド幅を臨界帯
域幅の１／３に設定した場合を示し、また、それぞれ従
来例で説明した図１５（ｂ）（ｃ）に対応している。す
なわち、図５（ａ）は信号スペクトルがピーキー（トー
ンライク）であって信号の中心パワーがバンドの中心近
くにある場合を示し、図５（ｃ）は信号スペクトルがピ
ーキーであって信号の中心パワーがバンドの境界に近い
場合を示している。また、図の実線の折れ線はマスキン
グレベルＭ〔ｉ〕を、一点鎖線が各サブバンドｓの許容
ノイズレベルＮ〔ｓ〕を、点線が各サブバンドの真の許
容ノイズレベルＮａ〔ｓ〕を示している。図５と図１５
を比較すると明らかなように、本実施例では許容ノイズ
レベルＮ〔ｓ〕と真の許容ノイズレベルＮａ〔ｓ〕の差
が少なくなり、したがて、上記（ｂ）の問題を解決する
ことができる。FIGS. 5 (a) and 5 (b) show the case where the analysis bandwidth is set to 1/3 of the critical bandwidth, and each corresponds to FIGS. 15 (b) and 15 (c) described in the conventional example. There is. That is, FIG. 5A shows the case where the signal spectrum is peaky (tone-like) and the center power of the signal is near the center of the band, and FIG. 5C shows the case where the signal spectrum is peaky and the center of the signal. It shows the case where the power is close to the band boundary. The solid broken line in the figure indicates the masking level M [i], the alternate long and short dash line indicates the allowable noise level N [s] of each subband s, and the dotted line indicates the true allowable noise level Na [s] of each subband. ing. 5 and 15
As is clear from the comparison of the above, in the present embodiment, the difference between the allowable noise level N [s] and the true allowable noise level Na [s] becomes small, and therefore the problem (b) above can be solved. it can.

【００３５】図６は分析バンド幅をサブバンド幅と等し
くした場合を示し、分析バンド幅がサブバンド幅より広
い図１６に対応している。図６と図１６を比較すると明
らかなように、図６においては各サブバンドｓの許容ノ
イズレベルＮ〔ｓ〕が真の許容ノイズレベルＮａ〔ｓ〕
に近くなり、したがって、上記（ｃ）の問題すなわち周
波数の分解能の問題を解決することができる。FIG. 6 shows the case where the analysis band width is made equal to the sub band width, and corresponds to FIG. 16 in which the analysis band width is wider than the sub band width. As is clear from comparison between FIG. 6 and FIG. 16, in FIG. 6, the allowable noise level N [s] of each subband s is the true allowable noise level Na [s].
Therefore, it is possible to solve the problem (c), that is, the problem of frequency resolution.

【００３６】したがって、上記実施例によれば、量子化
および符号化の単位であるサブバンドと、周波数領域に
おける聴覚心理分析（マスキングカーブの算出）の単位
である分析バンドとの関係を最適にするので、許容ノイ
ズレベルＮ〔ｓ〕の評価精度を向上させることができ、
聴覚心理特性を利用してエンコードした場合に音質を向
上させることができる。具体的には精度の向上幅は平均
的には１〜３ｄＢ、大きい場合には６ｄＢ以上になるこ
とも想定される。また、量子化器のＳ／Ｎ比が一般的に
は６ｄＢステップであることを考えるとこの向上幅は大
きな値である。Therefore, according to the above embodiment, the relationship between the subband, which is a unit of quantization and coding, and the analysis band, which is a unit of psychoacoustic analysis (calculation of masking curve) in the frequency domain, is optimized. Therefore, the evaluation accuracy of the allowable noise level N [s] can be improved,
It is possible to improve the sound quality when encoded using the psychoacoustic characteristics. Specifically, it is assumed that the improvement range of accuracy is 1 to 3 dB on average, and 6 dB or more when it is large. Further, considering that the S / N ratio of the quantizer is generally 6 dB steps, this improvement is a large value.

【００３７】本発明の第２実施例について図７〜図９を
参照して説明する。図７は臨界帯域幅とイクイバレント
レクタンギュラノイズバンドウィドゥス（EquivalentRe
ctangular noise Bandwidth）（以下、単にＥＲＢと略
す）を比較した説明図、図８はマスキング量−臨界帯域
特性を示す説明図、図９はマスキングカーブ毎のＳ／Ｎ
比特性を示す説明図である。この第２の実施例では聴覚
心理分析で用いられるマスキングカーブのマスキング量
−臨界帯域特性を一定にせず、各臨界帯域に応じて適切
に変更するように構成されている。A second embodiment of the present invention will be described with reference to FIGS. Figure 7 shows the critical bandwidth and the equivalent noise band Widus.
ctangular noise Bandwidth) (hereinafter, simply abbreviated as ERB), FIG. 8 is an explanatory view showing a masking amount-critical band characteristic, and FIG. 9 is an S / N for each masking curve.
It is explanatory drawing which shows a ratio characteristic. In the second embodiment, the masking amount-critical band characteristic of the masking curve used in the psychoacoustic analysis is not fixed, but is appropriately changed according to each critical band.

【００３８】ここで、最近の聴覚心理の研究によれば、
従来言われていた臨界帯域よりもＥＲＢの方が解剖学的
にも聴覚分析バンドとして相応しいと言われている。図
７は横軸に周波数（log）、縦軸にバンド幅を採って臨
界帯域幅とＥＲＢを比較しており、特に４００Ｈｚ以下
の低域で臨界帯域幅とＥＲＢの差が大きく、１００Ｈｚ
以下では臨界帯域幅のバンド幅が３〜４倍程度広い。言
い換えると、マスキング量−臨界帯域特性とマスキング
量−ＥＲＢ特性がほぼ一致するとき、また、何れも周波
数領域に依らず一定としたときには、マスキング量−Ｅ
ＲＢ特性の傾斜の方がマスキング量−臨界帯域特性の傾
斜より非常に大きいと言うことができる。Here, according to the recent research on auditory psychology,
It is said that ERB is anatomically more suitable as an auditory analysis band than the critical band that has been conventionally said. FIG. 7 compares the critical bandwidth and the ERB by taking the frequency (log) on the horizontal axis and the bandwidth on the vertical axis, and in particular, the difference between the critical bandwidth and the ERB is large at a low frequency of 400 Hz or less, 100 Hz.
Below, the critical bandwidth is about 3 to 4 times wider. In other words, when the masking amount-critical band characteristic and the masking amount-ERB characteristic are almost the same, and when both are constant regardless of the frequency domain, the masking amount-E
It can be said that the slope of the RB characteristic is much larger than the slope of the masking amount-critical band characteristic.

【００３９】したがって、臨界帯域を横軸に考える場合
にはマスキングカーブを臨界帯域に関して一定ではな
く、可変とするか、またはマスキングカーブを臨界帯域
以外の値に関して一定とすることにより再生音質が良好
となることが予測され、実際例えば図８に示すように傾
斜が異なる３つの（一般には複数の）マスキング量−臨
界帯域特性Ａ〜Ｃを準備し、周波数帯域に応じて変更す
ることにより音質を向上させることができた。Therefore, when considering the critical band on the horizontal axis, the reproduced sound quality is good by making the masking curve variable with respect to the critical band rather than being constant, or by making the masking curve constant with respect to values outside the critical band. It is expected that, for example, three (generally a plurality of) masking amount-critical band characteristics A to C having different slopes are prepared as shown in FIG. 8, and the sound quality is improved by changing them according to the frequency band. I was able to do it.

【００４０】図９は図８において４００Ｈｚ以下で傾斜
が急なマスキングカーブに切り替えた場合とマスキング
カーブが一定の場合のＳ／Ｎ比を示し、４００Ｈｚ以下
でＳ／Ｎ比が大きく上昇している。図１８（ａ）（ｂ）
において説明したように、マスキングカーブが狭くなる
とその領域では許容ノイズレベルＮ〔ｓ〕が下がり、し
たがって、信号レベルＳが同一の場合にはＳ／Ｎ比は向
上する。なお、その他の領域に割り当てられる情報量が
減少するのでその領域ではＳ／Ｎ比は若干劣化するもの
の、聴感上では低域の音質改善効果の方が大きい。FIG. 9 shows the S / N ratio when the masking curve is switched to a steep slope at 400 Hz or less in FIG. 8 and when the masking curve is constant, and the S / N ratio greatly rises at 400 Hz or less. . 18 (a) (b)
As described above, when the masking curve becomes narrow, the allowable noise level N [s] decreases in that region, and therefore, when the signal level S is the same, the S / N ratio improves. Since the amount of information assigned to other areas decreases, the S / N ratio slightly deteriorates in those areas, but the sound quality improvement effect in the low frequency range is larger in terms of hearing.

【００４１】[0041]

【発明の効果】以上説明したように本発明によれば、各
周波数領域においてサブバンド幅以下であり、かつ聴覚
心理上の臨界帯域幅以下に設定されると共に少なくとも
低域において帯域幅をこれらに比べ十分狭く設定するよ
うな周波数に依存した分析バンド幅、例えば臨界帯域幅
の１／２から１／３に細分化した分析バンド幅でオーデ
ィオ信号を分析し、各サブバンドの量子化ビット数を決
定するので、量子化および符号化の単位であるサブバン
ドと、周波数領域における聴覚心理分析（マスキングカ
ーブの算出）の単位である分析バンドとの関係を最適に
することができ、したがって、聴覚心理をより満足させ
て音質を向上させることができる。As described above, according to the present invention, the sub-bandwidth is set to be less than or equal to the sub-bandwidth in each frequency range, and is set to be less than or equal to the critical bandwidth in psychoacoustics, and the bandwidth is set at least in these low frequencies. Compare the audio signal with a frequency-dependent analysis bandwidth that is set sufficiently narrow, for example, an analysis bandwidth that is subdivided from 1/2 to 1/3 of the critical bandwidth, and determine the number of quantization bits for each subband. Since it is determined, the relationship between the subband, which is the unit of quantization and coding, and the analysis band, which is the unit of auditory psychological analysis (calculation of the masking curve) in the frequency domain, can be optimized. Can be more satisfied and the sound quality can be improved.

【００４２】また本発明によれば、聴覚心理を分析する
際の周波数領域のノイズマスキングレベルを表すため
の、少なくとも例えば１ｋＨｚ以下の低域において相対
臨界帯域カーブの傾斜が大きくなるよう設定される如く
に各別の臨界帯域において相対臨界帯域カーブの傾斜が
異なるよう選定されたマスキング量−臨界帯域特性を用
いて各サブバンドのオーディオ信号を分析して各サブバ
ンドの量子化ビット数を決定するので、適用臨界帯域に
応じて最適なマスキングカーブを選択することができ、
したがって、聴覚心理をより満足させて音質を向上させ
ることができる。According to the present invention, the slope of the relative critical band curve is set to be large at least in the low range of 1 kHz or less for representing the noise masking level in the frequency domain when analyzing the psychology of hearing. Since the audio signal of each subband is analyzed using the masking amount-critical band characteristics selected so that the slopes of the relative critical band curves are different in different critical bands, the number of quantization bits of each subband is determined. , You can select the optimal masking curve according to the applied critical band,
Therefore, the psychology of hearing can be more satisfied and the sound quality can be improved.

[Brief description of drawings]

【図１】本発明に係る音声高能率符号化装置の第１実施
例を示すブロック図である。FIG. 1 is a block diagram showing a first embodiment of a speech efficient coding apparatus according to the present invention.

【図２】図１の音声高能率符号化処理を説明するための
フローチャートである。FIG. 2 is a flowchart for explaining the high-efficiency speech coding processing of FIG.

【図３】図１の音声高能率符号化装置の変形例を示すブ
ロック図である。FIG. 3 is a block diagram showing a modified example of the high-efficiency speech coding apparatus of FIG.

【図４】サブバンド幅と、臨界帯域幅と分析バンド幅の
関係を示す説明図である。FIG. 4 is an explanatory diagram showing a relationship between a sub-bandwidth, a critical bandwidth and an analysis bandwidth.

【図５】分析バンド幅が臨界帯域幅の１／３の場合の許
容ノイズレベルＮ〔ｓ〕を示す説明図である。FIG. 5 is an explanatory diagram showing an allowable noise level N [s] when the analysis bandwidth is 1/3 of the critical bandwidth.

【図６】分析バンド幅と臨界帯域幅が同一の場合の許容
ノイズレベルＮ〔ｓ〕を示す説明図である。FIG. 6 is an explanatory diagram showing an allowable noise level N [s] when the analysis bandwidth and the critical bandwidth are the same.

【図７】第２実施例において臨界帯域幅とＥＲＢを比較
した説明図である。FIG. 7 is an explanatory diagram comparing the critical bandwidth with the ERB in the second embodiment.

【図８】第２実施例のマスキング量−臨界帯域特性を示
す説明図である。FIG. 8 is an explanatory diagram showing a masking amount-critical band characteristic of the second embodiment.

【図９】マスキングカーブ毎のＳ／Ｎ比特性を示す説明
図である。FIG. 9 is an explanatory diagram showing S / N ratio characteristics for each masking curve.

【図１０】音声高能率符号化方法を模式的に示す説明図
である。FIG. 10 is an explanatory diagram schematically showing a high-efficiency voice encoding method.

【図１１】各種周波数スペクトルにおけるマスキングカ
ーブの一例を示す説明図である。FIG. 11 is an explanatory diagram showing an example of a masking curve in various frequency spectra.

【図１２】図１１の横軸の周波数を臨界帯域に置き換え
たマスキングカーブを示す説明図である。12 is an explanatory diagram showing a masking curve in which the frequency on the horizontal axis of FIG. 11 is replaced with a critical band.

【図１３】２５バンドの臨界帯域幅を示す説明図であ
る。FIG. 13 is an explanatory diagram showing a critical bandwidth of 25 bands.

【図１４】マスキング基準カーブを示す説明図である。FIG. 14 is an explanatory diagram showing a masking reference curve.

【図１５】分析バンドとサブバンドが等しい場合の従来
の許容ノイズレベルＮ〔ｓ〕と真の許容ノイズレベルＮ
ａ〔ｓ〕を示す説明図である。FIG. 15 is a conventional allowable noise level N [s] and a true allowable noise level N when the analysis band and the subband are equal.
It is explanatory drawing which shows a [s].

【図１６】分析バンドよりサブバンドが広い場合の従来
の許容ノイズレベルＮ〔ｓ〕と真の許容ノイズレベルＮ
ａ〔ｓ〕を示す説明図である。FIG. 16 is a conventional allowable noise level N [s] and a true allowable noise level N when the subband is wider than the analysis band.
It is explanatory drawing which shows a [s].

【図１７】従来のマスキング量−臨界帯域特性を示す説
明図である。FIG. 17 is an explanatory diagram showing a conventional masking amount-critical band characteristic.

【図１８】図１７に示すマスキングカーブの傾斜と許容
ノイズレベルＮ〔ｓ〕の関係を示す説明図である。FIG. 18 is an explanatory diagram showing the relationship between the slope of the masking curve shown in FIG. 17 and the allowable noise level N [s].

[Explanation of symbols]

１窓掛け切り出し部２直交変換部（分割手段）３聴感心理分析部（聴感心理分析手段）４量子化および符号化部（量子化・符号化手段）５マルチプレックス部６サブバンドフィルタ部 DESCRIPTION OF SYMBOLS 1 Window cut-out section 2 Orthogonal transformation section (dividing means) 3 Perceptual psychological analysis section (perceptual psychological analysis means) 4 Quantization and encoding section (quantization / encoding means) 5 Multiplex section 6 Subband filter section

Claims

[Claims]

1. A dividing unit for dividing an audio signal into subbands of a plurality of frequency bands, and a quantum for quantizing and encoding the audio signal of each subband divided by the dividing unit with a variable number of quantization bits. Encoding / encoding means and a frequency that is less than or equal to the subband width in each frequency region and less than or equal to the critical psychoacoustic bandwidth and that sets the bandwidth sufficiently narrower than those at least in the low frequency range. A high-efficiency speech coding apparatus having a psychoacoustic analysis means for analyzing an audio signal with a dependent analysis bandwidth and determining the number of quantization bits of each subband of the quantization and coding means.

2. A dividing unit for dividing an audio signal into subbands of a plurality of frequency bands, and a quantum for quantizing and encoding the audio signal of each subband divided by the dividing unit with a variable number of quantization bits. Encoding / encoding means, and a masking amount selected so that the slope of the relative critical band curve is different in at least some of the respective critical bands for representing the noise masking level in the frequency domain when analyzing the psychology of hearing. A high-efficiency speech coding apparatus having a psychoacoustic analysis means for analyzing an audio signal of each subband by using a critical band characteristic and determining the number of quantization bits of each subband of the quantization and coding means.