JPH08167878A

JPH08167878A - Digital audio signal coding device

Info

Publication number: JPH08167878A
Application number: JP6310956A
Authority: JP
Inventors: Oopen Tan; タン・オーペン; Sua Hon Neo; ネオ・スア・ホン; Ken Riyoon Nu; ヌ・ケン・リョーン; Nun Fuatsuto Kee; ケー・ヌン・ファット
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1994-12-14
Filing date: 1994-12-14
Publication date: 1996-06-25

Abstract

PURPOSE: To provide a digital audio signal coding device which can code the digital audio signals without deteriorating the quality of these signals. CONSTITUTION: The characteristic of an audio signal is analyzed in each frame by a dynamic bit allocation means 2.4 based on the peak energy distribution and the pure sound properties of the audio signal. Based on this analysis result, it is decided to use a non-itelligent bit allocation means 2.5 which allocates the bits based on only the signal energy or an intelligent bit allocation means 2.6. The high audio quality can be secured by means of the means 2.4, and also the measurement performance can be secured for the extremely optimum THD+N (total harmonic distortion plus noise) with extremely low compressibility.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、デジタルオーディオ信
号の符号化装置に関するものであり、例えば、伝送また
はデジタル記憶媒体などに用いられる。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a digital audio signal encoding apparatus, which is used, for example, in transmission or digital storage media.

【０００２】[0002]

【従来の技術】デジタルオーディオ信号の圧縮アルゴリ
ズムの普及はめざましく、消費者製品に広く浸透してき
ている。このような圧縮アルゴリズムは、２０kHz帯域
幅のデジタルオーディオ信号の符号化に使用されてい
る。たとえば、ディジタルコンパクトカセット（ＤＣ
Ｃ）、ミニディスク（ＭＤ）、そしてビデオコンパクト
ディスク（ビデオＣＤ）用の製品に使用されている。こ
のアルゴリズムに使用されている動的なビット割り当て
方法は、オーディオの品質を高品位に維持するのに役だ
っている。2. Description of the Related Art Digital audio signal compression algorithms have been remarkably popularized and are widely used in consumer products. Such compression algorithms are used to encode digital audio signals in the 20 kHz bandwidth. For example, digital compact cassette (DC
C), mini discs (MD), and video compact discs (video CD). The dynamic bit allocation method used in this algorithm helps to maintain high quality audio.

【０００３】図４に、従来のデジタルオーディオ信号符
号化装置の一般的な構成図を示す。すなわち、通常３２
kHz、４４．１kHz、または４８kHzでサンプリングされ
たデジタルオーディオ信号は、まず周波数変換手段３．
１にて、修正ディスクリートコサイン変換、サブバンド
炉波器解析、Ｗａｖｅｌｅｔ変換またはその変形の数学
的変換を受け、変換出力係数が出力される。前記変形に
は、サブバンド炉波器解析と修正ディスクリートコサイ
ン変換とのハイブリッド、あるいは各修正ディスクリー
トコサイン変換の異なるサイズ間の切り替えによるもの
が存在している。その後、前記変換出力係数は、正規
化手段３．２および量子化手段３．３にて、正規化およ
び量子化される。その量子化は線形または非線形のどち
らでもよい。各変換係数または変換係数のブロックを量
子化するのに割り当てられるビット数は、知覚的ビット
割り当て手段３．４にて決定される。FIG. 4 shows a general block diagram of a conventional digital audio signal encoding apparatus. Ie 32
The digital audio signal sampled at kHz, 44.1 kHz, or 48 kHz is first converted into frequency conversion means 3.
At 1, the modified discrete cosine transform, the subband reactor wave analysis, the Wavelet transform, or a mathematical transform of its modification is received, and the transform output coefficient is output. The variants include hybrids of sub-band reactor wave analysis and modified discrete cosine transforms, or switching between different sizes of each modified discrete cosine transform. Then, the transform output coefficient is normalized and quantized by the normalizing means 3.2 and the quantizing means 3.3. The quantization can be either linear or non-linear. The number of bits allocated to quantize each transform coefficient or block of transform coefficients is determined by the perceptual bit allocation means 3.4.

【０００４】前記知覚的ビット割り当て手段は、デジタ
ルオーディオ信号内の冗長なおよび無関係な内容を、オ
ーディオの品質が低下しない方法で除去するものであ
る。これを効果的に行なうため、人間の耳が音を如何に
知覚するかという音響心理学的経験に基づく知識が利用
され、この知識に基づく数学的モデルが構成されてい
る。この数学的モデルを採用したビット割り当て方法を
知覚的ビット割り当て方法という。この知覚的ビット割
り当て方法の例としては、ＩＳＯ／ＩＥＣ１１１７２
−３の文書の音響心理学的モデルＩおよびモデルＩＩが
ある。The perceptual bit allocation means removes redundant and irrelevant content in the digital audio signal in such a way that audio quality is not degraded. To do this effectively, knowledge based on psychoacoustic experience of how the human ear perceives sound is used, and a mathematical model based on this knowledge is constructed. The bit allocation method that employs this mathematical model is called the perceptual bit allocation method. An example of this perceptual bit allocation method is ISO / IEC 11172.
-3 document psychoacoustic models I and II.

【０００５】前記音響心理学的モデルを用いた符号化装
置では、デジタルオーディオ信号のフレームまたはブロ
ックは、通常、高速フーリエ変換（ＦＦＴ）により、ま
ず細かい分解能のスペクトル成分に変換される。前記ス
ペクトル成分は、臨界帯域に密接に比例するサブバンド
のサイズに組分けされ解析される。ここで、臨界帯域と
は、周波数選択性・マスキング閾値等の特定の音響心理
学的規則性が有効な広帯域オーディオスペクトルの特性
的部分をいう。In the coding apparatus using the psychoacoustic model, a frame or block of a digital audio signal is first converted into a spectral component having a fine resolution, usually by a fast Fourier transform (FFT). The spectral components are grouped into subband sizes that are closely proportional to the critical band and analyzed. Here, the critical band refers to a characteristic portion of a wideband audio spectrum in which specific psychoacoustic regularity such as frequency selectivity / masking threshold is effective.

【０００６】純音性および雑音のマスキングは、前記ス
ペクトル成分から推定される。純音性マスキングおよび
雑音マスキングによる同時マスキングの特性は、各およ
び全スペクトル成分の周波数でのマスキング閾値を得る
ためにモデル化されている。なお、そのマスキング閾値
は、計算の複雑さを減らすため、臨界帯域ごとに減らさ
れた周波数の数で見積もることもある。サブバンドに対
する信号対マスキング閾値の比は、各々が多数のスペク
トル成分を含んでいるが、連続して計算することがで
き、反復ビット割り当て手段で使用することができる。Pure tone and noise masking are estimated from the spectral components. The characteristics of simultaneous masking with pure tone masking and noise masking have been modeled to obtain a masking threshold at each and every spectral component frequency. Note that the masking threshold may be estimated by the number of frequencies reduced for each critical band in order to reduce calculation complexity. The signal-to-masking threshold ratio for the subbands, each containing a large number of spectral components, can be calculated in succession and used in an iterative bit allocation means.

【０００７】前記符号化装置は、雑音のないオーディオ
の品質を実現するのに役立つ。その復号化された信号
は、約８倍までの圧縮率に対して、元の信号と区別する
ことができないくらい品質の高いものである。その圧縮
率は、復号ビットストリームのビットレートに対する入
力信号のビットレートの比で表される。例えば、４４．
１kHzでサンプルされたデジタルオーディオ信号を処理
する場合、７０５．６kbits/second/channelのビットレ
ートに対して、ＤＣＣでは１９２Kbits/second/channe
l、ＭＤでは１４６．０８kbits/second/channel、そし
てビデオＣＤのオーディオでは１１２kbits/second/cha
nnelに効果的に圧縮され、圧縮率は、それぞれ３．６７
５、４．８３、そして６．３倍となる。The coding device serves to achieve a noise-free audio quality. The decoded signal is of such high quality that it is indistinguishable from the original signal for compression rates up to about 8 times. The compression rate is represented by the ratio of the bit rate of the input signal to the bit rate of the decoded bit stream. For example, 44.
When processing a digital audio signal sampled at 1 kHz, the bit rate is 705.6 kbits / second / channel, while DCC is 192 Kbits / second / channe.
l, MD: 146.08 kbits / second / channel, and Video CD audio: 112 kbits / second / cha
It is effectively compressed into nnel and the compression rate is 3.67 each.
5, 4.83, and 6.3 times.

【０００８】[0008]

【発明が解決しようとする課題】圧縮率が５未満の知覚
的ビット割り当て手段を備えたデジタルオーディオ符号
化装置では、雑音のない高い品質のみならず、比較的良
好な全高調波ひずみプラス雑音（ＴＨＤ＋Ｎ）の測定結
果を示す。（ＴＨＤ＋Ｎ測定値は、広帯域雑音、ＡＣ電
源関連のハム、そして測定の行なわれる周波数の下およ
び上の何らかの他の妨害信号の総和を指す。）これは、
多量のビットを必要とする大きい値の係数の個々または
ブロックが、品質を高く維持するための、十分なビット
を利用することができることによる。このことは、多量
のビットを必要とする個々またはブロックの高付加係数
において、ビットを十分に確保することが、高い品質を
実現させるのに有効であることを示している。これはビ
ット割り当ての判定基準が、信号エネルギーではなく信
号対マスキング閾値比に基づいていることに関わらな
い。しかし、圧縮率が増大するにつれて、スペクトル変
換係数の量子化に利用できるビットの総数は少なくな
る。それでもなお、良好なオーディオの品質を達成する
ことができるが、ＴＨＤ＋Ｎの測定値はかなり低下して
しまう。知覚的ビット割り当て手段を備えた符号化装置
の音質の評価は、主観的感性に基づく試聴を主な評価基
準とすべきではあるが、ＴＨＤ＋Ｎの測定値はなお符号
化の性能の重要な品質を確認する指標として扱われてい
る。これは、プロフェッショナルオーディオまたはスタ
ジオの用途について特に言いえることである。In a digital audio coding apparatus having a perceptual bit allocation means with a compression rate of less than 5, not only high noise-free quality but also relatively good total harmonic distortion plus noise ( The measurement result of THD + N) is shown. (The THD + N measurement refers to the sum of broadband noise, AC power related hum, and some other disturbing signal below and above the frequency at which the measurement is made.)
This is because an individual or block of large valued coefficients that requires a large number of bits can utilize sufficient bits to maintain high quality. This shows that securing sufficient bits is effective in achieving high quality in high addition coefficients of individual or blocks that require a large amount of bits. This is independent of the bit allocation criteria being based on the signal-to-masking threshold ratio rather than the signal energy. However, as the compression ratio increases, the total number of bits available for quantizing the spectral transform coefficients decreases. Nevertheless, good audio quality can be achieved, but the THD + N measurement is significantly reduced. Although the evaluation of the sound quality of the encoding device provided with the perceptual bit allocation means should be based on the subjective sensation audition as the main evaluation criterion, the measured value of THD + N still indicates an important quality of the encoding performance. It is treated as an index to confirm. This is especially true for professional audio or studio applications.

【０００９】高い圧縮率の場合、知覚的ビット割り当て
手段に用いられている信号対マスキング閾値の比につい
ての判定基準を、必要なビット単位に割り当てることが
できない場合が多々あり、結果として、良好なＴＨＤ＋
Ｎの測定結果を確保することができなくなる。ここで、
知覚的ビット割り当て手段を備えたビデオＣＤのオーデ
オ信号のデジタル符号化に対して測定された一連のＴＨ
Ｄ＋Ｎの測定結果を表１に示す。At high compression ratios, the criterion for the signal-to-masking threshold ratio used by the perceptual bit allocation means is often not allocable to the required bit units, resulting in good results. THD +
It becomes impossible to secure the measurement result of N. here,
A series of THs measured for digital encoding of audio signals of video CDs with perceptual bit allocation means
The measurement results of D + N are shown in Table 1.

【００１０】[0010]

【表１】 [Table 1]

【００１１】これより高いＴＨＤ＋Ｎの測定値は、非臨
界オーディオ信号のクラスに属するＴＨＤ＋Ｎの測定に
使用される入力試験信号として論じられるべきである。Higher THD + N measurements should be discussed as input test signals used to measure THD + N belonging to the class of non-critical audio signals.

【００１２】図５は、テストトーンの周波数係数を示
す。（これを図６の任意のオーディオ信号の周波数スペ
クトルと比較すること。）このクラスの非臨界信号は、
少数の単位で高いエネルギーレベルを有し、他の単位で
非常に低いエネルギーを有している。エネルギーが更に
多くの単位を横断して広がっているが、高品質を確保す
るのに知覚的ビット割り当てを必要としない非臨界信号
も存在する。このような非臨界信号は、オーディオの高
品質を確保するための符号化に使用される高ビットレー
トを必要としない、と同時に、良好なＴＨＤ＋Ｎの測定
値を得ることができる。FIG. 5 shows the frequency coefficient of the test tone. (Compare this to the frequency spectrum of any audio signal in FIG. 6.) This class of non-critical signals is
It has high energy levels in a few units and very low energies in other units. Although the energy is spread across more units, there are also non-critical signals that do not require perceptual bit allocation to ensure high quality. Such a non-critical signal does not require the high bit rate used for coding to ensure a high quality of audio, while at the same time obtaining good THD + N measurements.

【００１３】本発明は、このような従来技術の課題を考
慮し、知覚的ビット割り当て手段または非知覚的ビット
割り当て手段のどちらを使用するかを決定する動的ビッ
ト割り当て手段を備えたデジタルオーディオ信号符号化
装置を提供することを目的とするものである。In view of the above problems of the prior art, the present invention provides a digital audio signal having a dynamic bit allocation means for deciding whether to use the perceptual bit allocation means or the non-perceptual bit allocation means. It is an object of the present invention to provide an encoding device.

【００１４】[0014]

【課題を解決するための手段】請求項１の本発明は、デ
ジタルオーディオ信号の各フレームをスペクトル成分に
周波数変換する周波数変換手段と、前記スペクトル成分
を所定の帯域幅または単位に組分けする組分け手段と、
前記帯域幅または単位を人間の聴覚特性に基づきビット
割り当てを行う知覚的ビット割り当て手段と、前記帯域
幅または単位を信号エネルギーに基づき動的ビット割り
当てを行う非知覚的ビット割り当て手段と、前記知覚的
ビット割り当て手段または前記非知覚的ビット割り当て
手段のどちらを使用するかを、所定の基準に従って決定
する動的ビット割り当て手段とを備えたことを特徴とす
るデジタルオーディオ信号符号化装置である。According to the present invention of claim 1, there is provided a frequency conversion means for frequency-converting each frame of a digital audio signal into a spectrum component, and a group for grouping the spectrum component into a predetermined bandwidth or unit. A means of division,
Perceptual bit allocation means for allocating bits or bandwidths based on human auditory characteristics; non-perceptual bit allocation means for dynamic or dynamical allocation of bandwidths or units based on signal energy; A digital audio signal coding apparatus, comprising: a dynamic bit allocating unit that determines whether to use the bit allocating unit or the non-perceptual bit allocating unit according to a predetermined standard.

【００１５】[0015]

【作用】オーディオ信号のフレームを変換する周波数変
換手段は、高速フーリエ変換、修正ディスクリートコサ
イン変換、またはＷａｖｅｌｅｔ変換のような数学的フ
ィルター変換により、スペクトル成分に変換して出力す
る。これらスペクトル成分には高分解能が不可欠である
ため、変換に使用されるポイントの数は通常５１２かそ
れ以上である。The frequency conversion means for converting the frame of the audio signal converts it into a spectral component by a mathematical filter conversion such as a fast Fourier transform, a modified discrete cosine transform, or a wavelet transform, and outputs it. Since high resolution is essential for these spectral components, the number of points used for conversion is typically 512 or more.

【００１６】前記スペクトル成分は、一様なサブバンド
符号化のように、一様な単位の大きさに組分けされる必
要がある。あるいは、一様でないサブバンド符号化また
はＷａｖｅｌｅｔ符号化のように、異なる長さの単位の
大きさに組み分けされる必要がある。The spectral components need to be grouped into uniform unit sizes, such as uniform subband coding. Alternatively, it needs to be grouped into units of different lengths, such as non-uniform subband coding or Wavelet coding.

【００１７】各々およびすべての単位の含まれるスペク
トル成分のピークが抽出され、一組の閾値と比較され
る。これら閾値は、しばしばＱｕｉｅｔモードの閾値が
設定される。あるいは、簡単のため、Ｑｕｉｅｔモード
の閾値の組のうち最小に近い一定値に設定されることも
ある。前記閾値を上回るピークを有する単位の数は、検
査されたオーディオ信号のエネルギーの見積値を表す。The peaks of the contained spectral components of each and every unit are extracted and compared to a set of thresholds. These thresholds are often set in the Quiet mode. Alternatively, for simplicity, it may be set to a constant value close to the minimum of the set of threshold values in the Quiet mode. The number of units with peaks above said threshold represents an estimate of the energy of the examined audio signal.

【００１８】次に、前記閾値を上回るピークに対する単
位の純音性指標が計算される。この指標の計算は、より
低いリミットを越えるスペクトル成分の数に関する単位
に対してのみ可能である。その指標は、１または０のよ
うなバイナリか、もしくは分数のどちらでもよい。分数
は、その単位に対するスペクトルの平坦度の尺度を計算
することにより得られる。純音性指標は、その単位のス
ペクトル特性が純音性であるか否かを示す。Next, the unit pureness index for peaks above the threshold is calculated. The calculation of this index is only possible for units relating to the number of spectral components above the lower limit. The index may be either binary, such as 1 or 0, or a fraction. Fractions are obtained by calculating a measure of spectral flatness for that unit. The pure tone index indicates whether the spectral characteristic of the unit is pure tone.

【００１９】更に強いマスキング効果が、純音性である
単位から得られる。ピークがＱｕｉｅｔモードの閾値を
上回るスペクトル成分の単位の数または割合から、非臨
界オーディオ信号であるかどうかを識別することができ
る。この単位の数が所定値より低ければ、オーディオ信
号は非常に少ない単位にエネルギーを含んでいることを
示している。たとえ、スペクトルエネルギーが高くて
も、ビット割当てに対する要求は必ずしも高くならな
い。単位の数が前記所定値を上回り、広帯域と見做され
る信号を越えたレベルより低ければ、ビット割当てに対
して見積もられる要求に関する更なる検査が、純音性指
標の計算により行なわれる。純音性指標の和が所定閾値
より低ければ、ビット割当てに対する要求は小さく、オ
ーディオ信号のフレームを非臨界と見做すことができ
る。非臨界と識別されたオーディオ信号のフレームに対
しては、信号エネルギーに基づく従来の動的ビット割当
てが適用される。その他の場合には、知覚的ビット割当
てが適用される。A stronger masking effect is obtained from the units that are pure tones. From the number or percentage of units of the spectral component where the peak is above the Quiet mode threshold, it is possible to identify whether it is a non-critical audio signal. If this number of units is lower than a predetermined value, it indicates that the audio signal contains energy in very few units. Even with high spectral energy, the demand for bit allocation is not necessarily high. If the number of units exceeds the predetermined value and is below the level above which signals are considered wideband, a further check on the estimated requirements for bit allocation is made by the calculation of the pure tone index. If the sum of the pure tone indexes is lower than a predetermined threshold, the requirement for bit allocation is small and the frame of the audio signal can be considered non-critical. For frames of the audio signal identified as non-critical, conventional dynamic bit allocation based on signal energy is applied. In other cases, perceptual bit allocation applies.

【００２０】なお、前記単位は帯域幅であってもよい。The unit may be bandwidth.

【００２１】[0021]

【実施例】以下に、本発明をその実施例を示す図に基づ
いて説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below with reference to the drawings showing its embodiments.

【００２２】図３は、本発明のデジタルオーディオ信号
符号化装置にかかる１実施例の構成図である。すなわ
ち、周波数変換手段２．１は、デジタルオーディオ信号
の各フレームをスペクトル成分に周波数変換するもので
ある。正規化手段２．２は、所定の帯域幅または単位に
組分けされた前記スペクトル成分の値を±１以内の範囲
に正規化を行うものである。量子化手段２．３は、一般
的な一様な線形量子化を行うものである。非知覚的ビッ
ト割り当て手段２．５は、前記帯域幅または単位を信号
エネルギーに基づき動的ビット割り当てを行うものであ
る。知覚的ビット割り当て手段２．６は、前記帯域幅ま
たは単位を人間の聴覚特性に基づきビット割り当てを行
うものである。動的ビット割り当て手段２．４は、前記
非知覚的ビット割り当て手段または前記知覚的ビット割
り当て手段のどちらを行うかを決定をするものである。
反復ビット割り当て手段２．７は、前記動的ビット割り
当て手段の決定に基づき割り当てを行うものである。FIG. 3 is a block diagram of an embodiment of the digital audio signal encoding apparatus of the present invention. That is, the frequency conversion means 2.1 frequency-converts each frame of the digital audio signal into spectral components. The normalizing means 2.2 normalizes the values of the spectral components grouped into a predetermined bandwidth or unit within a range of ± 1. The quantizing means 2.3 performs general uniform linear quantization. The non-perceptual bit allocation means 2.5 performs dynamic bit allocation based on signal energy for the bandwidth or unit. The perceptual bit allocation means 2.6 allocates bits to the bandwidth or unit based on human auditory characteristics. The dynamic bit allocation means 2.4 is for deciding whether to perform the non-perceptual bit allocation means or the perceptual bit allocation means.
The repetitive bit allocation means 2.7 performs allocation based on the determination of the dynamic bit allocation means.

【００２３】次に、上記実施例の動作について説明す
る。Next, the operation of the above embodiment will be described.

【００２４】デジタルオーディオ信号Ｓ２．１のフレー
ムが周波数変換手段２．１に入力されると、高速フーリ
エ変換、修正ディスクリートコサイン変換、またはＷａ
ｖｅｌｅｔ変換のような数学的フィルター変換により、
スペクトル成分に変換されて、スペクトル成分Ｓ２．２
として出力される。前記スペクトル成分には高分解能が
不可欠であるため、前記周波数変換に使用されるポイン
ト数は通常５１２以上である。When the frame of the digital audio signal S2.1 is input to the frequency conversion means 2.1, the fast Fourier transform, the modified discrete cosine transform, or the Wa
By a mathematical filter transformation such as the velet transformation,
The spectrum component is converted into the spectrum component S2.2.
Is output as Since high resolution is indispensable for the spectral components, the number of points used for the frequency conversion is usually 512 or more.

【００２５】正規化および量子化の前に、前記スペクト
ル成分は、その分解能に基づき、周波数または時間軸に
対して、さらに組分けされなければならない。つまり、
前記スペクトル成分は、一様なサブバンド符号化のよう
に、一様な帯域幅または単位の大きさに組分けされる必
要がある。あるいは、一様でないサブバンド符号化また
はＷａｖｅｌｅｔ符号化のように、異なる長さの帯域幅
または単位に組分けされる必要がある。例えば、前記ス
ペクトル成分が、５１２ポイントのＭＤＣＴのように、
高い分解能を有する場合、周波数軸に対してその成分の
組分けが行われる。その他の場合は、一様な３２帯域サ
ブバンドブロック符号化のように、時間領域での成分組
分けが行なわれる。前記スペクトル成分は、上述の如く
一様または非一様のどちらでもよいが、量子化ビットを
別々に割り当てることになっている帯域幅または単位の
大きさと同一となるように組分けされなければならな
い。（以後、帯域幅または単位のことを単に単位とい
う。）次に、前記単位の各々およびすべてに含まれるス
ペクトル成分のピークが抽出され、一組の閾値と比較さ
れる。前記閾値は、しばしばＱｕｉｅｔモードの閾値が
設定される。あるいは、簡単のため、Ｑｕｉｅｔモード
の閾値の組のうち最小に近い一定値に設定されることも
ある。ここで、ある周波数のＱｕｉｅｔモードの閾値と
は、聴きとることができる周波数に対する純音性の最小
値のことをいう。前記閾値を上回るピークに関する単位
の数は、検査されたオーディオ信号のエネルギーの見積
値を表す。Prior to normalization and quantization, the spectral components must be further grouped on the frequency or time axis based on their resolution. That is,
The spectral components need to be grouped into uniform bandwidths or unit sizes, such as uniform subband coding. Alternatively, they need to be grouped into bandwidths or units of different length, such as non-uniform subband coding or Wavelet coding. For example, if the spectral component is a 512-point MDCT,
When the resolution is high, the components are grouped on the frequency axis. In other cases, component grouping in the time domain is performed like uniform 32 band subband block coding. The spectral components, which may be uniform or non-uniform as described above, must be grouped to be the same size as the bandwidth or unit for which the quantized bits are to be allocated separately. . (Hereinafter, the bandwidth or unit is simply referred to as a unit.) Next, the peaks of the spectral components contained in each and all of said units are extracted and compared with a set of thresholds. As the threshold value, a threshold value in Quiet mode is often set. Alternatively, for simplicity, it may be set to a constant value close to the minimum of the set of threshold values in the Quiet mode. Here, the threshold of the Quiet mode of a certain frequency refers to the minimum value of pure tones with respect to the audible frequency. The number of units for peaks above said threshold represents an estimate of the energy of the examined audio signal.

【００２６】前記閾値を上回るピークに対する単位の純
音性指標が計算される。この指標は、より低いリミット
を越えるスペクトル成分の数に関する単位に対してのみ
計算可能である。前記指標は、１または０のようなバイ
ナリか、もしくは分数のどちらでもよい。前記分数は、
その単位に対するスペクトルの平坦度の尺度を計算する
ことにより得られる。前記純音性指標は、単位のスペク
トル特性が純音性であるか否かを示す。ここで、純音性
（tonality）とは、オーディオ信号における正弦波的性
質をいう。A unit pureness index for peaks above the threshold is calculated. This measure can only be calculated in units of the number of spectral components above the lower limit. The index may be either binary, such as 1 or 0, or a fraction. The fraction is
It is obtained by calculating a measure of the flatness of the spectrum for that unit. The pure tone index indicates whether the spectral characteristic of the unit is pure tone. Here, tonality refers to the sinusoidal nature of an audio signal.

【００２７】更に強力なマスキング効果が、純音性であ
る単位から得られる。すなわち、ピークがＱｕｉｅｔモ
ードの閾値を上回るスペクトル成分の単位の数または割
合から、非臨界オーディオ信号であるかどうかを識別す
ることができる。前記単位の数が所定値より低ければ、
オーディオ信号は非常に少ない単位にエネルギーを含ん
でいることを示している。たとえ、スペクトルエネルギ
ーが高くても、ビット割り当てに対する要求は必ずしも
高くならない。前記単位の数が前記所定値を上回り、広
帯域とみなされる信号を越えたレベルより低い場合、ビ
ット割り当てに対して見積もられる要求に関する検査
が、さらに、純音性指標の計算によってなされる。純音
性指標の和が所定の閾値より低ければ、ビット割り当て
に対する要求は小さく、オーディオ信号のフレームを非
臨界とみなすことができる。非臨界と識別されたオーデ
ィオ信号のフレームに対しては、信号エネルギーに基づ
く非知覚的ビット割り当て手段２．５が適用される。臨
界と識別されたオーディオ信号のフレームに対しては、
知覚的ビット割り当て手段２．６が適用される。A more powerful masking effect is obtained from the units that are pure tones. That is, it is possible to identify whether or not it is a non-critical audio signal from the number or percentage of units of the spectral component whose peak exceeds the threshold of the Quiet mode. If the number of units is lower than a predetermined value,
The audio signal is shown to contain energy in very few units. Even with high spectral energy, the demand for bit allocation is not necessarily high. If the number of units is above the predetermined value and below the level above the signal considered wideband, a further check on the estimated demand for bit allocation is made by calculation of the pureness index. If the sum of the pureness indices is lower than a predetermined threshold, the requirement for bit allocation is small and the frame of the audio signal can be considered non-critical. For frames of the audio signal identified as non-critical, the non-perceptual bit allocation means 2.5 based on signal energy is applied. For frames of the audio signal identified as critical,
The perceptual bit allocation means 2.6 is applied.

【００２８】前記非知覚的ビット割り当て手段または前
記知覚的ビット割り当て手段のいずれを行うかの前記決
定を行うのが、動的ビット割り当て手段２．４である。It is the dynamic bit allocation means 2.4 that makes the decision whether to perform the non-perceptual bit allocation means or the perceptual bit allocation means.

【００２９】前記非知覚的ビット割り当て手段が選択さ
れ、前記スペクトル成分Ｓ２．２が十分高い分解能のも
のでないとき、入力オーディオ信号に並列周波数変換を
施すための要求は行われない。このとき、反復ビット割
り当て手段２．７は、信号エネルギーまたは変動のみに
基づいて割り当てる必要がある。その信号内容に関する
複雑な解析は行なう必要はない。その信号エネルギーま
たは変動は、非知覚的ビット制御手段２．５により、Ｓ
２．５として出力される。When the non-perceptual bit allocation means is selected and the spectral component S2.2 is not of sufficiently high resolution, no request is made to perform parallel frequency conversion on the input audio signal. At this time, the repetitive bit allocating means 2.7 needs to allocate based on only the signal energy or the fluctuation. It is not necessary to perform complicated analysis on the signal content. The signal energy or variation is S by the non-perceptual bit control means 2.5.
It is output as 2.5.

【００３０】前記知覚的ビット割り当て手段２．６が選
択されると、マスキング閾値が計算される必要がある。
このためには、十分高い周波数分解能のスペクトル成分
が使用されなければならない。信号対マスキング閾値比
が、知覚的ビット割り当て手段２．６からＳ２．６とし
て出力される。このとき、反復ビット割り当て手段２．
７は、反復して利用可能な全ビットを、信号変化または
信号対マスキング閾値比が最も高いスペクトルの単位に
優先順位を付し割り当てる。各々およびすべてのスペク
トル単位に割り当てられるビットの組が、Ｓ２．７とし
て出力される。When the perceptual bit allocation means 2.6 is selected, a masking threshold needs to be calculated.
For this purpose, spectral components with sufficiently high frequency resolution must be used. The signal-to-masking threshold ratio is output from the perceptual bit allocation means 2.6 as S2.6. At this time, the repetitive bit allocation means 2.
7 prioritizes and assigns all the bits that are available iteratively to the unit of the spectrum with the highest signal change or signal to masking threshold ratio. The set of bits assigned to each and every spectral unit is output as S2.7.

【００３１】正規化手段２．２は、量子化前に、同じ単
位に属するスペクトル成分の値を±１以内の範囲に正規
化を行う。正規化された前記スペクトル成分のことを正
規化スペクトル成分Ｓ２．３という。前記正規化は、そ
の単位のすべてのスペクトル成分の値を、前記最大の大
きさのスペクトル成分、または前記最大の大きさに等し
いかまたはほんの僅かだけ大きいスケールの成分の値で
割ることにより行なわれる。The normalizing means 2.2 normalizes the values of the spectral components belonging to the same unit to within ± 1 before the quantization. The normalized spectral component is referred to as a normalized spectral component S2.3. The normalization is performed by dividing the values of all spectral components of the unit by the value of the spectral component of maximum magnitude or of a scale equal to or only slightly greater than the maximum magnitude. .

【００３２】量子化手段２．３では、一般的な一様な線
形量子化が行なわれるが、非線形量子化であってもよ
い。一様な線形量子化が行われると、量子化ノイズの削
減を簡単に実現することができる。あるいは、追加のビ
ットが量子化に用いられる場合は、ＳＮ比の改善を簡単
に実現することできる。前記正規化スペクトル成分を量
子化するのに、より多くのビットを使用すると、量子化
ノイズは減少される。The quantizing means 2.3 performs general uniform linear quantization, but it may be non-linear quantization. If uniform linear quantization is performed, reduction of quantization noise can be easily realized. Alternatively, if the additional bits are used for quantization, then an improved SNR can be easily achieved. Using more bits to quantize the normalized spectral component reduces the quantization noise.

【００３３】ここで、図１及び図２に基づき、前記動的
ビット割り当て手段２．７の動作を、さらに具体的に順
を追って説明する。なお、スペクトル成分については、
異なる大きさの単位あるいは帯域幅に組分けすることも
できるが、説明のため、等しい大きさの単位に組分け
し、そして、その単位はdBであるものする。The operation of the dynamic bit allocating means 2.7 will now be described more specifically step by step with reference to FIGS. 1 and 2. Regarding the spectral components,
Although it is possible to group into units of different sizes or bandwidths, for the sake of explanation, we will group into units of equal size and the units are dB.

【００３４】オーディオフレームの非臨界または臨界オ
ーディオ信号への組分けは、ピークエネルギー分布およ
び純音性の解析を含んでいる。その単位の大きさは一般
に一様であり、または臨界帯域の大きさに比例してい
る。前記臨界帯域の大きさに基づき、番号の低い単位は
番号の高い単位より少数のスペクトル成分を持つことに
なる。The grouping of audio frames into non-critical or critical audio signals includes analysis of peak energy distribution and pureness. The unit size is generally uniform or proportional to the critical band size. Due to the size of the critical band, lower numbered units will have fewer spectral components than higher numbered units.

【００３５】各および全周波数の単位の中のピークまた
は最大の大きさを抽出する。ピークエネルギー分布の解
析において、すべての単位のピークまたは最大の大きさ
が最初に抽出される（１．１）。例えば、Ｑｕｉｅｔモ
ードの閾値のような所定の閾値に対するこれらピーク値
の比較は、「可聴単位」を識別するように行なわれる。
検出されたすべての「可聴単位」について、単位カウン
ターがインクリメントするか、またはスペクトル成分の
カウンターがその単位のスペクトル成分の数だけ増加す
る。単位カウンターのインクリメントは、一様な大きさ
の単位が使用されるとき行われる（１．４）。The peak or maximum magnitude in each and every frequency unit is extracted. In the analysis of peak energy distribution, the peak or maximum magnitude of all units is first extracted (1.1). For example, comparison of these peak values to a predetermined threshold, such as the Quiet mode threshold, is performed to identify "audible units."
For every "audible unit" detected, the unit counter is incremented or the spectral component counter is incremented by the number of spectral components in that unit. The incrementing of the unit counter is done when uniformly sized units are used (1.4).

【００３６】その純音性指標は、純音性決定手段により
１．５で計算される。その指標はバイナリまたは分数を
与えるように計算される。バイナリの純音性指標の計算
については、ウィンドーの大きさは７スペクトル成分で
ある。これは処理中の単位の直前および直後の単位が、
計算に必要であることを意味している。Ｘ(k)−Ｘ(k＋
j)≧６dBであるスペクトル成分Ｘ(k)を見つけることが
できれば、純音性指標は１に設定される。ここで、kは
単位に対するスペクトル成分指標であり、jは、j＝−
３，−２，＋２，＋３である。この代わりに、公式（１
−ＳＦＭ）を使用して分数純音性指標を使用してもよ
い。ここで、ＳＦＭはスペクトルの平坦度の尺度であ
り、パワースペクトルの幾何平均に対するパワースペク
トルの算術平均の比と定義される。ＳＦＭの計算に対す
るウィンドーの大きさは、少なくとも１６スペクトル成
分である。この大きさに満たない単位については、隣接
単位からのスペクトル線を使用して適切な数を与えるこ
とができる。検査中の単位のスペクトル線は、ＳＦＭの
計算に対するウィンドーの中心成分を形成するはずであ
る。The tonality index is calculated as 1.5 by the tonality determining means. The index is calculated to give a binary or a fraction. For binary pureness index calculations, the window size is 7 spectral components. This means that the units immediately before and after the unit being processed are
It means that it is necessary for calculation. X (k) -X (k +
If the spectral component X (k) with j) ≧ 6 dB can be found, the pure tone index is set to 1. Here, k is a spectral component index for the unit, and j is j = −
3, -2, + 2, + 3. Instead of this, the formula (1
-SFM) may be used to use the Fractional Tonicity Index. Here, SFM is a measure of the flatness of the spectrum and is defined as the ratio of the arithmetic mean of the power spectrum to the geometric mean of the power spectrum. The window size for SFM calculation is at least 16 spectral components. For units below this size, spectral lines from adjacent units can be used to give the appropriate number. The spectral lines of the unit under examination should form the central component of the window for the SFM calculation.

【００３７】ピークエネルギー分布の解析から、Ｑｕｉ
ｅｔモードの閾値を上回るスペクトル成分の割合が１０
%未満であれば、オーディオ信号は完全に非臨界である
とされる。２０%より多ければ、信号は完全に臨界であ
るとされる。割合が１０%と２０%との間にある場合に
は、１．１０で示すように、０．５を超す純音性指標を
有する単位の数が調べられる。２を超える単位が高い純
音性指標を備えている場合には、これはマスキング閾値
の計算が複雑である可能性があるので、信号エネルギー
のみに基づいてビット割り当てを行なうのは不可能であ
るということを意味している。From the analysis of the peak energy distribution, Qui
The ratio of the spectral components exceeding the threshold of the et mode is 10
If it is less than%, the audio signal is said to be completely non-critical. If it is more than 20%, the signal is said to be completely critical. If the proportion is between 10% and 20%, the number of units with a pureness index above 0.5 is investigated, as indicated by 1.10. If more than two units have a high pureness index, it may not be possible to base bit allocation solely on signal energy, as this may complicate the calculation of masking thresholds. It means that.

【００３８】次に、前記図１及び図２を工程別に説明す
る。デジタルオーディオ信号のフレームをＦＦＴ、ＭＤ
ＣＴまたは他の変換を使って、スペクトル成分に変換す
る。そのスペクトル成分を等しい大きさの単位に組分け
する。各々のおよびすべての単位のスペクトル成分のピー
クまたは最大の大きさを識別する（１．１）。ピークカウンターｐを０に初期設定する（１．
２）。単位についてピーク値を最小のＱｕｉｅｔモードの
閾値ηdBと比較し（１．３）、ピーク値がηを超えてい
れば、ピークカウンターｐをインクリメントする（１．
４）。純音性指標を計算する（１．５）。バイナリまたは
分数の純音性指標の計算法を使用することができる。ピーク値の検査および純音性指標の計算がＮ個の単
位すべてについて行なわれると、ピークカウンターｐの
値が検査される（１．６）。それがχ以下であれば、非
知覚的ビット割り当て手段、すなわち、各々のおよびす
べての単位の信号エネルギーに基づく割り当ての実行に
進む（１．１３）。ただし、χ＝０．１×Ｎである。ｐがχより大きければ、ｐがγ以下であるか検査さ
れる（１．７）。ただし、γ＝０．２×Ｎである。真で
なければ、知覚的ビット割り当て手段、すなわち、各々
のおよびすべての単位の信号対マスキング閾値に基づく
割り当ての実行に進む（１．１４）。ｐがγ以下であれば、純音性カウンターｑを０に初
期設定し（１．９）、ηを超えるピーク値を有するｐ個
の単位の純音性指標を、それらがξより大きいかを順に
チェックする（１．１０）。ただし、ξ＝０．５であ
る。真てあるときはすべて、純音性カウンターｑをイン
クリメントする（１．１１）。ｑがＭ以下であれば、非知覚的ビット割り当て手段
に進む（１．１３）。ここで、４kHzを超す元のオーデ
オの帯域幅に対してはＭ＝２、その他の場合はＭ＝１と
する。その他の場合には、知覚的ビット割り当て手段に
進む（１．１４）。最後に、従来のデジタルオーディ
オ信号符号化装置と本発明の符号化装置について、ＴＨ
Ｄ＋Ｎの測定結果を比較する。Next, FIGS. 1 and 2 will be described step by step. Frame of digital audio signal is FFT, MD
Transform into spectral components using CT or other transforms. The spectral components are grouped into units of equal size. Identify the peak or maximum magnitude of the spectral components of each and every unit (1.1). Initialize the peak counter p to 0 (1.
2). The peak value is compared with the minimum Quiet mode threshold value η dB for the unit (1.3), and if the peak value exceeds η, the peak counter p is incremented (1.
4). A pure tone index is calculated (1.5). Binary or fractional pureness index calculation methods can be used. When the peak value check and the pure tone index calculation have been performed for all N units, the peak counter p value is checked (1.6). If it is less than or equal to χ, proceed to the implementation of non-perceptual bit allocation means, i.e. allocation based on the signal energy of each and every unit (1.13). However, χ = 0.1 × N. If p is larger than χ, it is checked whether p is γ or less (1.7). However, γ = 0.2 × N. If it is not true, proceed to the perceptual bit allocation means, i.e. performing the signal-to-masking threshold based allocation of each and every unit (1.14). If p is less than or equal to γ, the tonality counter q is initialized to 0 (1.9), and the tonality index of p units having a peak value exceeding η is sequentially checked for whether they are greater than ξ. Yes (1.10). However, ξ = 0.5. When it is true, the pure tone counter q is incremented (1.11). If q is less than or equal to M, proceed to the non-perceptual bit allocation means (1.13). Here, it is assumed that M = 2 for the original audio bandwidth exceeding 4 kHz, and M = 1 in other cases. Otherwise, go to Perceptual Bit Allocation Means (1.14). Finally, regarding the conventional digital audio signal encoding device and the encoding device of the present invention,
The D + N measurement results are compared.

【００３９】まず、従来の符号化装置について、ステレ
オＬ／Ｒに関して２２４kbits/secondで動作するビデオ
ＣＤを比較対象とする。その構造が、図４に示す知覚的
ビット割り当て手段を備えたデジタルオーディオ符号化
装置であるとき、ＴＨＤ＋Ｎの測定結果は前記した表１
となる。これより、種々の周波数、特に、２０１８Hz、
１５００５Hz、１６００４Hz、および１７００３Hzにお
いて測定結果が望ましくないことが分かる。この主な原
因は、知覚的ビット割り当て手段が、すべての信号形式
に依存することなく、常に動作していることによるもの
である。First, regarding the conventional encoding device, a video CD operating at 224 kbits / second for stereo L / R will be compared. When the structure is a digital audio encoding device having the perceptual bit allocation means shown in FIG. 4, the measurement result of THD + N is shown in Table 1 above.
Becomes From this, various frequencies, especially 2018Hz,
It can be seen that the measurement results are undesirable at 15005Hz, 16004Hz, and 17003Hz. The main reason for this is that the perceptual bit allocation means is always working, independent of all signal formats.

【００４０】これに対して、図３の如く、動的ビット割
り当て手段を備えた本発明のデジタルオーディオ信号符
号化装置で得られたＴＨＤ＋Ｎの測定結果を表２に示
す。On the other hand, as shown in FIG. 3, Table 2 shows the measurement results of THD + N obtained by the digital audio signal coding apparatus of the present invention provided with the dynamic bit allocation means.

【００４１】[0041]

【表２】 [Table 2]

【００４２】これより、表１で結果の望ましくなかった
周波数に対して、ＴＨＤ＋Ｎの測定結果が大幅に改善さ
れていることが分かる。と同時に、試聴による音質に目
立った低下は見つけられなかった。From this, it can be seen that the measurement result of THD + N is significantly improved for the frequencies whose results are not desirable in Table 1. At the same time, no noticeable deterioration in sound quality due to the audition was found.

【００４３】[0043]

【発明の効果】以上述べたことから明らかなように本発
明は、動的ビット割り当て手段により、高圧縮率または
低ビットレートで動作し、非臨界オーディオ信号を識別
することができるようになる。これにより、非知覚的ビ
ット割り当て手段を適用することで、デジタルオーディ
オ信号の品質を損なうことなく符号化することができ、
そして、ＴＨＤ＋Ｎの測定結果をすべての周波数で最大
にすることができる効果を有する。As is clear from the above description, the present invention enables the dynamic bit allocation means to operate at a high compression rate or a low bit rate and identify a non-critical audio signal. By this, by applying a non-perceptual bit allocation means, it is possible to encode without impairing the quality of the digital audio signal,
Then, there is an effect that the measurement result of THD + N can be maximized at all frequencies.

[Brief description of drawings]

【図１】本発明の実施例の動的ビット割り当て手段の動
作を示す流れ図FIG. 1 is a flow chart showing the operation of a dynamic bit allocation means according to an embodiment of the present invention.

【図２】本発明の実施例の動的ビット割り当て手段の動
作を示す流れ図FIG. 2 is a flowchart showing the operation of the dynamic bit allocation means of the embodiment of the present invention.

【図３】本実施例のデジタルオーディオ信号符号化装置
の構成図FIG. 3 is a configuration diagram of a digital audio signal encoding device according to the present embodiment.

【図４】従来のデジタルオーディオ信号符号化装置の構
成図FIG. 4 is a block diagram of a conventional digital audio signal encoding device.

【図５】１kHzシヌソイド試験入力の周波数係数のグラ
フFIG. 5: Graph of frequency coefficient of 1 kHz sinusoidal test input

【図６】任意のオーディオ信号の周波数係数のグラフFIG. 6 is a graph of frequency coefficients of an arbitrary audio signal.

[Explanation of symbols]

２．１周波数変換手段２．２正規化手段２．３量子化手段２．４動的ビット割り当て手段２．５非知覚的ビット割り当て手段２．６知覚的ビット割り当て手段２．７反復ビット割り当て手段 2.1 Frequency conversion means 2.2 Normalization means 2.3 Quantization means 2.4 Dynamic bit allocation means 2.5 Non-perceptual bit allocation means 2.6 Perceptual bit allocation means 2.7 Iterative bit allocation means

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号庁内整理番号ＦＩ技術表示箇所Ｈ０３Ｍ 7/30 Ａ 9382−5ＫＨ０４Ｌ 29/02 (72)発明者ケー・ヌン・ファットシンガポールトゥルロロード 28─────────────────────────────────────────────────── ─── Continuation of front page (51) Int.Cl. ⁶ Identification number Office reference number FI Technical indication location H03M 7/30 A 9382-5K H04L 29/02 (72) Inventor Kee Nung Phat Singapore Truro Road 28

Claims

[Claims]

1. A frequency conversion means for frequency-converting each frame of a digital audio signal into a spectrum component, a grouping means for grouping the spectrum component into a predetermined bandwidth or unit, and the bandwidth or unit of a human being. Perceptual bit allocation means for performing bit allocation based on auditory characteristics; non-perceptual bit allocation means for performing dynamic bit allocation based on signal energy in the bandwidth or unit; and the perceptual bit allocation means or the non-perceptual bit allocation means. A digital audio signal coding apparatus, comprising: a dynamic bit allocating means for determining which of the bit allocating means is to be used according to a predetermined standard.

2. The determination comprises extracting spectral component values of peaks contained in the bandwidth or unit and calculating a ratio of the bandwidth or units having a spectral component value of the peak smaller than a predetermined threshold value to the total number of units. , Determining the tonality of the signal contained in each and all of said bandwidths or units, calculating the total tonality of the digital audio signal, and performing based on said ratio and said total tonality. The digital audio signal encoding device according to claim 1.