JP3254953B2

JP3254953B2 - Highly efficient speech coding system

Info

Publication number: JP3254953B2
Application number: JP05331795A
Authority: JP
Inventors: 徳彦渕上; 昭治植野
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 1995-02-17
Filing date: 1995-02-17
Publication date: 2002-02-12
Anticipated expiration: 2017-02-12
Also published as: JPH08223052A

Abstract

PURPOSE: To reduce the arithmetic quantity when the offset amount of a masking reference curve is calculated and to improve the sound quality for higher satisfaction of the acoustic psychology. CONSTITUTION: An acoustic psychology analysis part 3 calculates the power spectrum of an audio signal from an orthogonal conversion factor, calculates the auto-correlation of the power spectrum for every band that is previously decided, and then calculates the offset amount of the acoustic psychological masking effect from the ratio between the maximum and minimum auto- correlation value. Based on this offset amount, the quantization bit number is decided for every sub-band of a quantizing/coding means 4. Then a 2nd necessary SN ratio is calculated from the signal power of every sub-band by means of a root-mean-square error minimum theory including the acoustic control, and the final necessary SN ratio is calculated by giving weighting to the 1st and 2nd SN ratios. Based on the final SN ratio, the quantization bit number is decided for every sub-band of the means 4.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、オーディオ信号を複数
の周波数帯域（サブバンド）に分割し、分割された信号
をサブバンド毎に量子化及び符号化する音声高能率符号
化装置に関し、特に聴覚心理分析に基づいてサブバンド
毎の量子化ビット数を決定する音声高能率符号化装置に
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a high-efficiency audio encoding apparatus for dividing an audio signal into a plurality of frequency bands (sub-bands) and quantizing and encoding the divided signals for each sub-band. The present invention relates to a high-efficiency speech coding apparatus that determines the number of quantization bits for each subband based on psychoacoustic analysis.

【０００２】[0002]

【従来の技術】ミニディスク（ＭＤ）、ディジタルコン
パクトカセット（ＤＣＣ）、カラオケＣＤ等における音
声高能率符号化は、オーディオ信号のデータ量を圧縮す
るので音楽圧縮とも呼ばれている。このような符号化方
式では、オーディオ信号がデジタルフィルタ又は直交変
換により複数のサブバンドに分割され、周波数領域にお
ける聴覚心理分析に基づいてサブバンド毎の量子化ビッ
ト数が決定される。なお、以下の説明では「エンコー
ド」という用語を符号化の他に圧縮の意味で用いる場合
もある。2. Description of the Related Art High-efficiency audio encoding in minidiscs (MD), digital compact cassettes (DCC), karaoke CDs, and the like compresses the data amount of audio signals, and is therefore called music compression. In such an encoding method, an audio signal is divided into a plurality of subbands by a digital filter or an orthogonal transform, and the number of quantization bits for each subband is determined based on psychoacoustic analysis in a frequency domain. In the following description, the term “encode” may be used in a sense of compression other than encoding.

【０００３】図２２の（ａ）〜（ｄ）はこのような符号
化方式で周波数帯域を直交変換により分割する例を示し
ている。図２２の（ａ）はエンコードの対象となる１６
ビットＰＣＭオーディオ信号を５１２サンプル分切り出
したことを示し、ここでは図の長方形で囲まれる全情報
量が１６ビット＊５１２＝８１９２ビットとして説明す
る。もちろん、切り出されるサンプル数やＰＣＭのビッ
ト数はこの値に限定されない。FIGS. 22A to 22D show an example in which a frequency band is divided by orthogonal transform in such an encoding system. (A) of FIG.
This indicates that the bit PCM audio signal has been cut out by 512 samples. Here, the description will be made on the assumption that the total information amount enclosed by the rectangle in the figure is 16 bits * 512 = 8192 bits. Of course, the number of samples to be cut out and the number of bits of PCM are not limited to this value.

【０００４】図２２の（ｂ）は図２２の（ａ）に示す信
号をＤＣＴ（離散コサイン変換）やＦＦＴ（高速フーリ
エ変換）等の直交変換により周波数変換した信号を示
し、図の曲線が周波数スペクトルのエンベロープを示し
ている。ここで、直交変換により情報量が保存されると
仮定すると、この全情報量も図の長方形領域で表現する
ことができる。一方、聴覚心理モデルによれば、図２２
の（ｂ）に示す信号が存在したときにその信号によりマ
スキングされて聞こえなくなる信号レベルをカーブとし
て規定することができ、これは一般にマスキング効果
（詳しくは後述）と言われる。FIG. 22B shows a signal obtained by frequency-transforming the signal shown in FIG. 22A by orthogonal transform such as DCT (discrete cosine transform) or FFT (fast Fourier transform). The spectrum envelope is shown. Here, assuming that the information amount is preserved by the orthogonal transformation, this entire information amount can also be represented by a rectangular area in the figure. On the other hand, according to the psychoacoustic model, FIG.
(B), when there is a signal, the signal level that is masked by the signal and becomes inaudible can be defined as a curve, which is generally called a masking effect (to be described in detail later).

【０００５】図２２の（ｂ）からマスキングカーブを描
くと図２２の（ｃ）に示すように表すことができ、ここ
で、図２２の（ｂ）に示す信号を再量子化することを考
慮すると、再量子化により発生する量子化ノイズレベル
がマスキングカーブで規定されるレベル以下であれば、
そのノイズは人間の耳には聞こえないと言うことができ
る。そこで、図２２の（ｄ）に示すようにスペクトルを
複数データ毎にサブバンドに分割し、各サブバンド毎の
最大信号レベルをＳとし、また、図２２の（ｃ）から許
容されるノイズレベルをＮとしてこのＳ／Ｎを満足する
ビット数で再量子化すれば、そのときの量子化ノイズは
マスキングされて聞こえない。If a masking curve is drawn from FIG. 22 (b), it can be expressed as shown in FIG. 22 (c). Here, it is considered that the signal shown in FIG. 22 (b) is requantized. Then, if the quantization noise level generated by the requantization is equal to or less than the level defined by the masking curve,
It can be said that the noise is inaudible to human ears. Therefore, as shown in FIG. 22D, the spectrum is divided into sub-bands for each of a plurality of data, the maximum signal level for each sub-band is set to S, and the noise level allowed from FIG. If N is re-quantized with the number of bits satisfying this S / N, the quantization noise at that time is masked and cannot be heard.

【０００６】図２２の（ｄ）の矩形は圧縮時および伸長
時に必要な情報量を示し、特に図の中央の変形矩形は主
情報を、図の下側の細長い矩形は補助情報を示してい
る。なお、補助情報とはデコード時に必要な各サブバン
ドの最大値（スケール値）と量子化ビット数を示す情報
等である。したがって、図２２の（ｄ）において示され
る全情報量は主情報量と補助情報量の和であり、図２２
の（ａ）や図２２の（ｂ）における全情報量の数分の１
になることが分かる。以上の処理を所定区間（この例で
は５１２サンプル区間）毎に繰り返すことにより音質を
殆ど劣化することなくエンコードすることができる。The rectangles shown in FIG. 22 (d) indicate the amount of information required at the time of compression and decompression. Particularly, the deformed rectangle at the center of the figure shows main information, and the elongated rectangle at the bottom of the figure shows auxiliary information. . The auxiliary information is information indicating the maximum value (scale value) of each subband and the number of quantization bits necessary for decoding. Therefore, the total information amount shown in FIG. 22D is the sum of the main information amount and the auxiliary information amount.
(A) and a fraction of the total information amount in (b) of FIG.
It turns out that it becomes. By repeating the above process for each predetermined section (512 sample section in this example), encoding can be performed with almost no deterioration in sound quality.

【０００７】図２３は一般的なエンコード処理を示して
いる。先ず、例えば１６ビットＰＣＭオーディオ信号が
５１２サンプル分切り出され、各サンプルのオーディオ
信号がＤＣＴやＦＦＴ等により直交変換され、複数のサ
ブバンドｓに分割される（ステップＳ１）。そして、聴
覚心理分析により各サブバンドｓの最大値（スケール
値）が決定されるとともに（ステップＳ２）、各サブバ
ンドの許容ノイズレベルＮ〔ｓ〕が決定される（ステッ
プＳ３）。次いで各サブバンドに必要なＳ／Ｎ比が決定
され（ステップＳ４）、次いでこのＳ／Ｎ比から各サブ
バンドの量子化ビット数が決定され（ステップＳ５）、
各サブバンドが量子化されて補助情報とともに出力され
る（ステップＳ６）。FIG. 23 shows a general encoding process. First, for example, a 16-bit PCM audio signal is cut out for 512 samples, and the audio signal of each sample is orthogonally transformed by DCT, FFT, or the like, and divided into a plurality of subbands s (step S1). Then, the maximum value (scale value) of each subband s is determined by the psychoacoustic analysis (step S2), and the allowable noise level N [s] of each subband is determined (step S3). Next, the required S / N ratio for each subband is determined (step S4), and then the number of quantization bits for each subband is determined from this S / N ratio (step S5).
Each subband is quantized and output together with auxiliary information (step S6).

【０００８】次に聴覚心理に基づくマスキングカーブの
計算方法を説明する。マスキング効果とはある周波数ス
ペクトルが存在する場合に、その周辺のある一定レベル
以下の音が検知できなくなることを言う。図２４は各種
周波数スペクトルに関するマスキングカーブを示し、こ
のカーブの傾斜は低域ほど急峻であり、高域ほど緩慢で
ある。Next, a method of calculating a masking curve based on auditory psychology will be described. The masking effect means that when a certain frequency spectrum exists, sounds around a certain level or lower cannot be detected. FIG. 24 shows a masking curve relating to various frequency spectra. The slope of the curve is steeper in a lower frequency range and is gentler in a higher frequency range.

【０００９】また、図２４の横軸（周波数）を聴覚の臨
界帯域幅に比例したスケールに変換すると、図２５に示
すようにこれらのカーブはほぼ同様な形および傾斜のカ
ーブになることが知られている。また、この臨界帯域幅
は図２６に示すように、ＤＣ〜２０ｋＨｚを２５バンド
で分割して表すことができ、マスキングを始めとする聴
覚特性は、この臨界帯域幅に比例した振る舞いをするこ
とが多い、When the horizontal axis (frequency) in FIG. 24 is converted into a scale proportional to the critical bandwidth of hearing, these curves have substantially the same shape and slope as shown in FIG. Have been. As shown in FIG. 26, this critical bandwidth can be expressed by dividing DC to 20 kHz into 25 bands, and the auditory characteristics such as masking may behave in proportion to this critical bandwidth. Many,

【００１０】さて、図２２の（ｂ）に示すような一般的
な信号が存在したときのマスキングカーブは、図２４ま
たは図２５のように個々の周波数スペクトルに対するマ
スキングカーブの和（重ね合わせ）で表して図２２の
（ｃ）に示すような曲線で表すことができるが、実際の
計算では図２２の（ｃ）に示すような滑らかな曲線とし
てマスキングカーブを算出すると演算量が膨大となるの
で困難である。そこで、近似としてスペクトルを分析バ
ンド毎のパワーに置き換え、マスキングカーブを分析バ
ンド毎の折れ線波形として評価することが行われる。A masking curve when a general signal as shown in FIG. 22B is present is a sum (superposition) of masking curves for individual frequency spectra as shown in FIG. 24 or FIG. It can be expressed by a curve as shown in FIG. 22C. However, in the actual calculation, if a masking curve is calculated as a smooth curve as shown in FIG. Have difficulty. Therefore, as an approximation, the spectrum is replaced with the power for each analysis band, and the masking curve is evaluated as a polygonal waveform for each analysis band.

【００１１】次に、図２２の（ｄ）においてマスキング
カーブの各サブバンド区間における最小値をそのサブバ
ンドにおいて許容されるノイズレベルＮとして、マスキ
ングカーブを算出してノイズレベルＮを導出する従来例
を図２７を参照して説明する。図２７においては（１）
〜（５）に示す処理を行い、（１）先ず、直交変換して得られたｑ（ｊ＝０〜ｑ−
１）本の周波数スペクトルからｍ個の分析バンドｉ（ｉ
＝０〜ｍ−１）毎のバンドトータルパワーＰ〔ｉ〕を算
出する。Next, in FIG. 22D, a conventional example in which a masking curve is calculated and the noise level N is derived by setting the minimum value in each subband section of the masking curve as the noise level N allowed in the subband. Will be described with reference to FIG. In FIG. 27, (1)
(1) First, q (j = 0 to q−
1) m analysis bands i (i
= 0 to m−1) is calculated for each band total power P [i].

【００１２】[0012]

【数１】 (Equation 1)

【００１３】（２）次に、次式（数２）のように分析バ
ンドｉに応じたマスキングの基準カーブＢとバンドトー
タルパワーＰ〔ｉ〕との畳み込み演算を行うことによ
り、各分析バンドｉにおけるマスキングレベルＭ〔ｉ〕
を算出する。ここで、マスキングの基準カーブＢは、分
析バンドｉによらず一定の形の場合には図２８に示すよ
うなＢ〔ｋ〕（ｋは整数）で表すことができる。(2) Next, a convolution operation of the masking reference curve B corresponding to the analysis band i and the band total power P [i] is performed as shown in the following equation (Equation 2) to obtain each analysis band i. Masking level M [i] in
Is calculated. Here, the masking reference curve B can be represented by B [k] (k is an integer) as shown in FIG. 28 in the case of a constant shape regardless of the analysis band i.

【００１４】[0014]

【数２】 (Equation 2)

【００１５】（３）−１：次に、分析バンドｉとサブバ
ンドｓが異なる場合にはサブバンドｓの区間における最
小のマスキングレベルＭ〔ｉ〕をそのサブバンドｓの許
容ノイズレベルＮ〔ｓ〕とする（サブバンド数ｎとす
る）。(3) -1: Next, when the analysis band i is different from the sub-band s, the minimum masking level M [i] in the section of the sub-band s is changed to the allowable noise level N [s (The number of subbands is n).

【００１６】[0016]

【数３】Ｎ〔ｓ〕＝min 〔Ｍ〔ｉ〕〕N [s] = min [M [i]]

【００１７】但し、ｉはサブバンドｓ〔Ｓ〕の中に含ま
れる各バンドを示し、また、ｓ＝０〜ｎ−１とする。Here, i indicates each band included in the sub-band s [S], and s = 0 to n-1.

【００１８】（３）−２：分析バンドｉとサブバンドｓ
が同一の場合には(3) -2: analysis band i and sub-band s
Are the same

【００１９】[0019]

【数４】Ｎ〔ｓ〕＝Ｍ〔ｓ〕但し、ｓ＝０・・・ｎ−１N [s] = M [s] where s = 0... N-1

【００２０】（４）各サブバンドｓの信号レベルＳ
〔ｓ〕を求める。(4) Signal level S of each subband s
[S] is obtained.

【００２１】[0021]

【数５】 (Equation 5)

【００２２】（５）信号レベルＳ〔ｓ〕と許容ノイズレ
ベルＮ〔ｓ〕により各サブバンドｓの必要Ｓ／Ｎ比（Ｓ
Ｎreq〔ｓ〕）を求める（平均Ｓ／Ｎ比）。(5) The required S / N ratio (S) of each subband s is determined based on the signal level S [s] and the allowable noise level N [s].
Nreq [s]) (average S / N ratio).

【００２３】[0023]

【数６】ＳＮreq〔ｓ〕＝１０．０・log １０（Ｓ
〔ｓ〕／Ｎ〔ｓ〕）## EQU6 ## SNreq [s] = 10.0 · log 10 (S
[S] / N [s])

【００２４】以上の処理（１）〜（３）により各サブバ
ンドｓの許容ノイズレベルＮ〔ｓ〕が求まり、また、処
理（４）（５）により各サブバンドｓの必要Ｓ／Ｎ比が
求まり、この必要Ｓ／Ｎ比に基づいて各サブバンドｓの
量子化ビット数（及び逆量子化ビット数）が求まる。The permissible noise level N [s] of each subband s is obtained by the above processes (1) to (3), and the required S / N ratio of each subband s is obtained by the processes (4) and (5). The number of quantization bits (and the number of inverse quantization bits) of each subband s is determined based on the required S / N ratio.

【００２５】ここで、このような一連の処理において重
要な役割を果たすのが、図２８において説明したマスキ
ング基準カーブＢ〔ｋ〕の設定である。一般に、マスキ
ング効果はマスクする側の信号とマスクされる側の信号
の性質により振る舞いが異なると言われ、具体的には図
２８に示すようにピーク値と「０．０」の差である「オ
フセット量Ｆ」が信号の性質の影響を受ける。Here, the setting of the masking reference curve B [k] explained in FIG. 28 plays an important role in such a series of processing. Generally, it is said that the masking effect behaves differently depending on the properties of the signal on the masking side and the signal on the masking side, and specifically, as shown in FIG. 28, the difference between the peak value and “0.0”. The offset F "is affected by the nature of the signal.

【００２６】高能率符号化法では、マスクされる側の信
号は「ノイズ」であるので、マスクする側の信号が何か
によってオフセット量Ｆが変化する。実験によれば、マ
スクする側の信号が「正弦波」の場合にＦ≒２５ｄＢ、
「ノイズ」の場合にＦ≒５ｄＢであると報告されてい
る。高能率符号化に入力する実際の音楽・音声信号に
は、これらの値を上限、下限とするオフセット量Ｆが存
在し、このオフセット量Ｆを適切に測定して聴覚心理分
析に用いることが高音質を実現するために必要であると
言うことができる。In the high-efficiency encoding method, since the signal on the masked side is "noise", the offset amount F changes depending on the signal on the masked side. According to experiments, when the signal on the masking side is a “sine wave”, F ≒ 25 dB,
It is reported that F ≒ 5 dB in the case of “noise”. Actual music / speech signals input to the high-efficiency coding have an offset amount F with these values as upper and lower limits, and it is often necessary to appropriately measure this offset amount F and use it for psychoacoustic analysis. It can be said that it is necessary to achieve sound quality.

【００２７】また、オフセット量Ｆは処理の各区間毎
に、各周波数帯域毎に測定することが望ましい。オフセ
ット量Ｆを測定する従来の方法としては、トナリティ
（tonality）を求めるのが一般的である。トナリティと
は信号の純音度を表す指標であり、１．０（正弦波）〜
０．０（ノイズ）の範囲の値をとる。このトナリティは
図２９に示すように３つの連続する区間のそれぞれのＦ
ＦＴスペクトルＡ、Ｂ、Ｃの直線予測から計算される。
なお、区間の間は隙間があってもよいし、重なる部分が
あってもよい。また、ｑポイントのスペクトルを得るた
めには２ｑポイントのＦＦＴスペクトルが必要になる。It is desirable that the offset amount F be measured for each section of the processing and for each frequency band. As a conventional method of measuring the offset amount F, it is general to determine tonality. Tonality is an index indicating the purity of a signal, and is 1.0 (sine wave) to
It takes a value in the range of 0.0 (noise). This tonality is represented by the F
It is calculated from the linear prediction of the FT spectra A, B, C.
Note that there may be a gap between the sections, or there may be overlapping portions. To obtain a q-point spectrum, a 2q-point FFT spectrum is required.

【００２８】図３０はトナリティを求めてオフセット量
Ｆを算出する従来の方法を示し、〜の処理を行う。３つの区間のＦＦＴ係数の振幅Ｒ₁ 〔ｊ〕、Ｒ₂
〔ｊ〕、Ｒ₃ 〔ｊ〕（ｊ＝０〜ｑ−１）と位相Φ₁
〔ｊ〕、Φ₂ 〔ｊ〕、Φ₃ 〔ｊ〕を求める。ここで、一
般的には（Ｒ₃ ，Φ₃ ）が現区間のスペクトルであり、
また、（Ｒ₂ ，Φ₂ ）が前区間のスペクトル、（Ｒ₁ ，
Φ₁ ）が２つ前の区間のスペクトルとすることが多い。
なお、振幅Ｒ〔ｊ〕と位相Φ〔ｊ〕はＦＦＴ係数の実数
部（Real〔ｊ〕）と虚数部（Imag〔ｊ〕）から以下のよ
うに求める。FIG. 30 shows a conventional method for calculating the offset amount F by obtaining the tonality. The amplitudes R ₁ [j] and R ₂ of the FFT coefficients of the three sections
[J], R ₃ [j] (j = 0 to q−1) and phase Φ ₁
[J], Φ ₂ [j], Φ ₃ [j] are obtained. Here, generally, (R ₃ , Φ ₃ ) is the spectrum of the current section,
Also, (R ₂ , Φ ₂ ) is the spectrum of the previous section, and (R ₁ , Φ ₂ )
In many cases, Φ ₁ ) is the spectrum of the previous section.
The amplitude R [j] and the phase Φ [j] are obtained from the real part (Real [j]) and the imaginary part (Imag [j]) of the FFT coefficient as follows.

【００２９】[0029]

【数７】 (Equation 7)

【００３０】Ｒ₁ 、Ｒ₂ 、Φ₁ 、Φ₂ から予測される
３番目の区間のスペクトルＲ_X 〔ｊ〕、Φ_X 〔ｊ〕を直
線予測により次のように求める。The spectra R _X [j] and φ _X [j] of the third section predicted from R ₁ , R ₂ , Φ ₁ , and Φ ₂ are obtained by linear prediction as follows.

【００３１】[0031]

【数８】Ｒ_X 〔ｊ〕＝２・Ｒ₂ 〔ｊ〕−Ｒ₁ 〔ｊ〕 Φ_X 〔ｊ〕＝２・Φ₂ 〔ｊ〕−Φ₁ 〔ｊ〕R _X [j] = 2 · R ₂ [j] −R ₁ [j] Φ _X [j] = 2 · Φ ₂ [j] −Φ ₁ [j]

【００３２】（Ｒ，Φ）平面上における予測値（Ｒ
_X ，Φ_X ）と実測値（Ｒ₃ ，Φ₃ ）との距離ｃ〔ｊ〕を
評価する。なお、この距離は予測不能度（unpredictabi
lity）よ呼ばれる。The predicted value (R) on the (R, Φ) plane
_X , Φ _X ) and the distance c [j] between the actually measured value (R ₃ , Φ ₃ ) are evaluated. This distance is unpredictable (unpredictabi
lity).

【００３３】[0033]

【数９】 (Equation 9)

【００３４】予測不能度ｃ〔ｊ〕を分析バンドｉ毎に
パワースペクトルで重み付け、平均化し、予測不能度ｃ
２〔ｉ〕を得る。The unpredictability c [j] is weighted by the power spectrum for each analysis band i, averaged, and the unpredictability c
2 [i] is obtained.

【００３５】[0035]

【数１０】 (Equation 10)

【００３６】重み付け処理後の予測不能度ｃ２〔ｉ〕
をトナリティｔ〔ｉ〕に変換する。Unpredictability c2 [i] after weighting processing
To the tonality t [i].

【００３７】[0037]

【数１１】ｔ〔ｉ〕＝ａ＋ｂ・ln（ｃ２〔ｉ〕）## EQU11 ## t [i] = a + b.ln (c2 [i])

【００３８】但し、ａ、ｂは０．０≦ｔ〔ｉ〕≦１．０
となるように決定する定数。トナリティｔ〔ｉ〕からオフセット量Ｆ〔ｉ〕を算出
する。However, a and b are 0.0 ≦ t [i] ≦ 1.0
A constant that is determined to be The offset amount F [i] is calculated from the tonality t [i].

【００３９】[0039]

【数１２】Ｆ〔ｉ〕＝α・ｔ〔ｉ〕＋β・｛１．０−ｔ
〔ｉ〕｝〔ｄＢ〕但し、α＝２５．０，β＝５．０等の定数。F [i] = α · t [i] + β · ｛1.0−t
[I]｝ [dB] where α = 25.0, β = 5.0, etc.

【００４０】[0040]

【発明が解決しようとする課題】しかしながら、トナリ
ティｔ〔ｉ〕を算出する方法は、以下のような問題点が
ある。問題点（１）演算量が多い。図３０に示す処理では、平方根やアー
クタンジェントの計算をサンプル毎に行うのでその演算
量がかなり多くなる。また、処理 3における距離演算の
場合にも平方根を用いる。However, the method of calculating the tonality t [i] has the following problems. Problem (1) The amount of calculation is large. In the processing shown in FIG. 30, since the calculation of the square root and the arc tangent is performed for each sample, the calculation amount is considerably large. The square root is also used in the distance calculation in the process 3.

【００４１】ここで、システムをＤＳＰ（デジタルシグ
ナルプロセッサ）等で実現する場合、一般の積和演算を
１インストラクションで行うとすると関数演算は１００
インストラクション以上と考えられる。処理、で
は平方根を２回、アークタンジェントを１回の演算をｑ
＝５１２（１０２４ポイントＦＦＴ）のサンプル毎に行
うので、少なくとも１００・５１２・３＝１５３６００
回の演算量を消費することになる。Here, when the system is realized by a DSP (Digital Signal Processor) or the like, if a general product-sum operation is performed by one instruction, the function operation becomes 100
It is considered more than instruction. In processing, the square root is calculated twice and the arc tangent is calculated once.
= 512 (1024 point FFT), so at least 100 · 512 · 3 = 153600
This consumes the number of operations.

【００４２】例えばＤＳＰの能力が２０ＭＩＰＳ（Mill
ion Instruction Per Second）とすると、１区間当たり
の演算量はサンプル周波数ｆs ＝４４．１ｋＨｚのと
き、２０・１０⁶ ・５１２／４４１００．０≒２３２２
００回であるので、このＤＳＰでは約６６％もの演算量
を消費することになる。For example, if the DSP has a capacity of 20 MIPS (Mill
If the sample frequency fs = 44.1 kHz, the calculation amount per section is 20 · 10 ⁶ · 512 / 44100.0 ≒ 2322
Since it is 00 times, this DSP consumes about 66% of the calculation amount.

【００４３】また、高能率符号化方式の中には、直交変
換としてＦＦＴの代わりにＭ（Modified）ＤＣＴ等のよ
うに変換係数が振幅、位相として表現できないものを用
いる場合がある。この場合には、トナリティ計算を行う
ために別途にＦＦＴ演算を行う必要があり、その分だけ
演算量が増加する。In some high-efficiency coding systems, orthogonal transforms that use transform coefficients that cannot be expressed as amplitude and phase, such as M (Modified) DCT, may be used instead of FFT. In this case, it is necessary to perform an FFT operation separately in order to perform the tonality calculation, and the amount of operation increases accordingly.

【００４４】問題点（２）音声信号にビブラートがかかっている場合にトナリティ
計算そのものに問題がある。例えば入力信号がボーカル
や単一楽器であってビブラートがかかっている場合、図
３１に示すようにそのスペクトルは時間と共に数Ｈｚ〜
十数Ｈｚの周期でドリフトしている。また、例えば区間
長＝５１２サンプルであって区間が密接している場合、
３区間における中心の移動量は１０２４サンプル→２３
ｍsec となり、１０Ｈｚのビブラートの１／４周期（２
５ｍsec ）とほぼ一致する。Problem (2) There is a problem in the tonality calculation itself when the audio signal is vibrato. For example, when the input signal is a vocal or a single instrument and vibrato is applied, as shown in FIG.
It drifts at a period of about 10 Hz. For example, when the section length is 512 samples and the sections are close,
The movement amount of the center in three sections is 1024 samples → 23
msec, which is a 1/4 cycle of vibrato of 10 Hz (2
5 msec).

【００４５】したがって、従来のトナリティ計算では各
スペクトル毎に直線予測を行うのでビブラートにより予
測精度が悪化し、本来、聴感上はトナリティが高い信号
であるにもかかわらず、算出されるトナリティが非常に
低くなり、聴感からずれた測定になるという問題点があ
る。Therefore, in the conventional tonality calculation, since the linear prediction is performed for each spectrum, the prediction accuracy is deteriorated by vibrato, and the calculated tonality is extremely high even though the signal is originally a signal with high tonality in terms of hearing. There is a problem that the measurement becomes lower and the measurement is deviated from the hearing.

【００４６】問題点（３）図２７に示す処理（１）〜（５）の如く聴覚心理分析に
基づく必要Ｓ／Ｎ比（ＳＮreq〔ｓ〕）の計算は、一般
的には良好な結果をもたらすが、データの圧縮率が高
く、各サブバンドｓの量子化・逆量子化後のＳ／Ｎ比が
必要Ｓ／Ｎ比を下回る場合には問題が生じる。すなわ
ち、従来の方法では、聴覚心理分析による必要Ｓ／Ｎ比
が満足されない場合、全サブバンドｓのＳ／Ｎ比が平均
的に劣化する。Ｓ／Ｎ比が劣化するとその量に応じて徐
々にノイズが検知されるようになり、そのとき信号パワ
ーの大きなバンドの劣化ほど聴感上目立つ傾向にある。
したがって、従来の方法では、Ｓ／Ｎ比の劣化が検知で
きる状況では、音質的に最適とは言えなくなる。Problem (3) The calculation of the required S / N ratio (SNreq [s]) based on the psychoacoustic analysis as in the processing (1) to (5) shown in FIG. However, if the data compression ratio is high and the S / N ratio of each subband s after quantization / inverse quantization is lower than the required S / N ratio, a problem occurs. That is, in the conventional method, if the required S / N ratio by the psychoacoustic analysis is not satisfied, the S / N ratio of all subbands s is deteriorated on average. When the S / N ratio is deteriorated, noise is gradually detected in accordance with the amount, and at this time, the deterioration of a band having a large signal power tends to be more audible.
Therefore, the conventional method cannot be said to be optimal in sound quality in a situation where deterioration of the S / N ratio can be detected.

【００４７】ここで、上記問題点を軽減するために、従
来の方法では、必要Ｓ／Ｎ比が満足されない場合にはパ
ワーが小さなバンドの情報を削減し、より大きなバンド
に情報を割り当てる手法がとられる。しかしながら、こ
の手法では、例えば１バンド、１ビット分の情報を移動
する場合、移動元のＳ／Ｎ比は約６ｄＢ劣化し、移動先
のＳ／Ｎ比は約６ｄＢ向上するという極端なことにな
る。また、バンドパワーそのものによって補正を行うの
で、パワーの大きなバンド（例えば中低域）が重視され
過ぎるという新たな問題点が発生する。Here, in order to alleviate the above-mentioned problems, in the conventional method, when the required S / N ratio is not satisfied, information in a band with a small power is reduced and information is allocated to a larger band. Be taken. However, in this method, when information of one band and one bit is moved, for example, the S / N ratio of the movement source is deteriorated by about 6 dB, and the S / N ratio of the movement destination is improved by about 6 dB. Become. In addition, since the correction is performed using the band power itself, a new problem occurs in that a band having a large power (for example, a middle and low frequency range) is given too much importance.

【００４８】問題点（４）ところで、以上の説明では独立したオーディオ信号を高
能率符号化することを考えているが、他の用途として、
また、システムによっては高能率符号化した信号と高能
率符号化しない信号を伝送し、再生側でこれらの信号を
ミキシングして１つのオーディオ信号として再生等する
ことが考えられる。Problem (4) By the way, in the above description, it is considered that an independent audio signal is coded with high efficiency.
Further, depending on the system, it is conceivable to transmit a high-efficiency coded signal and a signal without high-efficiency coding, mix these signals on the reproducing side, and reproduce them as one audio signal.

【００４９】最も単純な例としては、例えば図３２に示
すようにチャネル（ＣＨ）−Ａのオーディオ信号をオー
ディオエンコーダ２０により高能率符号化し、ＣＨ−Ｂ
のオーディオ信号を高能率符号化しないでマルチプレッ
クス部２１により多重化して伝送する。そして、再生側
ではデマルチプレックス部２２によりチャネルを分離
し、オーディオデコーダ２３によりデコードした信号Ｃ
Ｈ−Ａ’とＣＨ−Ｂのオーディオ信号をミキサ２４によ
りミキシングする。As the simplest example, for example, as shown in FIG. 32, the audio signal of channel (CH) -A is encoded by the audio encoder 20 with high efficiency, and
Are multiplexed by the multiplex unit 21 and transmitted without high efficiency coding. On the reproduction side, the channel is separated by the demultiplex unit 22 and the signal C decoded by the audio decoder 23 is output.
The audio signals of HA ′ and CH−B are mixed by the mixer 24.

【００５０】また、他の例として図３３の（ａ）に示す
ようにＣＨ−Ａのオーディオ信号を高能率符号化すると
共に、電子楽器をコントロールするデジタル信号の国際
規格であるＭＩＤＩ（Musical Instrument Digital Int
erface）シーケンサ２５によりＣＨ−Ｂのオーディオ信
号をＭＩＤＩコード化し、マルチプレックス部２１によ
り多重化して伝送する。そして、再生側ではデマルチプ
レックス部２２によりチャネルを分離し、オーディオデ
コーダ２３によりデコードした信号ＣＨ−Ａ’とＭＩＤ
Ｉコードに基づいてＭＩＤＩ音源２６により演奏された
信号ＣＨ−Ｂ’をミキサ２４によりミキシングする。As another example, as shown in FIG. 33A, a CH-A audio signal is encoded with high efficiency, and MIDI (Musical Instrument Digital) which is an international standard of a digital signal for controlling an electronic musical instrument is used. Int
(Erface) The CH-B audio signal is converted into MIDI code by the sequencer 25 and multiplexed by the multiplex unit 21 for transmission. Then, on the reproduction side, the channel is separated by the demultiplex unit 22 and the signal CH-A ′ decoded by the audio decoder 23 and the MID
The signal CH-B ′ played by the MIDI sound source 26 is mixed by the mixer 24 based on the I code.

【００５１】この変形例としては図３３の（ｂ）に示す
ようにＣＨ−Ａのオーディオ信号を高能率符号化すると
共に、ＣＨ−Ｂ１、ＣＨ−Ｂ２の２チャネルをＭＩＤＩ
コード化し、再生側で信号ＣＨ−Ａ’とＭＩＤＩコード
から演奏された２チャネルのＣＨ−Ｂ１’、ＣＨ−Ｂ
２’をそれぞれミキサ２４−１、２４−２によりミキシ
ングして２チャネルで出力する。この中でも図３３の
（ｂ）に示すシステムは近年、ＭＩＤＩコードを用いた
通信カラオケに用いられ、高能率符号化する信号は肉声
コーラスなどが多い。As a modification, as shown in FIG. 33 (b), a CH-A audio signal is encoded with high efficiency, and two channels of CH-B1 and CH-B2 are MIDI-coded.
The two channels CH-B1 'and CH-B are coded and reproduced on the reproduction side from the signal CH-A' and the MIDI code.
2 ′ are mixed by mixers 24-1 and 24-2, respectively, and output on two channels. Among them, the system shown in FIG. 33B has recently been used for communication karaoke using a MIDI code, and the signal to be efficiently encoded is often a real voice chorus.

【００５２】しかしながら、高能率符号化するＣＨ−Ａ
のオーディオ信号は、オーディオエンコーダ１０のみに
よる聴覚心理分析でビット割り当てを行っているので、
再生側でミキシングされる側のＣＨ−Ｂの影響を考えて
いない。すなわち、再生側でミキシングを行った場合、
ＣＨ−Ａのオーディオ信号がＣＨ−Ｂの信号からのマス
キング効果による影響を受けることになり、したがっ
て、ＣＨ−Ａのオーディオ信号のみを聞く場合には最適
にエンコードされるが、他の信号をミキシングした場合
には音質的に最適とは言えなくなる。However, CH-A for high efficiency coding
Is assigned a bit by psychoacoustic analysis using only the audio encoder 10,
The effect of CH-B on the mixing side on the reproduction side is not considered. In other words, when mixing is performed on the playback side,
The CH-A audio signal will be affected by the masking effect from the CH-B signal, and thus will be optimally encoded when listening to only the CH-A audio signal, but will mix other signals. If so, the sound quality is not optimal.

【００５３】図３４はＣＨ−Ａの信号スペクトルと、Ｃ
Ｈ−Ａを聴覚心理分析したマスキングレベルＭ１
〔ｉ〕、及びミキシングの対象となる他のチャネルから
のマスキングレベルＭ２〔ｉ〕の一例を示し、低域と高
域ではＭ１＜Ｍ２であり、中域ではＭ１＞Ｍ２である。
この場合、ミキシング後のＣＨ−Ａの信号にとって最適
なマスキングレベルＭ〔ｉ〕は図２７に示す処理（３）
においてFIG. 34 shows the signal spectrum of CH-A and C
Masking level M1 obtained by psychological analysis of HA
[I] and an example of a masking level M2 [i] from another channel to be mixed, where M1 <M2 in the low band and the high band, and M1> M2 in the middle band.
In this case, the optimum masking level M [i] for the CH-A signal after mixing is determined by processing (3) shown in FIG.
At

【００５４】[0054]

【数１３】Ｍ〔ｉ〕＝max （Ｍ１〔ｉ〕，Ｍ２〔ｉ〕）但し、ｉ＝０〜ｍ−１M [i] = max (M1 [i], M2 [i]) where i = 0 to m−1

【００５５】と考えられる。図３２、図３３の（ａ）
（ｂ）に示すようにミキシングを行う場合にはマスキン
グレベルがこの最適値Ｍ〔ｉ〕からずれていることにな
り、聴感上最適とは言えないという問題点がある。特に
実際のノイズレベルがマスキングレベルと同等か又はそ
れ以上になるような圧縮率が高い場合には、聴感上にお
いても図３４においてＭ１〔ｉ〕＞Ｍ２〔ｉ〕となるよ
うな領域ではノイズが強調されて聞こえるという現象が
発生する。It is considered that: (A) of FIGS. 32 and 33
As shown in (b), when mixing is performed, the masking level deviates from this optimum value M [i], and there is a problem in that the masking level is not optimal in terms of hearing. In particular, when the compression ratio is high such that the actual noise level is equal to or higher than the masking level, noise is also reduced in the region where M1 [i]> M2 [i] in FIG. A phenomenon that sounds emphasized occurs.

【００５６】本発明は上記（１）（２）の問題点に鑑
み、マスキング基準カーブのオフセット量を演算する際
の演算量を減少し、聴覚心理をより満足させて音質を向
上させることができる音声高能率符号化装置を提供する
ことを目的とする。本発明はまた、上記（３）の問題点
に鑑み、データの圧縮率が高く、聴覚心理分析による必
要Ｓ／Ｎ比が満足されない場合に音質を向上させること
ができる音声高能率符号化装置を提供することを目的と
する。本発明はまた、上記（４）の問題点に鑑み、高能
率符号化した信号と高能率符号化しない信号を再生側で
ミキシングする場合に高能率符号化しない信号による影
響を考慮して聴覚心理分析を行って聴覚心理をより満足
させて音質を向上させることができる音声高能率符号化
装置を提供することを目的とする。In view of the above-mentioned problems (1) and (2), the present invention can reduce the amount of calculation when calculating the offset amount of the masking reference curve, and can improve the sound quality by further satisfying the psychological sense of hearing. It is an object of the present invention to provide a high-efficiency audio coding device. In view of the above-mentioned problem (3), the present invention also provides a high-efficiency audio coding apparatus capable of improving the sound quality when the data compression ratio is high and the required S / N ratio by psychoacoustic analysis is not satisfied. The purpose is to provide. In view of the above-mentioned problem (4), the present invention also provides a psychoacoustic system that takes into account the influence of a signal that is not efficiently coded when mixing a signal that has been efficiently coded and a signal that is not to be highly efficient. It is an object of the present invention to provide a high-efficiency speech coding apparatus capable of performing analysis to more satisfy psychoacoustics and improve sound quality.

【００５７】[0057]

【課題を解決するための手段】本発明は上記目的を達成
するために、直交変換係数からオーディオ信号のパワー
スペクトルを算出してこのパワースペクトルの自己相関
を予め定めた帯域毎に算出し、この自己相関の最大値と
最小値の比から聴覚心理上のマスキング効果のオフセッ
ト量を算出し、このオフセット量に基づいて各サブバン
ドの量子化ビット数を決定するようにしている。In order to achieve the above object, the present invention calculates a power spectrum of an audio signal from orthogonal transform coefficients and calculates an autocorrelation of the power spectrum for each predetermined band. An offset amount of the masking effect on psychoacoustics is calculated from a ratio between the maximum value and the minimum value of the autocorrelation, and the number of quantization bits of each subband is determined based on the offset amount.

【００５８】すなわち本発明によれば、オーディオ信号
を複数の周波数帯域のサブバンドに分割する分割手段
と、前記分割手段により分割された各サブバンドのオー
ディオ信号を可変の量子化ビット数で量子化及び符号化
する量子化・符号化手段と、前記分割手段又は別途の直
交変換手段により得られた直交変換係数からオーディオ
信号のパワースペクトルを算出してこのパワースペクト
ルの自己相関を予め定めた帯域毎に算出し、この自己相
関の最大値と最小値の比から聴覚心理上のマスキング効
果のオフセット量を算出し、このオフセット量に基づい
て前記量子化・符号化手段の各サブバンドの量子化ビッ
ト数を決定する聴覚心理分析手段とを有する音声高能率
符号化装置が提供される。That is, according to the present invention, the dividing means for dividing the audio signal into sub-bands of a plurality of frequency bands, and the audio signal of each sub-band divided by the dividing means is quantized with a variable number of quantization bits. And a quantizing / encoding means for encoding, and a power spectrum of the audio signal calculated from the orthogonal transform coefficient obtained by the dividing means or the separate orthogonal transform means, and the autocorrelation of the power spectrum is determined for each predetermined band. The offset amount of the masking effect on psychoacoustics is calculated from the ratio of the maximum value and the minimum value of the autocorrelation, and the quantization bit of each subband of the quantization / encoding means is calculated based on the offset amount. A high-efficiency speech coding device having a psychoacoustic analysis means for determining the number is provided.

【００５９】本発明はまた、オーディオ信号の周波数領
域の聴覚心理分析に基づいてサブバンド毎の第１の必要
Ｓ／Ｎ比を算出すると共にサブバンド毎の信号パワーか
ら聴覚的制御を含む二乗平均誤差最小理論により第２の
必要Ｓ／Ｎ比を算出し、第１、第２の必要Ｓ／Ｎ比を重
み付けして最終の必要Ｓ／Ｎ比を算出し、この最終の必
要Ｓ／Ｎ比に基づいて各サブバンドの量子化ビット数を
決定するようにしている。The present invention also calculates a first required S / N ratio for each subband based on psychoacoustic analysis of the frequency domain of the audio signal, and includes a root-mean-square method including auditory control from the signal power for each subband. The second required S / N ratio is calculated by the minimum error theory, the first and second required S / N ratios are weighted to calculate the final required S / N ratio, and this final required S / N ratio is calculated. , The number of quantization bits of each subband is determined.

【００６０】すなわち本発明によれば、オーディオ信号
を複数の周波数帯域のサブバンドに分割する分割手段
と、前記分割手段により分割された各サブバンドのオー
ディオ信号を可変の量子化ビット数で量子化及び符号化
する量子化・符号化手段と、オーディオ信号の周波数領
域の聴覚心理分析に基づいてサブバンド毎の第１の必要
Ｓ／Ｎ比を算出すると共にサブバンド毎の信号パワーか
ら聴覚的制御を含む二乗平均誤差最小理論により第２の
必要Ｓ／Ｎ比を算出し、前記第１、第２の必要Ｓ／Ｎ比
を重み付けして最終の必要Ｓ／Ｎ比を算出し、この最終
の必要Ｓ／Ｎ比に基づいて前記量子化・符号化手段の各
サブバンドの量子化ビット数を決定する聴覚心理分析手
段とを有する音声高能率符号化装置が提供される。That is, according to the present invention, the dividing means for dividing the audio signal into sub-bands of a plurality of frequency bands, and the audio signal of each sub-band divided by the dividing means is quantized with a variable number of quantization bits. And a quantization / encoding means for encoding, and a first required S / N ratio for each subband based on psychoacoustic analysis of a frequency domain of the audio signal, and auditory control based on signal power for each subband. The second required S / N ratio is calculated according to the root mean square error theory including the following, and the first and second required S / N ratios are weighted to calculate the final required S / N ratio. A high-efficiency speech coding apparatus is provided which includes psychoacoustic analysis means for determining the number of quantization bits of each subband of the quantization / coding means based on a required S / N ratio.

【００６１】本発明はまた、高能率符号化する第１のオ
ーディオ信号と、高能率符号化されず再生側で第１のオ
ーディオ信号とミキシングされる第２のオーディオ信号
をそれぞれ周波数領域で聴覚心理分析して第１、第２の
マスキングレベルを算出し、この第１、第２のマスキン
グレベルに基づいて最終のマスキングレベルを算出し、
この最終のマスキングレベルに基づいて各サブバンドの
量子化ビット数を決定するようにしている。The present invention also relates to a psychoacoustic system in which a first audio signal to be encoded with high efficiency and a second audio signal which is not encoded with high efficiency and mixed with the first audio signal on the reproduction side are respectively in the frequency domain. Analyzing to calculate first and second masking levels, calculating a final masking level based on the first and second masking levels,
The number of quantization bits of each subband is determined based on the final masking level.

【００６２】すなわち本発明によれば、高能率符号化す
る第１のオーディオ信号を複数の周波数帯域のサブバン
ドに分割する分割手段と、前記分割手段により分割され
た各サブバンドのオーディオ信号を可変の量子化ビット
数で量子化及び符号化する量子化・符号化手段と、前記
第１のオーディオ信号と、高能率符号化されず再生側で
前記第１のオーディオ信号とミキシングされる第２のオ
ーディオ信号をそれぞれ周波数領域で聴覚心理分析して
第１、第２のマスキングレベルを算出し、この第１、第
２のマスキングレベルに基づいて最終のマスキングレベ
ルを算出し、この最終のマスキングレベルに基づいて前
記量子化・符号化手段の各サブバンドの量子化ビット数
を決定する聴覚心理分析手段とを有する音声高能率符号
化装置が提供される。That is, according to the present invention, the dividing means for dividing the first audio signal to be encoded with high efficiency into sub-bands of a plurality of frequency bands, and the audio signal of each sub-band divided by the dividing means is variable. Quantization / encoding means for quantizing and encoding with the number of quantization bits, the first audio signal, and a second audio signal which is mixed with the first audio signal on the reproduction side without being encoded with high efficiency. Each of the audio signals is subjected to psychoacoustic analysis in the frequency domain to calculate first and second masking levels. A final masking level is calculated based on the first and second masking levels. And a psychoacoustic analysis means for determining the number of quantization bits of each subband of the quantization / coding means based on the speech / voice efficiency coding apparatus. .

【００６３】[0063]

【作用】本発明では、直交変換係数からオーディオ信号
のパワースペクトルを算出してこのパワースペクトルの
自己相関を予め定めた帯域毎に算出し、この自己相関の
最大値と最小値の比から聴覚心理上のマスキング効果の
オフセット量を算出し、このオフセット量に基づいて各
サブバンドの量子化ビット数を決定するので、マスキン
グ基準カーブのオフセット量を演算する際の演算量を減
少し、また、オーディオ信号にビブラートがかかってい
る場合にも聴覚心理をより満足させて音質を向上させる
ことができる。In the present invention, the power spectrum of the audio signal is calculated from the orthogonal transform coefficients, the autocorrelation of the power spectrum is calculated for each predetermined band, and the psychoacoustic is calculated from the ratio of the maximum value and the minimum value of the autocorrelation. Since the offset amount of the above masking effect is calculated, and the number of quantization bits of each subband is determined based on the offset amount, the amount of calculation when calculating the offset amount of the masking reference curve is reduced, and Even when vibrato is applied to the signal, the auditory psychology can be further satisfied and the sound quality can be improved.

【００６４】また、本発明では、オーディオ信号の周波
数領域の聴覚心理分析に基づいてサブバンド毎の第１の
必要Ｓ／Ｎ比を算出すると共にサブバンド毎の信号パワ
ーから聴覚的制御を含む二乗平均誤差最小理論により第
２の必要Ｓ／Ｎ比を算出し、第１、第２の必要Ｓ／Ｎ比
を重み付けして最終の必要Ｓ／Ｎ比を算出し、この最終
の必要Ｓ／Ｎ比に基づいて各サブバンドの量子化ビット
数を決定するので、データの圧縮率が高く、聴覚心理分
析による必要Ｓ／Ｎ比が満足されない場合に音質を向上
させることができる。Further, in the present invention, the first necessary S / N ratio for each subband is calculated based on the psychoacoustic analysis of the frequency domain of the audio signal, and the square including the auditory control is calculated from the signal power for each subband. The second required S / N ratio is calculated by the average error minimum theory, the first and second required S / N ratios are weighted to calculate the final required S / N ratio, and the final required S / N ratio is calculated. Since the number of quantization bits of each subband is determined based on the ratio, the data compression ratio is high, and the sound quality can be improved when the required S / N ratio by psychoacoustic analysis is not satisfied.

【００６５】また、本発明では、高能率符号化する第１
のオーディオ信号と、高能率符号化されず再生側で第１
のオーディオ信号とミキシングされる第２のオーディオ
信号をそれぞれ周波数領域で聴覚心理分析して第１、第
２のマスキングレベルを算出し、この第１、第２のマス
キングレベルに基づいて最終のマスキングレベルを算出
し、この最終のマスキングレベルに基づいて各サブバン
ドの量子化ビット数を決定するので、高能率符号化した
信号と高能率符号化しない信号を再生側でミキシングす
る場合に高能率符号化しない信号による影響を考慮して
聴覚心理分析を行って聴覚心理をより満足させて音質を
向上させることができる。Further, according to the present invention, the first efficient coding is performed.
Audio signal and the first signal on the playback side without high efficiency encoding
The audio signal and the second audio signal to be mixed are each subjected to psychoacoustic analysis in the frequency domain to calculate first and second masking levels, and based on the first and second masking levels, a final masking level is calculated. Is calculated, and the number of quantization bits of each subband is determined based on the final masking level. Therefore, when the high-efficiency coded signal and the non-high-efficiency coded signal are mixed on the reproduction side, high-efficiency coding is performed. The psychoacoustic analysis is performed in consideration of the influence of the no-signal, so that the psychoacoustic can be more satisfied and the sound quality can be improved.

【００６６】[0066]

【実施例】以下、図面を参照して本発明の実施例につい
て説明する。図１は本発明に係る音声高能率符号化装置
の第１実施例を示すブロック図、図２は図１の変形例を
示すブロック図、図３は図１の他の変形例を示すブロッ
ク図、図４は前後のサブバンドとのパワースペクトルの
自己相関を算出する場合を示す説明図、図５はオフセッ
ト量を算出する処理を説明するためのフローチャート、
図６はビブラートが存在するオーディオ信号のスペクト
ルの一例を示す説明図、図７は従来技術のトナリティ算
出方法と第１実施例の自己相関方法により求めたオフセ
ット量を比較した説明図である。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a first embodiment of a high-efficiency audio coding apparatus according to the present invention, FIG. 2 is a block diagram showing a modification of FIG. 1, and FIG. 3 is a block diagram showing another modification of FIG. FIG. 4 is an explanatory diagram showing a case of calculating the autocorrelation of the power spectrum with the preceding and following subbands, and FIG. 5 is a flowchart for explaining a process of calculating the offset amount.
FIG. 6 is an explanatory diagram showing an example of a spectrum of an audio signal in which vibrato exists, and FIG. 7 is an explanatory diagram comparing an offset amount obtained by a conventional tonality calculation method and an autocorrelation method of the first embodiment.

【００６７】図１に示す第１実施例はオーディオ信号の
帯域分割を直交変換により行う場合を示している。図１
において、例えば１６ビットＰＣＭオーディオ信号が窓
掛け・切出し部１により５１２サンプル分切り出され、
各サンプルのオーディオ信号が直交変換部２によりＤＣ
ＴやＦＦＴ等により直交変換され、複数のサブバンドｓ
に分割される。The first embodiment shown in FIG. 1 shows a case where the band division of an audio signal is performed by orthogonal transformation. FIG.
In, for example, a 16-bit PCM audio signal is cut out by the windowing / cutout unit 1 for 512 samples,
The audio signal of each sample is converted into a DC signal by the orthogonal transform unit 2.
Orthogonal transform by T, FFT, etc., and a plurality of subbands s
Is divided into

【００６８】そして、聴覚心理分析部３によりマスキン
グ基準カーブのオフセット量Ｆが算出されて量子化ビッ
ト数が決定され、量子化・符号化部４はこの量子化ビッ
ト数で、直交変換部２により分割された各サブバンドｓ
のオーディオ信号を量子化及び符号化する。この量子化
・符号化部４により量子化および符号化されて圧縮され
たデータと、聴覚心理分析部３により決定された量子化
ビット数はマルチプレックス部５により多重化されてＭ
ＤやＤＣＣ等に出力される。なお、伸長時には圧縮デー
タは各サブバンドｓ毎の量子化ビット数に基づいて逆量
子化及び復号化される。Then, the psychoacoustic analyzer 3 calculates the offset amount F of the masking reference curve, determines the number of quantization bits, and the quantization / encoding unit 4 uses the number of quantization bits to Each divided subband s
Is quantized and encoded. The data quantized and encoded by the quantization / encoding unit 4 and compressed, and the number of quantization bits determined by the psychoacoustic analysis unit 3 are multiplexed by the multiplex unit 5 to obtain M
Output to D, DCC, etc. At the time of decompression, the compressed data is inversely quantized and decoded based on the number of quantization bits for each subband s.

【００６９】図２に示す変形例では、入力オーディオ信
号がデジタルフィルタ６によりサブバンドｓに分割さ
れ、量子化・符号化部４により量子化および符号化され
て圧縮されたデータと、聴覚心理分析部３により決定さ
れた量子化ビット数はマルチプレックス部５により多重
化されるように構成されている。ここで、フィルタバン
クによるサブバンド分割方法では、本発明が必要とする
低域のバンド分解能を得ることができないので、図１に
示す場合と同様に、切出し部１により切り出された各サ
ンプルのオーディオ信号が直交変換部２により複数のサ
ブバンドｓに分割され、聴覚心理分析部３によりマスキ
ング基準カーブのオフセット量Ｆが算出されて量子化・
符号化部４の量子化ビット数が決定される。In the modification shown in FIG. 2, the input audio signal is divided into sub-bands s by the digital filter 6, quantized and encoded by the quantizing / encoding unit 4 and compressed, and the psychoacoustic analysis is performed. The number of quantization bits determined by the unit 3 is configured to be multiplexed by the multiplex unit 5. Here, in the sub-band division method using the filter bank, the low band resolution required by the present invention cannot be obtained. Therefore, as in the case shown in FIG. The signal is divided into a plurality of sub-bands s by the orthogonal transformation unit 2, and the offset amount F of the masking reference curve is calculated by the psychoacoustic analysis unit 3, and is quantized.
The number of quantization bits of the encoding unit 4 is determined.

【００７０】図３に示す変形例では、オーディオ信号用
の系列と量子化ビット数決定用の窓掛け・切出し部１
ａ、１ｂ、直交変換部２ａ、２ｂ（及びオフセット算出
量算出部７）が設けられている。このように２系列で構
成した場合には、直交変換部２ａ、２ｂのポイント数が
異なるように、例えば直交変換部２ａは１０２４ポイン
ト、直交変換部２ｂは２０４８ポイントのように構成す
ることができる。In the modification shown in FIG. 3, a sequence for an audio signal and a windowing / cutout unit 1 for determining the number of quantization bits are provided.
a, 1b, orthogonal transform units 2a, 2b (and an offset calculation amount calculation unit 7) are provided. In the case where the orthogonal transform units 2a and 2b are configured in such a manner, the orthogonal transform units 2a and 2b may be configured to have different numbers of points. .

【００７１】次に、図４を参照して前後のサブバンドと
のパワースペクトルの自己相関を算出する処理について
説明する。予め決められたサブバンドｓとその前後のサ
ブバンドｓ−１、ｓ＋１のパワースペクトルが例えば図
４の（ａ）に示すような場合、サブバンドｓ内のスペク
トルとその前後のサブバンドｓ−１、ｓ＋１のパワース
ペクトルとの間で自己相関を計算する。そして、その結
果が図４の（ｂ）に示すような場合には自己相関値の最
大値と最小値の比を対数変換してオフセット量Ｆを算出
する。これにより、高調波成分がハッキリしたトーンラ
イクな信号の場合にはオフセット量Ｆは大きくなり、逆
にノイズライクな場合にはオフセット量Ｆは小さくな
る。なお、図４の（ｂ）に示すようにスライド量＝０と
その周辺の位置は最大値検索から除外する。Next, the processing for calculating the autocorrelation of the power spectrum with the preceding and succeeding subbands will be described with reference to FIG. For example, when the power spectrum of the predetermined sub-band s and the sub-bands s−1 and s + 1 before and after the sub-band s are as shown in FIG. , S + 1 is calculated. If the result is as shown in FIG. 4B, the ratio of the maximum value and the minimum value of the autocorrelation value is logarithmically converted to calculate the offset amount F. As a result, the offset F increases in the case of a tone-like signal with clear harmonic components, and decreases in the case of a noise-like signal. Note that, as shown in FIG. 4B, the slide amount = 0 and its peripheral position are excluded from the maximum value search.

【００７２】次に、図５を参照してマスキング基準カー
ブのオフセット量Ｆを算出する処理〜について説明
する。図５は一例として直交変換として２ｑポイントＦ
ＦＴを用いた場合を示し、この直交変換のポイント数２
ｑについては１０２４〜２０４８程度の値であることが
望ましい。図５において、先ず、直交変換係数の実数部Real〔ｊ〕と虚数部Imag〔ｊ〕
からパワースペクトルｐ〔ｊ〕を算出する。Next, the processing for calculating the offset amount F of the masking reference curve will be described with reference to FIG. FIG. 5 shows, as an example, 2q points F as an orthogonal transformation.
This shows a case where FT is used, and the number of points of this orthogonal transformation is 2
It is desirable that q is a value of about 1024 to 2048. In FIG. 5, first, the real part Real [j] and the imaginary part Imag [j] of the orthogonal transform coefficient
From the power spectrum p [j].

【００７３】[0073]

【数１４】ｐ〔ｊ〕＝Real〔ｊ〕² ＋Imag〔ｊ〕² 但し、ｊ＝０〜ｑ−１## EQU14 ## p [j] = Real [j] ² + Imag [j] ² where j = 0 to q-1

【００７４】次に、予め決められたバンド毎に自己相
関Ｓc 〔ｓ〕〔ｉ〕を求める。ｑサンプルをｎ個のバン
ドに分割した場合には、Next, the autocorrelation Sc [s] [i] is determined for each predetermined band. If q samples are divided into n bands,

【００７５】[0075]

【数１５】 (Equation 15)

【００７６】最後に、各バンド毎に自己相関Ｓc
〔ｓ〕〔ｉ〕の最大値と最小値からオフセット量Ｆ
〔ｓ〕を算出する。Finally, the autocorrelation Sc for each band
[S] Offset amount F from maximum and minimum values of [i]
[S] is calculated.

【００７７】[0077]

【数１６】 (Equation 16)

【００７８】図６の（ａ）（ｂ）は、図１〜図３に示す
直交変換部２、２ａのポイント数が１０２４であって、
ビブラートが存在するオーディオ信号のスペクトルが１
０２４ポイント（２３ｍsec ）ずれた場合を示し、図か
ら明らかなようにピークがオフセットしていることがわ
かる。図７は従来例のトナリティ算出方法と本実施例の
自己相関方法により３２バンド毎に求めたオフセット量
Ｆを示し、図から明らかなようにこのオーディオ信号は
聴感上、中低域がトーンライクであって情報量が多く、
本実施例の自己相関方法によるオフセット量Ｆが聴感に
一致している．FIGS. 6A and 6B show that the number of points of the orthogonal transform units 2 and 2a shown in FIGS.
The spectrum of the audio signal with vibrato is 1
This shows a case where the peak is shifted by 024 points (23 msec), and the peak is offset as is clear from the figure. FIG. 7 shows the offset amount F obtained for each of the 32 bands by the conventional tonality calculation method and the autocorrelation method of the present embodiment. As is apparent from FIG. There is a lot of information,
The offset amount F according to the autocorrelation method of this embodiment matches the audibility.

【００７９】また、このオフセット量Ｆを求めるための
演算量は、例えば図３に示す直交変換部２ｂの２０４８
ＦＦＴポイントを含む場合を例にし、また、図４に示す
処理〜では関数演算を１００回、乗算を１回、除算
を２０回と仮定すると約９０，０００回になり、従来例
のトナリティ算出方法による約１８０，０００回に比べ
て半減させることができる。The calculation amount for obtaining the offset amount F is, for example, 2048 of the orthogonal transformation unit 2b shown in FIG.
Assuming that the FFT point is included, the processing shown in FIG. 4 is approximately 90,000 times assuming that the function operation is performed 100 times, the multiplication is performed once, and the division is performed 20 times. Can be halved compared to about 180,000 times.

【００８０】次に、図８〜図１４を参照して本発明の第
２実施例について説明する。図８は第２実施例の音声高
能率符号化装置を示すブロック図、図９はノイズ・シェ
イピング・ファクタと量子化ノイズの関係を示す説明
図、図１０は第１、第２の必要Ｓ／Ｎから最終の必要Ｓ
／Ｎを算出する処理を説明するためのフローチャート、
図１１は最終の必要Ｓ／Ｎを算出する際の重み付け関数
を示す説明図、図１２はＳ／Ｎ比の劣化が検知されやす
いソースのスペクトルを示す説明図、図１３は図１２に
示すソースのＳ／Ｎ比を示す説明図、図１４は従来例と
第２実施例において音質の比較結果を示す説明図であ
る。Next, a second embodiment of the present invention will be described with reference to FIGS. FIG. 8 is a block diagram showing a high-efficiency audio coding apparatus according to a second embodiment, FIG. 9 is an explanatory diagram showing a relationship between a noise shaping factor and quantization noise, and FIG. 10 is a diagram showing first and second necessary S / S. From N to the final required S
A flowchart for explaining a process of calculating / N;
11 is an explanatory diagram showing a weighting function for calculating the final required S / N, FIG. 12 is an explanatory diagram showing a spectrum of a source in which deterioration of the S / N ratio is easily detected, and FIG. 13 is a source diagram shown in FIG. FIG. 14 is an explanatory diagram showing a comparison result of sound quality between the conventional example and the second embodiment.

【００８１】図８に示す第２実施例は、第１の必要Ｓ／
Ｎ比を算出等する聴覚心理分析部３と、サブバンドｓ毎
の信号パワーに基づいて二乗平均誤差最小理論により第
２の必要Ｓ／Ｎ比を算出等する第２の必要Ｓ／Ｎ算出
（及び最終必要Ｓ／Ｎ算出）部８とビット割り当て部９
を有する。第１の必要Ｓ／Ｎは、従来例と同様に純粋に
マスキング効果を中心とした聴覚心理モデルにより求め
られ、第２の必要Ｓ／Ｎ比は、各サブバンドｓ毎の信号
パワーに対して量子化ノイズを聴覚的に制御するパラメ
ータを加えた二乗平均誤差最小理論に基づいて求められ
る。In the second embodiment shown in FIG. 8, the first necessary S /
Psychoacoustic analyzer 3 for calculating the N ratio, and second required S / N calculation for calculating the second required S / N ratio based on the root mean square error theory based on the signal power of each subband s ( And final required S / N calculation) section 8 and bit allocation section 9
Having. The first necessary S / N is obtained by a psychoacoustic model centering on the masking effect purely as in the conventional example, and the second required S / N ratio is determined based on the signal power of each subband s. It is obtained based on the root mean square error theory to which a parameter for controlling the quantization noise aurally is added.

【００８２】ここで、後者では前者に比較してパワーが
大きなバンドの必要Ｓ／Ｎ比が若干強調される傾向にあ
る。そこで、先ず、各サブバンドの第２の必要Ｓ／Ｎ比
のトータルの平均値が第２の必要Ｓ／Ｎ比のそれと一致
するように第１の必要Ｓ／Ｎ比を正規化する。この理由
は、あくまでも第１の必要Ｓ／Ｎ比が聴覚心理と一致し
た量であって第２の必要Ｓ／Ｎ比はその補助のために用
いるものであり、さもないと第１、第２の必要Ｓ／Ｎ比
の平均値の間に差があると誤動作するからである。Here, in the latter, the required S / N ratio of a band having a higher power tends to be slightly emphasized as compared with the former. Therefore, first, the first required S / N ratio is normalized such that the total average value of the second required S / N ratio of each subband matches that of the second required S / N ratio. The reason is that the first required S / N ratio is an amount consistent with the psychoacoustic psychology, and the second required S / N ratio is used for the assistance, otherwise the first and second required S / N ratios are used. This is because a malfunction occurs if there is a difference between the average values of the required S / N ratios.

【００８３】最後に、第１の必要Ｓ／Ｎ比と正規化した
第２の必要Ｓ／Ｎ比とを重み付けして加算し、最終の必
要Ｓ／Ｎ比を得て各サブバンドｓの量子化ビット数を決
定する。この場合、重み比率としては、例えば第１の必
要Ｓ／Ｎ比：第２の必要Ｓ／Ｎ比＝０．７：０．３のよ
うに第１の必要Ｓ／Ｎ比を重視して加算する。以上の方
法により、圧縮率が高く、Ｓ／Ｎ比が検知される場合に
も聴覚上の劣化を最小限に抑えることができ、また、パ
ワーの大きなバンドが重視され過ぎるという問題も発生
しない。Finally, the first required S / N ratio and the normalized second required S / N ratio are weighted and added to obtain the final required S / N ratio to obtain the quantum of each subband s. Determine the number of coded bits. In this case, the weighting ratio is added with emphasis on the first necessary S / N ratio, for example, first required S / N ratio: second required S / N ratio = 0.7: 0.3. I do. According to the above method, even when the compression ratio is high and the S / N ratio is detected, auditory deterioration can be minimized, and the problem that a band with a large power is overemphasized does not occur.

【００８４】次に、二乗平均誤差最小理論によりビット
を配分する手法について説明する。一般に、音声波形は
ガウス過程で近似できると言われており、この場合、量
子化後の二乗平均誤差を最小にするビット配分（各バン
ドｓのビット数）bit 〔ｓ〕は、伝送速度−歪み理論か
ら次式１（数１７）のように表される。Next, a method of allocating bits according to the root mean square error theory will be described. In general, it is said that a speech waveform can be approximated by a Gaussian process. In this case, the bit allocation (the number of bits of each band s) bit [s] that minimizes the root mean square error after quantization is represented by the transmission rate-distortion. From the theory, it is expressed as in the following Expression 1 (Equation 17).

【００８５】[0085]

【数１７】 [Equation 17]

【００８６】そして、実際にはbit 〔ｓ〕の総和が使用
可能ビット数になるように上記係数ａ、ｂが調整され
る。ここで、式１（数１７）は聴覚制御を行わない場合
を示し、得られるbit 〔ｓ〕はバンドパワーを強く反映
したものであり、その結果の量子化ノイズはＰＣＭコー
ディングと同様のホワイトノイズとなる。そこで、本実
施例では、聴覚制御を行う際に式１（数１７）に対して
ウェイトファクタｗ〔ｓ〕を追加して次式２（数１８）
を得る。In practice, the coefficients a and b are adjusted so that the sum of bit [s] becomes the number of usable bits. Here, Equation 1 (Equation 17) shows a case where no auditory control is performed, and the obtained bit [s] strongly reflects the band power, and the resulting quantization noise is white noise similar to PCM coding. Becomes Therefore, in this embodiment, when performing auditory control, a weighting factor w [s] is added to Expression 1 (Equation 17), and the following Expression 2 (Equation 18) is obtained.
Get.

【００８７】[0087]

【数１８】 (Equation 18)

【００８８】式２（数１８）におけるノイズ・シェイピ
ング・ファクタγは−１．０〜０．０の範囲の値を取
り、γ＝０．０の場合に式１（数１７）と一致する。逆
にγ＝−１．０の場合には式２（数１８）のビット配分
bit 〔ｓ〕は定数となり、バンド毎の量子化ビット数は
同一となる。図９はγ＝−１．０〜０．０の場合の量子
化ノイズを示し、一般にはγ＝−０．２〜−０．１程度
のときに聴感と良く一致すると考えられている。The noise shaping factor γ in Equation 2 (Equation 18) takes a value in the range of −1.0 to 0.0, and coincides with Equation 1 (Equation 17) when γ = 0.0. Conversely, when γ = −1.0, the bit allocation of Expression 2 (Equation 18)
bit [s] is a constant, and the number of quantization bits for each band is the same. FIG. 9 shows quantization noise in the case of γ = −1.0 to 0.0. Generally, it is considered that the audibility matches well when γ = −0.2 to −0.1.

【００８９】次に、図１０を参照して第１、第２の必要
Ｓ／Ｎから最終の必要Ｓ／Ｎを算出する各処理〜
を説明する。先ず、直交変換係数からバンドトータルパワーＰ
〔ｓ〕を算出する。例えばｑ本のスペクトルをｎバンド
に分割する場合には、Next, referring to FIG. 10, each processing for calculating the final required S / N from the first and second required S / N
Will be described. First, from the orthogonal transform coefficients, the band total power P
[S] is calculated. For example, when dividing q spectra into n bands,

【００９０】[0090]

【数１９】 [Equation 19]

【００９１】次に、予め定めたバンド平均Ｓ／Ｎ比
（ＳＮavr ）から全バンド平均誤差パワー（定数）ｂを
求める。Next, the entire band average error power (constant) b is determined from the predetermined band average S / N ratio (SNavr).

【００９２】[0092]

【数２０】 (Equation 20)

【００９３】式２（数１８）により各バンドｓのビッ
ト配分bit 〔ｓ〕を算出する。The bit allocation bit [s] of each band s is calculated by Equation 2 (Equation 18).

【００９４】[0094]

【数２１】bit 〔ｓ〕＝ａ＋０．５・log ２（ｗ〔ｓ〕
・Ｐ〔ｓ〕／ｂ）[Equation 21] bit [s] = a + 0.5 · log 2 (w [s]
・ P [s] / b)

【００９５】ビット配分bit 〔ｓ〕より仮の第２の必
要Ｓ／Ｎ比（＝ＳＮreq'〔ｓ〕）を算出する。A temporary second required S / N ratio (= SNreq '[s]) is calculated from the bit allocation bit [s].

【００９６】[0096]

【数２２】ＳＮreq'〔ｓ〕＝６．０２・bit 〔ｓ〕〔ｄＢ〕[Mathematical formula-see original document] SNreq '[s] = 6.02.bit [s] [dB]

【００９７】第１の必要Ｓ／Ｎ比と仮の第２の必要Ｓ
／Ｎ比の各平均値ＳＮreq 〔ｓ〕＿avr 、ＳＮreq'
〔ｓ〕＿avr を算出する。First required S / N ratio and provisional second required S
/ N ratio average value SNreq [s] _avr, SNreq '
[S] _avr is calculated.

【００９８】[0098]

【数２３】 (Equation 23)

【００９９】仮の第２の必要Ｓ／Ｎ比の平均値ＳＮre
q'〔ｓ〕＿avr を正規化し、第２の必要Ｓ／Ｎ比（ＳＮ
req ２〔ｓ〕）を得る。Temporary average value SNre of required second S / N ratio
q ′ [s] _avr is normalized to obtain a second required S / N ratio (SN
req 2 [s]).

【０１００】[0100]

【数２４】ＳＮreq ２〔ｓ〕＝ＳＮreq'〔ｓ〕・（ＳＮreq ＿avr ／ＳＮreq'＿avr ）〔ｄＢ〕[Formula 24] SNreq 2 [s] = SNreq '[s] (SNreq_avr / SNreq'_avr) [dB]

【０１０１】第１の必要Ｓ／Ｎ比の平均値（ＳＮreq
〔ｓ〕＿avr ）をパラメータとして、第１の必要Ｓ／Ｎ
比（ＳＮreq 〔ｓ〕）と第２の必要Ｓ／Ｎ比（ＳＮreq
２〔ｓ〕）から最終の必要Ｓ／Ｎ比（ＳＮreq ＿fin
〔ｓ〕）を求める。The average value of the first required S / N ratio (SNreq
[S] _avr) as a parameter, the first required S / N
Ratio (SNreq [s]) and the second required S / N ratio (SNreq
2 [s]) to the final required S / N ratio (SNreq_fin)
[S]).

【０１０２】[0102]

【数２５】ＳＮreq ＿fin 〔ｓ〕＝ｆ〔ＳＮreq ＿avr 〕・ＳＮreq 〔ｓ〕＋（１．０−ｆ〔ＳＮreq ＿avr 〕）・ＳＮreq ２〔ｓ〕〔ｄＢ〕[Mathematical formula-see original document] SNreq_fin [s] = f [SNreq_avr] · SNreq [s] + (1.0-f [SNreq_avr]) · SNreq2 [s] [dB]

【０１０３】ここで、ｆ〔ｘ〕は図１１に示すように、
０．０〜１．０の範囲の値の重み付け関数であり、第１
の必要Ｓ／Ｎ比の平均値（ＳＮreq 〔ｓ〕＿avr ）が大
きい場合には第２の必要Ｓ／Ｎ比（ＳＮreq ２〔ｓ〕）
が増加するように設定される。Here, f [x] is, as shown in FIG.
A weighting function for values in the range of 0.0 to 1.0,
When the average value of the required S / N ratio (SNreq [s] _avr) is large, the second required S / N ratio (SNreq 2 [s])
Is set to increase.

【０１０４】ここで、図１２に示すようにＳ／Ｎ比の劣
化が検知されやすいソースのスペクトルの場合、図１３
の（ａ）に示すように第１、第２の必要Ｓ／Ｎ比はそれ
ぞれ太線、細線のような値となり、また、図１３の
（ｂ）に示すように第１の必要Ｓ／Ｎ比と最終の必要Ｓ
／Ｎ比はそれぞれ太線、細線のような値となる。このよ
うなソースの場合、パワーが大きな２〜４ｋＨｚ付近の
Ｓ／Ｎ比が補正され、したがって、聴感上のＳ／Ｎ比も
改善することができる。また、図１４に示すように従来
例と本実施例における３つの第１の必要Ｓ／Ｎ比を比較
した場合、本実施例によれば第１の必要Ｓ／Ｎ比の平均
値（ＳＮreq 〔ｓ〕＿avr ）が大きい場合に改善効果が
大きいことが分かる。Here, as shown in FIG. 12, in the case of a source spectrum in which deterioration of the S / N ratio is easily detected, FIG.
As shown in FIG. 13A, the first and second required S / N ratios take values like thick lines and thin lines, respectively, and the first required S / N ratio as shown in FIG. And the final required S
The / N ratio has a value like a thick line and a thin line, respectively. In the case of such a source, the S / N ratio near 2 to 4 kHz where the power is large is corrected, and therefore, the S / N ratio on the audibility can be improved. Further, as shown in FIG. 14, when comparing the three first required S / N ratios in the conventional example and the present embodiment, according to the present embodiment, the average value of the first required S / N ratio (SNreq [ It can be seen that the improvement effect is large when [s] _avr) is large.

【０１０５】次に、本発明の第３実施例について説明す
る。図１５は第３実施例の音声高能率符号化装置を示す
ブロック図、図１６は図１５の音声高能率符号化装置の
変形例を示すブロック図、図１７は図１５及び図１６の
オーディオエンコーダの一例を詳細に示すブロック図、
図１８は２チャネル間の同期が十分な精度で保証されて
いる場合の必要Ｓ／Ｎ比算出処理を説明するためのフロ
ーチャート、図１９は２チャネル間の同期精度が悪い場
合の必要Ｓ／Ｎ比算出処理を説明するためのフローチャ
ート、図２０は従来技術と本実施例によるミキシング後
のＭＮＲを比較した説明図、図２１は従来技術と本実施
例によるミキシング後の音質評価を比較した説明図であ
る。Next, a third embodiment of the present invention will be described. FIG. 15 is a block diagram showing a high-efficiency audio coding apparatus according to a third embodiment, FIG. 16 is a block diagram showing a modification of the high-efficiency audio coding apparatus shown in FIG. 15, and FIG. 17 is an audio encoder shown in FIGS. Block diagram showing an example of
FIG. 18 is a flowchart for explaining a necessary S / N ratio calculation process when synchronization between two channels is guaranteed with sufficient accuracy, and FIG. 19 is a necessary S / N ratio when synchronization accuracy between two channels is poor. FIG. 20 is a flowchart for explaining a ratio calculation process, FIG. 20 is an explanatory diagram comparing MNR after mixing according to the prior art and the present embodiment, and FIG. 21 is an explanatory diagram comparing sound quality evaluation after mixing according to the prior art and the present embodiment. It is.

【０１０６】図１５に示す第３実施例は図３２に示すオ
ーディオエンコーダ２０に適用した場合のものである。
この場合は、エンコーダ２０により高能率符号化するＣ
Ｈ−Ａの第１のオーディオ信号と、高能率符号化されず
再生側で第１のオーディオ信号とミキシングされるＣＨ
−Ｂの第２のオーディオ信号をそれぞれ周波数領域で聴
覚心理分析して第１、第２のマスキングレベルを算出
し、この第１、第２のマスキングレベルに基づいて最終
のマスキングレベルを算出し、この最終のマスキングレ
ベルに基づいて各サブバンドの量子化ビット数を決定し
て第１のオーディオ信号を量子化および符号化し、ビッ
トストリームとして出力する。このビットストリームと
ＣＨ−Ｂの高能率符号化されない信号がマルチプレック
ス部２１により多重化される。The third embodiment shown in FIG. 15 is a case where the third embodiment is applied to the audio encoder 20 shown in FIG.
In this case, C to be efficiently encoded by the encoder 20 is used.
The first audio signal of H-A and the CH mixed with the first audio signal on the reproduction side without being encoded with high efficiency
-A psychoacoustic analysis of the second audio signal in the frequency domain to calculate first and second masking levels, and calculate a final masking level based on the first and second masking levels; Based on the final masking level, the number of quantization bits of each subband is determined, and the first audio signal is quantized and encoded, and is output as a bit stream. The bit stream and the CH-B non-efficiently coded signal are multiplexed by the multiplex unit 21.

【０１０７】また、図１６に示す第３実施例は図３３の
（ａ）に示すオーディオエンコーダ２０に適用した場合
を示している。この場合には、ＭＩＤＩシーケンサ２５
によりＣＨ−Ｂのオーディオ信号をＭＩＤＩコード化
し、ＭＩＤＩコードに基づいてＭＩＤＩ音源２６により
演奏された信号ＣＨ−Ｂ’を生成し、エンコーダ２０に
よりＣＨ−Ａの第１のオーディオ信号と信号ＣＨ−Ｂ’
の第１、第２のマスキングレベルを算出し、この第１、
第２のマスキングレベルに基づいて最終のマスキングレ
ベルを算出し、この最終のマスキングレベルに基づいて
各サブバンドの量子化ビット数を決定して第１のオーデ
ィオ信号を量子化および符号化し、ビットストリームと
して出力する。このビットストリームとＭＩＤＩコード
はマルチプレックス部２１により多重化される。The third embodiment shown in FIG. 16 shows a case where the third embodiment is applied to the audio encoder 20 shown in FIG. In this case, the MIDI sequencer 25
Converts the CH-B audio signal into a MIDI code, generates a signal CH-B 'played by the MIDI sound source 26 based on the MIDI code, and outputs the CH-A first audio signal and the signal CH-B by the encoder 20. '
First and second masking levels are calculated, and the first and second masking levels are calculated.
Calculating a final masking level based on the second masking level, determining the number of quantization bits for each subband based on the final masking level, quantizing and encoding the first audio signal, Output as The bit stream and the MIDI code are multiplexed by the multiplex unit 21.

【０１０８】そして、再生側では図３３の（ａ）に示す
ように、デマルチプレックス部２２によりチャネルを分
離し、オーディオデコーダ２３によりデコードした信号
ＣＨ−Ａ’とＭＩＤＩコードに基づいてＭＩＤＩ音源２
６により演奏された信号ＣＨ−Ｂ’をミキサ２４により
ミキシングする。On the reproduction side, as shown in FIG. 33A, the channels are separated by the demultiplex unit 22, and the MIDI sound source 2 is separated based on the signal CH-A 'decoded by the audio decoder 23 and the MIDI code.
The signal CH-B ′ played by the mixer 6 is mixed by the mixer 24.

【０１０９】図１７に示すエンコーダ２０は一例として
オーディオ信号の帯域分割を直交変換により行い、もち
ろん帯域分割を図２に示すようにデジタルフィルタ６に
より行う場合にも適用することができる。図１７におい
て、高能率符号化を行うチャネルＣＨ−Ａの信号と、高
能率符号化を行わず再生側でミキシングするＣＨ−Ｂの
信号は、それぞれ窓掛け・切出し部１Ａ、１Ｂ及び直交
変換部２Ａ、２Ｂによりサブバンドに分割され、聴覚心
理分析部３Ａ、３Ｂに印加される。なお、ＣＨ−Ａ、Ｃ
Ｈ−Ｂの信号の再生側のミキシング比率が１：１でない
場合には、その比率を考慮したＣＨ−Ａ、ＣＨ−Ｂ間の
レベルが調整される（レベル調整部１１）。The encoder 20 shown in FIG. 17 can be applied to, for example, a case where the band division of an audio signal is performed by orthogonal transform, and the band division is performed by the digital filter 6 as shown in FIG. In FIG. 17, a signal of a channel CH-A for performing high-efficiency coding and a signal of CH-B to be mixed on the reproduction side without performing high-efficiency coding are windowed / cut out sections 1A and 1B and an orthogonal transform section, respectively. It is divided into subbands by 2A and 2B and applied to the psychoacoustic analyzers 3A and 3B. Note that CH-A, C
If the mixing ratio of the H-B signal on the reproduction side is not 1: 1, the level between CH-A and CH-B is adjusted in consideration of the ratio (level adjusting unit 11).

【０１１０】次に、図１８を参照していずれもチャネル
ＣＨ−Ａ、ＣＨ−Ｂ間の同期処理（例えば±１ｍsec 以
内）が予め成されている場合の処理について説明する。
図１８は従来例の図２７において説明した処理に対応
し、処理（１）〜（５）が同一であり、処理（１）’、
（２）’及び（ｘ）が追加されている。聴覚心理分析部
３Ｂでは処理（１）’、（２）’においてＣＨ−Ｂの信
号の周波数領域の聴覚心理分析により得られるマスキン
グレベルＭ２を算出する。Next, with reference to FIG. 18, a description will be given of the processing in the case where the synchronization processing (for example, within ± 1 msec) between the channels CH-A and CH-B has been performed in advance.
FIG. 18 corresponds to the processing described in FIG. 27 of the conventional example, and the processing (1) to (5) are the same, and the processing (1) ′,
(2) ′ and (x) are added. The psychoacoustic analyzer 3B calculates the masking level M2 obtained by the psychoacoustic analysis of the frequency domain of the CH-B signal in the processes (1) ′ and (2) ′.

【０１１１】これに対し、聴覚心理分析部３Ａは処理
（１）、（２）においてＣＨ−Ａの信号の周波数領域の
聴覚心理分析により得られるマスキングレベルＭ１を算
出し、続く処理（ｘ）においてこのマスキングレベルＭ
１と聴覚心理分析部３Ｂにより算出されたマスキングレ
ベルＭ２により、式（数１３）に示すＭ〔ｉ〕＝max
（Ｍ１〔ｉ〕，Ｍ２〔ｉ〕に基づいて高能率符号化しな
い信号による影響を考慮した最終のマスキングレベルＭ
を算出する。次いで処理（３）〜（５）においてこの最
終のマスキングレベルＭに基づいて必要Ｓ／Ｎ比を算出
する。ビット割り当て部１２はこの必要Ｓ／Ｎ比に基づ
いて各サブバンドの量子化ビット数を割り当て、量子化
・符号化部４はＣＨ−Ａ側をこの量子化ビット数に基づ
いて量子化、符号化する。On the other hand, the psychoacoustic analyzer 3A calculates the masking level M1 obtained by the psychoacoustic analysis of the frequency domain of the CH-A signal in the processes (1) and (2), and in the subsequent process (x) This masking level M
1 and the masking level M2 calculated by the psychoacoustic analyzer 3B, M [i] = max shown in the equation (13).
(Final masking level M taking into account the influence of a signal that is not efficiently coded based on M1 [i] and M2 [i]
Is calculated. Next, in processes (3) to (5), the required S / N ratio is calculated based on the final masking level M. The bit allocation unit 12 allocates the number of quantization bits of each subband based on the required S / N ratio, and the quantization / encoding unit 4 quantizes and encodes the CH-A side based on the number of quantization bits. Become

【０１１２】次に、図１９を参照してＣＨ−Ａ、ＣＨ−
Ｂ間の同期精度が悪い場合の処理を説明する。ミキシン
グ時の同期ずれが聴感上では許されるが、聴覚心理分析
上では問題となる場合、例えば同期誤差が±５〜１０ｍ
sec の場合、聴覚心理分析部３Ｂが図１８に示す処理
（１）’、（２）’を行うと、実際のミキシング時の同
期ずれのためにマスキングレベルＭの変更が逆効果にな
る可能性がある。Next, referring to FIG. 19, CH-A, CH-
Processing when the synchronization accuracy between B is poor will be described. Synchronous deviation during mixing is permissible on the auditory perception, but when it is a problem on psychoacoustic analysis, for example, a synchronization error of ± 5 to 10 m
In the case of sec, if the psychoacoustic analysis unit 3B performs the processing (1) ′ and (2) ′ shown in FIG. 18, there is a possibility that the change of the masking level M may have an adverse effect due to the synchronization deviation during actual mixing. There is.

【０１１３】そこで、ＣＨ−Ｂ側の聴覚心理分析部３Ｂ
は、図１９に示す処理（１）’においてＣＨ−Ｂの直交
変換長をＣＨ−Ａのそれより２倍程度に設定して各分析
バンドのトータルパワーＰ２〔ｉ〕を算出することによ
り同期ずれの誤差を平坦化して軽減し、続く処理
（２）’においてこのトータルパワーＰ２〔ｉ〕とマス
キング基準カーブＢ（ｋ〕からマスキングレベルＭ２を
算出する。また、ＣＨ−Ａ側の聴覚心理分析部３Ａは式
（数１３）に基づいてＭ１〔ｉ〕及びＭ２〔ｉ〕からＭ
〔ｉ〕を決定する際に最大値をとらないで、処理（ｘ）
では、重み付け係数ａを例えばａ＝０．６としてTherefore, the psychoacoustic analysis unit 3B on the CH-B side
Is calculated by setting the orthogonal transform length of CH-B to about twice that of CH-A and calculating the total power P2 [i] of each analysis band in the process (1) ′ shown in FIG. Is flattened and reduced, and a masking level M2 is calculated from the total power P2 [i] and the masking reference curve B (k) in the subsequent processing (2) '. 3A is obtained from M1 [i] and M2 [i] based on the equation (Equation 13).
The processing (x) is performed without determining the maximum value when determining [i].
Then, assuming that the weighting coefficient a is, for example, a = 0.6

【０１１４】[0114]

【数２６】Ｍ〔ｉ〕＝Ｍ１〔ｉ〕・０．６＋Ｍ２〔ｉ〕・０．４M [i] = M1 [i] · 0.6 + M2 [i] · 0.4

【０１１５】のように、Ｍ１〔ｉ〕を重視してＭ〔ｉ〕
を決定することにより、ＣＨ−Ａ、ＣＨ−Ｂ間の同期精
度が悪い場合の聴覚心理分析上の問題を解決することが
できる。したがって、この第３実施例によれば、高能率
符号化した信号と高能率符号化しない信号を再生側でミ
キシングする場合に、ミキシングされた音質が最適にな
るように高能率符号化することができる。As described above, M [i] is emphasized with emphasis on M1 [i].
Is determined, the problem of psychoacoustic analysis when the synchronization accuracy between CH-A and CH-B is poor can be solved. Therefore, according to the third embodiment, when the high-efficiency coded signal and the non-high-efficiency coded signal are mixed on the reproducing side, the high-efficiency coding is performed so that the mixed sound quality is optimized. it can.

【０１１６】ここで、一般にオーディオ信号の再生品質
を客観的に評価する場合にはＭＮＲ（Mask to Noise Ra
tio ）を測定することが多い。具体的には図３４におい
て示したように周波数領域におけるマスキングレベルＭ
と、実際に信号中に生じている（量子化）ノイズＮとの
比を求める。この場合、ＭＮＲが正の領域では聴覚心理
上のマスキング効果は満足されており、ノイズは検知さ
れない。逆にＭＮＲが負の領域では聴覚心理上のマスキ
ング効果が満足されず、ノイズが検知される。また、Ｍ
ＮＲが正の場合であってもできるだけフラットな周波数
特性を示す方が聴覚心理上好ましいと考えられる。その
理由は、帯域によってはＭＮＲに差があるとバランス
上、若干不自然な音に感じられるからである。Here, in general, when objectively evaluating the reproduction quality of an audio signal, an MNR (Mask to Noise Radar) is used.
tio) is often measured. Specifically, as shown in FIG. 34, the masking level M in the frequency domain
And the noise (quantization) noise N actually occurring in the signal. In this case, the masking effect on psychoacoustics is satisfied in the region where the MNR is positive, and no noise is detected. Conversely, in the region where the MNR is negative, the psychoacoustic masking effect is not satisfied, and noise is detected. Also, M
Even if the NR is positive, it is considered preferable to show a frequency characteristic that is as flat as possible in terms of psychoacoustics. The reason for this is that if there is a difference in MNR depending on the band, a sound that is slightly unnatural will be felt in terms of balance.

【０１１７】図２０はあるオーディオ信号を従来技術と
本実施例によりそれぞれ処理した場合のミキシング後の
ＭＮＲ〔ｄＢ〕の測定例を示している。本実施例（実
線）ではほぼ全周波数領域においてフラットな特性を示
すのに対し、従来技術（破線）では特性にうねりがあ
り、一部の領域（図の１０ｋＨｚ前後）では負の値を示
している。このように平均ＭＮＲが０ｄＢに近い場合に
は効果は特に大きい。FIG. 20 shows an example of measuring the MNR [dB] after mixing when a certain audio signal is processed by the prior art and this embodiment, respectively. In the present embodiment (solid line), the characteristics are flat in almost all frequency regions, whereas in the conventional technology (broken line), the characteristics have undulations, and in some regions (around 10 kHz in the figure), negative values are shown. I have. As described above, when the average MNR is close to 0 dB, the effect is particularly large.

【０１１８】図２１は従来技術と本実施例により多数の
ソースでミキシングした後の音質主観評価（５段階評
価）を行った例を示し、本実施例によれば、評価値の平
均値が向上し、特に評価値のバラツキが減少することが
分かる。FIG. 21 shows an example in which subjective sound quality evaluation (five-level evaluation) is performed after mixing with a large number of sources according to the prior art and this embodiment. According to this embodiment, the average value of the evaluation values is improved. In particular, it can be seen that the variation in the evaluation value is reduced.

【０１１９】[0119]

【発明の効果】以上説明したように本発明によれば、直
交変換係数からオーディオ信号のパワースペクトルを算
出してこのパワースペクトルの自己相関を予め定めた帯
域毎に算出し、この自己相関の最大値と最小値の比から
聴覚心理上のマスキング効果のオフセット量を算出し、
このオフセット量に基づいて各サブバンドの量子化ビッ
ト数を決定するので、マスキング基準カーブのオフセッ
ト量を演算する際の演算量を減少し、また、オーディオ
信号にビブラートがかかっている場合にも聴覚心理をよ
り満足させて音質を向上させることができる。As described above, according to the present invention, the power spectrum of the audio signal is calculated from the orthogonal transform coefficients, the autocorrelation of the power spectrum is calculated for each predetermined band, and the maximum of the autocorrelation is calculated. Calculate the offset amount of the masking effect on psychoacoustics from the ratio of the value and the minimum value,
Since the number of quantization bits for each subband is determined based on this offset amount, the amount of calculation when calculating the offset amount of the masking reference curve is reduced, and even when the audio signal is vibrato, The sound quality can be improved by further satisfying the psychology.

【０１２０】また、本発明では、オーディオ信号の周波
数領域の聴覚心理分析に基づいてサブバンド毎の第１の
必要Ｓ／Ｎ比を算出すると共にサブバンド毎の信号パワ
ーから聴覚的制御を含む二乗平均誤差最小理論により第
２の必要Ｓ／Ｎ比を算出し、第１、第２の必要Ｓ／Ｎ比
を重み付けして最終の必要Ｓ／Ｎ比を算出し、この最終
の必要Ｓ／Ｎ比に基づいて各サブバンドの量子化ビット
数を決定するので、データの圧縮率が高く、聴覚心理分
析による必要Ｓ／Ｎ比が満足されない場合に音質を向上
させることができる。Further, according to the present invention, the first necessary S / N ratio for each subband is calculated based on the psychoacoustic analysis of the frequency domain of the audio signal, and the square including the auditory control is obtained from the signal power for each subband. The second required S / N ratio is calculated by the average error minimum theory, the first and second required S / N ratios are weighted to calculate the final required S / N ratio, and the final required S / N ratio is calculated. Since the number of quantization bits of each subband is determined based on the ratio, the data compression ratio is high, and the sound quality can be improved when the required S / N ratio by psychoacoustic analysis is not satisfied.

【０１２１】また、本発明では、高能率符号化する第１
のオーディオ信号と、高能率符号化されず再生側で第１
のオーディオ信号とミキシングされる第２のオーディオ
信号をそれぞれ周波数領域で聴覚心理分析して第１、第
２のマスキングレベルを算出し、この第１、第２のマス
キングレベルに基づいて最終のマスキングレベルを算出
し、この最終のマスキングレベルに基づいて各サブバン
ドの量子化ビット数を決定するので、高能率符号化した
信号と高能率符号化しない信号を再生側でミキシングす
る場合に高能率符号化しない信号による影響を考慮して
聴覚心理分析を行って聴覚心理をより満足させて音質を
向上させることができる。Further, according to the present invention, the first efficient coding is performed.
Audio signal and the first on the playback side without high efficiency encoding
The audio signal and the second audio signal to be mixed are each subjected to psychoacoustic analysis in the frequency domain to calculate first and second masking levels. Based on the first and second masking levels, the final masking level is calculated. Is calculated, and the number of quantization bits of each subband is determined based on the final masking level. Therefore, when mixing the high-efficiency coded signal and the high-efficiency coded signal on the reproduction side, high-efficiency coding is performed. The psychoacoustic analysis is performed in consideration of the influence of the no-signal, so that the psychoacoustic can be more satisfied and the sound quality can be improved.

[Brief description of the drawings]

【図１】本発明に係る音声高能率符号化装置の第１実施
例を示すブロック図である。FIG. 1 is a block diagram showing a first embodiment of a high-efficiency audio coding apparatus according to the present invention.

【図２】図１の変形例を示すブロック図である。FIG. 2 is a block diagram showing a modification of FIG.

【図３】図１の他の変形例を示すブロック図である。FIG. 3 is a block diagram showing another modification of FIG. 1;

【図４】前後のサブバンドとのパワースペクトルの自己
相関を算出する場合を示す説明図である。FIG. 4 is an explanatory diagram showing a case where an autocorrelation of a power spectrum with a preceding and following subband is calculated.

【図５】オフセット量を算出する処理を説明するための
フローチャートである。FIG. 5 is a flowchart illustrating a process of calculating an offset amount.

【図６】ビブラートが存在するオーディオ信号のスペク
トルの一例を示す説明図である。FIG. 6 is an explanatory diagram showing an example of a spectrum of an audio signal in which vibrato exists.

【図７】従来技術のトナリティ算出方法と第１実施例の
自己相関方法により求めたオフセット量を比較した説明
図である。FIG. 7 is an explanatory diagram comparing the offset amounts obtained by the tonality calculation method of the related art and the autocorrelation method of the first embodiment.

【図８】第２実施例の音声高能率符号化装置を示すブロ
ック図である。FIG. 8 is a block diagram showing a high-efficiency audio coding apparatus according to a second embodiment.

【図９】ノイズ・シェイピング・ファクタと量子化ノイ
ズの関係を示す説明図である。FIG. 9 is an explanatory diagram showing a relationship between a noise shaping factor and quantization noise.

【図１０】第１、第２の必要Ｓ／Ｎから最終の必要Ｓ／
Ｎを算出する処理を説明するためのフローチャートであ
る。FIG. 10 is a diagram showing a first required S / N to a final required S / N.
9 is a flowchart illustrating a process for calculating N.

【図１１】最終の必要Ｓ／Ｎを算出する際の重み付け関
数を示す説明図である。FIG. 11 is an explanatory diagram showing a weighting function when calculating a final required S / N.

【図１２】Ｓ／Ｎ比の劣化が検知されやすいソースのス
ペクトルを示す説明図である。FIG. 12 is an explanatory diagram showing a spectrum of a source in which deterioration of the S / N ratio is easily detected.

【図１３】図１２に示すソースのＳ／Ｎ比を示す説明図
である。FIG. 13 is an explanatory diagram showing the S / N ratio of the source shown in FIG.

【図１４】従来例と第２実施例において音質の比較結果
を示す説明図である。FIG. 14 is an explanatory diagram showing comparison results of sound quality between the conventional example and the second embodiment.

【図１５】第３実施例の音声高能率符号化装置を示すブ
ロック図である。FIG. 15 is a block diagram illustrating a high-efficiency audio coding apparatus according to a third embodiment.

【図１６】図１５の音声高能率符号化装置の変形例を示
すブロック図である。FIG. 16 is a block diagram showing a modification of the high-efficiency audio coding apparatus of FIG.

【図１７】図１５及び図１６のオーディオエンコーダの
一例を詳細に示すブロック図である。FIG. 17 is a block diagram showing an example of the audio encoder of FIGS. 15 and 16 in detail.

【図１８】２チャネル間の同期が十分な精度で保証され
ている場合の必要Ｓ／Ｎ比算出処理を説明するためのフ
ローチャートである。FIG. 18 is a flowchart illustrating a necessary S / N ratio calculation process when synchronization between two channels is guaranteed with sufficient accuracy.

【図１９】２チャネル間の同期精度が悪い場合の必要Ｓ
／Ｎ比算出処理を説明するためのフローチャートであ
る。FIG. 19: Required S when synchronization accuracy between two channels is poor
It is a flowchart for explaining / N ratio calculation processing.

【図２０】従来技術と第３実施例によるミキシング後の
ＭＮＲを比較した説明図である。FIG. 20 is an explanatory diagram comparing the MNR after mixing according to the conventional technique and the third embodiment.

【図２１】従来技術と第３実施例によるミキシング後の
音質評価を比較した説明図である。FIG. 21 is an explanatory diagram comparing sound quality evaluation after mixing according to the conventional technique and the third embodiment.

【図２２】音声高能率符号化方法を模式的に示す説明図
である。FIG. 22 is an explanatory diagram schematically showing a high-efficiency audio encoding method.

【図２３】図２２の音声高能率符号化処理を説明するた
めのフローチャートである。FIG. 23 is a flowchart illustrating the high-efficiency audio encoding process of FIG. 22;

【図２４】各種周波数スペクトルにおけるマスキングカ
ーブの一例を示す説明図である。FIG. 24 is an explanatory diagram showing an example of a masking curve in various frequency spectra.

【図２５】図２４の横軸の周波数を臨界帯域に置き換え
たマスキングカーブを示す説明図である。FIG. 25 is an explanatory diagram showing a masking curve in which the frequency on the horizontal axis in FIG. 24 is replaced with a critical band.

【図２６】２５バンドの臨界帯域幅を示す説明図であ
る。FIG. 26 is an explanatory diagram showing a critical bandwidth of 25 bands.

【図２７】従来の必要Ｓ／Ｎ比算出処理を説明するため
のフローチャートである。FIG. 27 is a flowchart for explaining a conventional required S / N ratio calculation process.

【図２８】マスキング基準カーブの一例を示す説明図で
ある。FIG. 28 is an explanatory diagram showing an example of a masking reference curve.

【図２９】３区間のスペクトルを直線予測する方法を示
す説明図である。FIG. 29 is an explanatory diagram showing a method of linearly predicting a spectrum of three sections.

【図３０】従来のオフセット算出処理を説明するための
フローチャートである。FIG. 30 is a flowchart illustrating a conventional offset calculation process.

【図３１】ビブラートが存在する信号のスペクトルの一
例を示す説明図である。FIG. 31 is an explanatory diagram showing an example of a spectrum of a signal in which vibrato exists.

【図３２】従来のミキシング回路を示すブロック図であ
る。FIG. 32 is a block diagram showing a conventional mixing circuit.

【図３３】他の従来のミキシング回路を示すブロック図
である。FIG. 33 is a block diagram showing another conventional mixing circuit.

【図３４】高能率符号化する信号及びそのマスキングレ
ベルと高能率符号化しない信号のマスキングレベルを示
す説明図である。FIG. 34 is an explanatory diagram showing a signal to be encoded with high efficiency, a masking level thereof, and a masking level of a signal not to be encoded with high efficiency.

[Explanation of symbols]

１，１ａ，１ｂ，１Ａ，１Ｂ窓掛け切出し部２，２ａ，２ｂ，２Ａ，２Ｂ直交変換部（分割手段）３，３Ａ，３Ｂ聴感心理分析部（聴感心理分析手段）４量子化・符号化部（量子化・符号化手段）５マルチプレックス部６サブバンドフィルタ部（分割手段）７オフセット量算出部（聴感心理分析手段）８第２の必要Ｓ／Ｎ算出部（聴感心理分析手段）９ビット割り当て部（聴感心理分析手段） 1, 1a, 1b, 1A, 1B Windowing cutout section 2, 2a, 2b, 2A, 2B Orthogonal transformation section (division means) 3, 3A, 3B Perception psychological analysis section (perception psychological analysis means) 4 Quantization / encoding Unit (quantization / encoding unit) 5 multiplex unit 6 sub-band filter unit (division unit) 7 offset amount calculation unit (psychological analysis unit) 8 second necessary S / N calculation unit (psychological analysis unit) 9 Bit allocation unit (psychological analysis means)

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平７−46137（ＪＰ，Ａ) 特開平３−250923（ＪＰ，Ａ) 特開平６−232761（ＪＰ，Ａ) 特開平７−66733（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) H03M 7/30 ──────────────────────────────────────────────────続き Continuation of front page (56) References JP-A-7-46137 (JP, A) JP-A-3-250923 (JP, A) JP-A-6-232761 (JP, A) JP-A-7-46 66733 (JP, A) (58) Field surveyed (Int. Cl. ⁷ , DB name) H03M 7/30

Claims

(57) [Claims]

A dividing unit that divides an audio signal into subbands of a plurality of frequency bands; and a quantizer that quantizes and encodes the audio signal of each subband divided by the dividing unit with a variable number of quantization bits. Encoding and encoding means, calculate the power spectrum of the audio signal from the orthogonal transform coefficients obtained by the dividing means or separate orthogonal transform means, calculate the autocorrelation of this power spectrum for each predetermined band, Calculate the offset amount of the masking effect on psychoacoustics from the ratio of the maximum value and the minimum value of the autocorrelation,
A high-efficiency audio coding apparatus, comprising: psychoacoustic analysis means for determining the number of quantization bits of each subband of the quantization / coding means based on the offset amount.

2. A dividing means for dividing an audio signal into sub-bands of a plurality of frequency bands, and a quantizer for quantizing and encoding the audio signal of each sub-band divided by said dividing means with a variable number of quantization bits. Means for calculating and calculating a first required S / N ratio for each subband based on psychoacoustic analysis of a frequency domain of an audio signal, and including aural control from signal power for each subband The second required S / N ratio is calculated by the minimum theory, and the first and second required S / N ratios are weighted to obtain the final required S / N ratio.
/ Acoustic analysis means for calculating the / N ratio and determining the number of quantization bits of each sub-band of the quantization / coding means based on the final required S / N ratio. apparatus.

3. Dividing means for dividing a first audio signal to be encoded with high efficiency into sub-bands of a plurality of frequency bands, and a variable number of quantization bits for the audio signal of each sub-band divided by said dividing means. Quantizing / encoding means for quantizing and encoding the first audio signal, and a second audio signal which is not efficiently encoded and mixed with the first audio signal on the reproduction side.
Of each audio signal in the frequency domain to calculate first and second masking levels, calculate a final masking level based on the first and second masking levels, and calculate the final masking level. And a psychoacoustic analysis means for determining the number of quantization bits for each sub-band of the quantization / coding means based on the above.