JPH0455899A

JPH0455899A - Voice signal coding system

Info

Publication number: JPH0455899A
Application number: JP2166239A
Authority: JP
Inventors: Kazunori Ozawa; 一範小澤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1990-06-25
Filing date: 1990-06-25
Publication date: 1992-02-24

Abstract

PURPOSE:To allow extremely efficient quantization and to provide the voice coding system which is good in sound quality even at a low bit rate by subjecting the spectral parameter of a voice signal to a nonlinear transformation corresponding to the characteristic of hearing sensation to make vector quantization. CONSTITUTION:An LPC analysis circuit 130, an LSP quantization circuit 140, an impulse response calculating circuit 170, a weighting circuit 200, and adaptive code book 210, a sound source code book retrieval circuit 230, a sound source code book 235, etc. are provided and the spectral parameter is determined by approximating the inputted discrete input signal by an all Polar type model. The frame is divided to small sections and the pitch parameter is so determined that the signal reproduced in accordance with the past sound source signal is approximated to the voice signal. The spectral parameter is subjected to the nonlinear transformed so as to correspond to the characteristic of hearing sensation by the voice coding system which outputs the sound source signal of the voice signal by expressing the same by the code book. This parameter is expressed by the previously constituted code book 210 and is outputted. The voice coding system which is good in sound quality at <=4.8kb/s is obtd. by the relatively small computation quantity and memory quantity in this way.

Description

【発明の詳細な説明】Ｃ産業上の利用分野〕本発明は、音声信号を低いビットレート、特に８〜４．
８ｋｂ／ｓ程度で高品質に符号化するための音声信号符
号化方式に関する。DETAILED DESCRIPTION OF THE INVENTION C. Industrial Application Fields The present invention provides a method for processing audio signals at low bit rates, in particular from 8 to 4.
The present invention relates to an audio signal encoding method for high-quality encoding at approximately 8 kb/s.

Ｃ従来の技術〕音声信号を８〜４．８ｋｂ／ｓ程度のビットレートで符
号化する方式としては、例えば、Ｍ、　５ｃｈｒｏｅｄ
ｅｒａｎｄ　Ｂ、Ａｔａ１氏による”Ｃｏｄｅ−ｅｘｃ
ｉｔｅｄ　１ｉｎｅａｒ　ｐｒｅｄｉ−ｃｔｉｏｎ：　
Ｈｉｇｈ　ｑｕａｌｉｔｙ　５ｐｅｅｃｈ　ａｔ　ｖｅ
ｒｙ　ｌｏｗ　ｂｉｔｒａｔｅｓ”　（Ｐｒｏｃ、　Ｉ
ＣＡＳＳＰ、　ｐｐ、９３７−９４０．１９８５年）と
題した論文（文献１）等に記載されているＣＥＬＰ（Ｃ
ｏｄｅ　Ｅｘｃｉｔｅｄ　ＬＰＣＣｏｄｉｎｇ）符号化
方式や、Ｂ、Ａｔａ１氏らによるＡ　ｎｅｗ　ｍｏｄｅ
ｌ　ｏｆ　ＬＰＣｅｘｃｉｔａｔｉｏｎ　ｐｒｏ−ｄｕ
ｃｉｎｇ　ｎａｔｕｒａｌ−ｓｏｕｎｄｉｎｇ　５ｐｅ
ｅｃｈ　ａｔ　ｌｏｗ　ｂｉｔ　ｒａ−ｔｅｓ”と題し
た論文（文献２）等に記載されたマルチパルス符号化方
式が知られている。C. Prior Art] As a method for encoding an audio signal at a bit rate of about 8 to 4.8 kb/s, for example, M, 5 chroed
erand B, “Code-exc” by Mr. Ata1
ited 1inear predi-ction:
High quality 5peech at ve
ry low bitrates” (Proc, I
CELP (C
Excited LPC Coding) encoding method and A new mode by Mr. B.Ata1 et al.
l of LPCexcitation pro-du
singing natural-sounding 5pe
A multi-pulse encoding method is known, which is described in a paper titled "Ech at Low Bit Ra-tes" (Reference 2).

前者の方式では、送信側では、フレーム毎（例えば２抛
Ｓ）に音声信号から音声信号のスペクトル特性を全極（
Ａｌｌ−ｐｏｌｅ）モデルで近似し、スペクトル特性を
表すスペクトルパラメータ（例えばＰＡＲＣＯＲ係数や
Ｉ、ＳＰ係数）を抽出する。次にフレームをさらに小区
間サブフレーム（例えば５ｍ５）４：：分割し、サブフ
レーム毎に過去の音源信号をもとに長時間相関（ピッチ
相関）を表すピッチパラメータ（適応コードブックとも
呼ばれる）を抽出し、ピッチパラメータによりサブフレ
ームの音声信号を長期予測し、長期予測して求めた残差
信号に対して、予め定められた種類の雑音信号から構成
されたコードブックから選択した信号により合成した信
号と、音声信号との誤差電力を最小化するように一種類
の雑音信号を選択するとともに、最適なゲインを計算す
る。そして選択された雑音信号の種類を表すインデクス
とゲイン、ならびにスペクトルパラメータとピッチパラ
メータを伝送する。In the former method, on the transmitting side, the spectral characteristics of the audio signal are calculated from the audio signal for every frame (for example, 2 S).
All-pole) model is used to extract spectral parameters representing spectral characteristics (for example, PARCOR coefficients, I, and SP coefficients). Next, the frame is further divided into small interval subframes (for example, 5m5) 4::, and a pitch parameter (also called an adaptive codebook) representing a long-term correlation (pitch correlation) is calculated based on the past sound source signal for each subframe. The subframe audio signal is extracted, long-term prediction is performed using the pitch parameter, and the residual signal obtained through long-term prediction is synthesized with a signal selected from a codebook consisting of predetermined types of noise signals. One type of noise signal is selected so as to minimize the error power between the signal and the audio signal, and the optimal gain is calculated. Then, the index and gain indicating the type of the selected noise signal, as well as the spectral parameter and pitch parameter are transmitted.

[Problem to be solved by the invention]

上述した文献ｌに記載の従来方式では、高音質を得るた
めには、音声信号のスペクトル特性をスペクトルパラメ
ータにより良好に表すことが必要である。スペクトルパ
ラメータは、送信側から伝送するために量子化が施され
る。効率的な量子化法としては、例えばＬＳＰＳＰ係数
するベクトルースカラ量子化法が知られている。具体的
な方法は例えば、Ｔ、Ｍｏｒｉｙａ氏らによる”Ｔｒａ
ｎｓｆｏｒｍ　ＣｏｄｊＢｏｆ　　５ｐｅｅｃｈ　　ｕ
ｓｉｎｇ　ａ　　Ｗｅｉｇｈｔｅｄ　　Ｖｅｃｔｏｒ　
　Ｑｕａｎｔｉｚｅｒ。In the conventional method described in the above-mentioned document 1, in order to obtain high sound quality, it is necessary to express the spectral characteristics of the audio signal well by spectral parameters. The spectral parameters are quantized for transmission from the transmitting side. As an efficient quantization method, for example, a vector-scalar quantization method using LSPSP coefficients is known. A specific method is, for example, “Tra
nsform CodjBof 5peech u
sing a Weighted Vector
Quantizer.

と題した論文（ＩＥＥＥ　Ｊ、　Ｓｅｔ、＾ｒｅａｓ、
　Ｃｏｍｎ＋ｕｎ、。A paper entitled (IEEE J, Set, ^reas,
Comn+un,.

ｐｐ、４２５−４３１．１９８８年）（文献３）等を参
照できる。pp. 425-431.1988) (Reference 3).

この方法では、スペクトルパラメータとして、フレーム
毎に求めたＬＳＰＳＰ係数め構成したベクトル量子化コ
ードブックにより、−旦量子化復号化した後に、元のＬ
ＳＰと量子化復号化したＬＳＰとの誤差信号をスカラ量
子化する。ここでベクトル量子化コードブックは、多量
のスペクトルパラメータデータベースに対して、あらが
しめ２Ｂ種類（Ｂはスペクトルパラメータ量子化のため
のビット数）のコードベクトルからなるコードブックを
トレーニングにより構成しておく。コードブックのトレ
ーニング法は、例えばＬｉｎｄｅ氏らによる“Ａｎ　　
Ａｌｇｏｒｉｔｈｍ　　ｆｏｒ　　Ｖｅｃｔｏｒ　　Ｑ
ｕａｎｔｉｚａｔｉｏｎ　　Ｄｅｓｉｇｎと題した論文
（ＩＥＥＥ　Ｔｒａｎｓ、　Ｃ０Ｍ−２８，ｐｐ、８４
−９５゜１９８０年）（文献４）等を参照できる。In this method, after quantization and decoding, the original LSPSP coefficients obtained for each frame are used as spectral parameters.
The error signal between the SP and the quantized and decoded LSP is scalar quantized. Here, the vector quantization codebook is constructed by training a codebook consisting of 2B types of code vectors (B is the number of bits for spectral parameter quantization) against a large amount of spectral parameter database. . The codebook training method is, for example, “An
Algorithm for Vector Q
A paper entitled ``Antification Design'' (IEEE Trans, C0M-28, pp. 84
-95°1980) (Reference 4), etc.

４．８ｋｂ／ｓ以下の音声符号化方式を実現するために
は、スペクトルパラメータ量子化による歪を聴覚的な知
覚限以下におさえながら、スペクトルパラメータの量子
化ビット数をフレーム当り２ｏビツト以下に低減する必
要がある。このためには従来の方式では不十分で、量子
化ピント数を２０ビツト以下に低減すると、音質は大き
く劣化していた。In order to realize an audio encoding system of 4.8 kb/s or less, the number of quantized bits of spectral parameters must be reduced to 20 bits or less per frame while keeping the distortion caused by spectral parameter quantization below the auditory perceptible limit. There is a need to. Conventional methods were insufficient for this purpose, and when the number of quantization points was reduced to 20 bits or less, the sound quality deteriorated significantly.

本発明の目的は、上述した問題点を解決し、比較的少な
い演算量及びメモリ量により、４．８ｋｂ／ｓ以下で音
質が良好な音声信号符号化方式を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to solve the above-mentioned problems and provide an audio signal encoding method that achieves good sound quality at 4.8 kb/s or less with a relatively small amount of calculation and memory.

[Means to solve the problem]

第１の発明は、入力した離散的な音声信号を予め定めら
れた時間長のフレームに分割し、前記音声信号を全極形
モデルにより近似して前記音声信号のスペクトル包絡を
表すスペクトルパラメータを求め、前記フレームを予め
定められた時間長の小区間に分割し、過去の音源信号を
もとに再生した信号が前記音声信号に近くなるようにピ
ッチパラメータを求め、前記音声信号の音源信号を予め
構成されたコードブックあるいはマルチパルスにより表
して出力する音声符号化方式において、前記スペクトル
パラメータを聴覚の特性に対応するように非線形変換し
、予め構成したコードブツクにより表して出力すること
を特徴とする。The first invention divides an input discrete audio signal into frames of a predetermined time length, approximates the audio signal using an all-pole model, and obtains spectral parameters representing the spectral envelope of the audio signal. , the frame is divided into small sections of a predetermined time length, a pitch parameter is determined so that the signal reproduced based on the past sound source signal is close to the sound signal, and the sound source signal of the sound signal is determined in advance. The speech encoding method is characterized in that the spectral parameters are non-linearly transformed so as to correspond to auditory characteristics, and the spectral parameters are represented and output using a pre-configured codebook or multi-pulse. .

第２の発明は、入力した離散的な音声信号を予め定めら
れた時間長のフレームに分割し、前記音声信号を全極形
モデルにより近僚して前記音声信号のスペクトル包絡を
表すスペクトルパラメータを求め、前記フレームを予め
定められた時間長の小区間に分割し、過去の音源信号を
もとに再生した信号が前記音声信号に近くなるようにピ
ンチパラメータを求め、前記音声信号の音源信号を予め
構成されたコードブックあるいはマルチパルスにより表
して出力する音声符号化方式において、前記スペクトル
パラメータを聴覚の特性に対応するように非線形変換し
、非線形変換したスペクトルパラメータについてフレー
ム間での差分あるいは予測誤差、あるいは同一フレーム
でのパラメータ間での差分あるいは予測誤差を予め構成
したコードブックにより表して出力することを特徴とす
る。The second invention divides an input discrete audio signal into frames of a predetermined time length, and calculates spectral parameters representing the spectral envelope of the audio signal by dividing the audio signal using an all-pole model. The frame is divided into small sections of a predetermined time length, the pinch parameter is determined so that the signal reproduced based on the past sound source signal is close to the sound signal, and the sound source signal of the sound signal is determined. In a speech encoding method that outputs a pre-configured codebook or multi-pulse representation, the spectral parameters are non-linearly transformed to correspond to auditory characteristics, and the difference or prediction error between frames is calculated for the non-linearly transformed spectral parameters. , or the difference or prediction error between parameters in the same frame is represented by a pre-configured codebook and output.

[Effect]

本発明による音声信号符号化方式の作用を説明する。 The operation of the audio signal encoding method according to the present invention will be explained.

以下の説明では、スペクトルパラメータとして、ＬＳＰ
を用いることとして説明を行うが、他の周知なパラメー
タ、例えば、ＰＡＲＣＯＲ，ケプストラム。In the following explanation, LSP
However, other well-known parameters may be used, such as PARCOR, cepstrum.

メルケプストラムなども同様にして用いることができる
。ＬＳＰの求め方等は、Ｓｕｇａｔ＊ｕｒａ氏らによる
“Ｑｕａｎｔｉｚｅｒ　ｄｅｓｉｇｎ　ｉｎ　ＬＳＰ　
５ｐｅｅｃｈ　ａｎａｌｙｓｉｓ−ｓｙｎｔｈｅｓｉｓ
″と題した論文（ＩＥＥＥ、　Ｊ、　Ｓｅ１．　Ａｒｅ
ａｓ。Mel cepstrum etc. can be used in the same way. For information on how to obtain LSP, see “Quantizer design in LSP” by Sugat*ura et al.
5peech analysis-synthesis
” (IEEE, J, Se1. Are
as.

Ｃｏｍｍｕｎ、、　ｐｐ、４３２−４４０．１９８８年
）（文献５）等を参照できる。Common, pp. 432-440.1988) (Reference 5).

第１の発明では、聴覚の特性に対応するように非線形変
換したＬＳＰパラメータを求める。ここで聴覚の特性は
、周波数軸が非線形であり、低域はど分解能が高く、高
域はど分解能が低いことが知られている。このような特
性に合うような非線形変換としては、メル変換などが知
られている。スペクトルパラメータのメル変換について
は、パワスペクトルから変換する方法や、自己相関関数
から変換する方法が知られている。これらの方法の詳細
は、例えば、５ｔｒｕｂｅ氏による”Ｌｉｎｅａｒ　ｐ
ｒｅｄｉｃ−ｔｉｏｎ　ｏｎ　ａ　ｗａｒｐｅｄ　ｆｒ
ｅｑｕｅｎｃｙ　５ｃａｌｅ″と題した論文（Ｊ、　Ａ
ｃｏｕｓｔ、　Ｓｏｃ、＾―、、　ｐｐ、１０７１−１
０７６、１９８０）（文献６）等を参照できる。In the first invention, LSP parameters are obtained by nonlinear transformation so as to correspond to auditory characteristics. It is known that the characteristics of hearing are nonlinear on the frequency axis, with high resolution in the low range and low resolution in the high range. Mel transformation is known as a nonlinear transformation that meets these characteristics. Regarding Mel transformation of spectral parameters, a method of converting from a power spectrum and a method of converting from an autocorrelation function are known. Details of these methods can be found, for example, in "Linear p" by Mr. 5trube.
redic-tion on a warped fr
The paper entitled “Equency 5cale” (J, A
coust, Soc, ^-,, pp, 1071-1
076, 1980) (Reference 6).

上述の方法により、非線形変換したＬＳＰに対して予め
ベクトル量子化コードブックをトレーニングにより構成
する。ベクトル量子化コードブックの構成法は、前述の
文献４等を参照できる。下式を用いてＬＳＰをベクトル
量子化する。By the method described above, a vector quantization codebook is constructed in advance by training for the nonlinearly transformed LSP. For the method of configuring the vector quantization codebook, reference can be made to the above-mentioned document 4 and the like. The LSP is vector quantized using the following formula.

つまり、コードベクトルを２”［！類探索して、（１）
式の距離を最小にするコードベクトルを選択する。In other words, by searching the code vector for 2”[! class, (1)
Select the codevector that minimizes the distance of Eq.

ここでＬＳＰ’　ｉ　ｊは、ベクトル量子化コードブッ
クのｊ番目のコードベクトル、ＰはＬＳＰの次数である
。Here, LSP' i j is the j-th codevector of the vector quantization codebook, and P is the order of the LSP.

（１）式では非線形変換したＬＳＰ上の２乗距離を用い
てコードベクトルの探索を行ったが、他の方法として、
非線形変換したＬＳＰを例えばメルケブストラムｃ、　
（ｎ）等に変換して、（２）式のようにメルケブストラ
ム上の２乗距離を用いてコードベクトルの探索をしても
よい。メルケブストラムは非線形変換した対数スペクト
ルに対応しているので、この方が良好な音質が得られる
。In equation (1), the code vector was searched using the squared distance on the nonlinearly transformed LSP, but as another method,
For example, the nonlinearly transformed LSP is
(n) etc., and the code vector may be searched using the squared distance on the melkebstrum as shown in equation (2). Melkebstrum corresponds to a nonlinearly transformed logarithmic spectrum, so better sound quality can be obtained with this method.

また、２乗距離の代わりに、次式のような重み付け２乗
距離を用いることもできる。Furthermore, instead of the square distance, a weighted square distance as shown in the following equation can also be used.

（ｊ＝１〜２　ｍ　）　　　　　　　　　　　　　（３
）ここで、ｗ、（ｎ）は重み付は用のメルケプストラム
うなメルケプストラムを用いることができる。ここで、
δは聴感重み付けの度合を決める重み付は係数であり、
０〈δ〈１の値をとる。(j=1~2 m) (3
) Here, w and (n) can be weighted mel cepstrum or mel cepstrum. here,
δ is a weighting coefficient that determines the degree of auditory weighting,
It takes the value 0<δ<1.

なお、パワスペクトルからメルケブストラムへの変換法
は、例えば、北村氏らによる“メルケブストラムを用い
る音声の情報圧縮”と題した論文（電子通信学会論文誌
、ｐｐ、１０９２−１０９３．１９８４年）（文献７）
等を参照できる。The conversion method from the power spectrum to the mel-kebstrum is described, for example, in a paper entitled "Information compression of speech using the mel-kebstrum" by Mr. Kitamura et al. (Transactions of the Institute of Electronics and Communication Engineers, pp. 1092-1093, 1984) (Reference 7). )
etc. can be referred to.

なお、メルＬＳＰのコードベクトルに対応したメルケプ
ストラムの計算は、メルＬＳＰをベクトル量子化する際
に計算してもよいし、メルＬＳＰのコードベクトルに対
応したメルケプストラムを予め計算してメルケプストラ
ムコードブックとして有しておいてもよい。後者の方が
演算量を大幅に低減できる。Note that the calculation of the mel cepstrum corresponding to the code vector of the mel LSP may be performed when vector quantizing the mel LSP, or the mel cepstrum corresponding to the code vector of the mel LSP may be calculated in advance and the mel cepstrum code You may keep it as a book. The latter can significantly reduce the amount of calculation.

第２の発明においては、非線形変換したＬＳＰを次式の
ようにフレーム間で差分を求め、この差分についてベク
トル量子化コードブックを作成する。In the second invention, a difference is obtained between frames of nonlinearly transformed LSP as shown in the following equation, and a vector quantization codebook is created for this difference.

ＬＳＰ、１ａ−ＬＳＰ、ＩＬＱＬＳＰ’−ｉ’−’　　
　　　　　（４）ここで、ＬＳＰ、、Ｌ、　ＱＬＳＰ’
い、Ｌ刊は、それぞれ第Ｌフレームでのｉ次目のメルＬ
ＳＰ　、第Ｌ−１フレームでのｉ次目の量子化復号化し
たメルＬＳＰを示す。LSP, 1a-LSP, ILQLSP'-i'-'
(4) Here, LSP, , L, QLSP'
The L issue is the i-th Mel L in the L frame.
SP represents the i-th quantized and decoded mel LSP in the L-1th frame.

また、フレーム間差分ではなく、同一フレームでの次数
間の差分を求め、これに対するベクトル量子化コードブ
ックを作成してもよい。Alternatively, instead of the inter-frame difference, a difference between orders in the same frame may be determined, and a vector quantization codebook for this may be created.

ＤＬＳＰＩＩＨ＝ＬＳＰ、ｆＬＬＳＰ−ｆｆｉ−＋Ｌ（
但しＬＳＰＩＩＯ＝　０　）・　・　・（５）また、上記（４）式、（５）式においては、差分てはな
く予測誤差を求めてもよい。予測のための予測係数は、
固定でもよいし、予測係数のコードブックを予め構成し
ておき、最適なものを探索してもよい。DLSPIIH=LSP, fLLSP−ffi−+L(
However, LSPIIO=0)...(5) Furthermore, in the above equations (4) and (5), the prediction error may be calculated instead of the difference. The prediction coefficient for prediction is
It may be fixed, or a codebook of prediction coefficients may be configured in advance and an optimal one may be searched for.

〔Example〕

第１図は第１の発明による音声符号化方式を実施する音
声信号符号化装置を示すブロック図である。FIG. 1 is a block diagram showing an audio signal encoding device implementing the audio encoding method according to the first invention.

入力端子１００から音声信号を入力し、１フレ一ム分（
例えば２ｈｓ）の音声信号をバッファメモリ１１０に格
納する。Input the audio signal from the input terminal 100, and input the audio signal for one frame (
For example, an audio signal of 2 hs) is stored in the buffer memory 110.

ＬＰＧ分析回路１３０は、フレームの音声信号のスペク
トル特性を表すパラメータとして、線形予測係数α１を
フレームの音声信号から周知のＬＰＧ分析を行いあらか
じめ定められた次数Ｐだけ計算する。The LPG analysis circuit 130 performs well-known LPG analysis on the frame audio signal to calculate a linear prediction coefficient α1 of a predetermined order P as a parameter representing the spectral characteristics of the frame audio signal.

ＬＳＰ量子化回路１４０は、メルＬＳＰのベクトルース
カラ量子化を行う。第２図に処理の流れを示すように、
線形予測係数α、をＬＰＣケプストラムに一旦変換しく
ステップ３１１）　、ＬＰＣケプストラムからさらにメ
ルケプストラムＣい（ｎ）に変換する（ステップ５１２
）。線形予測係数α、からＬＰＣケプストラムへの変換
は、Ａｔａ１氏による“Ｅｆｆｅｃｔｉｖｅ〜ｎｅｓｓ
　ｏｆ　１ｉｎｅａｒ　ｐｒｅｄｉｃｔｉｏｎ　ｃｈａ
ｒａｃｔｅｒｉｓｔｉｃｓ　ｏｆｔｈｅ　５ｐｅｅｃｈ
　５ａｖｅ　ｆｏｒ　ａｕｔｏｍａｔｉｃ　５ｐｅａｋ
ｅｒ　１ｄｅｎｔｉ−ｆｉｃａｔｉｏｎａｎｄ　ｖｅｒ
ｉｆｉｃａｔｉｏｎ”　（Ｊ、Ａｃｏｕｓｔ、Ｓｏｃ、
＾ｍ。The LSP quantization circuit 140 performs vector-scalar quantization of the mel LSP. As shown in Figure 2, the process flow is as follows:
The linear prediction coefficient α is first converted into an LPC cepstrum (step 311), and the LPC cepstrum is further converted into a mel cepstrum C(n) (step 512).
). The conversion from the linear prediction coefficient α to the LPC cepstrum is described in “Effective~ness” by Mr. Ata1.
of 1inear prediction cha
racteristics of the 5peech
5ave for automatic 5peak
er 1 denti-fication and ver.
ification” (J, Acoust, Soc,
^m.

ｐｐ、１３０４−１３１２．１９７４年）と題した論文
（文献８）等を参照できる。また、ケプストラムからメ
ルケブストラムｃｌＩ（ｎ）への変換は前記文献７を参
照できる。さらにメルケブストラムからメルＬＰＣケプ
ストラムへ逆変換してメル線形予測係数へ逆変換する（
ステップ５１３）。さらにこれをＬＳＰに変換すること
によりメルＬＳＰ（ＬＳＰ、、）を計算する（ステップ
５１４）、メルＬＳＰに対して、予め構成したコードブ
ック１４５を用いてまずベクトル量子化を行う（ステッ
プ５１５）。ここでベクトル量子化コードブックの探索
は、作用の項で述べたように、ノルケプストラム上での
２乗距離あるいは重み付は距離を用いる。つまり、入力
音声から求めたメルＬＳＰに対応したメルケプストラム
ＣＪれ）と、メルＬＳＰコードブック１４５の各コード
ベクトルに対応したメルケブストラムコードブック１４
６中のメルケブストラムＣｍｊ’　（ｎ）　との距離を
（２）式あるいは（３）式により求め、これを最小にす
るコードベクトルｊを選択する。そして選択されたコー
ドベクトルｃ、ｊ（ｎ）に対応するメルＬＳＰコードベ
クトルＬＳＰ’　、、　。pp. 1304-1312.1974) (Reference 8). Further, for the conversion from cepstrum to mercebstrum cII(n), reference can be made to the above-mentioned document 7. Furthermore, the mel cebstrum is inversely transformed to the mel LPC cepstrum and then to the mel linear prediction coefficients (
Step 513). Furthermore, by converting this into LSP, mel LSP (LSP, . . . ) is calculated (step 514). The mel LSP is first subjected to vector quantization using the codebook 145 configured in advance (step 515). Here, the vector quantization codebook search uses the squared distance or the weighted distance on the norcepstrum, as described in the section on effects. In other words, the mel cepstrum CJ corresponding to the mel LSP obtained from the input voice, and the mel cepstral codebook 14 corresponding to each code vector of the mel LSP codebook 145.
The distance to the melkebstrum Cmj' (n) in 6 is determined by equation (2) or equation (3), and a code vector j that minimizes this distance is selected. Then, the mel LSP code vectors LSP', , corresponding to the selected code vectors c, j(n).

をＬＳＰコードブック１４５から選択する。is selected from the LSP codebook 145.

次に、入力のメルＬＳＰとベクトル量子化されたメルＬ
ＳＰの誤差を次式により計算する（ステップ５１６）。Next, input mel LSP and vector quantized mel L
The SP error is calculated using the following equation (step 516).

ΔＬＳＰ、、＝ＬＳＰ、、−ＬＳＰ’、、、　　　　　
　　　　（６）誤差ΔＬＳＰ、、を、次数ｉ毎に定めら
れたビット数のスカラ量子化器によりスカラ量子化する
（ステップ５１７）。ここでスカラ量子化器の量子化の
範囲（最小値、最大値）は、多量のトレーニング用誤差
信号ΔＬＳＰ、、を用いて予め決定してお（。ΔLSP,,=LSP,,-LSP',,,
(6) The error ΔLSP, , is scalar quantized using a scalar quantizer with a predetermined number of bits for each order i (step 517). Here, the quantization range (minimum value, maximum value) of the scalar quantizer is determined in advance using a large amount of training error signals ΔLSP, .

さらに、メルＬＳＰ係数が次式の関係を有していること
を利用して、スカラ量子化の範囲を制限する。Furthermore, the range of scalar quantization is limited by utilizing the fact that the mel LSP coefficients have the following relationship.

具体的な方法については、特願平２−４２９５５号明細
書（文献９）等を参照できる。For a specific method, reference may be made to Japanese Patent Application No. 2-42955 (Document 9).

ＬＳＰ□〈・・・・・・＜ＬＳｐＨｒ　　　　　　　　
　　　　（７）ベクトル量子化、スカラ量子化により得
られた符号をマルチプレクサ２６０へ出力するとともに
、復号化メルＬＳＰ　（ＱＬＳＰ□）を次式により求め
る。LSP□〈・・・・・・＜LSpHr
(7) The code obtained by vector quantization and scalar quantization is output to the multiplexer 260, and the decoded mel LSP (QLSP□) is determined by the following equation.

ＱＬＳＰ、、　＝　ＬＳＰ’□、＋Δｔ、ｓｐ’１１．
　　　　　　　　　（８）ここで、ΔＬＳＰ’□は、ス
カラ量子化により復号化した誤差信号である。そして口
ＬＳＰ、、を復号化線形予測係数、１．’（１，＝ｌ〜
Ｐ）に逆変換する。逆変換は、前述の線形予測係数から
メルＬＳＰへの変換を逆にたどればよい。復号化線形予
測係数ａ、′　はインパルス応答計算回ｆＷＩ７０へ出
力する。QLSP,, = LSP'□, +Δt, sp'11.
(8) Here, ΔLSP'□ is an error signal decoded by scalar quantization. Then, the mouth LSP, , decodes the linear prediction coefficients, 1. '(1,=l~
P). The inverse transformation can be performed by retracing the aforementioned transformation from linear prediction coefficients to mel LSP. The decoded linear prediction coefficients a,' are output to the impulse response calculation circuit fWI70.

サブフレーム分割回路１５０は、フレームの入力音声信
号をサブフレームに分割する。ここで例えばフレーム長
は２０Ｉｍｓ、サブフレーム長は５ａ＋ｓとする。The subframe division circuit 150 divides the frame input audio signal into subframes. Here, for example, the frame length is 20Ims and the subframe length is 5a+s.

重み付は回路２００は、サブフレームに分割した信号に
対して周知の聴感重み付けを行う。聴感重み付は関数の
詳細は前記文献ｌを参照できる。The weighting circuit 200 performs well-known perceptual weighting on the signal divided into subframes. For details of the perceptual weighting function, refer to the above-mentioned document 1.

減算器１９０は、重み付は信号から合成フィルタ２８１
の出力を減算して出力する。The subtracter 190 performs the weighting from the signal to the synthesis filter 281.
Subtract and output the output of .

適応コードブック２１０は、合成フィルタ２８１の入力
信号ｖ（ｎ）を遅延回路２０６を介して入力し、さらに
インパルス応答出力回路１７０から重み付はインパルス
応答り、（ｎ）　、減算器１９０からの信号を人力し、
長期相関に基づくピッチ予測を行い、ビ。The adaptive codebook 210 inputs the input signal v(n) of the synthesis filter 281 via the delay circuit 206, and further receives the weighted impulse response signal v(n) from the impulse response output circuit 170, and the signal from the subtracter 190. manually,
Pitch prediction is performed based on long-term correlation.

チパラメータとして遅延Ｍとゲインβを計算する。Delay M and gain β are calculated as the other parameters.

以下の説明では適応コードブックの予測次数は１とする
が、２次以上の高次とすることもできる。In the following explanation, the prediction order of the adaptive codebook is assumed to be 1, but it can also be set to a higher order than 2.

１次の適応コードブックにおける遅延Ｍ、ゲインβの計
算法は、Ｋｌｅｉｊｎ　”Ｉｍｐｒｏｖｅｄ　５ｐｅｅ
ｃｈ　ｑｕａｌｉｔｙａｎｄ　ｅｆｆｆｃｉｅｎｔ　ｖ
ｅｃｔｏｒ　ｑｕａｎｔｉｚａｔｉｏｎ　ｉｎ　５ＥＬ
Ｐ’と題した論文（ＩＣＡＳＳＰ、　ｐｐ、１５５−１
５８．１９８８年）（文献１０）等を参照できる。さら
に求めたゲインβをあらかじめ定められた量子化ビット
数で量子化復号化し、ゲインβ′を求め、これを用いて
次式により予測信号Ｌ（ｎ）を計算し減算器２０５に出
力する。マタゲインβ′、遅延Ｍをマルチプレクサ２６
０へ出力する。The calculation method for the delay M and gain β in the first-order adaptive codebook is described by Kleijn "Improved 5pee
ch quality and effective v
ector quantization in 5EL
A paper entitled P' (ICASSP, pp, 155-1
58.1988) (Reference 10). Further, the obtained gain β is quantized and decoded using a predetermined number of quantization bits to obtain a gain β', which is used to calculate a prediction signal L(n) according to the following equation and output to the subtracter 205. The multiplexer 26
Output to 0.

Ｌ（ｎ）　　　＝　　β’　　−ｖ（ｎ−Ｍ）＊　ｈ、
（ｒ＋）　　　　　　　　　　　　　　　　　　（９）
（９）式でν（ｎ−Ｍ）は過去の音源信号で、合成フィ
ルタ２８１の入力信号である。ｈ、（ｎ）はインパルス
応答計算回路１７０で求めた重み付はインパルス応答で
ある。L(n) = β'-v(n-M)*h,
(r+) (9)
In equation (9), ν(n-M) is a past sound source signal and is an input signal of the synthesis filter 281. h and (n) are the weighted impulse responses obtained by the impulse response calculation circuit 170.

遅延回路２０６は、合成フィルタ入力信号ｖ　（ｎ）を
１サブフレ一ム分遅延させて適応コードブック２１０へ
出力する。The delay circuit 206 delays the synthesis filter input signal v (n) by one subframe and outputs the delayed signal to the adaptive codebook 210 .

減算器２０５ば、減算器１９０の出力信号から適応コー
ドブック２１０の出力を次式に従い減算し、残差信号ｅ
、、１（ｎ）を音源コードブック探索回路２３ｏに出力
する。The subtracter 205 subtracts the output of the adaptive codebook 210 from the output signal of the subtracter 190 according to the following formula, and obtains a residual signal e.
, 1(n) to the sound source codebook search circuit 23o.

ｅ、（ｎ）　＝ｘ、（ｎ）−札（ｎ）０ωインパルス応
答計算回路１７０は、聴感重み付けした合成フィルタの
インパルス応答ｈｗ（ｎ）を予め定められたサンプル数
りだけ計算する。具体的な計算法は、前記文献１等を参
照できる。e, (n) = x, (n) - (n) 0ω The impulse response calculation circuit 170 calculates the impulse response hw(n) of the auditory weighted synthesis filter for a predetermined number of samples. For a specific calculation method, reference can be made to the above-mentioned document 1, etc.

音源コードブック探索回路２３０は、ガウス性雑音信号
あるいは多量の音声信号に対して学習して構成した音源
コードブック２３５を探索し、最適なコードベクトルｃ
　ｒ　Ｊ（ｎ＞を探索する。ここでガウス性雑音信号か
らなるコードブックの構成法は前記文献１等を参照でき
る。また、学習によりコードブックを構成する方法は、
前記文献４や、文献９等を参照できる。また、最適なコ
ードベクトルの探索法、最適なゲインγの計算法は前記
文献１や、文献９等を参照できる。The sound source codebook search circuit 230 searches a sound source codebook 235 constructed by learning from Gaussian noise signals or a large amount of speech signals, and finds the optimal code vector c.
Search r
The above-mentioned document 4, document 9, etc. can be referred to. Further, for the optimal code vector search method and the optimal gain γ calculation method, reference can be made to the aforementioned document 1, document 9, and the like.

さらにコードブック探索回路２３０では、求めたゲイン
を予め定められた量子化ビット数で量子化し、選択され
た最適コードベクトルＣｊ　（ｎ）に量子化ゲインＴ′
を下式により乗して音源信号ｑ（ｎ）を求め加算器２９
０へ出力する。Furthermore, the codebook search circuit 230 quantizes the obtained gain with a predetermined number of quantization bits, and adds a quantization gain T' to the selected optimal code vector Cj (n).
is multiplied by the following formula to obtain the sound source signal q(n), and the adder 29
Output to 0.

ｑ（ｎ）−γ’ｃＪ（ｎ）　　　　　　　　　　　　　
　００加算器２９０は、適応コードブック２１０の出方
音源信号とコードブック探索回路の出方音源信号とを下
式により加算し、合成フィルタ２８１に出方する。q(n)-γ'cJ(n)
The 00 adder 290 adds the output excitation signal of the adaptive codebook 210 and the output excitation signal of the codebook search circuit according to the following formula, and outputs the result to the synthesis filter 281.

ｖ（ｎ）＝β’ｖ（ｎ−Ｍ）　＋　７　ｃｊ（ｎ）　　
　　　　　　　０２）合成フィルタ２８１は、加算器２
９０の出力ｖ　（ｎ）を入力し、下式により合成音声を
１フレーム分求め、さらにもう１フレ一ム分はＯの系列
をフィルタに入力して応答信号系列を求め、１フレ一ム
分の応答信号系列を減算器１９０に出力する。v(n) = β'v(n-M) + 7 cj(n)
02) The synthesis filter 281 is the adder 2
Input the output v (n) of 90 and obtain one frame of synthesized speech using the formula below, and for another frame, input the O sequence to the filter to obtain the response signal sequence, and obtain one frame of synthesized speech. The response signal sequence of is output to the subtracter 190.

ただし、０式において、δは聴感重み付は回路２００における聴
感重み付けの度合を決める聴感重み付は係数であり、０
〈δ〈１の値をとる。However, in Equation 0, δ is an audible weighting coefficient that determines the degree of audible weighting in the circuit 200, and 0
〈δ〈takes the value of 1.

マルチプレクサ２６０は、ＬＳＰ量子化器１４０．適応
コードブック２１Ｏ１音源コードブツク探索回路２３０
の出力符号系列を組み合わせて出力する。Multiplexer 260 includes LSP quantizers 140 . Adaptive codebook 21O1 sound source codebook search circuit 230
The output code sequences of are combined and output.

第２の発明の音声信号符号化方式を実施する音声信号符
号化装置を示すブロック図を第３図に示す。図において
、第１図と同一の番号を付した構成要素は、第１図と同
様の動作をするので説明は省略する。第３図では、ＬＳ
Ｐ＠子化回路３４０．　ＬＳＰコードブック３４５．メ
ルケプストラムコードブック３４６が第１図と異なる。A block diagram showing an audio signal encoding device implementing the audio signal encoding method of the second invention is shown in FIG. In the figure, the components labeled with the same numbers as in FIG. 1 operate in the same manner as in FIG. 1, and therefore their explanation will be omitted. In Figure 3, LS
P@ child circuit 340. LSP codebook 345. The mel cepstrum codebook 346 is different from that in FIG.

第４図はＬＳＰ量子化回路３４０の処理の流れを示す図
である。FIG. 4 is a diagram showing the processing flow of the LSP quantization circuit 340.

第４図において、線形予測係数α、をＬＰＣケプストラ
ムに一旦変換しくステップ５２１）　、ＬＰＣケプスト
ラムからさらにメルケプストラムに変換する（ステップ
５２２）。さらにメルケプストラムからメルＬＰＣケプ
ストラムへ逆変換してメル線形予測係数へ逆変換する（
ステップ５２３）。さらにこれをＬＳＰに変換すること
により、メルＬＳＰ　（ＬＳＰ、　ｉ　）を計算する（
ステップ５２４）。In FIG. 4, the linear prediction coefficient α is first converted into an LPC cepstrum (step 521), and the LPC cepstrum is further converted into a mel cepstrum (step 522). Furthermore, the mel cepstrum is inversely transformed to the mel LPC cepstrum, and then inversely transformed to the mel linear prediction coefficient (
Step 523). Furthermore, by converting this to LSP, mel LSP (LSP, i) is calculated (
step 524).

メルＬＳＰに対して、（４）式あるいは、（５）式に基
づき、フレーム間の差分あるいは、次数間のメルＬＳＰ
の差分を計算する（ステップ５２５）。以下では、同一
のフレームにおいて次数間で差分信号を求める場合につ
いて説明する。そして、多量の差分信号に対して予め学
習して構成したメルＬＳＰコードブック３４５を用いて
、差分信号に対してベクトル量子化を行う（ステップ５
２６）。ここでベクトル量子化コードブックの探索は、
前記作用の項で述べたように、メルケブストラム上での
２乗距離あるいは、重み付け２乗距離を用いる。つまり
、入力音声から求めたメルＬＳＰに対応したメルケプス
トラムＣ，（ｎ）　　と、メルＬＳＰコードブック３４
５の各コードベクトルに対応したノルケプストラムコー
ドブック３４６中のメルケブストラムＣＩＩｊ’　（ｎ
）　　との距離を（２）式あるいは（３）式により求め
、これを最小にするコードベクトルｊを選択する。そし
て選択されたコードベクトルＣ＋＊ｊ　（ｎ）に対応す
るメルＬＳＰコードベクトルＬＳＰ’□ｊをＬＳＰコー
ドブック３４５から選択する。そして次式によりメルＬ
ＳＰを復号化する（ステップ５２７）。For Mel LSP, the difference between frames or the Mel LSP between orders is calculated based on equation (4) or equation (5).
(step 525). In the following, a case will be described in which a difference signal is obtained between orders in the same frame. Then, vector quantization is performed on the difference signal using the Mel LSP codebook 345 that has been learned and configured in advance for a large amount of difference signals (step 5
26). Here, the vector quantization codebook search is
As mentioned in the section on the effect, the square distance on the Melkebstrum or the weighted square distance is used. In other words, the mel cepstrum C,(n) corresponding to the mel LSP obtained from the input voice and the mel LSP codebook 34
Melkebstrum CIIj' (n
) is determined by equation (2) or (3), and a code vector j that minimizes this distance is selected. Then, a mel LSP code vector LSP'□j corresponding to the selected code vector C+*j (n) is selected from the LSP code book 345. Then, by the following formula, Mel L
The SP is decoded (step 527).

ＬＳＰ’、１＝ＬＳＰ’、ｉ−１＋Δｔ、ｓｐ’、、、
　　　　　　　　Ｑ５）ここでΔｔ、ｓｐ’、、、は、
メルＬＳＰコードブック３４５において選択されたコー
ドベクトルである。LSP', 1=LSP', i-1+Δt, sp',...
Q5) Here, Δt, sp',...
This is a code vector selected in the Mel LSP codebook 345.

次に、入力のメルＬＳＰと、ベクトル量子化復号化され
たメルＬＳＰの誤差を求め（ステップ５２８）、第１図
のＬＳＰ量子化回路１４０と同様にスカラ量子化して出
力する（ステップ５２９）。Next, the error between the input mel LSP and the vector quantized decoded mel LSP is determined (step 528), and the result is scalar quantized and output in the same way as the LSP quantization circuit 140 in FIG. 1 (step 529).

上述の各実施例では、文献１のＣＥＬＰ方式に通用する
例について説明したが、これ以外の周知な方式に適用す
ることもできる。例えば、音源信号を２種類の異なるコ
ードブックで表す改良ＣＥＬＰに適用することもできる
。改良ＣＥＬＰ方式は、前記文献９等を参照できる。さ
らに、前記文献２等に記されたマルチパルス方式に適用
することもできる。In each of the above-mentioned embodiments, an example applicable to the CELP method of Document 1 has been described, but it is also possible to apply to other well-known methods. For example, it can also be applied to improved CELP in which a sound source signal is represented by two different codebooks. For the improved CELP method, reference can be made to the above-mentioned document 9 and the like. Furthermore, it is also possible to apply the multi-pulse method described in Document 2 and the like.

また、各実施例では、スペクトルパラメータを聴覚領域
でベクトルースカラ量子化する例について説明したが、
スペクトルパラメータを聴覚領域でベクトル量子化し、
原パラメータとベクトル量子化復号化したパラメータと
の差分を、例えば予め定められた統計的性質を有する乱
数コードブックを用いてベクトル量子化することもでき
る。このようにすると、さらに量子化ビット数を低減す
ることが可能となる。Furthermore, in each embodiment, an example was explained in which spectral parameters are vector-scalar quantized in the auditory domain.
Vector quantize the spectral parameters in the auditory domain,
The difference between the original parameter and the vector-quantized decoded parameter can also be vector-quantized using, for example, a random number codebook having predetermined statistical properties. In this way, it becomes possible to further reduce the number of quantization bits.

また、メルＬＳＰの量子化法としては、さらに効率のよ
いマトリクス量子化や、トレリス量子化。In addition, quantization methods for Mel LSP include matrix quantization and trellis quantization, which are more efficient.

有限状態ベクトル量子化法などを適用できるにれらの量
子化法の詳細については、Ｇｒａｙ氏による＠Ｖｅｃｔ
ｏｒ　ｑｕａｎｔｉｚａｔｉｏｎ”　と題した論文（Ｊ
ＥＥＲＡＳＳＰＭａｇ、、　ｐｐ、４−２９．１９８４
）　（文献１１）等に記載されている。For details on these quantization methods, which can be applied to finite state vector quantization methods, etc., see @Vect by Mr. Gray.
or quantization” (J
EERASSPMag, pp, 4-29.1984
) (Reference 11) etc.

また、メモリ量低減のためには、メルケブストラムコー
ドブック１４６．３４６を除去し、メルＬＳＰのベクト
ル量子化の際に、メルＬＳＰコードベクトルをメルケプ
ストラムに変換するようにしてもよい。Furthermore, in order to reduce the amount of memory, the mel-kebstrum codebook 146.346 may be removed, and the mel-LSP code vector may be converted to a mel-cepstrum during vector quantization of the mel-LSP.

さらに、メルＬＳＰ上でコードブック探索を行ってもよ
い。Furthermore, a codebook search may be performed on the Mel LSP.

〔Effect of the invention〕

以上述べたように、本発明によれば、音声信号のスペク
トルパラメータを聴覚の特性に対応した非線形交換を施
してベクトル量子化を行うか、あるいは、非線形変換を
施したパラメータにおいて、フレーム間あるいは同一フ
レーム内で次数間の差分を求めこの差分をベクトル量子
化しているので、きわめて効率的な量子化が可能となり
、低ビツトレートでも音質の良好な音声符号化方式を提
供できるという効果がある。As described above, according to the present invention, vector quantization is performed by subjecting the spectral parameters of an audio signal to nonlinear exchange corresponding to auditory characteristics, or parameters subjected to nonlinear transformation are changed between frames or at the same time. Since the difference between orders is determined within a frame and this difference is vector quantized, extremely efficient quantization is possible, and the effect is that it is possible to provide an audio encoding system with good sound quality even at a low bit rate.

[Brief explanation of drawings]

第１図は第１の発明による音声信号符号化方式を実施す
る音声信号符号化装置を示すブロック図、第２図は第１
図のＬＳＰ量子化回路の処理の流れを示すブロック図、第３図は第２の発明による音声信号符号化方式を実施す
る音声信号符号化装置を示すブロック図、第４図は第３
図のＬＳＰ量子化回路の処理の流れを示す図である。１１０　　・・・・・１３０　　・・・・・１４０、３４０・・・１４５、３４５・・・１４６、３４６・・・バッファメモリＬＰＧ計算回路ＬＳＰｉｉ子化回路ＬＳＰコードブックメルケプストラムコードブックサブフレーム分割回路インパルス応答計算＠路減算器重み付は回路遅延回路適応コードブック音源コードブック探索回路音源コードブック合成フィルタ１９０゜２００　・２０６　・２１０　・２３０　・２３５　・２８１　　・２６０　　・・・・・マルチプレクサFIG. 1 is a block diagram showing an audio signal encoding device implementing the audio signal encoding method according to the first invention, and FIG.
FIG. 3 is a block diagram showing the processing flow of the LSP quantization circuit shown in FIG.
It is a figure which shows the flow of a process of the LSP quantization circuit of a figure. 110... 130... 140, 340... 145, 345... 146, 346... Buffer memory LPG calculation circuit LSPii childization circuit LSP code book Mel cepstrum code book Subframe division circuit Impulse response calculation @ path subtractor Weighting circuit Delay circuit Adaptive codebook Sound source codebook search circuit Sound source codebook Synthesis filter 190° 200 ・ 206 ・ 210 ・ 230 ・ 235 ・ 281 ・ 260 ・・・・・・Multiplexer

Claims

[Claims]

(1) Divide the input discrete audio signal into frames of a predetermined time length, approximate the audio signal using an all-pole model to obtain spectral parameters representing the spectral envelope of the audio signal, and is divided into small sections of a predetermined time length, and the pitch parameter is determined so that the signal reproduced based on the past sound source signal is close to the audio signal, and the sound source signal of the audio signal is An audio signal encoding system that represents and outputs a codebook or multipulse, characterized in that the spectral parameters are non-linearly transformed so as to correspond to auditory characteristics, and are represented and output using a preconfigured codebook. method.

(2) Divide the input discrete audio signal into frames of a predetermined time length, approximate the audio signal using an all-pole model to obtain spectral parameters representing the spectral envelope of the audio signal, and is divided into small sections of a predetermined time length, and the pitch parameter is determined so that the signal reproduced based on the past sound source signal is close to the audio signal, and the sound source signal of the audio signal is In a speech encoding method that outputs a codebook or multi-pulse representation, the spectral parameters are nonlinearly transformed to correspond to auditory characteristics, and the nonlinearly transformed spectral parameters are calculated as differences or prediction errors between frames, or in the same frame. An audio signal encoding method characterized in that a difference between parameters or a prediction error is represented by a preconfigured codebook and output.