JP3510168B2

JP3510168B2 - Audio encoding method and audio decoding method

Info

Publication number: JP3510168B2
Application number: JP34985799A
Authority: JP
Inventors: 祐介日和▲崎▼; 一則間野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1999-12-09
Filing date: 1999-12-09
Publication date: 2004-03-22
Anticipated expiration: 2019-12-09
Also published as: JP2001166800A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、無声区間の処理に
おいて音声の信号系列を少ない情報量でディジタル符号
化する高能率音声符号化方法及び復号化方法に関するも
のである。特に、本発明は従来のボコーダと呼ばれる音
声分析合成系の領域である2.0kbit/s以下のビットレー
トで高品質な音声符号化を実現するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a high-efficiency speech coding method and decoding method for digitally coding a speech signal sequence with a small amount of information in the processing of unvoiced sections. In particular, the present invention realizes high-quality voice encoding at a bit rate of 2.0 kbit / s or less, which is a region of a conventional voice analysis / synthesis system called a vocoder.

【０００２】[0002]

【従来の技術】本発明に関連する従来技術として、線形
予測ボコーダ、符号励振線形予測符号化(CELP:Code Exc
ited Linear Prediction)がある。線形予測ボコーダ
は、4.8kbit/s以下の低ビットレート領域における音声
符号化方法としてこれまで広く用いられ、PARCOR方式
や、線スペクトル対(LSP:LineSpectrum Pair)方式など
の方式がある。これらの方法の詳細は、たとえば斎藤、
中田著「音声情報処理の基礎」（オーム社出版）に記載
されている。線形予測ボコーダは、音声のスペクトル包
絡特性を表す全極型のフィルタとそれを駆動する励振信
号によって構成される。励振信号には、有声区間に対し
てはパルス系列、無声区間に対しては白色雑音が用いら
れる。しかし、線形予測ボコーダでは、白色雑音による
励振信号は音声波形の特徴、特に破裂音と摩擦音の両方
をうまく再現するには不十分なため、自然性の高い合成
音声を得ることは困難である。2. Description of the Related Art As a prior art related to the present invention, a linear predictive vocoder and a code-excited linear predictive coding (CELP: Code Exc.
ited Linear Prediction). The linear prediction vocoder has been widely used as a speech coding method in a low bit rate region of 4.8 kbit / s or less, and includes a PARCOR method and a line spectrum pair (LSP) method. For details on these methods, see Saito,
It is described in "Basics of Speech Information Processing" by Nakata (published by Ohmsha). The linear predictive vocoder is composed of an all-pole filter that represents the spectral envelope characteristic of speech and an excitation signal that drives it. For the excitation signal, a pulse sequence is used for the voiced section and white noise is used for the unvoiced section. However, in a linear predictive vocoder, the excitation signal due to white noise is not sufficient to successfully reproduce the characteristics of the speech waveform, especially both plosive and fricative sounds, so it is difficult to obtain synthetic speech with high naturalness.

【０００３】一方、符号励振線形予測符号化では、雑音
系列を励振信号として音声の近接相関とピッチ相関特性
をあらわす２つの全極型フィルタを駆動することにより
音声を合成する。雑音系列は複数個の符号パターンとし
てあらかじめ用意され、その中から、入力音声波形と合
成音声波形との誤差を最小とする符号パターンが選択さ
れる。その詳細は、文献Schroeder:“Code-Excited Lin
ear Prediction(CELP): High Quality Speech at Very
Low Bit Rates,” Proc. IEEE ICASSP, pp937-940,198
5.に記載されている。符号励振線形予測符号化では、再
現精度は符号パターンの数に依存する関係にある。した
がって、多くの符号パターンを用意すれば音声波形の再
現精度が高まりそれにともなって品質を高めることが出
来る。しかし、音声符号化のビットレートを4kbit/s以
下にすると、符号パターンの数が制限され、その結果十
分な音声品質が得られなくなる。良好な音声品質を得る
には4.8kbit/s程度の情報量が必要であるとされてい
る。On the other hand, in code-excited linear predictive coding, speech is synthesized by driving two all-pole filters that represent the proximity correlation and pitch correlation characteristics of speech using a noise sequence as an excitation signal. The noise sequence is prepared in advance as a plurality of code patterns, and a code pattern that minimizes the error between the input speech waveform and the synthesized speech waveform is selected from among them. For details, refer to the document Schroeder: “Code-Excited Lin.
ear Prediction (CELP): High Quality Speech at Very
Low Bit Rates, ”Proc. IEEE ICASSP, pp937-940,198
It is described in 5. In code-excited linear predictive coding, the reproduction accuracy depends on the number of code patterns. Therefore, if a large number of code patterns are prepared, the reproduction accuracy of the voice waveform is increased and the quality can be improved accordingly. However, if the bit rate of voice coding is set to 4 kbit / s or less, the number of code patterns is limited, and as a result, sufficient voice quality cannot be obtained. It is said that an amount of information of about 4.8 kbit / s is required to obtain good voice quality.

【０００４】[0004]

【発明が解決しようとする課題】本発明の課題は、雑音
系列を励振信号として用いる線形予測符号化に関して、
より能率的な無声区間の音声波形の量子化を実現する方
法を提供することである。また、ボコーダ方式におい
て、音声の有声無声判別は必ず誤りが含まれ、有声区間
をパワーのみ一致させた白色雑音で駆動すると、著しい
品質劣化が生じる。An object of the present invention is to perform linear predictive coding using a noise sequence as an excitation signal,
It is an object of the present invention to provide a method for realizing more efficient quantization of a voice waveform in an unvoiced section. Further, in the vocoder system, voiced and unvoiced discrimination of speech always includes an error, and when the voiced section is driven by white noise in which only power is matched, a remarkable deterioration in quality occurs.

【０００５】[0005]

【課題を解決するための手段】前記課題を解決するため
に請求項１に記載の発明は、音声信号をフレームごとに
線形予測分析して線形予測分析係数を求め、前記線形予
測分析係数に基づくフィルタ係数を用いた線形予測合成
フィルタを駆動して得られた残差信号の特徴量を量子化
した符号を決定する音声符号化方法であって、前記特徴
量として、前記残差信号の周期性を判定し、前記周期性
が予め定められた閾値より低い場合、前記残差信号を低
域成分と高域成分とに帯域分割し、前記低域成分との距
離が最小となる雑音符号ベクトルに対応する雑音符号を
選択し、前記高域成分は前記フレームを構成するサブフ
レームごとの平均パワーを算出することを特徴とする。In order to solve the above-mentioned problems, the invention according to claim 1 determines a linear prediction analysis coefficient by performing a linear prediction analysis of a speech signal for each frame, and based on the linear prediction analysis coefficient. A speech coding method for determining a code obtained by quantizing a feature amount of a residual signal obtained by driving a linear prediction synthesis filter using a filter coefficient, wherein the feature amount is a periodicity of the residual signal. If the periodicity is lower than a predetermined threshold value, the residual signal is band-divided into a low-frequency component and a high-frequency component, and a noise code vector having a minimum distance to the low-frequency component is obtained. It is characterized in that a corresponding noise code is selected, and the high frequency component calculates an average power for each sub-frame forming the frame.

【０００６】請求項２に記載の発明は、請求項１に記載
の音声符号化方法において、前記算出された平均パワー
を正規化し、正規化平均パワーを算出すると共にそのス
ケール係数を計算することを特徴とする。請求項３に記
載の発明は、請求項１又は２に記載の音声符号化方法に
おいて、前記残差信号の低域成分波形のサンプル点の間
引きを行うことにより低域成分とし、この低域成分との
距離が最小となる雑音符号ベクトルに対応する雑音符号
を選択することを特徴とする。According to a second aspect of the present invention, in the speech coding method according to the first aspect, the calculated average power is normalized, the normalized average power is calculated, and the scale factor thereof is calculated. Characterize. According to a third aspect of the present invention, in the voice coding method according to the first or second aspect, the low-frequency component is obtained by thinning out sampling points of the low-frequency component waveform of the residual signal. It is characterized by selecting a random code corresponding to a random code vector that minimizes the distance between and.

【０００７】請求項４に記載の発明は、線形予測合成フ
ィルタに励振源を入力して音声信号を復号化する音声復
号化方法であって、請求項１に記載の音声符号化方法に
より生成された雑音符号と平均パワーを入力して、低域
成分の波形は、前記雑音符号に基づいて、符号帳から復
号し、高域成分の波形は、高域通過フィルタを通した白
色雑音を、量子化された前記平均パワーを元にサブフレ
ーム毎に利得を乗じて合成し、これらの二つの帯域の波
形を足し合わせて、線形予測合成フィルタの励振源とす
ることを特徴とする。The invention described in claim 4 is a speech decoding method for inputting an excitation source to a linear prediction synthesis filter to decode a speech signal, which is generated by the speech coding method according to claim 1. The noise code and the average power are input, the waveform of the low frequency component is decoded from the codebook based on the noise code, and the waveform of the high frequency component is the white noise that has passed through the high pass filter. Based on the converted average power, the gain is multiplied for each subframe to synthesize, and the waveforms of these two bands are added together to form an excitation source of the linear prediction synthesis filter.

【０００８】請求項５に記載の発明は、線形予測合成フ
ィルタに励振源を入力して音声信号を復号化する音声復
号化方法であって、請求項２に記載の音声符号化方法に
より生成された正規化平均パワーとスケール係数および
雑音符号を入力し、低域成分の波形は、前記雑音符号に
基づいて符号帳から復号し、高域成分の波形は、前記正
規化平均パワーと前記スケール係数を乗じて合成し、こ
れらの二つの帯域の波形を足し合わせて線形予測合成フ
ィルタの励振源とすることを特徴とする。A fifth aspect of the present invention is a voice decoding method for inputting an excitation source to a linear prediction synthesis filter to decode a voice signal, which is generated by the voice encoding method according to the second aspect. The normalized average power and the scale factor and the noise code are input, the waveform of the low frequency component is decoded from the codebook based on the noise code, and the waveform of the high frequency component is the normalized average power and the scale coefficient. It is characterized in that the waveforms of these two bands are added together to form the excitation source of the linear prediction synthesis filter.

【０００９】請求項６に記載の発明は、請求項４又は５
に記載の音声復号化方法において、音声符号化方法にお
いて生成された残差信号の低域成分波形のサンプル点の
間引きを行うことにより低域成分とし、この低域成分と
の距離が最小となる雑音符号ベクトルに対応する選択し
た雑音符号を入力し、前記雑音符号に基づきサンプリン
グ変換によって間引いたサンプル点の再計算を行うこと
を特徴とする。本発明は、前記構成を備えることにより
音声の無声区間の雑音系列を帯域分割することによっ
て、低域は波形符号化、高域は平均パワーの量子化と併
用することによって、低いビットレートでより効率的に
量子化して品質の向上を図ることを特徴とする。The invention according to claim 6 is the invention according to claim 4 or 5.
In the speech decoding method described in (1), the sampling points of the low-frequency component waveform of the residual signal generated in the speech coding method are thinned to form a low-frequency component, and the distance to this low-frequency component is minimized. It is characterized in that the selected noise code corresponding to the noise code vector is input, and the sample points thinned out by sampling conversion based on the noise code are recalculated. According to the present invention, by using the above configuration, the noise sequence in the unvoiced section of the speech is band-divided, so that the low frequency band is used in combination with the waveform coding and the high frequency band is used in the quantization of the average power. It is characterized by efficient quantization to improve quality.

【００１０】[0010]

【発明の実施の形態】実施例１図１に、この発明の量子化方法を適用した符号化部の機
能構成を示す。符号化器は、以下の手順をＮサンプル数
の長さをもつフレームごとに１回行う。フレームiにお
いて、入力端子TI1よりの入力音声信号s(t)のp次の線形
予測係数(LPC)α_j(j=0,1,・・・,p-1)を線形予測係数計算
部１で計算する。この線形予測係数αは線形予測係数量
子化部２で量子化され、線形予測係数符号I₁として送出
される。線形予測係数αの量子化の詳細については「音
声の線形予測パラメータ符号化方法」（特願平3-180819
号：特開平5-27798号公報）に記載されている。線形予
測係数量子化部２よりの線形予測係数符号I₁は復号さ
れ、その復号された線形予測係数α´に基づいて、線形
予測逆フィルタ３のフィルタ係数を定め、この線形予測
逆フィルタ３に入力音声信号s(t)を通して残差信号ｒ
(t)を得る。線形予測逆フィルタ３は次の伝達特性をも
つデジタルフィルタA(z)で実現される。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiment 1 FIG. 1 shows a functional configuration of an encoding unit to which the quantization method of the present invention is applied. The encoder performs the following procedure once for each frame having a length of N samples. In the frame i, the p-th order linear prediction coefficient (LPC) α _j (j = 0,1, ..., p-1) of the input speech signal s (t) from the input terminal TI1 is calculated by the linear prediction coefficient calculation unit 1 Calculate with. This linear prediction coefficient α is quantized by the linear prediction coefficient quantizer 2 and sent as a linear prediction coefficient code I ₁ . For details of the quantization of the linear prediction coefficient α, refer to “Speech Linear Prediction Parameter Coding Method” (Japanese Patent Application No. 3-180819).
No .: JP-A-5-27798). The linear prediction coefficient code I ₁ from the linear prediction coefficient quantization unit 2 is decoded, the filter coefficient of the linear prediction inverse filter 3 is determined based on the decoded linear prediction coefficient α ′, and the linear prediction inverse filter 3 Residual signal r through input voice signal s (t)
get (t). The linear prediction inverse filter 3 is realized by a digital filter A (z) having the following transfer characteristic.

【００１１】 A(z)＝１＋α₁z^-1＋・・・＋α_pz^-p （１）ここで得られた残差信号ｒの相関（偏相関関数）ρを相
関計算部４で計算し、その相関ρの最大値をρ_maxとす
る。このとき、周期性判定部５で入力音声信号s(t)が有
声部であるか無声部であるかを、例えば、以下の様に閾
値θ(0.5〜1.0)で判別し、周期性符号I₂を出力する。 k₁/2＋ρ_max ＞θ；有声部 k₁/2＋ρ_max ＜θ；無声部（２）ここで、k₁は線形予測係数計算部１で求まる第１次の偏
自己相関(PARCOR)係数である。A (z) = 1 + α ₁ z ⁻¹ + ... + α _p z ^-p (1) The correlation calculation unit 4 calculates the correlation (partial correlation function) ρ of the residual signal r obtained here. , The maximum value of the correlation ρ is ρ _max . At this time, the periodicity determination unit 5 determines whether the input voice signal s (t) is a voiced portion or an unvoiced portion by a threshold value θ (0.5 to 1.0) as described below, and the periodicity code I Output ₂ _{_{k 1/2 + ρ max>}} θ; voiced portion _{_{k 1/2 + ρ max <}} θ; unvoiced portion (2) where, k ₁ is the first-order partial autocorrelation which is obtained by the linear prediction coefficient calculator 1 (PARCOR) coefficients .

【００１２】周期性判定部５が有声区間と判断すると、
スイッチSW1を有声区間量子化部６に切り換えて残差信
号ｒ(t)を有声区間量子化部６で量子化を行い、符号化
出力I ₃を出力する。なお、有声区間量子化の詳細につい
ては「音声符号化方法」（特願平11−108161号）に記載
されている。周期性判定部５が無声区間と判断した場合
は、スイッチSW1を無声区間量子化部７に切り換えて残
差信号ｒ(t)を無声区間量子化部７で量子化を行い、符
号化出力I₄を出力する。When the periodicity determining section 5 determines that the section is a voiced section,
Switch the switch SW1 to the voiced section quantizer 6 to change the residual signal.
No. r (t) is quantized by the voiced section quantizer 6 and encoded.
Output I ₃Is output. For details on voiced interval quantization,
Is described in "Voice coding method" (Japanese Patent Application No. 11-108161).
Has been done. When the periodicity determination unit 5 determines that the unvoiced section
Switches the switch SW1 to the unvoiced interval quantizer 7
The difference signal r (t) is quantized by the unvoiced section quantizer 7,
Encoded output I_FourIs output.

【００１３】図２に無声区間量子化部Ａの詳細を示す。
線形予測逆フィルタ３からの線形予測残差ｒに基づき平
均パワー計算部９で、フレームを時間方向にn_sfr 分割
した長さN_sfr=N/n_sfrのサブフレーム単位毎に線形予測
残差の平均パワーｐ（平均パワー系列の行列表現Ｐの要
素）を計算する。この計算には、以下の式を用いる。FIG. 2 shows details of the unvoiced section quantizer A.
On the basis of the linear prediction residual r from the linear prediction inverse filter 3, the average power calculation unit 9 divides the frame in the time direction by n _sfr to calculate the linear prediction residual for each subframe unit of length N _sfr = N / n _sfr . Calculate the average power p (element of the matrix representation P of the average power sequence). The following formula is used for this calculation.

【００１４】[0014]

【数１】このとき、０＜i＜n_fsr（n_fsr：1フレーム中のサブフレ
ームの個数）である。ここで求まった平均パワーＰは、
その平均パワーの１フレーム分を平均パワー量子化部10
でベクトル量子化し、無声部符号I_4-1として出力する。
ベクトル量子化の符号選択時の選択尺度dの計算には、
以下の（４）式を用いる。[Equation 1] At this time, 0 <i <n _fsr (n _fsr : the number of subframes in one frame). The average power P obtained here is
One frame of the average power is converted to the average power quantizer 10
The vector is quantized by and output as unvoiced part code I _4-1 .
To calculate the selection scale d when selecting the vector quantization code,
The following equation (4) is used.

【００１５】ｄ＝‖Ｐ−c_i‖² （４）このとき、Ｐは平均パワー系列のベクトル、c_iは評価す
る符号ベクトルである。また、線形予測残差信号ｒはカ
ットオフ周波数f_cHzの低域通過フィルタ演算部11でフィ
ルタ処理を行うことによって、低域のみの線形予測残差
r_lを計算する。このカットオフ周波数f_cHzには500Hzか
ら1000Hzを用いる。D = ‖P−c _i ‖ ² (4) At this time, P is the vector of the average power sequence, and c _i is the code vector to be evaluated. In addition, the linear prediction residual signal r is filtered by the low-pass filter calculation unit 11 having a cutoff frequency f _c Hz to obtain a linear prediction residual of only the low band.
Calculate r _l . The cutoff frequency f _c Hz is 500 Hz to 1000 Hz.

【００１６】次に、低域予測残差のベクトル量子化を行
う。残差信号ｒは低域通過フィルタ演算部11を介して生
成された低域予測残差信号r_lと雑音符号帳12からの雑音
符号ベクトルｃ(= c_i )との距離計算は距離計算部13で
以下の（５）式を用いて計算する。ｄ＝‖r_l−c_i‖² （５）ここで、r_l , c_iはそれぞれ低域予測残差波形、評価す
る符号ベクトルの行列表現である。Next, vector quantization of the low band prediction residual is performed. The residual signal r is the distance calculation unit for calculating the distance between the low-frequency prediction residual signal r _l generated via the low-pass filter calculation unit 11 and the noise code vector c (= c _i ) from the noise codebook 12. Calculation is performed using the following equation (5) in 13. d = ‖r _l −c _i ‖ ² (5) Here, r _l and c _i are the low-frequency prediction residual waveform and the matrix representation of the code vector to be evaluated, respectively.

【００１７】そして、距離計算部13で距離が最小となる
ベクトルｃに対応する雑音符号を選択し、選択された符
号ベクトルの符号はI_4-2として出力される。次に図１、
２に示した符号化方法の実施例と対応した、復号化方法
の実施例を適用した復号化器Ａを図５に示す。ここで
は、入力端子TI2に入力された符号 I₁〜I₄（すなわち、
ビットストリーム）はデマルチプレクサ29で全ての音声
パラメータが分離復号された後、有声区間音源合成部21
と無声区間音源合成部22において無声・有声パラメータ
I₃、I₄によって励振信号を生成する。Then, the distance calculation unit 13 selects a noise code corresponding to the vector c having the minimum distance, and the code of the selected code vector is output as I _4-2 . Next,
FIG. 5 shows a decoder A to which the embodiment of the decoding method corresponding to the embodiment of the encoding method shown in FIG. 2 is applied. Here, the codes I _{1 to} I ₄ input to the input terminal TI2 (that is,
After demultiplexing 29, all voice parameters are separated and decoded, and then a voiced section sound source synthesis unit 21
And unvoiced / voiced parameters in the unvoiced section sound source synthesis unit 22
An excitation signal is generated by I ₃ and I ₄ .

【００１８】周期性信号I₂によりスイッチSW2を切り換
えて無声区間を示す時は無声区間音源合成部22からの合
成励振信号を、I₂が有声区間を示す時は有声区間音源合
成部21からの合成励振信号ｅを用いて線形予測合成フィ
ルタ23を駆動し、出力音声を出力端子TO2に得る。ここ
で、線形予測係数符号I₁ は線形予測係数復号部20で復
号され、線形予測合成フィルタ23に出力される。When the switch SW2 is switched by the periodic signal I ₂ to indicate the unvoiced section, the synthetic excitation signal from the unvoiced section sound source synthesis unit 22 is output, and when I ₂ indicates the voiced section, the voiced section sound source synthesis unit 21 outputs the synthesized excitation signal. The linear predictive synthesis filter 23 is driven by using the synthetic excitation signal e, and the output voice is obtained at the output terminal TO2. Here, the linear prediction coefficient code I ₁ is decoded by the linear prediction coefficient decoding unit 20 and output to the linear prediction synthesis filter 23.

【００１９】有声区間の復号化方法については、例え
ば、「音声符号化方法」(特願平11-108161号)に記載の
ものを用いる。図６に、無声区間音源合成部Ａの詳細を
示す。無声区間では、まず無声部符号I_4-1に基づき平均
パワー復号化部24で平均パワー系列p_iを復号化する。次
に白色雑音生成部25で生成された白色雑音を乗算器26で
平均パワーｐと乗算し、カットオフ周波数f_cHzの高域
通過フィルタ演算部27の処理を行い、先の平均パワー系
列の各係数p_iにf_c/4000を乗じた系列p_i ^~となるように利
得調整を行い、無声高域駆動音源信号e_hを生成する。こ
の処理は、f_cHzまでの帯域の信号は、雑音信号符号帳28
より生成されるため、残りのf_cHzから4000Hzまでの信号
パワーは、p_iのほぼf_c/4000倍となるためである。As a method of decoding the voiced section, for example, the method described in "Voice coding method" (Japanese Patent Application No. 11-108161) is used. FIG. 6 shows details of the unvoiced section sound source synthesis unit A. In the unvoiced section, first, the average power decoding unit 24 decodes the average power sequence p _i based on the unvoiced part code I _4-1 . Next, the white noise generated by the white noise generation unit 25 is multiplied by the average power p in the multiplier 26, and the processing of the high pass filter calculation unit 27 of the cutoff frequency f _c Hz is performed, and the average power p Gain adjustment is performed so that each coefficient p _i is multiplied by f _c / 4000 to obtain a sequence p _i ^~, and the unvoiced high-frequency drive sound source signal e _h is generated. In this process, signals up to f _c Hz are
This is because the remaining f _c Hz to 4000 Hz signal power is approximately f _c / 4000 times p _i because it is generated more.

【００２０】また、無声区間低域駆動音源e_l は、符号
ベクトルの符号I_4-2に基づいて雑音符号帳28より復号さ
れる。次に、無声高域駆動音源信号e_hと無声区間低域駆
動音源e_lは加算器35で足し合わされ、無声区間の駆動音
源eとして、線形予測合成フィルタへ入力され、出力音
声が得られる。実施例２図１中の平均パワーを正規化して量子化する場合の実施
例の無声区間量子化部Ｂを図３に示す。Further, the unvoiced section low frequency drive excitation e _l is decoded from the random codebook 28 based on the code I _{4-2 of the} code vector. Next, the unvoiced high-frequency driving sound source signal e _h and the unvoiced section low-frequency driving sound source e _l are added together by the adder 35, and are input to the linear prediction synthesis filter as the unvoiced section driving sound source e to obtain the output speech. Embodiment 2 FIG. 3 shows an unvoiced section quantizing unit B of an embodiment when the average power in FIG. 1 is normalized and quantized.

【００２１】残差信号ｒより平均パワー計算部９で求め
た平均パワー系列p_i は以下の式に基づいて正規化され
る。 p_i ^~＝p_i／s_p （６）ここで、スケール係数s_pは、The average power sequence p _i obtained by the average power calculation unit 9 from the residual signal r is normalized based on the following equation. at _{^{_{p i ~ = p i / s}}} p (6) Here, the scale factor s _p is,

【００２２】[0022]

【数２】より計算する。スケール係数s_pはスケール係数量子化部
16でスカラー量子化、正規化系列p_i ^~は正規化平均パワ
ー量子化部15でベクトル量子化され、それぞれ符号I_4-3
とI_4-4が出力される。このとき、Ｐ^〜のベクトル量子化
の符号選択時の距離尺度ｄの計算には、以下の（８）式
を用いる。[Equation 2] Calculate more. The scale factor s _p is the scale factor quantization unit
The scalar quantization in 16 and the normalized sequence p _i ^~ are vector-quantized in the normalized average power quantization unit 15, and the respective codes I _4-3
And I _4-4 are output. At this time, the following equation (8) is used to calculate the distance measure d when selecting a code for vector quantization of P 1 ^to P.

【００２３】ｄ＝‖Ｐ^〜−c_i‖² （８）ここで、Ｐ^〜は平均パワー系列の行列表現、c_iは評価す
る符号ベクトルである。この実施例に対応する復号器Ｂ
での平均パワー系列p_iの復号を図７に示す。それぞれI
_4-3とI_4-4はスケール係数復号部31と正規化平均パワー
復号化部30で復号され、乗算器32でp_i ^~とs_pとを乗じ、
平均パワー系列p_iを再計算する。D = ‖P ^to −c _i ‖ ² (8) Here, P ^to is a matrix representation of the average power sequence, and c _i is a code vector to be evaluated. Decoder B corresponding to this embodiment
FIG. 7 shows decoding of the average power sequence p _i in. Each I
_4-3 and I _4-4 are decoded by the scale factor decoding unit 31 and the normalized mean power decoding unit 30, multiplied by the p _i ^~ and s _p at the multiplier 32,
Recalculate the average power sequence p _i .

【００２４】実施例３図３の低域通過フィルタ演算部11の出力である低域線形
予測残差r_lを、時間収縮によって量子化演算量を低減
し、さらに計算された平均パワーを用いて量子化効率を
向上させた場合の無声区間量子化部Ｃを図４に示す。以
上の実施例と同様にして求めた低域線形予測残差r_lは、
時間収縮部17で、信号の間引きを行い時間方向の収縮が
行われ、得られた信号r_l ^~は正規化雑音符号帳18、距離
計算部19を用いて距離が最小となる雑音符号ベクトルに
対応する雑音符号を選択し符号ベクトルの符号を生成
し、ベクトル量子化が行われる。Embodiment 3 The low-pass linear prediction residual r _l , which is the output of the low-pass filter operation unit 11 of FIG. 3, is reduced in time by shrinking the quantization operation amount, and the calculated average power is used. FIG. 4 shows the unvoiced section quantizer C when the quantization efficiency is improved. The low-frequency linear prediction residual r _l obtained in the same manner as in the above example is
In the time contraction unit 17, signals are decimated and contracted in the time direction, and the obtained signal r _l ^~ is converted into a noise code vector that minimizes the distance by using the normalized noise codebook 18 and the distance calculation unit 19. The corresponding random code is selected, the code of the code vector is generated, and vector quantization is performed.

【００２５】音声信号ｓがサンプリング周期8000Hzで標
本化されているとすると、この低域線形予測残差r
_l(f_c:1000Hz〜500Hz)は、情報量が全帯域の1/8〜1/4し
かないため、サンプリング定理より３〜７サンプルを間
引いて１サンプルで代表させたとしても、元の低域線形
予測残差r_lを得ることができ、少ない演算量で低域線形
予測残差の量子化が可能となる。次に、量子化効率を上
げるために、平均パワー計算部９で求まったp_iをもと
に、サブフレーム毎にスケールw_iを計算し、サブフレー
ム毎に白色雑音にw_iを計算し、サブフレーム毎に白色雑
音にw_iを乗算器で乗ずることによって重みつける。If the speech signal s is sampled at a sampling frequency of 8000 Hz, this low-frequency linear prediction residual r
_{Since l} (f _c : 1000Hz to 500Hz) has only 1/8 to 1/4 of the total amount of information, even if 3 to 7 samples are thinned out from the sampling theorem and represented by 1 sample, the original low The region linear prediction residual r _l can be obtained, and the low region linear prediction residual can be quantized with a small amount of calculation. Next, in order to improve the quantization efficiency, the scale w _i is calculated for each subframe based on p _i obtained by the average power calculation unit 9, and the white noise w _i is calculated for each subframe. The white noise is weighted by multiplying w _i by a multiplier for each subframe.

【００２６】スケールw_i は以下の（９）式より求め
る。The scale w _i is obtained from the following equation (9).

【００２７】[0027]

【数３】低域予測残差信号r_lと符号帳ベクトルc(=c_i)との間の距
離計算には以下の（10）式を用いる。ｄ＝‖r_l ^~−wc_i‖² （10）ここで、r_l ^~、c_i、wはそれぞれ時間収縮された低域予測
残差、評価する符号ベクトル、スケールw_i系列の行列表
現である。そして、正規化雑音符号帳18から選択された
符号ベクトルの符号I_4-5が出力される。[Equation 3] The following equation (10) is used to calculate the distance between the low-frequency prediction residual signal r _l and the codebook vector c (= c _i ). d = ‖r _l ^~ −wc _i ‖ ² (10) where r _l ^~ , c _i , and w are the time-reduced low-frequency prediction residual, the code vector to be evaluated, and the matrix representation of the scale w _i sequence, respectively. is there. Then, the code I _{4-5 of the} code vector selected from the normalized noise codebook 18 is output.

【００２８】この実施例に対応する復号器Ｃを図８に示
す。符号ベクトルの符号I_4-5より復号された低域残差信
号は、時間伸張部34でそのサンプリング変換を行い、符
号化器で間引きされた中間点を計算する。サンプリング
変換は、以下の（11）式に基づき行われる。A decoder C corresponding to this embodiment is shown in FIG. The low band residual signal decoded by the code I _{4-5 of the} code vector is subjected to sampling conversion in the time expansion unit 34, and the decimated intermediate points are calculated in the encoder. Sampling conversion is performed based on the following equation (11).

【００２９】[0029]

【数４】ここで、x(nT_s)は元の信号で、x(t)は求めたい中間点、
T_sはサンプリング周期である。以上で求まった信号に、
平均パワー系列p_iと一致するように、利得調整の処理を
行い、無声低域駆動音源信号e_lを生成する。[Equation 4] Where x (nT _s ) is the original signal, x (t) is the desired midpoint,
T _s is the sampling period. In the signal obtained above,
Gain adjustment processing is performed so as to match the average power sequence p _i , and the unvoiced low-frequency drive sound source signal e _l is generated.

【００３０】[0030]

【発明の効果】以上説明したように、この発明の無声区
間音声符号化方法によれば、従来のボコーダで用いられ
ている単純に白色雑音で無声区間のパワー情報のみを量
子化する方法よりも量子化効率が向上する。また、問題
となっていた有声無声判別も、音声を低域と高域に分離
することにより、高域を白色雑音で駆動しても、低域成
分の波形をそのまま符号化すれば音声波形の周期性はあ
る程度保持される。したがって、有声無声判別に誤りが
生じても、著しく品質を劣化させることがなくなるとい
う利点がある。As described above, according to the unvoiced section speech coding method of the present invention, compared to the method of quantizing only the power information in the unvoiced section simply by white noise used in the conventional vocoder. Quantization efficiency is improved. In addition, voiced and unvoiced discrimination, which has been a problem, separates the voice into low and high frequencies, so that even if the high frequency is driven by white noise, if the waveform of the low frequency component is encoded as it is, The periodicity is maintained to some extent. Therefore, even if an error occurs in voiced / unvoiced discrimination, there is an advantage that the quality is not significantly deteriorated.

【００３１】本発明の音声符号化・復号化の効果を調べ
るために、以下の条件で分析合成音声実験を行った。入
力音声としては、０〜４kHz帯域の音声を標本化周波数
8.0kHzで標本化した後に、電話機の特性と対応するＩＲ
Ｓ特性フィルタを通したものを用いた。符号化器は実施
例２の構成のものを用いた。まず、この信号に、25ms
（200サンプル）毎に音声信号に分析窓長30msのハミン
グ窓を乗じ、分析次数を12次として自己相関法による線
形予測分析を行い、12個の予測計数を求める。予測係数
はＬＳＰパラメータのユークリッド距離を用いてベクト
ル量子化する。上記実施例のI₁およびI₂、I₃に７ビット
を割り当て、また、実施例３の態様においてはI_4-3,I
_4-4,I_4-6にそれぞれ７ビットを割り当て、いずれの態様
においても無声区間のビット数を21ビットとした。さら
にn_sfrには10を用いた。In order to investigate the effect of the speech coding / decoding of the present invention, an analysis-synthesis speech experiment was conducted under the following conditions. As input voice, the voice of 0-4kHz band is sampled frequency
IR corresponding to the characteristics of the telephone after sampling at 8.0kHz
What passed the S characteristic filter was used. The encoder having the configuration of the second embodiment was used. First, add 25ms to this signal
For every (200 samples), a speech signal is multiplied by a Hamming window with an analysis window length of 30 ms, and a linear prediction analysis is performed by the autocorrelation method with the analysis order of 12th order to obtain 12 prediction counts. The prediction coefficient is vector-quantized using the Euclidean distance of the LSP parameter. 7 bits are allocated to I ₁ and I ₂ and I ₃ in the above-mentioned embodiment, and I _4-3 , I in the aspect of the third embodiment.
7 bits are allocated to _4-4 and I _4-6 , and the number of bits in the unvoiced section is 21 bits in any of the modes. Furthermore, 10 was used for n _sfr .

【００３２】上記の条件で、単純に21bitの白色雑音情
報を用いて無声区間を符号化したものより、聴覚上の品
質向上が認められた。Under the above conditions, an improvement in auditory quality was recognized as compared with the case where the unvoiced section was simply encoded using 21-bit white noise information.

[Brief description of drawings]

【図１】符号化器の構成を示すブロック図。FIG. 1 is a block diagram showing the configuration of an encoder.

【図２】無声区間量子化部Ａ構成を示すブロック図。FIG. 2 is a block diagram showing the configuration of an unvoiced section quantization unit A.

【図３】無声区間量子化部Ｂの構成を示すブロック図。FIG. 3 is a block diagram showing the configuration of an unvoiced section quantization unit B.

【図４】無声区間量子化部Ｃの構成を示すブロック図。FIG. 4 is a block diagram showing the configuration of an unvoiced section quantization unit C.

【図５】復号化器の構成を示すブロック図。FIG. 5 is a block diagram showing the configuration of a decoder.

【図６】無声区間音源合成部Ａの構成を示すブロック
図。FIG. 6 is a block diagram showing a configuration of an unvoiced section sound source synthesis unit A.

【図７】無声区間音源合成部Ｂの構成を示すブロック
図。FIG. 7 is a block diagram showing a configuration of an unvoiced section sound source synthesis unit B.

【図８】無声区間音源合成部Ｃの構成を示すブロック
図。FIG. 8 is a block diagram showing a configuration of an unvoiced section sound source synthesis unit C.

[Explanation of symbols]

１線形予測係数計算部２線形予測係数量子化部３線形予測逆フィルタ４相関計算部５周期性判定部６有声区間量子化部７無声区間量子化部８マルチプレクサ９平均パワー計算部 10 平均パワー量子化部 11 低域通過フィルタ演算部 12 雑音符号帳 13,19 距離計算部 14 平均パワー系列正規化部 15 正規化平均パワー量子化部 16 スケール係数量子化部 17 時間収縮部 18 正規化雑音符号帳 20 線形予測係数復号部 21 有声区間音源合成部 22 無声区間音源合成部 23 線形予測合成フィルタ 24 平均パワー復号化部 25 白色雑音生成部 27 高域通過フィルタ演算部 28 雑音符号帳 29 デマルチプレクサ 31 スケール係数復号化部 34 時間伸張部 1 Linear prediction coefficient calculator 2 Linear prediction coefficient quantizer 3 Linear prediction inverse filter 4 Correlation calculator 5 Periodicity judgment section 6 Voiced section quantizer 7 Unvoiced section quantizer 8 multiplexer 9 Average power calculator 10 Average power quantizer 11 Low-pass filter calculation unit 12 noise codebook 13,19 Distance calculator 14 Average power sequence normalization unit 15 Normalized average power quantizer 16 Scale coefficient quantizer 17 hours contraction 18 Normalized random codebook 20 Linear prediction coefficient decoder 21 Voiced section sound source synthesizer 22 Voiceless section sound source synthesis section 23 Linear predictive synthesis filter 24 Average power decoding unit 25 White noise generator 27 High-pass filter calculator 28 Random codebook 29 Demultiplexer 31 Scale factor decoding unit 34 hours extension

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平10−232697（ＪＰ，Ａ) 特開平９−16198（ＪＰ，Ａ) 特開平５−150800（ＪＰ，Ａ) 特表2001−525079（ＪＰ，Ａ) 日和崎祐介，他，ピッチ波形に基づく２ｋｂｉｔ／ｓ音声符号化法，日本音響学会平成９年度春季研究発表会講演論文集，1997年３月17日，３−７−15, ｐ．287−288 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 11/06 G10L 19/08 G10L 19/12 ＪＩＣＳＴファイル（ＪＯＩＳ)─────────────────────────────────────────────────── --Continued from the front page (56) References JP-A-10-232697 (JP, A) JP-A-9-16198 (JP, A) JP-A-5-150800 (JP, A) Special Table 2001-525079 (JP, A) Yusuke Hiwasaki, et al., 2 kbit / s speech coding method based on pitch waveform, Proceedings of the 1997 Spring Research Presentation Meeting of the Acoustical Society of Japan, March 17, 1997, 3-7-15, p. ． 287-288 (58) Fields surveyed (Int.Cl. ⁷ , DB name) G10L 11/06 G10L 19/08 G10L 19/12 JISC file (JOIS)

Claims

(57) [Claims]

1. A residual signal obtained by driving a linear prediction synthesis filter using a filter coefficient based on the linear prediction analysis coefficient by linearly analyzing a speech signal for each frame to obtain a linear prediction analysis coefficient. A voice encoding method for determining a code quantized a feature amount, wherein as the feature amount, the periodicity of the residual signal is determined, and when the periodicity is lower than a predetermined threshold, the residual error The signal is band-divided into a low-frequency component and a high-frequency component, a noise code corresponding to a noise code vector having a minimum distance to the low-frequency component is selected, and the high-frequency component is a subframe that constitutes the frame. A speech coding method, characterized in that an average power is calculated for each.

2. The speech coding method according to claim 1, wherein the calculated average power is normalized, the normalized average power is calculated, and a scale coefficient thereof is calculated. .

3. The speech encoding method according to claim 1, wherein a low-frequency component is obtained by thinning out sampling points of the low-frequency component waveform of the residual signal, and a distance from the low-frequency component is A speech coding method characterized by selecting a random code corresponding to a minimum random code vector.

4. A speech decoding method for driving a linear predictive synthesis filter with an excitation source to decode a speech signal, wherein a noise code and an average power generated by the speech coding method according to claim 1 are obtained. The waveform of the low-pass component is decoded from the codebook based on the noise code, and the waveform of the high-pass component is the white noise that has passed through the high-pass filter. A speech decoding method, characterized in that each subframe is multiplied by a gain for synthesis, and waveforms of these two bands are added together to form an excitation source of a linear prediction synthesis filter.

5. A speech decoding method for driving a linear predictive synthesis filter with an excitation source to decode a speech signal, comprising: a normalized average power and scale generated by the speech encoding method according to claim 2. The coefficient and the noise code are input, the waveform of the low frequency component is decoded from the codebook based on the noise code, and the waveform of the high frequency component is synthesized by multiplying the normalized average power and the scale coefficient. A speech decoding method, characterized in that the waveforms of the two bands are added to form an excitation source of a linear prediction synthesis filter.

6. The speech decoding method according to claim 4, wherein low-frequency components are obtained by thinning out the sampling points of the low-frequency component waveform of the residual signal generated in the speech encoding method. A speech decoding method characterized by inputting a selected noise code corresponding to a noise code vector having a minimum distance to a low-frequency component, and recalculating sample points thinned out by sampling conversion based on the noise code. .