JP3531780B2

JP3531780B2 - Voice encoding method and decoding method

Info

Publication number: JP3531780B2
Application number: JP30520696A
Authority: JP
Inventors: 祐介日和▲崎▼; 一則間野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1996-11-15
Filing date: 1996-11-15
Publication date: 2004-05-31
Anticipated expiration: 2016-11-15
Also published as: JPH10143199A

Abstract

PROBLEM TO BE SOLVED: To reproduce high quality voices using a small bit rate. SOLUTION: Inputted voices are fed into a linear prediction inverse filter 18 and residue signals r(t) are obtained. Then, the signals r(t) are segmented (31) into one pitch period for every frame (25ms), the segmented signals r(t) are normalized for a constant length (32), the peak position values of the signals r(t) are made constant (33) and the normalized waveforms of the signals r(t) are quantized. During a docoding, a waveform is made between two normalized waveform vectors before and after the subject waveform by a linear interpolation and is reproduced.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、音声の信号系列
を少ない情報量でディジタル符号化する高能率音声符号
化方法。従来ボコーダと呼ばれる音声分析合成系の領域
である２．４ｋｂｉｔ／ｓ以下のビットレートで高品質
な音声符号化を実現する符号化方法及びその復号化方法
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a high-efficiency speech encoding method for digitally encoding a speech signal sequence with a small amount of information. The present invention relates to a coding method for realizing high-quality voice coding at a bit rate of 2.4 kbit / s or less, which is a domain of a voice analysis / synthesis system conventionally called a vocoder, and a decoding method thereof.

【０００２】[0002]

【従来の技術】この発明に関連する従来技術としては、
線形予測ボコーダ、コード励振予測符号化（ＣＥＬＰ：
ＣｏｄｅＥｘｃｉｔｅｄＬｉｎｅａｒＰｒｅｄｉ
ｃｔｉｏｎ）、混合領域符号化（Ｍｉｘｅｄ−ｄｏｍａ
ｉｎＣｏｄｉｎｇ）、代表波形補間符号化（Ｐｒｏｔ
ｏｔｙｐｅＷａｖｅｆｏｒｍＩｎｔｅｒｐｏｌａｔ
ｉｏｎ）がある。2. Description of the Related Art The related art relating to the present invention includes:
Linear prediction vocoder, code excitation prediction coding (CELP:
Code Excited Linear Predi
ction), mixed-domain coding (Mixed-doma)
in Coding), representative waveform interpolation coding (Prot)
otype Waveform Interpolat
ion).

【０００３】線形予測ボコーダは、４．８ｋｂｉｔ／ｓ
以下の低ビットレート領域における音声符号化方法とし
てこれまで広く用いられ、ＰＡＲＣＯＲ方式や、線スペ
クトル対（ＬＳＰ）方式などの方式がある。これらの方
法の詳細は、たとえば斎藤、中田著「音声情報処理の基
礎」（オーム社出版）に記載されている。線形予測ボコ
ーダは、音声のスペクトル包絡特性をあらわす全極型の
フィルタと、それを駆動する音源信号によって構成され
る。駆動音源信号には、有声音に対してはピッチ周期パ
ルス列、無声音に対しては白色雑音が用いられる。線形
予測ボコーダにおいて、周期パルス列や白色雑音による
駆動音源では音声波形の特徴を再現するには不十分なた
め、自然性の高い合成音声を得ることは困難である。A linear prediction vocoder has a capacity of 4.8 kbit / s.
The following low-bit-rate speech coding methods have been widely used, and include the PARCOR method and the line spectrum pair (LSP) method. Details of these methods are described, for example, in Saito and Nakata, "Basics of Speech Information Processing" (Ohmsha Publishing). The linear predictive vocoder is composed of an all-pole filter that represents the spectral envelope characteristic of speech, and a sound source signal that drives the filter. As the driving sound source signal, a pitch cycle pulse train is used for voiced sounds, and white noise is used for unvoiced sounds. In a linear predictive vocoder, it is difficult to obtain a synthesized speech with high naturalness because a sound source driven by a periodic pulse train or white noise is not enough to reproduce the characteristics of a speech waveform.

【０００４】一方、コード励振予測符号化では、雑音系
列を駆動音源として音声の近接相関とピッチ相関特性を
あらわす２つの全極型フィルタを駆動することにより音
声を合成する。雑音系列は複数個のコードパターンとし
てあらかじめ用意され、その中から、入力音声波形と合
成音声波形との誤差を最小にするコードパターンが選択
される。その詳細は、文献Ｓｃｈｒｏｅｄｅｒ “Ｃｏ
ｄｅ−ＥｘｃｉｔｅｄＬｉｎｅａｒＰｒｅｄｉｃｔｉ
ｏｎ（ＣＥＬＰ）：ＨｉｇｈＱｕａｌｉｔｙＳｐ
ｅｅｃｈａｔＶｅｒｙＬｏｗＢｉｔＲａｔｅ
ｓ”Ｐｒｏｃ．ＩＥＥＥ．ＩＣＡＳＳＰ，ｐｐ９３７−
９４０，１９８５に記載されている。コード励振予測符
号化では、再現精度はコードパターンの数に依存する関
係にある。したがって、多くの系列パターンを用意すれ
ば音声波形の再現精度が高まり、それにともなって品質
を高めることが出来る。しかし、音声符号化のビットレ
ートを４ｋｂｉｔ／ｓ以下にすると、コードパターンの
数が制限され、その結果十分な音声品質が得られなくな
る。良好な音声品質を得るには４．８ｋｂｉｔ／ｓ程度
の情報量が必要であるとされている。[0004] On the other hand, in the code excitation predictive coding, speech is synthesized by driving two all-pole filters representing the close correlation and pitch correlation characteristics of the speech using a noise sequence as a driving sound source. The noise sequence is prepared in advance as a plurality of code patterns, and a code pattern that minimizes an error between the input speech waveform and the synthesized speech waveform is selected from among them. For details, see Schroeder “Co.
de-ExcitedLinear Predicti
on (CELP): High Quality Sp
tech at Very Low Bit Rate
s "Proc. IEEE. ICASPSP, pp937-
940, 1985. In the code excitation predictive coding, the reproduction accuracy depends on the number of code patterns. Therefore, if a large number of series patterns are prepared, the reproduction accuracy of the audio waveform is improved, and accordingly, the quality can be improved. However, if the bit rate of audio coding is set to 4 kbit / s or less, the number of code patterns is limited, and as a result, sufficient audio quality cannot be obtained. It is said that an amount of information of about 4.8 kbit / s is required to obtain good voice quality.

【０００５】また、混合領域符号化（Ｍｉｘｅｄ−ｄｏ
ｍａｉｎＣｏｄｉｎｇ）では、有声音でフレーム毎に
残差波形よりピッチ周期分の波形が抽出され、前のピッ
チ周期分の波形との差分が時間領域で量子化される。復
号器では周波数領域でこれらの波形の線形補間を行うこ
とによって音源信号を生成し、全極フィルタを駆動して
音声を合成する。無声音ではコード励振予測符号化と同
様な方法で符号化を行う。この方式の詳細は、文献Ｄｅ
Ｍａｒｔｉｎ等“Ｍｉｘｅｄ−ｄｏｍａｉｎＣｏｄｉ
ｎｇｏｆＳｐｅｅｃｈａｔ３ｋｂ／ｓ”Ｐｒｏ
ｃ．ＩＥＥＥ．ＩＣＡＳＳＰ，ＰＰＩＩ／２１６−１７
０，１９９６に記載されている。この方法の特徴として
は、差分を求める際に、前ピッチ周期波形は、現在のフ
レームの波形に長さが正規化されることが挙げられる。
この差分の量子化には、パルス符号帳と雑音符号帳を用
いるが、３．５ｋｂｉｔ／ｓ程度の情報量が必要とされ
ている。[0005] Also, mixed-domain coding (Mixed-do
In main coding, a pitch-period waveform is extracted from a residual waveform for each frame of a voiced sound, and a difference from a previous pitch-period waveform is quantized in a time domain. The decoder generates a sound source signal by performing linear interpolation of these waveforms in the frequency domain, and drives the all-pole filter to synthesize speech. For unvoiced sound, encoding is performed in the same manner as in code excitation predictive encoding. For details of this method, see Reference De.
Martin et al. “Mixed-domain Codi
ng of Speech at 3kb / s "Pro
c. IEEE. ICASP, PPII / 216-17
0, 1996. A feature of this method is that when calculating the difference, the previous pitch period waveform is normalized in length to the waveform of the current frame.
A pulse codebook and a noise codebook are used for quantization of the difference, but an information amount of about 3.5 kbit / s is required.

【０００６】また、代表波形補間符号化（Ｐｒｏｔｏｔ
ｙｐｅＷａｖｅｆｏｒｍＩｎｔｅｒｐｏｌａｔｉｏ
ｎ）では、プロトタイプ波形（ＰｒｏｔｏｔｙｐｅＷ
ａｖｅｆｏｒｍ）の線形補間を行って合成した音源信号
で全極フィルタを駆動することにより音声を合成する。
この詳細は、文献ＫｌｅｉｊｎＷ．Ｂ．“Ｅｎｃｏｄ
ｉｎｇＳｐｅｅｃｈＵｓｉｎｇＰｒｏｔｏｔｙｐ
ｅＷａｖｅｆｏｒｍｓ”ＩＥＥＥＴｒａｎｓ．ｏｎ
ＳｐｅｅｃｈＡｕｄｉｏＰｒｏｃｅｓｓｉｎｇ，
Ｖｏｌ．１，ｐｐ３８６−３９９１９９３に記載され
ている。プロトタイプ波形は、一定周期で残差波形より
抽出され、フーリエ変換された後に符号化される。この
方式では良好な品質を得るには３．４ｋｂｉｔ／ｓ程度
の情報量が必要であるとされている。In addition, a representative waveform interpolation coding (Protot)
ype Waveform Interpolatio
n), the prototype waveform (Prototype W)
A sound is synthesized by driving an all-pole filter with a sound source signal synthesized by performing linear interpolation (aveform).
Details of this can be found in the document Kleijn W. B. “Encode
ing Speech Using Prototyping
e Waveforms "IEEE Trans.on
Speech Audio Processing,
Vol. 1, pp 386-399 1993. The prototype waveform is extracted from the residual waveform at a constant cycle, and is encoded after being subjected to Fourier transform. According to this method, an amount of information of about 3.4 kbit / s is required to obtain good quality.

【０００７】[0007]

【発明が解決しようとする課題】この発明の課題は、雑
音系列やピッチパルス列を駆動信号として用いる線形予
測符号化方法において、電話音声などのように入力信号
の周波数帯域が制限されている場合に、より能率的な符
号化を実現する方法と、その復号化方法を実現すること
である。SUMMARY OF THE INVENTION An object of the present invention is to provide a linear predictive encoding method using a noise sequence or a pitch pulse train as a drive signal when the frequency band of an input signal is restricted, such as telephone speech. , A method for realizing more efficient encoding and a method for decoding the same.

【０００８】[0008]

【課題を解決するための手段】この発明による符号化方
法は入力音声のピッチ周期を推定し、駆動音源信号の周
期的な部分で、推定されたピッチ周期分の波形を抽出
し、波形の長さを正規化したものとの波形歪みが最小に
なるように駆動信号を決定することを特徴とする。ここ
で、入力ピッチ周期波形を固定長の符号ベクトルと長さ
が一致するよう正規化し、合成フィルタのインパルス応
答を同様に正規化したものを畳み込むことによって符号
を決定することが従来法と異なる特徴である。また、音
声を合成する際には、前後の駆動音源を補間したものを
ピッチ周期長に戻してつなげる。A coding method according to the present invention estimates a pitch period of an input voice, extracts a waveform corresponding to the estimated pitch period from a periodic portion of a driving sound source signal, and calculates a length of the waveform. It is characterized in that the drive signal is determined such that the waveform distortion with the normalized value is minimized. Here, a feature different from the conventional method is that the input pitch period waveform is normalized so that the length matches the fixed-length code vector and the impulse response of the synthesis filter is similarly convoluted to determine the code. It is. Further, when synthesizing the voice, an interpolated one of the preceding and following driving sound sources is returned to the pitch cycle length and connected.

【０００９】[0009]

【発明の実施の形態】実施例１図１にこの発明の符号化方法を適用した符号化部の機能
構成を示す。この符号化部は、以下の手順をＮサンプル
数の長さをもつフレームごとに１回行う。フレームｉに
おいて、入力端子１１よりの入力音声信号ｓ（ｔ）のｐ
次の線形予測係数（ＬＰＣ）ａ_j（ｊ＝０，１，…，ｐ
−１）をＬＰＣ計算部１２で計算する。この線形予測係
数はＬＰＣ量子化部１３で量子化され、線形予測係数符
号Ｉ₁として送出される。線形予測係数の量子化の詳細
については「音声の線形予測パラメータ符号化方法」
（特願平３−１８０８１９）に記載されている。ＬＰＣ
計算部１２で得られた線形予測係数に基づいて、線形予
測逆フィルタ１４のフィルタ係数を定め、この逆フィル
タ１４に入力音声信号ｓ（ｔ）を通して残差信号ｒ
（ｔ）を計算する。逆フィルタ１４は次の伝達特性を持
つディジタルフィルタで実現される。DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiment 1 FIG. 1 shows a functional configuration of an encoding unit to which an encoding method according to the present invention is applied. This encoding unit performs the following procedure once for each frame having a length of N samples. In frame i, p of input audio signal s (t) from input terminal 11
The next linear prediction coefficient (LPC) a _j (j = 0, 1,..., P
-1) is calculated by the LPC calculation unit 12. The linear prediction coefficients are quantized by LPC quantization section 13, it is transmitted as linear prediction coefficient code I _1. For details of quantization of linear prediction coefficients, see "Speech Linear Prediction Parameter Coding Method".
(Japanese Patent Application No. 3-180819). LPC
Based on the linear prediction coefficient obtained by the calculation unit 12, the filter coefficient of the linear prediction inverse filter 14 is determined, and the residual signal r is input to the inverse filter 14 through the input audio signal s (t).
Calculate (t). The inverse filter 14 is realized by a digital filter having the following transfer characteristics.

【００１０】Ａ（ｚ）^-1＝１＋ａ₁ｚ^-1＋…＋ａ_pｚ^-p （１）ここで得られた残差信号の相関（変形相関関数）ρを相
関計算部１５で計算し、その相関ρの最大値ρ_maxの遅
れ（間隔）をピッチ周期抽出部１６で推定ピッチ周期ｐ
_iとする。このとき、周期性判定部１７で入力音声信号
ｓ（ｔ）が有声部であるか無声部であるかを、例えば以
下の様にしきい値θ（０．５〜１．０）で判別する。A (z) ⁻¹ = 1 + a ₁ z ⁻¹ +... + _Ap z ^-p (1) The correlation (deformed correlation function) ρ of the residual signal obtained here is calculated by the correlation calculator 15, The delay (interval) of the maximum value ρ _max of the correlation ρ is calculated by the pitch period extracting unit 16 as the estimated pitch period p.
_i . At this time, the periodicity determination unit 17 determines whether the input audio signal s (t) is a voiced part or a non-voiced part, for example, using a threshold value θ (0.5 to 1.0) as follows.

【００１１】ｋ₁／２＋ρ_max＞θ；有声部ｋ₁／２＋ρ_max＜θ；無声部（２）ここで、ｋ₁はＬＰＣ計算部１２で求まる第１次の偏自
己相関（ＰＡＲＣＯＲ）係数である。ＬＰＣ量子化部１
３よりの線形予測係数符号Ｉ₁は復号され、その逆量子
化された線形予測係数に基づいて、線形予測逆フィルタ
１８のフィルタ係数を定め、この逆フィルタ１８に入力
音声信号ｓ（ｔ）を通して残差信号ｒ′（ｔ）を得る。
判定部１７が無声部と判断すると無声部量子化部１９で
図２Ａに示すように量子化を行う。この量子化部１９で
は、フレームをＳ分割し、Ｎ_sub（＝Ｎ／Ｓ）サンプル
数をサブフレームとし、そのサブフレーム中の逆フィル
タ１８より求めた残差波形ｒ′（ｔ）の平均パワーをパ
ワー計算部２１で計算しその１フレーム分をベクトル量
子化部２２でベクトル量子化して無声部符号Ｉ₂として
出力する。この無声部の量子化は、図２Ｂに示すような
構成で行ってもよい。即ちＬＰＣ量子化部１３よりの量
子化線形予測係数により線形予測合成フィルタ２３，２
４のフィルタ係数を設定し、逆フィルタ１８よりの残差
信号ｒ′（ｔ）を合成フィルタ２３で入力音声信号ｓ
（ｔ）を再生し、一方雑音符号帳２５より選択した雑音
符号を利得部２６で利得符号帳２７より選択した利得を
与え、その利得が与えられた雑音符号を合成フィルタ２
４で音声合成し、この合成無声と、合成フィルタ２３よ
りの合成音声との差を引算部２８でとり、その差（誤
差）の二乗が最小となるように歪み計算部２９により雑
音符号帳２５の雑音符号の選択と、利得符号帳２７の利
得選択を行う。この時の雑音符号帳２５の雑音符号およ
び利得符号帳２７の利得を無声部符号Ｉ₂とする。[0011] _{_{k 1/2 + ρ max>}} θ; voiced portion _{_{k 1/2 + ρ max <}} θ; unvoiced portion (2) where, k ₁ is the first-order partial autocorrelation (PARCOR) coefficients obtained by the LPC calculation section 12 is there. LPC quantization unit 1
Linear prediction coefficient code I ₁ than 3 is decoded, on the basis of the linear prediction coefficients the inverse quantization, determines the filter coefficients of the linear prediction inverse filter 18, through an input audio signal s (t) to the inverse filter 18 A residual signal r '(t) is obtained.
When the determination section 17 determines that the voice section is unvoiced, the voiceless section quantization section 19 performs quantization as shown in FIG. 2A. In the quantization unit 19, the frame is divided into S, the number of N _sub (= N / S) samples is set as a subframe, and the average power of the residual waveform r ′ (t) obtained by the inverse filter 18 in the subframe. Is calculated by the power calculator 21, and one frame thereof is vector-quantized by the vector quantizer 22 and output as the unvoiced code I ₂ . The quantization of the unvoiced part may be performed by a configuration as shown in FIG. 2B. In other words, the linear prediction synthesis filters 23 and 2 are calculated based on the quantized linear prediction coefficients from the LPC quantization unit 13.
4 is set, and the residual signal r ′ (t) from the inverse filter 18 is applied to the input sound signal s by the synthesis filter 23.
(T) is reproduced, and on the other hand, the noise code selected from the noise codebook 25 is given the gain selected from the gain codebook 27 by the gain section 26, and the noise code given the gain is combined with the synthesis filter 2.
4, a difference between the synthesized unvoiced voice and the synthesized voice from the synthesis filter 23 is calculated by a subtraction unit 28, and a noise codebook is calculated by a distortion calculation unit 29 so that the square of the difference (error) is minimized. 25, and the gain codebook 27 is selected. The noise code and gain of the gain codebook 27 of the noise codebook 25 at this time is unvoiced portion numerals I _2.

【００１２】周期性判定部１７が有声部と判断した場合
は残差切りだし部３１により推定されたピッチ周期ｐ_i
を用いて、逆フィルタ１８からの残差信号ｒ′（ｔ）に
おけるフレームの中央付近からｐ_iの長さの波形を切り
出し、これを伸縮することによってベクトル長ｎに長さ
を正規化する。この正規化はサンプリング変換部３２に
より、サンプリング変換を、式（３）の標本化関数に基
づいて行う。ｘ（ｔ_i) ＝Σ_n=-q ^qｘ（ｎＴ）・ｓｉｎ（π／Ｔ）（ｔ_i−ｎＴ）／｛（ｎ／Ｔ）（ｔ_i−ｎＴ）｝（３）ここで、Ｔはサンプリング周期、ｑは論理上無限大であ
るが有限数で打切った値である。When the periodicity determining unit 17 determines that the voice is a voiced part, the pitch period p _i estimated by the residual extracting unit 31 is used.
Using, cut the length of the waveform p _i from the residual signal r '(t) frame near the center in from the inverse filter 18, to normalize the length to the vector length n by stretching it. This normalization is performed by the sampling conversion unit 32 based on the sampling function of Expression (3). x (t _i ) = _{{n = −q} ^q x (nT) · sin (π / T) (t _i −nT) / {(n / T) (t _i −nT)} (3) where T Is a sampling period, and q is a value that is theoretically infinite but is truncated by a finite number.

【００１３】次に、この正規化された残差波形とパルス
信号と相関が大きくなるまで、正規化残差波形を整列部
３３で回転する。ここで、推定ピッチ周期分の長さを正
規化してｎの長さにされ、かつ回転により位相も正規化
された残差波形ｒ_npをＮＰＷ（正規化ピッチ周期波形）
と呼ぶ。推定ピッチ周期ｐ_iはピッチ周期量子化部３４
で四捨五入によって整数値に量子化され、ピッチ周期符
号Ｉ₃として出力される。Next, the normalized residual waveform is rotated by the alignment unit 33 until the correlation between the normalized residual waveform and the pulse signal becomes large. Here, the length of the estimated pitch period is normalized to have a length of n, and the phase of the residual waveform _rnp whose rotation is also normalized is converted to an NPW (normalized pitch period waveform).
Call. Estimated pitch period p _i is the pitch period quantizer 34
In quantized to integer values by rounding, is output as a pitch period codes I _3.

【００１４】整列部３３よりのＮＰＷはＮＰＷ量子化部
３５でベクトル量子化される。ＮＰＷ量子化部３５は例
えば図３Ａに示すように、図１中のＬＰＣ量子化部１３
よりの量子化された線形予測係数によりフィルタ係数が
定められた線形予測合成フィルタ３７にインパルス信号
が通されて、インパルス応答ｈ_jが求められ、そのイン
パルス応答ｈ_jはサンプリング変換部３８でｎの長さに
正規化され、この正規化されたインパルス応答ｈ′_jに
もとづくインパルス応答行列Ｈが、図１中の整列部３
３よりのＮＰＷに畳み込みフィルタ３９で畳み込まれて
音声信号ｘが合成される。一方、ＮＰＷ符号帳４１から
選択された符号ベクトルｃ₀ ⁱに対し、利得部４２で
利得符号帳４３より取出された利得ｇ_o ^kが与えられ、
これに対し、畳み込みフィルタ４４でインパルス応答行
列Ｈが畳み込まれて、音声合成され、この合成音声の
再生音声に対する誤差が引算部４５でとられ、その誤差
の二乗が最小になるように、ＮＰＷ符号帳４１の符号ベ
クトルｃ₀ ⁱの選択と、利得符号帳４３の利得ｇ_o ^k
の選択とが歪み計算部４６で行われる。The NPW from the alignment unit 33 is vector-quantized by an NPW quantization unit 35. The NPW quantization unit 35 is, for example, as shown in FIG. 3A, the LPC quantization unit 13 in FIG.
A linear prediction synthesis filter 37 the filter coefficients are determined by linear prediction coefficients quantized more is passed through the impulse signal, the impulse response h _j is obtained, the impulse response h _j are the n sampling conversion section 38 The impulse response matrix H normalized to the length and based on the normalized impulse response h ′ _j is represented by the alignment unit 3 in FIG.
The NPW from No. 3 is convolved by the convolution filter 39 to synthesize the audio signal x. On the other hand, with respect to the code vector c ₀ ⁱ selected from NPW codebook 41, gain g _o ^k taken out from the gain codebook 43 by a gain unit 42 is provided,
On the other hand, the impulse response matrix H is convolved by the convolution filter 44 and synthesized into speech. An error of the synthesized speech with respect to the reproduced speech is obtained by the subtraction unit 45, and the square of the error is minimized. Selection of code vector c ₀ ⁱ of NPW codebook 41 and gain g _o ^k of gain codebook 43
Is selected by the distortion calculator 46.

【００１５】なお、ＮＰＷ符号帳４１の各符号ベクトル
の長さはｎであり、ピークの位相は均一とされてある。
図１中のＮＰＷ整列部３３で用いたパルス信号は周期が
ｎであり、位相は、符号帳４１の符号ベクトルのピーク
の位相と一致させてある。図３Ａで説明したようにＮＰ
Ｗ符号は、符号ベクトルを駆動音源として合成した波形
と、ＮＰＷ波形を駆動音源として合成した波形との聴覚
重み付け平均二乗誤差が最小になるように決定される。
この距離の歪み尺度Ｄの距離計算には以下の式（４）を
用いる。The length of each code vector in the NPW codebook 41 is n, and the peak phases are uniform.
The cycle of the pulse signal used in the NPW alignment unit 33 in FIG. 1 is n, and the phase is matched with the phase of the peak of the code vector of the codebook 41. As described with reference to FIG.
The W code is determined such that the auditory weighted mean square error between the waveform synthesized with the code vector as the driving sound source and the waveform synthesized with the NPW waveform as the driving sound source is minimized.
The following equation (4) is used for the distance calculation of the distortion scale D of the distance.

【００１６】Ｄ＝‖ｘ−ｇ_o ^kＨｃ₀ ⁱ‖² （４）ここで、ｘはターゲット（ＮＰＷ波形を駆動音源とし
て合成した波形）、Ｈは量子化された線形予測係数
ａ′_jを用いた合成フィルタ３７のインパルス応答を正
規化したものをあらわす行列、ｃ_oは符号ベクトル、
ｇ_oは符号ベクトルの利得をあらわす。[0016] _{^{D = ‖x-g o k Hc}} 0 i ‖ ² (4) wherein, x represents a target (synthesized waveform NPW waveform as the driving source), H is the linear prediction coefficients a _'j quantized A matrix representing the normalized impulse response of the used synthesis filter 37, c _o is a code vector,
g _o represents the gain of the code vector.

【００１７】ターゲットｘは以下の式（５）を用いて
フィルタ３９で畳み込み演算によりあらかじめ計算す
る。ｘ＝Ｈｒ_np （５）ここで、ｒ_npは量子前の原ＮＰＷ波形をベクトル表示
にしたものである。従来のＣＥＬＰ符号化では、Ｈに
は通常下三角の（ｎ×ｎ）の正方行列を用いるが、ＮＰ
Ｗ波形を合成フィルタに通して得られる自由応答分を求
めるために、下側に（ｍ−ｎ）行分拡張した（ｍ×ｎ）
の非正方行列を用いる。ここで、ｍ＞ｎである。Ｈに
は、聴覚重み付けを行った線形予測フィルタのインパル
ス応答ｈ_j（ｊ＝０，１，…，ｐ_i−１）をサンプリン
グ変換部３８でサンプリング変換してｒ_npと同様に正
規化してｈ′_j（ｊ＝０，１，…，ｎ−１）にしたもの
を用いる。The target x is calculated in advance by a convolution operation in the filter 39 using the following equation (5). x = _Hrnp (5) Here, _rnp is a vector representation of the original NPW waveform before the quantum. In the conventional CELP coding, a lower triangular (n × n) square matrix is usually used for H.
In order to obtain the free response obtained by passing the W waveform through the synthesis filter, the (m × n) rows are extended to the lower side by (mn) rows.
Is used. Here, m > n. For H, the impulse response h _j (j = 0, 1,..., P _i −1) of the linear prediction filter subjected to auditory weighting is sampled and converted by the sampling converter 38 and normalized in the same manner as r _np to h ′ _J (j = 0, 1,..., N−1) is used.

【００１８】[0018]

【数１】このとき、ｈ_j（ｊ＝０，１，…，ｐ_i−１）の計算に
用いる線形予測合成フィルタ３７は、以下の伝達特性を
もつディジタルフィルタで実現される。Ａ（ｚ）＝１／（１＋ａ₁ｚ^-1＋…＋ａ_pｚ^-p）（７）聴覚重み付けの伝達特性は、次のように表される。(Equation 1) At this time, the linear prediction synthesis filter 37 used for calculating h _j (j = 0, 1,..., P _i -1) is realized by a digital filter having the following transfer characteristics. A (z) = 1 / (1 + a ₁ z ^-1 +... + _Ap z ^-p ) (7) The transfer characteristic of the auditory weight is expressed as follows.

【００１９】Ｗ（ｚ）＝Ａ（γ₁ｚ）／Ａ（γ₂ｚ）（８）ここで、γ₁とγ₂は聴覚重み付けの程度を制御するパ
ラメータであり、０＜γ ₂＜γ ₁＜１の値を取る。図３
Ａ中の畳み込みフィルタ３９、４４での畳み込み演算に
用いる行列Ｈは正規化インパルス応答ｈ_jより先に述
べた拡張されたｍ×ｎの行列を作って用いる。このよう
にＨが拡張されているため、式（５）の演算で得られ
るターゲットｘもW (z) = A (γ ₁ z) / A (γ ₂ z) (8) Here, γ ₁ and γ ₂ are parameters for controlling the degree of auditory weighting, and 0 < γ ₂ < γ Take a value of ₁ < 1. FIG.
The matrix H used for the convolution operation in the convolution filters 39 and 44 in A is formed by using the above-described extended m × n matrix from the normalized impulse response h _j . Since H is extended in this manner, the target x obtained by the operation of Expression (5) is also

【００２０】[0020]

【数２】と次数はｍとなる。ここで、ｘ_i（ｎ＜ｉ＜ｍ−１）は
線形予測フィルタ３７の自由応答に対応する成分で、合
成フィルタ３７の零入力初期値応答である。ＮＰＷ符号
ｃ₀の選択では、符号ベクトル帳４１の中から式
（４）が最小となるように、符号ベクトルｃ₀ ⁱを選
択し、その理想利得ｇ₀ ⁱを計算する。まず、式（１
０）のＤ′₀値が最大となる符号ベクトルｃ₀ ⁱを閉
ループで選択する。(Equation 2) And the order is m. _{Here, x i (n <i <} m-1) is a component corresponding to the free response of a linear prediction filter 37, a quiescent initial value response of the synthesis filter 37. In the selection of the NPW code c ₀ , the code vector c ₀ ⁱ is selected from the code vector book 41 so that the equation (4) is minimized, and the ideal gain g ₀ ⁱ is calculated. First, equation (1)
The code vector c ₀ ⁱ having the maximum value of D ′ _{0 of} (0) is selected in a closed loop.

【００２１】Ｄ′₀＝（ｘ^TＨｃ₀ ⁱ）²／‖Ｈｃ₀ ⁱ‖² （10) 選択された符号ベクトルｃ₀ ⁱの理想利得ｇ₀ ⁱの計
算は、式（１１）式を用いて行う。ｇ₀ ⁱ＝ｘ^TＨｃ₀ ⁱ／‖Ｈｃ₀ ⁱ‖² （11) 次に、利得ｇ₀ ⁱをスカラー量子化する。以上の手続き
で、符号ベクトルの選択は終了しているため、（４）式
が最小となるようなｇ₀ ^kを選択する。これら選択した
符号ベクトルのコード、選択した利得のコードをＮＰＷ
符号Ｉ₄として出力し、更に、周期判定部１７よりその
フレームが有声部か無声部かを示す周期性符号Ｉ₅を出
力する。符号Ｉ₁〜Ｉ₄がマルチプレクサ４７でまとめ
られ、伝送路又は蓄積部へ出力される。D ′ ₀ = (x ^T Hc ₀ ⁱ ) ² / ‖Hc ₀ ⁱ ‖ ² (10) The calculation of the ideal gain g ₀ ⁱ of the selected code vector c ₀ ⁱ uses the equation (11). Do it. g ₀ ⁱ = x ^T Hc ₀ ⁱ / ‖Hc ₀ ⁱ ‖ ² (11) Next, the gain g ₀ ⁱ is scalar-quantized. Since the selection of the code vector has been completed by the above procedure, g ₀ ^k that minimizes the expression (4) is selected. The code of the selected code vector and the code of the selected gain are NPW
The frame is output as a code I ₄ , and the periodic determination unit 17 outputs a periodic code I ₅ indicating whether the frame is a voiced part or an unvoiced part. Symbols I _{1 to} I ₄ are combined by a multiplexer 47 and output to a transmission path or a storage unit.

【００２２】以上のように１フレームは例えば２５ミリ
秒とされ、そのうちから１ピッチ周期分の残差波形（信
号）が取出され、つまり１フレーム中の例えば数分の１
の部分しか取出されていない。一方合成フィルタ３７は
入力を零として駆動しても、その直前の状態に応じた出
力、いわゆる零入力応答が生じる。そのためＣＥＬＰ符
号化においては、零入力応答を入力波形から差し引いた
ものをターゲットとしている。しかしこの発明では１フ
レーム中の一部のみを用いて符号化するため、合成フィ
ルタ３７のインパルス応答行列をＣＥＬＰ符号化よりも
零応答に対応する分拡張して、１ピッチ周期分の波形を
零入力応答（自由応答）を含めて、これに近い符号ベク
トルの選択を行っている。以上のように波形情報につい
ては１フレーム中の１ピッチ周期分しか符号化していな
いから、それだけ少ないビット数で済み、かつその際に
ピッチ周期を正規化し、またピーク位置を正規化（一定
位相）としているため、この点においても符号化ビット
数を少なくすることができる。As described above, one frame is set to, for example, 25 milliseconds, from which a residual waveform (signal) for one pitch period is extracted, that is, for example, a fraction of one frame.
Only the part has been removed. On the other hand, even when the synthesis filter 37 is driven with the input set to zero, an output corresponding to the state immediately before that, that is, a so-called zero input response occurs. Therefore, in CELP encoding, a target obtained by subtracting a zero input response from an input waveform is targeted. However, according to the present invention, since encoding is performed using only a part of one frame, the impulse response matrix of the synthesis filter 37 is extended by an amount corresponding to the zero response as compared with the CELP encoding, and the waveform for one pitch period is reduced to zero. A code vector close to this is selected, including the input response (free response). As described above, since the waveform information is encoded only for one pitch period in one frame, the number of bits is required to be smaller, the pitch period is normalized at that time, and the peak position is normalized (constant phase). Therefore, the number of coding bits can be reduced in this respect as well.

【００２３】次に図１に示した符号化方法の実施例と対
応した、この発明の復号化方法の実施例を適用した復号
器を図４に示す。ここでは入力端子５１に入力された符
号Ｉ ₁〜Ｉ₅はデマルチプレクサ５２で全ての音声パラ
メータが分離復号された後、有声・無声パラメータ
Ｉ₂，Ｉ₄によって駆動音源を生成する。周期性符号Ｉ
₅が無声部の場合は、無声部符号Ｉ₂を無声部復号部５
３で駆動音源信号を再生する。即ち例えば図５Ａに示す
ように、白色雑音生成部５４よりの白色雑音に、無声部
符号Ｉ₂の復号パワー符号を利得計算部５５で処理して
無声部の合成残差波形を生成する。つまりＮサンプルの
白色雑音を生成し、各々のサブフレーム（Ｎ _su長）中の
平均パワーを、復号された対応するサブフレームの平均
パワーと一致するように利得を計算して乗じたものを駆
動信号とする。Next, an embodiment of the encoding method shown in FIG.
Decoding to which the embodiment of the decoding method of the present invention is applied
The vessel is shown in FIG. Here, the mark input to the input terminal 51
Issue I ₁~ I_FiveIs a demultiplexer 52 for all audio parameters.
After the meter is separated and decoded, the voiced and unvoiced parameters
I_Two, I_FourTo generate a driving sound source. Periodic code I
_FiveIs a voiceless part, the voiceless part code I_TwoTo the unvoiced part decoding unit 5
In step 3, the driving sound source signal is reproduced. That is, for example, as shown in FIG.
As described above, the white noise from the white noise
Sign I_TwoIs processed by the gain calculator 55.
Generate a synthetic residual waveform of the unvoiced part. In other words, N samples
A white noise is generated and each subframe (N _suLong) in
Average power is the average of the corresponding decoded subframe
Calculate and multiply the gain to match the power and drive
A motion signal.

【００２４】周期性符号Ｉ₅が有声部を示す場合は図４
においてＮＰＷ符号Ｉ₄によりＮＰＷ復号部５６で式
（１２）に示すように、符号ベクトルｃ²に利得ｇⁱ
を乗じて、ＮＰＷ波形ｒⁱを復号する。図に示してい
ないが、図３Ａ中のＮＰＷ符号帳４１及び利得符号帳４
３と同一のものを備えている。ｒⁱ＝ｇ₀ ⁱｃ₀ ⁱ （12) 次に、この復号ＮＰＷ波形ｒⁱと前ＮＰＷバッファ５
７の内容ｒ^i-1との間の線形補間を線形補間部５８で
行い、中間のＮＰＷ波形ｒⁱⁿを得る。この線形補間に
は、例えば式（１３）を用いる。FIG. 4 shows a case where the periodic code I ₅ indicates a voiced part.
In the NPW code I ₄ , the NPW decoding unit 56 adds the gain g ⁱ to the code vector c ² as shown in Expression (12).
Multiplied by, decoding the NPW waveform r ^i. Although not shown, the NPW codebook 41 and the gain codebook 4 in FIG.
3 is provided. r ⁱ = g ₀ ⁱ c ₀ ⁱ (12) Next, the decoded NPW waveform r ⁱ and the previous NPW buffer 5
Performs linear interpolation between the contents r ^i-1 of 7 by linear interpolation section 58, to obtain an intermediate of the NPW waveform r ^in. For this linear interpolation, for example, equation (13) is used.

【００２５】Ｓⁱⁿ（ｊ）＝（１−α）Ｓ^i-1(j)＋αＳⁱ(j) （ｊ＝０，１，…，ｎ− １；０＜α＜１）（13) ここで、αは、波形がＮサンプル長のフレーム中のどの
位置にあるかを表す値で、Ｓ^i-1はＳⁱのひとつ前
のベクトルで、Ｓⁱⁿは補間されて出来たベクトルをあ
らわす。つまり、符号化側では残差波形は各フレーム中
の１ピッチ周期分しか切出されていない。従って、現フ
レームで切出された波形と、前フレームで切出された波
形との間には本来は、１ピッチ周期乃至数ピッチ周期分
程度の波形が存在する。この本来は存在すべき波形を前
フレームの復号ＮＰＷ波形ｒ^i- ¹と現フレームの復号
ＮＰＷ波形ｒⁱとで線形補間する。この補間される波
形が、前フレームの切出された波形と現フレームの切出
された波形との間に補間されるべき波形の何番目かに応
じてαが決定される。ピッチ周期符号Ｉ₃はピッチ復号
部５９で復号され、その復号ピッチ周期とフレーム長と
から補間する波形数が決められる。S ⁱⁿ (j) = (1−α) S ⁱ⁻¹ (j) + αS ⁱ (j) (j = 0, 1,..., N−1; 0 < α < 1) (13) , Α are values indicating the position of the waveform in the frame having the length of N samples, S ^i-1 is a vector immediately before S ⁱ , and S ⁱⁿ is a vector obtained by interpolation. That is, on the encoding side, the residual waveform is cut out only for one pitch period in each frame. Therefore, between the waveform cut out in the current frame and the waveform cut out in the previous frame, there is originally a waveform of about one pitch period to several pitch periods. The originally linearly interpolating at should be present waveform and decoded NPW waveform r ^{^i-1} of the previous frame and decoding NPW waveform r ⁱ of the current frame. Α is determined according to the order of the waveform to be interpolated between the extracted waveform of the previous frame and the extracted waveform of the current frame. The pitch period codes I ₃ is decoded by a pitch decoder 59, the number of waveform is determined to be interpolated from its decoded pitch period and the frame length.

【００２６】また復号ピッチ周期と前ピッチバッファ６
１の内容とにより、前フレームの切出し波形のピッチ周
期と、現フレームの切出し波形のピッチ周期との間につ
いて各ピッチ周期との補間をピッチ補間部６２で行い、
このピッチ周期及び復号ピッチ周期をもちいて線形補間
部５８よりの対応する中間ＮＰＷ波形をサンプリング変
換部６３でサンプリング変換し、つまり原音声のピッチ
周期に戻して残差信号合成部６４で順次つなぎ、これを
駆動音源信号とする。The decoding pitch period and the previous pitch buffer 6
According to the content of 1, the pitch interpolating unit 62 interpolates each pitch cycle between the pitch cycle of the cut waveform of the previous frame and the pitch cycle of the cut waveform of the current frame,
Using the pitch period and the decoding pitch period, the corresponding intermediate NPW waveform from the linear interpolation unit 58 is sampled and converted by the sampling conversion unit 63, that is, returned to the pitch period of the original voice, and successively connected by the residual signal synthesis unit 64. This is a driving sound source signal.

【００２７】なお、図では説明がないが、符号化のとき
誤って半分のピッチもしくは倍のピッチ周期分のＮＰＷ
を抽出し、上記方法で補間を行う時、もう片方のＮＰＷ
のピッチが正しいとすると、線形補間部５８よりの補間
波形を用いると、出力音声の品質が劣化する。そこで、
復号された前後のピッチ周期が例えばほぼ２：１のよう
に大きく異なる場合は、前後の波形の短い方を２回繰り
返し、これをサンプリング変換によりｎサンプル長のベ
クトルに再正規化し、この再正規化ベクトルと長い方の
波形とを用いて線形補間を行う。ピッチ周期も同様に短
い方のピッチ周期を２倍としてこれと、他方のピッチ周
期の内のピッチ補間を行う。Although not described in the figure, the NPW corresponding to a half pitch or a double pitch period is erroneously performed during encoding.
Is extracted and when interpolation is performed by the above method, the other NPW
Assuming that the pitch is correct, the use of the interpolation waveform from the linear interpolation unit 58 degrades the quality of the output sound. Therefore,
If the decoded pitch periods before and after are greatly different, for example, approximately 2: 1, the shorter one of the preceding and following waveforms is repeated twice, and this is re-normalized to a vector of n sample length by sampling conversion. Linear interpolation is performed using the quantization vector and the longer waveform. Similarly, the pitch cycle of the shorter pitch cycle is doubled and the pitch interpolation of the other pitch cycle is performed.

【００２８】周期性信号Ｉ₅が無声部を示す時は無声部
信号部５３よりの合成音源信号をＩ ₅が有声部を示す時
は残差信号合成部６４よりの合成音源信号を用いて線形
予測合成フィルタ６５を駆動し、出力音声を出力端子６
６に得る。ここで、線形予測係数符号Ｉ₁を線形予測係
数復号部６７で復号し、この線形予測係数についても前
係数バッファ６８の内容を用いて前フレーム中の１ピッ
チ周期分の線形予測係数と現フレーム中の１ピッチ周期
分の線形予測係数との間を線形予測係数補間部６９で式
（１３）により線形補間を行い合成フィルタ６５の係数
を決定する。なお線形予測係数の補間は従来のＣＥＬＰ
方式で用いられる手法によってもよい。実施例２図１中のＮＰＷ量子化部３５で多段量子化する場合の実
施例のＮＰＷ量子化部を図６に示す。図６において図３
Ａと対応する部分に同一符号を付けてあり、この例は２
段量子化の場合で、ＮＰＷ符号帳７１が設けられ、この
ＮＰＷ符号帳７１より選択した符号ベクトルｃ₁ ^jに
対し、利得部７２で利得符号帳４３から選択された利得
ｇ₁ ^kが与えられて畳み込みフィルタ７３に与えられ、
正規化インパルス応答Ｈが畳み込まれ、これにより得
られた合成波形が引算部４５より誤差信号から引算部７
４で差し引かれ、その残りが歪み計算部７５に与えら
れ、歪み計算部７５は引算部７４の出力の二乗が最小に
なるようにＮＰＷ符号帳７１の符号ベクトルｃ₁ ^jの
選択と利得符号帳４３の利得ｇ₁ ^kの選択とが行われ
る。この場合も全体として、符号ベクトルを駆動音源と
して合成した波形と、ＮＰＷ波形を駆動音源として合成
した波形との聴覚重み付き平均二乗誤差が最小になるよ
うに符号ベクトルｃ₀ ⁱ、ｃ₁ ^j、利得ｇ₀ ^k、ｇ
₁ ^kが決定される。このユークリッド距離の歪み尺度の
距離計算には式（１４）を用いる。The periodic signal I_FiveIs a silent part when indicates a silent part
The synthesized sound source signal from the signal unit 53 is represented by I _FiveIndicates a voiced part
Is linear using the synthesized sound source signal from the residual signal synthesis unit 64
Drives the prediction synthesis filter 65 and outputs the output sound to the output terminal 6
Get 6 Here, the linear prediction coefficient code I₁The linear predictor
The number is decoded by the numerical decoding unit 67, and the linear prediction coefficient is also
Using the contents of the coefficient buffer 68, one
Linear prediction coefficient for one cycle and one pitch cycle in the current frame
The linear prediction coefficient interpolating unit 69 calculates a value between the linear prediction coefficient and the
The coefficient of the synthesis filter 65 is obtained by performing linear interpolation according to (13).
To determine. The interpolation of the linear prediction coefficient is the same as the conventional CELP
The method used in the method may be used.Example 2 In the case where multi-stage quantization is performed by the NPW quantization unit 35 in FIG.
FIG. 6 shows the NPW quantization unit of the embodiment. In FIG. 6, FIG.
The same reference numerals are given to the portions corresponding to A,
In the case of round quantization, an NPW codebook 71 is provided.
Code vector c selected from NPW codebook 71₁ ^jTo
On the other hand, the gain selected by the gain unit 72 from the gain codebook 43
g₁ ^kIs given to the convolution filter 73,
The normalized impulse response H is convolved, which gives
The combined waveform obtained is subtracted from the error signal by the subtractor 45 to the subtracter 7.
4 and the rest is given to the distortion calculator 75.
The distortion calculation unit 75 minimizes the square of the output of the subtraction unit 74.
The code vector c of the NPW codebook 71₁ ^jof
Selection and gain g of codebook 43₁ ^kThe choice is made and
You. Also in this case, as a whole, the code vector is
And NPW waveform as driving sound source
The weighted mean square error with the waveform
Sea sign vector c₀ ⁱ, C₁ ^j, Gain g₀ ^k, G
₁ ^kIs determined. Of the distortion measure of this Euclidean distance
Equation (14) is used for distance calculation.

【００２９】Ｄ＝‖ｘ−ｇ_o ^kＨｃ₀ ⁱ−ｇ₁ ^kＨｃ₁ ^j‖² （14）ここで、ｘは（５）式で求めたターゲット、Ｈは量
子化された線形予測係数ａ_i′を用いた合成フィルタ３
７のインパルス応答を正規化したものをあらわす行列、
ｃ_oおよびｃ₁は符号ベクトル、ｇ_o、ｇ₁はそれ
ぞれの符号ベクトルの利得をあらわす。[0029] _{^{D = ‖x-g o k Hc}} 0 i -g 1 k Hc 1 j ‖ ² (14) wherein, x represents a (5) targets determined by the equation, H is the linear prediction coefficients a quantized Synthesis filter 3 using _i '
A matrix representing the normalized impulse response of 7
c _o and c ₁ represent code vectors, and g _o and g ₁ represent gains of the respective code vectors.

【００３０】まず、図３Ａについて説明したとおりに１
段目のｃ_oとのその理想利得ｇ_o ⁱを定める。次に、
符号ベクトル帳７１の中から、式（１４）が最小となる
ような符号ベクトルｃ₁ ^jを選択し、その理想利得ｇ
₁ ^jを計算し、ｃ₀ ⁱの理想利得であるｇ₀ ⁱを再計
算する。これは、符号ベクトルｃ_o ⁱとｃ₁ ^jのベ
クトル直交化を行い符号化を行う。このベクトル直交化
に基づくベクトル量子化の詳細については、「励振信号
直交化音声符号化法」（特願平６−４３５１９）に記載
されている。First, as described with reference to FIG.
Stage c_oAnd its ideal gain g_o ⁱIs determined. next,
From the code vector book 71, the expression (14) is minimized.
Such a code vector c₁ ^jAnd its ideal gain g
₁ ^jAnd c₀ ⁱG, the ideal gain of₀ ⁱRecalculate
Calculate. This gives the code vector c_o ⁱAnd c₁ ^jNo
Performs vector orthogonalization and performs encoding. This vector orthogonalization
For more information on vector quantization based on
Orthogonalized speech coding method "(Japanese Patent Application No. 6-43519).
Have been.

【００３１】選択には、式（１５）のＤ₁′値が最大と
なる符号ベクトルｃ₁ ⁱを閉ループで選択する。For the selection, the code vector c ₁ ⁱ having the maximum value of D ₁ ′ in equation (15) is selected in a closed loop.

【００３２】[0032]

【数３】選択された符号ベクトルの理想利得ｇ₁ ^jの計算は、式
（１６）を用いて行う。[Equation 3] Calculation of the ideal gain g ₁ ^j of the selected code vector is performed using Expression (16).

【００３３】[0033]

【数４】また、理想利得ｇ₀ ⁱは式（１７）を用いて再計算を行
う。(Equation 4) Further, the ideal gain g ₀ ⁱ is recalculated using the equation (17).

【００３４】[0034]

【数５】以上の手続きで、符号ベクトルの選択は終了しているた
め、式（１４）が最小となるような（ｇ₀ ^kｇ₁ ^k）を
選択し、これをベクトル量子化する。この場合における
復号器は、図４と同様であるが、ＮＰＷ波形の復号には
式（１８）を用いる。(Equation 5) Since the selection of the code vector has been completed by the above procedure, (g ₀ ^k g ₁ ^k ) that minimizes the expression (14) is selected, and vector quantization is performed. The decoder in this case is the same as that in FIG. 4, but uses equation (18) for decoding the NPW waveform.

【００３５】ｒⁱ＝ｇ₀ ⁱｃ_o ⁱ＋ｇ₁ ⁱｃ₁ ⁱ （18）上述において、符号帳には図５Ｂに示すような適応符号
帳（ａ）、固定符号帳（ｂ）、代数的パルス符号帳
（ｃ）の何れを用いることも可能である。適応符号帳
（ａ）は過去の残差波形であり、代数的パルス符号帳
（ｃ）は規則によりその都度生成することができるもの
である。実施例３図１中のＮＰＷ量子化部３５として共役構造の符号帳
（２つ）を用いて量子化する場合の実施例を図３Ｂにあ
らわし、図２Ｂと対応する部分に同一符号を付けてあ
る。ＮＰＷ符号帳８１が更に設けられる。このＮＰＷ符
号帳８１の各符号ベクトル及びＮＰＷ符号帳４１の符号
ベクトルは互いに共役構造をもつもの、つまり互いに直
交関係にあるものでＮＰＷ符号帳８１から選択された符
号ベクトルは利得部８２で利得符号帳４３から選択され
た利得が与えられ、この利得が与えられた符号ベクトル
と利得部４２よりの符号ベクトルとが加算部８３で加算
されて駆動音源信号として畳み込みフィルタ４４に与え
られる。この符号ベクトルを駆動音源信号として合成し
た波形と、ＮＰＷ波形を駆動音源信号として合成した波
形との聴覚重み付け平均二乗誤差が最小になるようにＮ
ＰＷ符号帳４１，８１の各符号ベクトルとその利得とが
決定される。この距離の歪み尺度の距離計算には実施例
２と同様に式（１４）を用いる。この共役構造の符号帳
４１，８１を用いる符号化方法の詳細については「多重
ベクトル量子化方法およびその装置」（特願昭６３−２
４９４５０）に記載されている。[0035] In ^{_{^{_{r i = g 0 i c o}}}} i + g 1 i c 1 i (18) above, the adaptive codebook as shown in Figure 5B the codebook (a), a fixed codebook (b), algebraic Any of the pulse codebooks (c) can be used. The adaptive codebook (a) is a residual waveform in the past, and the algebraic pulse codebook (c) can be generated each time according to rules. Embodiment 3 FIG. 3B shows an embodiment in which quantization is performed using a codebook (two) having a conjugate structure as the NPW quantization unit 35 in FIG. 1, and parts corresponding to those in FIG. is there. An NPW codebook 81 is further provided. Each code vector of the NPW codebook 81 and the code vector of the NPW codebook 41 have a conjugate structure with each other, that is, are orthogonal to each other. The gain selected from the book 43 is provided, and the code vector to which the gain has been provided and the code vector from the gain unit 42 are added by the addition unit 83 and provided to the convolution filter 44 as a drive excitation signal. N is set so that the auditory weighted mean square error between the waveform obtained by combining the code vector as the driving sound source signal and the waveform obtained by combining the NPW waveform as the driving sound source signal is minimized.
Each code vector of the PW codebooks 41 and 81 and its gain are determined. Equation (14) is used in the distance calculation of the distance distortion scale as in the second embodiment. For details of the encoding method using the conjugate structure codebooks 41 and 81, see "Multiple Vector Quantization Method and Apparatus Thereof" (Japanese Patent Application No. 63-2).
49450).

【００３６】この場合も、符号帳としては図５Ｂに示す
ような適応符号帳、固定符号帳、代数的パルス符号帳を
用いることが可能である。上述において、複数の符号帳
を用いる場合は、図５Ｂに示した複数種類のものから、
例えば適応符号帳と、固定符号帳というように組合わせ
て用いてもよい。多段ベクトル量子化や共役構造ベクト
ル量子化に対する図４中のＮＰＷ復号部５６は、入力符
号ベクトル数と対応する符号帳を用意しておき、これら
符号帳からそれぞれ入力ＮＰＷ符号Ｉ₄に応じた符号ベ
クトルをそれぞれ取出し、かつそれらに対して、入力Ｎ
ＰＷ符号Ｉ₄中の利得コードにより利得符号帳から得た
各対応する利得をそれぞれ与えればよい。このようにし
てそれぞれ復号されたＮＰＷベクトルをそれぞれ線形補
間し、更にサンプリング変換をそれぞれ行い連続した信
号とすると共に互いに加算して、残差信号として、合成
フィルタ６５へ供給するようにすればよい。Also in this case, an adaptive codebook, a fixed codebook, and an algebraic pulse codebook as shown in FIG. 5B can be used as the codebook. In the above description, when a plurality of codebooks are used, from the plurality of types shown in FIG. 5B,
For example, a combination of an adaptive codebook and a fixed codebook may be used. The NPW decoding unit 56 in FIG. 4 for multi-stage vector quantization and conjugate structure vector quantization prepares codebooks corresponding to the number of input code vectors, and codes based on the input NPW code I ₄ from these codebooks. Fetch the vectors, and input them to the input N
PW code I ₄ in gain encoded by each corresponding gain obtained from the gain codebook may be given respectively. The thus decoded NPW vectors may be linearly interpolated, and may be further subjected to sampling conversion to obtain continuous signals and may be added to each other to be supplied to the synthesis filter 65 as a residual signal.

【００３７】図１中の無声部量子化部１９は雑音符号帳
から雑音ベクトルを取出し、これに利得を与えたもの
と、入力残差信号との誤差の二乗が最小になるように雑
音ベクトル選択とこれに与える利得とを決定してもよ
い。また図１において線形予測逆フィルタ１４を省略
し、線形予測逆フィルタ１８の出力を相関計算部１５へ
入力してもよい。ピッチ周期の検出精度は逆フィルタ１
４を用いた方がよい。また図１において線形予測係数符
号Ｉ₁も対する１ピッチ周期分だけ出力してもよい。The unvoiced part quantization unit 19 shown in FIG. 1 extracts a noise vector from the noise codebook, and selects a noise vector so as to minimize the square of the error between the noise vector obtained by giving the gain and the input residual signal. And the gain to be given to it. In FIG. 1, the linear prediction inverse filter 14 may be omitted, and the output of the linear prediction inverse filter 18 may be input to the correlation calculator 15. Pitch period detection accuracy is inverse filter 1
4 is better. In FIG. 1, the linear prediction coefficient code I ₁ may be output for one pitch period.

【００３８】[0038]

【発明の効果】以上説明したように、この発明の符号化
方法によれば、有声区間では１フレーム中の１ピッチ周
期だけを符号化しているため、全体を符号化するより符
号化ビット数を少くすることができる。しかもその符号
化の際に、ベクトル長を正規化し、ピーク位置をそろえ
ているため波形の位相情報もなくなり、一層符号化ビッ
ト数を少なくすることができる。As described above, according to the encoding method of the present invention, only one pitch period in one frame is encoded in a voiced section. Can be reduced. In addition, during encoding, the vector length is normalized and the peak positions are aligned, so that there is no phase information of the waveform, and the number of encoded bits can be further reduced.

【００３９】また発明の復号化方法によれば有声区間で
１フレーム中の１ピッチ周期分の情報しか入力されない
が、前後の２つの正規化された符号ベクトルの間を補間
した符号ベクトルを作り、同様に前後の２つの復号ピッ
チ周期の間を補間したピッチ周期を作り、その後、その
各符号ベクトルを対応するピッチ周期に伸縮して、連結
させることにより、連結した駆動信号を作り、これによ
り合成フィルタを駆動して、音声を再生することができ
る。According to the decoding method of the present invention, only information for one pitch period in one frame is input in a voiced section, but a code vector is generated by interpolating between two preceding and succeeding normalized code vectors. Similarly, a pitch period is generated by interpolating between the preceding and following two decoding pitch periods, and then each code vector is expanded and contracted to the corresponding pitch period and connected to form a connected drive signal, thereby producing a synthesized signal. The sound can be reproduced by driving the filter.

【００４０】この発明の音声符号化方法・復号化方法の
効果を調べるために、以下の条件で分析合成音声実験を
行った。入力音声としては、０〜４ｋＨｚ帯域の音声を
標本化周波数８．０ｋＨｚで標本化した後に、ＩＲＳ特
性フィルタを通したものを用いた。符号化器および復号
器は実施例２（図１、図６および図４）の構成のものを
用いた。まず、この入力音声信号に、２５ｍｓ（２００
サンプル）毎に音声信号に分析窓長３０ｍｓのハミング
窓を乗じ、分析次数を１２次として自己相関法による線
形予測分析を行い、１２個の予測係数を求める。予測係
数はＬＳＰパラメータのユークリッド距離を用いてベク
トル量子化する。In order to examine the effects of the speech encoding / decoding method of the present invention, an analysis / synthesis speech experiment was performed under the following conditions. As the input sound, a sound in a band of 0 to 4 kHz, which was sampled at a sampling frequency of 8.0 kHz and then passed through an IRS characteristic filter, was used. The encoder and decoder having the configuration of the second embodiment (FIGS. 1, 6 and 4) were used. First, 25 ms (200 ms)
For each sample), a voice signal is multiplied by a Hamming window having an analysis window length of 30 ms, and a linear prediction analysis is performed by an autocorrelation method with an analysis order of 12, thereby obtaining 12 prediction coefficients. The prediction coefficient is vector-quantized using the Euclidean distance of the LSP parameter.

【００４１】入力音声信号の状態が有声部と判断された
場合、量子化前の推定ピッチ周期の長さの残差波形を、
ｑ＝２０、ｎ＝１２０として式（３）の演算により１２
０サンプル長のＮＰＷベクトルにサンプリング変換を行
い、このＮＰＷ波形を２つの雑音符号ベクトル
ｃ₀ ⁱ、ｃ₁ ^jを用いてベクトル量子化する。偏自
己相関法でもとめたピッチは整数値へとスカラー量子化
する。If the state of the input speech signal is determined to be a voiced part, the residual waveform having the length of the estimated pitch period before quantization is represented by
Assuming that q = 20 and n = 120, 12
Sampling conversion is performed on an NPW vector having a length of 0 samples, and this NPW waveform is vector-quantized using two noise code vectors c ₀ ⁱ and c ₁ ^j . The pitch determined by the partial autocorrelation method is scalar-quantized to an integer value.

【００４２】また、入力音声信号が無声部と判断された
場合は２５ｍｓフレームを５分割して各５ｍｓサブフレ
ーム内の残差波形の平均パワーを計算し、その５つの値
をベクトル量子化する。ビットレートは周期性がある場
合は２．０８ｋｂｉｔ／ｓ、周期性がない場合は１．２
４ｋｂｉｔ／ｓであり、その内訳は次のようになる。If the input voice signal is determined to be unvoiced, the average power of the residual waveform in each 5 ms subframe is calculated by dividing the 25 ms frame into five, and the five values are vector quantized. The bit rate is 2.08 kbit / s when there is periodicity, and 1.2 when there is no periodicity.
4 kbit / s, and the breakdown is as follows.

【００４３】パラメータビット数／フレーム予測係数（ＬＳＰ）２１有声・無声パラメータ１駆動音源（有声の場合）１段目の雑音系列７２段目の雑音系列７雑音系列の利得８ピッチ周期７駆動音源（無声の場合）雑音系列８上記の条件で符号化された音声は、同一ビットレートの
従来のボコーダに比べてはるかに高い自然性をもち、ま
た同一ビットレートの従来のＣＥＬＰ符号化に比べても
明瞭で雑音感の少ない音声品質が達成された。Parameter Number of bits / frame Prediction coefficient (LSP) 21 Voiced / unvoiced parameter 1 Driving sound source (if voiced) First-stage noise sequence 7 Second-stage noise sequence 7 Noise sequence gain 8 Pitch period 7 Driving sound source (In the case of unvoiced) noise sequence 8 Speech coded under the above conditions has much higher naturalness than a conventional vocoder of the same bit rate, and also has a higher quality than a conventional CELP coding of the same bit rate. The voice quality was also clear and less noise.

[Brief description of the drawings]

【図１】この発明の符号化方法の実施例を適用した符号
化器の機能構成例を示すブロック図。FIG. 1 is a block diagram showing a functional configuration example of an encoder to which an embodiment of an encoding method according to the present invention is applied.

【図２】Ａは図１中の無声部量子化部１９の具体的機能
構成を示すブロック図、Ｂは図１中の無声部量子化部１
９の他の具体的機能構成を示すブロック図である。2A is a block diagram showing a specific functional configuration of the unvoiced part quantization unit 19 in FIG. 1; FIG.
FIG. 9 is a block diagram showing another specific functional configuration 9.

【図３】Ａは図１中のＮＰＷ量子化部３５の具体的機能
構成例を示すブロック図、Ｂは共役構造ベクトル量子化
の場合のＮＰＷ量子化部３５の具体的機能構成例を示す
ブロック図である。3A is a block diagram illustrating a specific functional configuration example of an NPW quantization unit 35 in FIG. 1; FIG. 3B is a block illustrating a specific functional configuration example of the NPW quantization unit 35 in the case of conjugate structure vector quantization; FIG.

【図４】この発明による復号化方法の実施例を適用した
復号化器の機能構成例を示すブロック図。FIG. 4 is a block diagram showing a functional configuration example of a decoder to which the embodiment of the decoding method according to the present invention is applied;

【図５】Ａは図４中の無声部復号部５３の具体的機能構
成を示すブロック図、Ｂはこの発明で用いられる各種符
号帳の例を示す図である。5A is a block diagram illustrating a specific functional configuration of the unvoiced section decoding unit 53 in FIG. 4, and FIG. 5B is a diagram illustrating examples of various codebooks used in the present invention.

【図６】図１中のＮＰＷ量子化部３５を多段ベクトル量
子化法とした場合の機能構成例を示すブロック図。FIG. 6 is a block diagram showing an example of a functional configuration when an NPW quantization unit 35 in FIG. 1 is a multi-stage vector quantization method.

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 19/12 ──────────────────────────────────────────────────続き Continued on front page (58) Field surveyed (Int.Cl. ⁷ , DB name) G10L 19/12

Claims

(57) [Claims]

A drive for driving a linear prediction synthesis filter of a linear prediction coefficient obtained by the analysis and a filter coefficient based on the linear prediction coefficient by analyzing the voice signal for each frame longer than the pitch period of the voice. In a speech coding method for expressing speech characteristics by signals, voiced / unvoiced section discrimination is performed for each frame, and if the frame is a voiced section, the residual signal obtained by performing linear predictive inverse filtering on the voice signal is used. Extracting a residual signal vector having a pitch period length, normalizing the extracted residual signal vector length to a predetermined length, and comparing the normalized residual signal vector with a predetermined reference signal vector. In order to increase the correlation, a target residual vector is obtained by circulating through the residual signal vector element, and the target residual vector is sent to the synthesis filter. A target waveform vector is obtained by synthesizing the target signal with a voice signal, and a selected one of a plurality of predetermined code vectors is used as a drive signal to synthesize a voice by the synthesis filter to obtain a synthesized waveform vector, and the target waveform of the synthesized waveform vector is obtained. A speech code characterized by selecting the code vector that minimizes the distortion of the waveform with respect to the vector, determining a quantization code, and determining the code by quantizing the residual signal if the determination is an unvoiced section. Method.

2. The method according to claim 1, wherein the target waveform vector is generated by lowering a non-square matrix obtained by expanding a lower triangular matrix based on an impulse response of the synthesis filter to obtain a free response of the filter. 2. The speech encoding method according to claim 1, wherein a convolution operation is performed on a difference vector, and the non-square matrix is convolved with the selected code vector to generate the composite waveform vector.

3. An impulse response is obtained by passing an impulse through a synthesis filter having a filter coefficient based on the linear prediction coefficient, the impulse response is normalized to a vector length having the predetermined length, and the length is changed. 3. The speech encoding method according to claim 2, wherein the non-square matrix is created by an impulse response.

4. The driving signal for obtaining the composite waveform vector by a weighted linear sum of code vectors selected from a plurality of codebooks, respectively. Voice encoding method.

5. The driving signal for obtaining the composite waveform vector by a weighted linear sum of code vectors respectively selected from a plurality of codebooks having a conjugate structure. The speech encoding method according to any one of the above.

6. The quantization of the unvoiced section is performed by exciting the synthesis filter with a noise vector selected from a noise codebook and selecting a noise vector that minimizes distortion between an output signal and an input speech signal. 2. The method according to claim 1, wherein
6. The speech encoding method according to any one of claims 1 to 5.

7. The quantization of the unvoiced section is performed by dividing one frame into a plurality of sub-frames and vector-quantizing the average power of each sub-frame for each frame. 6. The speech encoding method according to any one of claims 1 to 5.

8. A linear prediction unit encoded for each frame.
Numeric code, periodic code and,Input the quantization code of the drive signal.
Filter obtained by decoding the linear prediction coefficient code
A linear prediction synthesis filter having coefficients is
Driving with the decoded signal of the quantization code to synthesize the output sound
In the audio decoding method If the periodic code indicates a voiced section, the drive
Two code vectors before and after decoding the quantized code of the motion signal
ofWeighted linearInterpolation and input pitch circumference
Of the two pitch periods before and after decoding the period codeWeighted linear
Interpolate, According to the interpolated pitch period, the interpolated code
By expanding and contracting the vector length of the
Generate motion signalAnd The two pitch periods before and after the decoding differ greatly from each other.
If the above, interpolation of the above code vector is a short pitch period length
After repeating the corresponding code vector twice,
Is normalized to the length of the code vector of
Interpolation between the vector and the other code vector, and
The interpolation of the pitch period is also doubled for the shorter one,
The correction between the switch cycle andSoundVoice decoding
Method.

9. before and after the two above decoded speech decoding method of claim 8 Symbol mounting the linear prediction coefficients the linear interpolation and obtains the filter coefficient of the synthesis filter.

10. If the periodic code indicates an unvoiced section, decoding the input unvoiced part code to obtain an unvoiced residual waveform, and driving the synthesis filter with the unvoiced residual waveform. The speech decoding method according to claim 8 or 9 , wherein:

11. The decoding of the unvoiced code is performed by matching the power of each subframe of the generated white noise with the power of a plurality of subframes obtained by dividing one frame from a power codebook. 11. The speech decoding method according to claim 10 , wherein a residual waveform is obtained.

12. The speech decoding method according to claim 10 , wherein the decoding of the unvoiced part code extracts a noise vector from a noise codebook to obtain the speech residual waveform.