JPH0211920B2

JPH0211920B2 -

Info

Publication number: JPH0211920B2
Application number: JP54155981A
Authority: JP
Inventors: Yasuhiko Arai; Masahisa Furuya
Original assignee: Matsushita Communication Industrial Co Ltd
Current assignee: Panasonic Mobile Communications Co Ltd
Priority date: 1979-11-30
Filing date: 1979-11-30
Publication date: 1990-03-16
Also published as: JPS5678898A

Description

[Detailed description of the invention]

本発明は音声分析合成方式におけるパラメータ
情報の圧縮方法に関するものである。分析合成方式とは離散的音声信号に一定長の窓
関数、たとえば30ms長のハミング窓等を掛けて
切り出した有限個のデータから音声のスペクトル
情報を表現するパラメータ（スペクトルパラメー
タ）と音源情報を表現するパラメータ（音源パラ
メータ）とを分離して抽出し、抽出したパラメー
タを用いて元の音声信号を復元するものである。
このとき、分析窓を一定時間長づつ移動させなが
ら、それぞれのパラメータを抽出する。たとえば
分析窓移動時間長は通常5ms、10ms、あるいは
20msの固定値が使用されている。このようにして抽出したパラメータを符号化し
て伝送、または記憶しておき、受信側または記憶
装置から読み出した後に復号化し、復号化したパ
ラメータを用いて元の音声が合成される。音声のスペクトル情報を表わすパラメータとし
て８〜10個の線形予測係数、あるいは偏自己相関
係数（PARCOR係数とも呼ばれる）などがある。
線形予測係数では、符号化する際にパラメータ当
り10ビツト前後の情報量が必要となるが、
PARCOR係数では、次数に応じて10〜３ビツト
の情報量で良く、従つて線形予測係数使うよりも
少ない情報量で元の音声が復元できる利点があ
る。本発明はこのような分析合成方式において、さ
らにパラメータ情報を圧縮する方法を提供し、効
率の良い分析合成方式を実現しようとするもので
ある。以下PARCOR係数を用いた実施例ととも
に説明する。 PARCOR係数k_i（ｉ＝１、２、３…、10）は｜
k_i＜１の範囲の値であるが、これを符号化するた
めに量子化が必要である。量子化によつて生ずる
スペクトル歪はk_iの次数によつて異なり、K₁の量
子化誤差が最も大きくスペクトル歪に影響する。
次数が増すにつれて影響力は減少して行く。K_iの
変動がスペクトルにおよぼす影響はスペクトル感
度で表わされる。スペクトル感度s^(k) _iiは次式で定
義されるものである。 s^(k) _ii＝ lim ΔKi→ｏ｜Δs／ΔK_i｜ ……(1) ここで、Δsはスペクトルの変化分、ΔK_iは
PARCOR係数K_iの変化分である。スペクトル感
度の高いPARCOR係数は量子化の際に多くのビ
ツトを要し、感度の低い係数は少ないビツト数で
良い。日本音響学会講演論文集、３−２−21、昭和53
年５月、「LPCパラメータのスペクトル感度解
析」（文献１）によれば、K_iをN_iビツトに線形量
子化する場合、パラメータ当りの量子化歪をσ_dB
（dBスケール表示）とすると、 N_i＝log₂√^(k) _ii−log₂(0.282σ_dB) ……(2) となることが示されている。このことから、各パ
ラメータ（k₁、k₂、……、k₁₀）の量子化誤差に
よるスペクトル歪が、等しくσ_dBとなるようにN_i
（ｉ＝１、２、……、10）を決定すれば、どのパ
ラメータについても、1LSBの変化がスペクトル
歪に及ぼす影響はほぼ同一であるとみなせる。本発明はこのようなスペクトルパラメータの性
質を利用してパラメータ情報を削減するものであ
る。今、各パラメータをN_iビツトに量子化した
値（２の補数表示）をQ_iとし、分析フレームごと
にパラメータベクトルＰを次式(3)によつて定義す
る。Ｐ＝（Q₁、Q₂、Q₃、……、Q_P） ……(3) さらに、隣接する２フレームのベクトル間距離
Ｄを次式で定義する。Ｄ＝｜P_L−P_R｜＝_p 〓ⁱ⁼¹ ｜Q_iL−Q_iR｜ ……(4) ただし、ｐはパラメータ次数、Ｌは左側のフレ
ーム、Ｒは右側のフレームを表わす。本発明では
このＤを尺度として、Ｄがある一定値以下のとき
だけパラメータベクトルP_RをP_Lで置き換えるよ
うにしてパラメータ情報を圧縮する。図はパラメータベクトルの置き換え操作説明図
である。１、２、３、……はフレーム番号、上段
のP_i（ｉ＝１、２、３、……）は音声信号を分析
して抽出したスペクトルパラメータベクトルであ
る。中段はＤの算出例を示したもので、２進数を
10進数に変換して表示してある。下段はＤが20以
下の場合にパラメータベクトルを前のフレームの
パラメータベクトルに置き換えた場合を示してい
る。第１フレームではP₁をそのまま使い、第２
フレームではＤが４であるからP₂をP₁に置き換
える。第３フレームでは、置き換え後の第２フレ
ームのベクトルP₁との距離Ｄが12であるから、
P₃をP₁に置き換える。同様にして第４フレーム
のＤを算出すると、これが20以上となるから、
P₄をそのまま使用する。第５フレームではＤが
10となるからP₅をP₄に置き換える。第６フレー
ムではＤが26となり、20を越えるからP₆をその
まま使う。このようにして、図下段のベクトル系列に置き
換えが完了すれば、６フレーム分のパラメータ情
報が３フレーム分のパラメータ情報で表現でき情
報が大巾に削減される。電子通信学会論文誌、78／２、Vol.61−Ａ、No.
２「PARCOR形受声分析合成系における最適符号
構成」（文献２）によれば、スペクトルパラメー
タk_iのうちの１つだけをN_iビツトに量子化したと
きの量子化歪DS^(p) _Qi（N_i）とk_iをそれぞれN_iビツト
に量子化したときの総量子化歪DS^(p) _Q（Ｎ）との間
に、実験式(5)が成立することが示されている。 DS^(p) _Q（Ｎ）＝_p 〓ⁱ⁼¹ DS^(p) _Qi（N_i）、（dB）² ……(5) ここで、Ｎ＝_p 〓ⁱ⁼¹ N_i、（ビツト） ……(6) であり、ｐはパラメータ次数で、実施例ではｐ＝
10としている。本発明では、(5)式におけるDS^(p) _Qi
（N_i）をｉの値にかかわらず一定値（σ_dB）²に定め
ているから、 DS^(p) _Q（Ｎ）＝_p 〓ⁱ⁼¹ （σ_dB）²＝ｐ（σ_dB）² ……(7) となる。一方、(2)式で定まるN_iを用いれば、(3)式にお
けるQ_i（ｉ＝１、２、……、10）の変化分がスペ
クトル歪に及ぼす影響は全て等価とみなせること
から、パラメータスペクトルP_RをP_Lで置き換え
た合に生じる歪〔Ｄ＝（σ_dB）²〕を量子化歪に加え
て、スペクトル歪の総量√を求めると、 √＝√（）＋（_dB）² ＝√（＋）・σ_dB、（dB） ……(8) と表わされる。(4)式で定義したベクトル間距離Ｄ
は(8)式によつてスペクトル歪を間接的に表現する
尺度であることがわかる。すなわち、Ｄの値をある値に選択すれば、P_R
をP_Lに置き換えることによつて生じるスペクト
ル歪を一定値以下とすることができる。分析窓長30ms、フレーム周期10ms、ｐ＝10と
して行つたシミユレーシヨン実験によると、σ_dB
＝1dBとして求めた｛N_i｝¹⁰ _i=1は、｛N_i｝¹⁰ _i=1＝｛７、６、５、４、４
、４、３、３、３、３｝……(9) であり、Ｄ＝０、10、20、30として、パラメータ
ベクトルＰの置き換え操作を行つた結果は第１表
のようである。実験によると、無声フレームに対して、置き換
え操作を行うと、音韻性の劣化が著しいが、連続
する有声フレームに対しては良好な結果が得られ
た。次表は後者の場合をまとめたものである。 The present invention relates to a method for compressing parameter information in a speech analysis and synthesis method. What is the analysis and synthesis method? Parameters (spectral parameters) that express the spectrum information of the voice and sound source information are expressed from a finite number of data extracted by multiplying the discrete voice signal by a window function of a certain length, such as a 30ms long Hamming window. This method separates and extracts the parameters (sound source parameters) that are used in the audio signal, and then uses the extracted parameters to restore the original audio signal.
At this time, each parameter is extracted while moving the analysis window by a certain length of time. For example, the analysis window movement time length is typically 5ms, 10ms, or
A fixed value of 20ms is used. The parameters extracted in this way are encoded and transmitted or stored, read out from the receiving side or a storage device, decoded, and the original voice is synthesized using the decoded parameters. There are 8 to 10 linear prediction coefficients, partial autocorrelation coefficients (also called PARCOR coefficients), etc. as parameters representing the spectral information of speech.
Linear prediction coefficients require around 10 bits of information per parameter when encoding.
PARCOR coefficients require only 10 to 3 bits of information depending on the order, and therefore have the advantage that the original speech can be restored with less information than using linear prediction coefficients. The present invention provides a method for further compressing parameter information in such an analysis and synthesis method, and aims to realize an efficient analysis and synthesis method. This will be explained below along with an example using the PARCOR coefficient. The PARCOR coefficient k _i (i=1, 2, 3..., 10) is |
values in the range k _i <1, but quantization is required to encode them. The spectral distortion caused by quantization varies depending on the order of k _i , and the quantization error of K ₁ has the largest effect on the spectral distortion.
As the order increases, the influence decreases. The effect that fluctuations in K _i have on the spectrum is expressed as spectral sensitivity. The spectral sensitivity s ^(k) _ii is defined by the following equation. s ^(k) _ii = lim ΔKi→o | Δs / ΔK _i | ...(1) Here, Δs is the change in spectrum, and ΔK _i is
This is the change in the PARCOR coefficient K _i . PARCOR coefficients with high spectral sensitivity require many bits for quantization, while coefficients with low sensitivity require fewer bits. Proceedings of the Acoustical Society of Japan, 3-2-21, 1972
According to "Spectral Sensitivity Analysis of LPC Parameters" (Reference 1) published in May, when K _i is linearly quantized to N _i bits, the quantization distortion per parameter is σ _dB.
(dB scale display), it is shown that N _i =log ₂ √ ^(k) _ii −log ₂ (0.282σ _dB ) ...(2). _From _this _, _N _i
If (i=1, 2, . . . , 10) is determined, it can be considered that the influence of a change of 1 LSB on the spectral distortion is almost the same for any parameter. The present invention utilizes such properties of spectral parameters to reduce parameter information. Now, a value obtained by quantizing each parameter into N _i bits (in two's complement representation) is set as Q _i , and a parameter vector P is defined for each analysis frame by the following equation (3). P=(Q ₁ , Q ₂ , Q ₃ , . . . , Q _P ) (3) Furthermore, the distance D between vectors of two adjacent frames is defined by the following equation. D=| _PL −P _R |= _p 〓 ⁱ⁼¹ |Q _iL −Q _iR | (4) where p is the parameter order, L is the left frame, and R is the right frame. In the present invention, parameter information is compressed by using D as a measure and replacing parameter vector _PR with P _L only when D is less than a certain value. The figure is an explanatory diagram of parameter vector replacement operation. 1, 2, 3, . . . are frame numbers, and P _i (i=1, 2, 3, . . .) in the upper row is a spectral parameter vector extracted by analyzing the audio signal. The middle row shows an example of calculating D, and the binary number is
It is converted to decimal and displayed. The lower row shows the case where the parameter vector is replaced with the parameter vector of the previous frame when D is 20 or less. In the first frame, use P ₁ as is, and in the second frame
Since D is 4 in the frame, P ₂ is replaced with P ₁ . In the third frame, the distance D from the vector P ₁ of the second frame after replacement is 12, so
Replace P ₃ with P ₁ . If we calculate D for the fourth frame in the same way, it will be 20 or more, so
Use P ₄ as is. In the fifth frame, D
Since it becomes 10, replace P ₅ with P ₄ . In the 6th frame, D becomes 26 and exceeds 20, so P ₆ is used as is. In this way, when the replacement with the vector sequence shown in the lower part of the figure is completed, the parameter information for 6 frames can be expressed by the parameter information for 3 frames, and the amount of information can be greatly reduced. Journal of the Institute of Electronics and Communication Engineers, 78/2, Vol.61-A, No.
2. According to ``Optimal Code Configuration in PARCOR-type Voice Analysis and Synthesis System'' (Reference 2), when only one of the spectral parameters k _i is quantized to N _i bits, the quantization distortion DS ^(p) _Qi It has been shown that the empirical formula (5) holds between (N _i ) and the total quantization distortion DS ^(p) _Q (N) when each of k _i is quantized to N _i bits. DS ^(p) _Q (N)= _p 〓 ⁱ⁼¹ DS ^(p) _Qi (N _i ), (dB) ² ...(5) Here, N= _p 〓 ⁱ⁼¹ N _i , (bit) ... ...(6), p is the parameter order, and in the example p=
It is set at 10. In the present invention, DS ^(p) _Qi in equation (5)
Since (N _i ) is set to a constant value (σ _dB ) ² regardless of the value of i, DS ^(p) _Q (N) = _p 〓 ⁱ⁼¹ (σ _dB ) ² = p(σ _dB ) ² ...(7) becomes. On the other hand, if N _i determined by equation (2) is used, the effects of changes in Q _i (i = 1, 2, ..., 10) in equation (3) on spectral distortion can be considered to be equivalent, so Adding the distortion [D=(σ _dB ) ² ] that occurs when the parameter spectrum _PR is replaced by _PL to the quantization distortion to find the total amount of spectral distortion √, we get √=√()+( _dB ) ² =√(+)・σ _dB , (dB)...(8) Distance between vectors D defined by formula (4)
It can be seen that is a measure that indirectly expresses spectral distortion using equation (8). That is, if the value of D is selected to a certain value, P _R
By replacing PL with _PL , the spectral distortion caused can be kept below a certain value. According to a simulation experiment conducted with an analysis window length of 30 ms, a frame period of 10 ms, and p = 10, σ _dB
{N _i } ¹⁰ _i=1 calculated as = 1 dB is {N _i } ¹⁰ _i=1 = {7, 6, 5, 4, 4
, 4, 3, 3, 3, 3}...(9), and the results of replacing the parameter vector P with D=0, 10, 20, 30 are as shown in Table 1. Experiments have shown that when a replacement operation is performed on an unvoiced frame, the phonology deteriorates significantly, but good results are obtained for consecutive voiced frames. The following table summarizes the latter case.

【表】Ｄ＝30の場合は聴感的にも品質劣化が目立つ
が、Ｄ＝20まではＤ＝０の場合とほとんど遜色な
い。Ｄ＝20ときＤ＝０に対して33％の情報が削減
され、極めて効率の良い、分析合成が実現される
ことがわかる。以上の説明から明らかなように、本発明では、
パラメータベクトル間の距離を求める前の段階
で、スペクトルパラメータ当りのスペクトル歪が
等しくなるように各スペクトルパラメータを量子
化しているため、各パラメータスペクトル間距離
の演算結果が同じ重みを持つことになり、スペク
トル歪に与える影響が等しくなり、このスペクト
ル歪を間接的に表現するベクトル間距離を評価尺
度として用いてスペクトル歪が一定値を越えない
範囲で、スペクトルパラメータを１フレーム前の
値に置き換えるようにして情報を削減しているた
め、極めて効率的な分析合成が実現できる効果が
ある。またこのようにして情報圧縮をしたパラメ
ータを記憶装置に格納しておく音声合成装置に利
用すれば、メモリコストが大巾に削減でき、極め
て経済的な装置が実現できる効果がある。[Table] When D=30, the deterioration in quality is noticeable audibly, but up to D=20, it is almost as good as when D=0. It can be seen that when D=20, information is reduced by 33% compared to D=0, and extremely efficient analysis and synthesis can be achieved. As is clear from the above description, in the present invention,
Before calculating the distance between parameter vectors, each spectral parameter is quantized so that the spectral distortion per spectral parameter is equal, so the calculation result of the distance between each parameter spectrum has the same weight. Spectral parameters are replaced with values from one frame before, as long as the influence on spectral distortion is equal and spectral distortion does not exceed a certain value using the distance between vectors that indirectly expresses this spectral distortion as an evaluation measure. Since the amount of information is reduced by using the method, it has the effect of realizing extremely efficient analysis and synthesis. Furthermore, if parameters compressed in this way are used in a speech synthesis device that stores them in a storage device, the memory cost can be greatly reduced and an extremely economical device can be realized.

[Brief explanation of drawings]

図面は本発明によるパラメータ情報圧縮方法を
用いたパラメータベクトルの置き換え操作を説明
する図面である。 The drawing is a diagram illustrating a parameter vector replacement operation using the parameter information compression method according to the present invention.

Claims

[Claims]

1 After quantizing each spectral parameter so that the spectral distortion per audio spectral parameter is equal, find the distance between the parameter spectra of two adjacent frames, and when the distance between the parameter vectors is less than a certain value, A parameter information compression method characterized by replacing parameters of a subsequent frame with parameters of a previous frame.