JP2535809B2

JP2535809B2 - Linear predictive speech analysis and synthesis device

Info

Publication number: JP2535809B2
Application number: JP60096223A
Authority: JP
Inventors: 哲田口
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1985-05-07
Filing date: 1985-05-07
Publication date: 1996-09-18
Anticipated expiration: 2011-09-18
Also published as: JPS61255000A

Description

【発明の詳細な説明】（産業上の利用分野）本発明は音声分析合成装置に関し、特に音声合成フィ
ルタの出力波形に対して利得調整を施す手段を有する音
声分析合成装置の動作振幅範囲の拡大を図った音声分析
合成装置に関する。Description: TECHNICAL FIELD The present invention relates to a speech analysis / synthesis apparatus, and in particular, to expanding an operation amplitude range of a speech analysis / synthesis apparatus having means for adjusting a gain of an output waveform of a speech synthesis filter. The present invention relates to a voice analysis / synthesis device.

（従来の技術）音声分析合成装置は分析側で音声を分析して特徴パラ
メータを抽出し、合成側で前記パラメータを再合成する
ものであり、通常、装置規模が比較的小さい固定小数点
による有限精度演算を利用する音声合成フィルタを備え
て音声波形を合成している。このような分析合成装置に
よる音声合成にあっては、合成される音声波形の品質を
高品位のものに保持するためには不十分となり勝な音声
合成フィルタのダイナミックレンジ,S/N（Signal to No
ise）比を補足する目的から音声合成フィルタの入力側
ではなく出力側で入力音声の音源の強さに対応した利得
調整を行なっている。(Prior Art) A speech analysis / synthesis apparatus analyzes speech on the analysis side to extract characteristic parameters, and resynthesizes the parameters on the synthesis side. Usually, the scale of the apparatus is finite precision with a fixed point and relatively small. A voice synthesis filter that uses computation is provided to synthesize a voice waveform. In speech synthesis by such an analysis-synthesis device, it is not sufficient to maintain the quality of the synthesized speech waveform with high quality, and the dynamic range of the speech synthesis filter, S / N (Signal to No
For the purpose of supplementing the ratio, the gain adjustment corresponding to the strength of the sound source of the input voice is performed on the output side of the voice synthesis filter, not on the input side.

[Problems to be solved by the invention]

しかしながら従来のこの種の音声合成手段には次の如
き更に改善すべき問題点がある。However, the conventional voice synthesizing means of this kind has the following problems to be further improved.

すならち、音声合成フィルタはそのスペクトル伝送特
性が音声の発声者の声道伝送特性と対応づけられるもの
であり、被分析音声波形のスペクトル筒絡形状の変化に
伴ないそのフィルタ係数も時々変化するものである。従
って音声合成フィルタはそのフィルタ利得、即ち入力信
号の電力と出力信号の電力との比も時々刻々変化する。
そのため従来の音声合成フィルタの出力波形に対して利
得調整を施す手段を有する音声分析合成装置では、経験
的に知られる合成フィルタの最大利得を勘案して音声合
成フィルタの入力信号レベルを一定にしている。従って
合成フィルタの利得が小さい場合には音声合成フィルタ
の出力波形レベルが相対的に小さくなり、本発明の目的
である音声合成フィルタのダイナミックレンジ、S/N比
の補足が充分に達成されないという問題がある。In other words, the speech synthesis filter has its spectral transmission characteristics associated with the vocal tract transmission characteristics of the voice speaker, and its filter coefficient also changes from time to time as the spectrum envelope shape of the speech waveform to be analyzed changes. To do. Therefore, the filter gain of the voice synthesis filter, that is, the ratio of the power of the input signal to the power of the output signal also changes from moment to moment.
Therefore, in the conventional voice analysis / synthesis device having means for performing gain adjustment on the output waveform of the voice synthesis filter, the input signal level of the voice synthesis filter is kept constant in consideration of the maximum gain of the synthesis filter known empirically. There is. Therefore, when the gain of the synthesis filter is small, the output waveform level of the speech synthesis filter becomes relatively small, and the dynamic range of the speech synthesis filter and the complement of the S / N ratio, which is the object of the present invention, cannot be sufficiently achieved. There is.

本発明の目的も上述した欠点を除去し、音声合成フィ
ルタのダイナミックレンジ、S/N比の補足を充分に達成
し得る音声分析合成装置を提供することにある。Another object of the present invention is to eliminate the above-mentioned drawbacks and to provide a speech analysis / synthesis apparatus capable of sufficiently achieving the dynamic range and S / N ratio of a speech synthesis filter.

[Means for solving problems]

本発明の装置は、音声合成フィルタの出力波形に対し
て利得調整を施す機能を有し、且つ正規化予測残差電力
を算出する手段を合成側に有して構成される。The apparatus of the present invention has a function of performing gain adjustment on the output waveform of the voice synthesis filter, and is configured to have a means for calculating the normalized prediction residual power on the synthesis side.

〔Example〕

次に図面を参照して本発明を詳細に説明する。 The present invention will now be described in detail with reference to the drawings.

第１図は本発明による音声分析合成装置の一実施例の
構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of an embodiment of a voice analysis / synthesis apparatus according to the present invention.

第１図に示す実施例は分析側１と合成側２とから成
り、さらに分析側１は窓処理器11,LPC分析器12,ピッチ
抽出器13,電力算出器14,K量子化器15,ピッチ量子化器1
6,電力量子化器17,マルチプレクサ18を備えて構成さ
れ、また合成側２は、デマルチプレクサ19,K復号化器2
0,ピッチ復号化器21,電力復号化器22,正規化予測残差電
力算出器23,1/2乗器（１）24,掛算器（１）25,K/α変換
器26,LPC合成フィルタ27,掛算器（２）28,ピッチパルス
発声器29,雑音発声器30,切換器31,1/2乗器（２）32を備
えて構成される。The embodiment shown in FIG. 1 comprises an analysis side 1 and a synthesis side 2, and the analysis side 1 further includes a window processor 11, an LPC analyzer 12, a pitch extractor 13, a power calculator 14, a K quantizer 15, Pitch quantizer 1
6, a power quantizer 17, and a multiplexer 18, and the synthesis side 2 includes a demultiplexer 19 and a K decoder 2
0, pitch decoder 21, power decoder 22, normalized prediction residual power calculator 23, 1/2 multiplier (1) 24, multiplier (1) 25, K / α converter 26, LPC synthesis It comprises a filter 27, a multiplier (2) 28, a pitch pulse speaker 29, a noise speaker 30, a switcher 31, and a 1/2 multiplier (2) 32.

第１図において、入力ライン1001を介して入力した入
力音声信号は窓処理器11によって窓処理を受ける。この
窓処理は次のようにして実施される。In FIG. 1, an input audio signal input via an input line 1001 is subjected to window processing by a window processor 11. This window processing is performed as follows.

窓処理器11は入力音声信号を低域フィルタリングした
のちA/D（Analog to Digital）コンバータによって所定
のビット数での量子化を実施する。The window processor 11 low-pass filters the input audio signal, and then performs quantization with a predetermined number of bits by an A / D (Analog to Digital) converter.

本実施例の場合、低域フィルタリングは遮断周波数3.
4KHzのLPF（Low Pass Filter）を通してこの周波数以上
の周波数成分をカットし、そのあと8MHzのサンプリング
周波数で標本化したものを１サンプル当り12ビットで量
子化したうえ一旦内部メモリにストアする。この内部メ
モリは入力する量子化信号を予め設定する時間分たとえ
ば30mSEC分すなわち240サンプルの窓時間分を記憶しこ
れにハミング関数あるいは矩形関数等の窓関数を乗算し
て切出す窓処理を所定の周期，本実施例では10mSEC周期
で実施しこれが分析フレーム周期となる。In the case of this embodiment, the low-pass filtering has a cutoff frequency of 3.
A frequency component above this frequency is cut through a 4KHz LPF (Low Pass Filter), and then a sample sampled at a sampling frequency of 8MHz is quantized with 12 bits per sample and then temporarily stored in the internal memory. This internal memory stores a preset time portion of the input quantized signal, for example, 30 mSEC portion, that is, a window time portion of 240 samples, and multiplies this by a window function such as a Hamming function or a rectangular function to cut out a predetermined window processing. The cycle is 10 mSEC cycles in this embodiment, and this is the analysis frame cycle.

このようにして窓処理器11から出力される分析フレー
ムごとの量子化音声信号はLPC分析器12,ピッチ抽出器13
と電力算出器14と供給される。In this way, the quantized speech signal for each analysis frame output from the window processor 11 is the LPC analyzer 12 and the pitch extractor 13.
And the power calculator 14 are supplied.

LPC分析器12は入力した分析フレームごとの量子化音
声信号をLPC分析し所定の次数のLPC,本実施例では10次
のＫパラメータ（部分自己相関係数）を抽出しＫ量子化
器15へ出力する。Ｋ量子化器15は供給されたＫパラメー
タを所定のビット数に量子化しマルチプレクサ18へ出力
する。ピッチ抽出器13は公知の方法により入力した分析
フレームごとの量子化音声信号の音声／無声を判定し、
更にピッチ周期を抽出しピッチ量子化器16へ出力する。
ピッチ量子化器16は供給された有声／無声判別結果とピ
ッチ周期情報を一括して所定のビット数に量子化しマル
チプレクサ18へ出力する。電力算出器14は入力した分析
フレームごとの量子化音声信号の電力を算出し電力量子
化器17へ出力する。電力量子化器17は供給された電力情
報を所定のビット数に量子化しマルチプレクサ18へ出力
する。The LPC analyzer 12 performs LPC analysis on the input quantized speech signal for each analysis frame to extract LPC of a predetermined order, in this embodiment, a tenth-order K parameter (partial autocorrelation coefficient) and outputs it to the K quantizer 15. Output. The K quantizer 15 quantizes the supplied K parameter into a predetermined number of bits and outputs it to the multiplexer 18. The pitch extractor 13 determines the voice / unvoice of the quantized voice signal for each analysis frame input by a known method,
Further, the pitch period is extracted and output to the pitch quantizer 16.
The pitch quantizer 16 collectively quantizes the supplied voiced / unvoiced discrimination result and pitch period information into a predetermined number of bits, and outputs it to the multiplexer 18. The power calculator 14 calculates the power of the input quantized speech signal for each analysis frame and outputs it to the power quantizer 17. The power quantizer 17 quantizes the supplied power information into a predetermined number of bits and outputs it to the multiplexer 18.

マルチプレクサ18はこうして供給を受けた入力を所定
の形式で多重化したうえ伝送路101を介して合成側２に
伝送する。The multiplexer 18 multiplexes the input thus supplied in a predetermined format and transmits it to the combining side 2 via the transmission line 101.

合成側２ではデマルチプレクサ19によって入力の多重
化分離を行ない、LPC符号はＫ復号化器20に、ピッチ情
報（有声／無声判別情報を含む）はピッチ復号化器21
に、電力情報は電力復号化器22にそれぞれ供給され復号
化される。On the synthesizing side 2, the demultiplexer 19 demultiplexes the input, the LPC code is sent to the K decoder 20, and the pitch information (including voiced / unvoiced discrimination information) is sent to the pitch decoder 21.
Then, the power information is supplied to the power decoder 22 and decoded respectively.

Ｋ復号化器20は復号化したＫパラメータをK/α変換器
26と正規化予測残差算出器23とへ出力する。正規化予測
残差電力算出器23は下記（１）式により正規化予測残差
電力Ｕを算出する。The K decoder 20 is a K / α converter for the decoded K parameter.
26 and the normalized prediction residual calculator 23. The normalized prediction residual power calculator 23 calculates the normalized prediction residual power U by the following equation (1).

（１）式に於いてK_p（ｐ＝1.2……10）はＫパラメー
タである。正規化予測残差電力算出器23に算出したＵを
1/2乗器（１）24へ出力する。1/2乗器（１）24はテーブ
ルバックアップ法によりを算出し掛算器（１）25へ出力する。 In equation (1), K _p (p = 1.2 ... 10) is a K parameter. U calculated by the normalized prediction residual power calculator 23
Output to 1/2 multiplier (1) 24. 1/2 multiplier (1) 24 is based on the table backup method Is calculated and output to the multiplier (1) 25.

K/α変換器26は公知の手法を用いてＫパラメータから
αパラメータ（α₁,α_２……α₁₀）を算出しLPC合成フ
ィルタ27へ出力する。The K / α converter 26 calculates an α parameter (α ₁ , α ₂ ... α ₁₀ ) from the K parameter using a known method and outputs it to the LPC synthesis filter 27.

1/2乗器（２）32は電力復号化器22より供給される電
力情報Ｐよりテーブルルックアップ法によりを求めた掛算器（２）28へ出力する。The 1/2 multiplier (2) 32 is based on the power information P supplied from the power decoder 22 by the table lookup method. Is output to the multiplier (2) 28 which has obtained.

ピッチ復号化器21は復号したピッチ周期情報をピッチ
パルス発声器29へ、有声／無声判別情報を切換器31へ出
力する。ピッチパルス発声器29はピッチ周期情報と一致
する周期を有するピッチパルス列を発声し切換器31へ出
力する。雑音発声器30は例えば15次のＭ系列等を用いて
白色雑音を発声し切換器31へ出力する。切換器31は有声
／無声情報に基づいて、有声時にはピッチパルス列を無
声時には白色雑音を選択して掛算器（１）25へ供給す
る。掛算器（１）25は前記を用いてピッチパルス列又は白色雑音の振幅を倍しLPC合成フィルタ27へ出力する。LPC合成フィルタ27
はK/α変換器26より供給されるαパラメータをフィルタ
係数とし、掛算器（１）25より供給されるされたピッチパルス列又は白色雑音を入力として音声を
合成（掛算器（２）28へ出力する。The pitch decoder 21 outputs the decoded pitch period information to the pitch pulse voice generator 29 and the voiced / unvoiced discrimination information to the switch 31. Pitch pulse utterer 29 utters a pitch pulse train having a period that matches the pitch period information and outputs it to switch 31. The noise utterer 30 utters white noise using, for example, a 15th-order M sequence and outputs it to the switch 31. Based on the voiced / unvoiced information, the switch 31 selects the pitch pulse train when voiced and the white noise when unvoiced and supplies it to the multiplier (1) 25. Multiplier (1) 25 is the above The pitch pulse train or the amplitude of white noise using The result is multiplied and output to the LPC synthesis filter 27. LPC synthesis filter 27
Is supplied from the multiplier (1) 25 using the α parameter supplied from the K / α converter 26 as a filter coefficient. The pitch pulse train or the white noise thus generated is used as an input to synthesize a voice (output to the multiplier (2) 28).

ところでLPC合成フィルタの利得はおおむねであることが経験的に知られている。これはLPC合成フ
ィルタのインパルス応答波形の電力と入力インパルスの
電力の比が正確にになることからも容易に類推し得る。従って掛算器
（１）25のピッチパルス列又は白色雑音列の電力を一定
とすればLPC合成フィルタ27の出力電力はほぼ一定にな
り、LPC合成フィルタ27の動作範囲の上限近くに算大合
成波形振幅がほぼ保たれる様に設定することは極めて容
易である。By the way, the gain of the LPC synthesis filter is roughly It is empirically known to be. This is because the ratio between the power of the impulse response waveform of the LPC synthesis filter and the power of the input impulse is accurate. It can be easily inferred from Therefore, if the power of the pitch pulse train or the white noise train of the multiplier (1) 25 is made constant, the output power of the LPC synthesis filter 27 becomes almost constant, and the amplitude of the synthesis waveform becomes close to the upper limit of the operating range of the LPC synthesis filter 27. Is extremely easy to set.

LPC合成フィルタ27により合成された音声波形は掛算
器（２）28によりされ、分析側電力と対応付けられた後、出力ライン2001
に送出される。The voice waveform synthesized by the LPC synthesis filter 27 is output by the multiplier (2) 28. Output line 2001 after being associated with analysis side power
Sent to

〔The invention's effect〕

以上説明した如く本発明によれば、音声合成フィルタ
の出力波形に対して利得調整を施す機能と、合成側で正
規化予測残差電力を算出する機能とを有することによ
り、常に音声合成フィルタをダイナミックレンジの上限
近くで使用することが可能となり、合成音質を大幅に改
善した音声分析合成装置を実現できるという効果があ
る。As described above, according to the present invention, by having the function of performing gain adjustment on the output waveform of the speech synthesis filter and the function of calculating the normalized prediction residual power on the synthesis side, the speech synthesis filter is always operated. Since it can be used near the upper limit of the dynamic range, there is an effect that it is possible to realize a voice analysis / synthesis device with significantly improved synthetic sound quality.

[Brief description of drawings]

第１図は本発明による音声分析合成装置の一実施例の構
成を示すブロック図である。１……分析側、２……合成側、11……窓処理器、12……
LPC分析器、13……ピッチ抽出器、14……電力算出器、1
5……Ｋ量子化器、16……ピッチ量子化器、17……電力
量子化器、18……マルチプレクサ、19……デマルチプレ
クサ、20……Ｋ復号化器、21……ピッチ復号化器、22…
…電力復号化器、23……正規化予測電力算出器、24……
1/2乗器（１）、25……掛算器（１）、26……K/α変換
器、27……LPC合成フィルタ、28……掛算器（２）、29
……ピッチパルス発生器、30……雑音発生器、31……切
換器、32……1/2乗器（２）。FIG. 1 is a block diagram showing the configuration of an embodiment of a voice analysis / synthesis apparatus according to the present invention. 1 ... Analysis side, 2 ... Synthesis side, 11 ... Window processor, 12 ...
LPC analyzer, 13 …… Pitch extractor, 14 …… Power calculator, 1
5 ... K quantizer, 16 ... pitch quantizer, 17 ... power quantizer, 18 ... multiplexer, 19 ... demultiplexer, 20 ... K decoder, 21 ... pitch decoder ,twenty two…
… Power decoder, 23 …… Normalized predicted power calculator, 24 ……
1/2 multiplier (1), 25 ... multiplier (1), 26 ... K / α converter, 27 ... LPC synthesis filter, 28 ... multiplier (2), 29
...... Pitch pulse generator, 30 …… Noise generator, 31 …… Switcher, 32 …… 1/2 multiplier (2).

Claims

(57) [Claims]

1. A calculation means provided on a synthesis side for calculating a normalized prediction residual power, and a normalized prediction residual power value calculated by the calculation means for an input waveform of a speech synthesis filter on the synthesis side. It has a first adjusting means for adjusting the gain and a second adjusting means for adjusting the gain of the output waveform of the voice synthesis filter with the power value analyzed on the analysis side and sent to the synthesis side. A linear predictive speech analysis / synthesis device characterized by comprising: