JPS5941597B2

JPS5941597B2 - Speech analysis and synthesis device

Info

Publication number: JPS5941597B2
Application number: JP53001283A
Authority: JP
Inventors: 哲田口
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1978-01-09
Filing date: 1978-01-09
Publication date: 1984-10-08
Also published as: JPS5494210A

Description

【発明の詳細な説明】本発明は線形予測係数を用いた音声分析合成装置に関し
、伝送パラメータの量子化による合成音声の劣化を軽減
し、さらに伝送エラー率の高い伝送路で使用して好適な
合成音声を得るための音声分析合成装置に係るものであ
る。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech analysis and synthesis device using linear prediction coefficients, which reduces deterioration of synthesized speech due to quantization of transmission parameters, and which is suitable for use in transmission paths with high transmission error rates. This invention relates to a speech analysis and synthesis device for obtaining synthesized speech.

一般に、線形方程式の直接的な解として求まる予測係数
または前記予測係数の変形である部分自己相関係数もし
くはそれらを変換した線形予測係数を用いる、いわゆる
線形予測形音声分析合成装置（たとえばＪ（５ＨＮＲ、
ＨＡＳＫＥＷ、Ｊ、Ｍ。In general, a so-called linear predictive speech analysis and synthesis device (for example, J(5HNR) ,
HASKEW, J.M.

ＫＥＬＬＹ、、ＲＯＢＥＲＴＭ、ＫＥＬＬＹ、ＪＲ、、
ＡＮＤＴＨＯＭＡＳＨ、ＭＣＫＩＮＮＥＹ′″Ｒｅｓｕ
ｌｔｓｏｆａＳｔｕｄｙｏｆｔｈｅＬｉｎｅａｒＰｒｅ
ｄｉｃｔｉｏｎＶ０ｃ０ｄｅｒ一部ＥＥＥＴＲＡＮＳＡ
ＣＴＩＯＮＯＮＣＯＭＭＵＮＩＣＡＴＩＯＮＳ、ＶＯＬ
、ＣＯＭ−２１。KELLY, ROBERTM, KELLY, JR.
ANDTHOMASH, MCKINNEY'''Resu
ltsofa Study of the Linear Pre
dictionV0c0derpartEEETRANSA
CTIONONCOMMUNICATIONS, VOL
, COM-21.

Ａ６．９ＳＥＰＴＥＭＢＥＲ、１９７３及び板倉文忠・
斉藤収Ξ「最尤スペクトル推定法をもちいた音声情報圧
縮」日本音響学会誌２７巻９号（工９７１））にぉいて
は分析側から合成側へ被分析波形の情報を伝達するため
に、例えば線形予測係数、ピッチ周波数、短時間平均電
力、短時間平均電力を１に正規化した場合の予測残差電
力である正規化予測残差電力および有声無声判定信号の
５種類の伝送パラメータ、又は前記５種類のパラメータ
の複号パラメータ、例えばピッチ周波数と有声無声判定
信号の複合パラメータ、短時間平均電力と正規化予測残
差電力の複合パラメータ等、もしくは音声波形の一部又
は予測残差波形等が用いられている。一般にこの種の音
声分析合成装置では、伝送路に対する経済性を高めるた
め前記伝送パラメータを合成音声の品質との関係におい
て許容される範囲で極力小数のビットで伝送する必要が
ある。さらに被分析波形の分析周期も極力大きくする必
要がある。線形予測係数を用いた音声分析合成装置に用
いる伝送パラメータのうち正規化予測残差電力又は短時
間平均電力と正規化予測残差電力との複合パラメータは
他の伝送パラメータと比較して時間的な変化率が一般に
著しく大きいことが実験的に知られている。A6.9 SEPTEMBER, 1973 and Fumitada Itakura
In Shu Saito's "Speech Information Compression Using Maximum Likelihood Spectrum Estimation" Journal of the Acoustical Society of Japan, Vol. 27, No. 9 (Eng. 971)), in order to transmit information on the analyzed waveform from the analysis side to the synthesis side, For example, five types of transmission parameters: linear prediction coefficient, pitch frequency, short-time average power, normalized prediction residual power which is prediction residual power when short-time average power is normalized to 1, and voiced/unvoiced determination signal, or A decoding parameter of the five types of parameters, for example, a composite parameter of pitch frequency and voiced/unvoiced determination signal, a composite parameter of short-time average power and normalized predicted residual power, or a part of the speech waveform or predicted residual waveform, etc. is used. Generally, in this type of speech analysis and synthesis apparatus, in order to improve the economical efficiency of the transmission path, it is necessary to transmit the transmission parameters using as few bits as possible within the allowable range in relation to the quality of the synthesized speech. Furthermore, it is necessary to make the analysis period of the waveform to be analyzed as large as possible. Among the transmission parameters used in a speech analysis and synthesis device using linear prediction coefficients, the normalized predicted residual power or the composite parameter of short-time average power and normalized predicted residual power are temporally sensitive compared to other transmission parameters. It is experimentally known that the rate of change is generally extremely large.

従来この種の音声分析合成装置には、スペクトラムパラ
メータである線形予測係数の量子化の精度と、正規化予
測残差電力又は短時間平均電力と正規化予測残差電力と
の複合パラメータの量子化の精度とが合成音声の振巾再
現性に影響を与え、スペクトラムを表現するのには十分
な精度で量子化された線形予測係数を用いても合成され
る音声の振巾再現性がしばしば劣化するという第１の欠
点があつた。Conventionally, this type of speech analysis/synthesis device requires the accuracy of quantization of linear prediction coefficients, which are spectrum parameters, and the quantization of a composite parameter of normalized predicted residual power or short-time average power and normalized predicted residual power. The amplitude reproducibility of synthesized speech often deteriorates even when using linear prediction coefficients quantized with sufficient accuracy to represent the spectrum. There was the first drawback.

又、正規化予測残差電力又は短時間平均電力と正規化予
測残差電力との混合パラメータの時間的な変化率が他の
伝送パラメータと比較して著しく大きく、被分析波形の
分析周期を線形予測係数等の他の伝送パラメータを分析
するのに必要な分析周期よりも短かくする必要があり、
伝送容量の増加の原因となるという第２の欠点があつた
。更に、従来の装置は伝送エラーの影響で、線形予測係
数と正規化予測残差電力との一方又は両方が、又は線形
予測係数及び短時間平均電力と正規化予測残差電力との
複合パラメータとの一方又は両方が変形されると、合成
音声の振巾再現性が劣化し、特に合成音声の振巾が増大
する場合には、合成音声の聴者に異和感を与えるという
第３の欠点を持つていた。前記第１の欠点は従来のこの
種の音声分析合成装置に特有のものであり、合成音声の
音質と伝送容量との関係において最適な選択は可能であ
るが、本質的に軽減することほできない。前記第２の欠
点を緩和するために正規化予測残差電力又は短時間平均
電力と正規化予測残差電力との複合パラメータ以外の伝
送パラメータを求める分析フレーム周期を長く、正規化
予測残差電力又は短時間平均電力と正規化予測残差電力
との複合パラメータを求める分析フレーム周期を短かく
し、分析フレーム周期の長い伝送パラメータを分析フレ
ーム周期の短かい伝送パラメータの分析間隔に対応して
補間すると、線形予測係数と正規化予測残差電力又は短
時間平均電力と正規化予測残差電力との複合バラメータ
との関係がくづれ、合成音の音質の劣化の原因となる。
前記第３の欠点は前記第１の欠点と同様に従来のこの種
の音声分析合成装置に特有のものであり、本質的に改善
することはできない。本発明の目的は伝送パラメータの
伝送間隔と量子化精度とに係る伝送パラメータの量子化
による合成音声の品質の劣化を軽減し、更に伝送路エラ
の影響による合成音声の異状振巾の発生を軽減し得る線
形予測係数を用いた音声分析合成装置を提供することに
ある。In addition, the temporal change rate of the normalized predicted residual power or the mixed parameter of short-time average power and normalized predicted residual power is significantly large compared to other transmission parameters, and the analysis cycle of the analyzed waveform is linear. It needs to be shorter than the analysis period required to analyze other transmission parameters such as prediction coefficients.
The second drawback is that it causes an increase in transmission capacity. Furthermore, due to the influence of transmission errors, conventional devices have been configured such that one or both of the linear prediction coefficient and the normalized prediction residual power, or a composite parameter of the linear prediction coefficient, the short-time average power, and the normalized prediction residual power. If one or both of these are deformed, the amplitude reproducibility of the synthesized speech deteriorates, and especially when the amplitude of the synthesized speech increases, the third drawback is that it gives a sense of strangeness to the listener of the synthesized speech. I had it. The first drawback is unique to conventional speech analysis and synthesis devices of this type, and although it is possible to select the optimal one in terms of the relationship between the quality of the synthesized speech and the transmission capacity, it cannot essentially be alleviated. . In order to alleviate the second drawback, the analysis frame period for determining transmission parameters other than the normalized predicted residual power or the composite parameter of short-time average power and normalized predicted residual power is lengthened, and the normalized predicted residual power is increased. Alternatively, by shortening the analysis frame period for obtaining a composite parameter of short-time average power and normalized predicted residual power, and interpolating the transmission parameter with a long analysis frame period in accordance with the analysis interval of the transmission parameter with a short analysis frame period. , the relationship between the linear prediction coefficient and the composite parameter of the normalized predicted residual power or the short-term average power and the normalized predicted residual power collapses, causing deterioration of the sound quality of the synthesized sound.
The third drawback, like the first drawback, is unique to this type of conventional speech analysis and synthesis apparatus, and cannot be essentially improved. The purpose of the present invention is to reduce the deterioration in the quality of synthesized speech due to quantization of transmission parameters related to the transmission interval and quantization accuracy of transmission parameters, and further reduce the occurrence of abnormal amplitude of synthesized speech due to the influence of transmission path errors. An object of the present invention is to provide a speech analysis and synthesis device using linear prediction coefficients that can be used.

本発明は線形予測型の音声分析合成装置に関するもので
あり、合成側において伝送された線形予測係数から正規
化予測残差電力を求める手段で構成されている。The present invention relates to a linear prediction type speech analysis/synthesis device, which includes means for obtaining normalized prediction residual power from linear prediction coefficients transmitted on the synthesis side.

本発明の特徴は、量子化エラー及び伝送路エラーの影響
を受けた線形予測係数から求めた正規化予測残差電力を
用いて音声を合成することにある。A feature of the present invention is that speech is synthesized using normalized prediction residual power obtained from linear prediction coefficients affected by quantization errors and transmission path errors.

このため線形予測係数の量子化及び伝送路エラーによる
変形が原因となる合成音声の振巾変動原因を補償し得る
という効果がある。次に図面を参照して本発明を詳細に
説明する。Therefore, it is possible to compensate for amplitude fluctuations in synthesized speech caused by deformation due to quantization of linear prediction coefficients and transmission path errors. Next, the present invention will be explained in detail with reference to the drawings.

図は本発明の一実施例を示すプロツク図である。図にお
いて、１０１は分析側を、１０２は線形予測係数伝送路
を、１０３は音源情報伝送路を、１０４は合成側を示す
。音声波形データが波形入力端子、１０５を介して線形
予測分析器、１０６と音源パラメータ分析器、１０７と
に入力される。線形予測分析器１０６は音声波形ゼータ
から線形予測係数を計測し、前記線形予測係数を線形予
測係数符号化器１０８へ供給する。線形予測係数符号化
器１０８は線形予測係数を量子化し線形予測係数伝送路
１０２へ出力する。音源パラメータ分析器１０７は音声
波形データからピツチ周波数、有声無声判別信号と短時
間平均電力から構成される音波情報を計測し音源情報符
号化器１０９へ供給する。音源情報符号化器１０９は音
源情報を量子化し音源情報伝送路１０３へ出力する。線
形予測係数符号器１０８で量子化された線形予測係数は
線形予測係数伝送路１０２を介して、線形予測係数復号
化器１１０に供給される。線形予測係数復号化器１１０
は量子化された線形予測係数伝送路１０２で伝送エラー
を加えられた線形予測係数を復号し、復号された線形予
測係数を復号線形予測係数伝送路１１１と復号線形予測
係数伝送路１１２とへ出力する。正規化予測残差電力計
算器１１３は復号線予測係数伝送路１１１を介して供給
された復号線形予測係数から正規化予測残差電力を、例
えば復号線形予測係数が部分自己相関係数の形式であれ
ば、部分自己相関係数の次数をＰＰとして（−Ｔ（１）
一（ｉ次の部分自己相関係数）２）で求め正規化予測残
差電力伝送路１１４へ出力する。The figure is a block diagram showing one embodiment of the present invention. In the figure, 101 indicates an analysis side, 102 a linear prediction coefficient transmission path, 103 a sound source information transmission path, and 104 a synthesis side. Speech waveform data is input to a linear prediction analyzer, 106 and a sound source parameter analyzer, 107 via a waveform input terminal, 105. A linear prediction analyzer 106 measures linear prediction coefficients from the audio waveform zeta and supplies the linear prediction coefficients to a linear prediction coefficient encoder 108 . The linear prediction coefficient encoder 108 quantizes the linear prediction coefficient and outputs it to the linear prediction coefficient transmission path 102. The sound source parameter analyzer 107 measures sound wave information consisting of pitch frequency, voiced/unvoiced discrimination signal, and short-time average power from the audio waveform data, and supplies it to the sound source information encoder 109. The sound source information encoder 109 quantizes the sound source information and outputs it to the sound source information transmission path 103. The linear prediction coefficients quantized by the linear prediction coefficient encoder 108 are supplied to the linear prediction coefficient decoder 110 via the linear prediction coefficient transmission path 102. Linear prediction coefficient decoder 110
decodes the linear prediction coefficient to which the transmission error has been added in the quantized linear prediction coefficient transmission line 102, and outputs the decoded linear prediction coefficient to the decoded linear prediction coefficient transmission line 111 and the decoded linear prediction coefficient transmission line 112. do. The normalized prediction residual power calculator 113 calculates the normalized prediction residual power from the decoded linear prediction coefficients supplied via the decoded linear prediction coefficient transmission path 111, for example, in the form of a partial autocorrelation coefficient. If so, set the order of the partial autocorrelation coefficient to PP (-T(1)
1 (i-th order partial autocorrelation coefficient) 2) and outputs it to the normalized predicted residual power transmission path 114.

予測残差電力は予測残差信号の電力、即ち予測係数から
成る信号を取り除いた残余の信号（予測残差信号）の電
力である。正規化予測残差電力はこの予測残差信号を人
力信号で除して得られる。この正規化予測残差電力は必
ずしも予測残差信号の電力を直接的に求めなくとも上述
のように線形予測係数（部分自己相関係数）より算出で
きることは良く知られている。例えば、東北大学電気通
信研究所主催第８回シンポジウム論文集１９７１年２月
の板倉文忠著論文”統計的手法による音声の特徴抽出−
ページ−５−１〜−５〜１２における式（至）により示
されている。The prediction residual power is the power of the prediction residual signal, that is, the power of the residual signal (prediction residual signal) after removing the signal consisting of the prediction coefficients. The normalized prediction residual power is obtained by dividing this prediction residual signal by the human input signal. It is well known that the normalized prediction residual power can be calculated from the linear prediction coefficient (partial autocorrelation coefficient) as described above without necessarily calculating the power of the prediction residual signal directly. For example, in the Proceedings of the 8th Symposium sponsored by the Institute of Telecommunications, Tohoku University, published in February 1971, Fumitada Itakura's paper ``Speech feature extraction using statistical methods''
It is shown by the formula (to) on pages -5-1 to -5 to 12.

この式におけるＵｎを部分自己相関係数の次数Ｐまで漸
化的に算出した値が正規化予測残差電力である。即ち、
Ｕ１−ＵＯ（１−Ｋｒ）、Ｕ２−Ｕ１（１−ＫＳ）、・
・・・・・・・・・・・・・・、Ｕｐ−Ｕｐ−１（１−
Ｋ２ｐ）として算出したＵｐが正規化予測残差電力であ
る。ここでＵ。は部分泊己相関係数の次数“０―言い換
えれば予測を行なわない場合の正規化予測残差電力を表
わし、次の理由で１．０となる。即ち、予測を行なわな
いため予測残差信号は入力音声信号と一致する。従つて
予測残差電力と入力音声信号の電力である短時間平均電
力とは一致し、前記２つの電力の比は１．０となる。尚
、Ｕｎは部分自己相関係数の次数ｎの、Ｕｎ−１は次数
ｎ−１の予測残差電力であることは以下により明らかで
ある。The value obtained by recursively calculating Un in this equation up to the order P of the partial autocorrelation coefficient is the normalized predicted residual power. That is,
U1-UO (1-Kr), U2-U1 (1-KS),・
・・・・・・・・・・・・・・・, Up-Up-1 (1-
Up calculated as K2p) is the normalized predicted residual power. U here. represents the order of the partial correlation coefficient “0”, in other words, the normalized prediction residual power when no prediction is performed, and is 1.0 for the following reason. In other words, since no prediction is performed, the prediction residual signal matches the input audio signal. Therefore, the predicted residual power and the short-term average power, which is the power of the input audio signal, match, and the ratio of the two powers is 1.0. Note that Un is the partial self It is clear from the following that Un-1 of the order n of the correlation coefficient is the prediction residual power of the order n-1.

上記文献中の式（２０によればであり、Ｕｎ．−１はε
Ｆｔの２乗値の期待値、即ち信号ε（ｎ−１）の電力で
ある。ε（ｎ−１）は予測誤差（又は予測残差、予測誤
差信号、予測残差信号）と称され、慣用されている。従
つてＵｎ−，、Ｕｎは各々部分自己相関係数の次数。１
、ｎの予測残差電力である。According to the formula (20) in the above document, Un.-1 is ε
This is the expected value of the square value of Ft, that is, the power of the signal ε(n-1). ε(n-1) is called a prediction error (or prediction residual, prediction error signal, prediction residual signal) and is commonly used. Therefore, Un-, and Un are the orders of the partial autocorrelation coefficients. 1
, n is the predicted residual power.

そして初期値Ｕ。＝１．０とおいたときのＵｎが正規化
予測残差電力を示すことは明らかである。正規化予測残
差電力は正規化予測残差電力伝送路１１４を介して励振
信号発生器１１５へ入力される。And the initial value U. It is clear that Un when =1.0 indicates the normalized prediction residual power. The normalized predicted residual power is input to the excitation signal generator 115 via the normalized predicted residual power transmission line 114.

音源情報符号化器１０９で量子化された音源情報は音源
情報伝送路１０３を介して音源情報復号化器１１６に供
給される。音源情報復号化器１１６は量子化された音源
情報を復号し、復号音源情報伝送路１１７を介して、前
記復号された音源情報を励振信号発生器１１５へ供給す
る。励振信号発生器１１５は前記正規化予測残差電力と
、前記復号された音源情報とからフイルタ励振信号を発
生し、フイルタ励振信号伝送路１１８を介して音声合成
フイルタ１１９へ供給する。音声合成フィルタ１１９は
復号線形予測係数伝送路１１２を介して供給される復号
された線形予測係数と前記フイルタ励振信号とから音声
を合成し、波形出力端子１２０へ出力する。なお正規化
予測残差計算器１１３は復号された線形予測係数の形式
が部分自己相関係数以外の場合には線形演算により部分
自己相関係数に変換する等の手段またはこれと等価な手
段により正規化予測残差電力を求め得ることは明らかで
ある。また音源情報伝送路１０３を介して供給される短
時間平均電力等の伝送パラメータも伝送路の影響を受け
るが、短時間平均電力等の時間的変化は正規化予測残差
電力の時間的変化と比較して緩やかであり、受信側にお
いて平滑化しても合成音の品質に与える影響は比較的小
さく、伝送路のエラーを容易に軽減することが可能であ
り、本発明の目的に妨げられることはないなお本発明を
ボイス励振方式の線形予測形音声分析合成装置（例えば
Ｂ．Ｓ．ＡＰＡＬ．Ｍ．Ｒ．ＳＣＨＲＯＥＧＥＲ．Ｖ．
ＳＴＯＶＲＢＥＬＬＴＥＬＥＰＨＯＮＥＬＡＢＯＲＡＴ
ＯＲＩＥＳＭＵＲＲＡＹＨＩＬＬ，．Ｎ．ＪＯ７９７４
゛ＴＶＯｌＣｅＥｘｃｉｔｅｄＰｒｅｄｉｃｔｉｖｅＣ
ＯｒｄｉｎｇＳｙｓｔｅｍｆＯｒＬＯｗＢｉｔＲａｔｅ
ＴｒａｎｓｍｉｓｓｉＯＮＯｆＳｐｅｅｃｈ′５ＩＥＥ
ＥＣＡＴＬ０ＧＮＵＭＢＥＲ７５ＣＨ０９７１２ＣＳＣ
ＢＩＣＣ７５・ＪＵＮＥｌ６〜１８）に適用し得ること
は、本発明が直接には音源情報の伝送法と無関係である
ことが明らかである。The sound source information quantized by the sound source information encoder 109 is supplied to the sound source information decoder 116 via the sound source information transmission line 103. The sound source information decoder 116 decodes the quantized sound source information and supplies the decoded sound source information to the excitation signal generator 115 via the decoded sound source information transmission line 117. The excitation signal generator 115 generates a filter excitation signal from the normalized prediction residual power and the decoded sound source information, and supplies it to the speech synthesis filter 119 via the filter excitation signal transmission path 118. The speech synthesis filter 119 synthesizes speech from the decoded linear prediction coefficients supplied via the decoded linear prediction coefficient transmission path 112 and the filter excitation signal, and outputs the synthesized speech to the waveform output terminal 120. Note that if the format of the decoded linear prediction coefficient is other than a partial autocorrelation coefficient, the normalized prediction residual calculator 113 converts it into a partial autocorrelation coefficient by linear operation, or by an equivalent means. It is clear that the normalized prediction residual power can be determined. In addition, transmission parameters such as short-time average power supplied via the sound source information transmission path 103 are also affected by the transmission path, but temporal changes in short-time average power, etc. are similar to temporal changes in normalized predicted residual power. It is relatively gentle, and even if it is smoothed on the receiving side, the effect on the quality of the synthesized sound is relatively small, and it is possible to easily reduce errors in the transmission path, and the purpose of the present invention is not hindered. However, the present invention is applicable to a voice excitation type linear predictive speech analysis and synthesis device (for example, B.S.APAL.M.R.SCHROEGER.V.
STOVRBELLTELEPHONELABORAT
ORIESMURRAYHILL,. N. JO7974
゛TVOlCeExcitedPredictiveC
OrdingSystemfOrLOwBitRate
TransmissionONOfSpeech'5IEE
ECATL0GNUMBER75CH09712CSC
It is clear that the present invention is applicable to BICC75/JUNEL6-18) and is not directly related to the method of transmitting sound source information.

また予測残差波形励振方式の音声分析合成装置（例えば
ＣＨＯＮＧＫＷＡＮＵＮ．ＡＮＤＤ．ＴＨＯＭＡＳＭＡ
ＧＩＬＬ゛ＴｈｅＲｅｓｉｄｕａｌＥｘｃｉｔｅｄＬｉ
ｎｅａｒＰｒｅｄｉｃｔｉＯｎＶＯｃＯｄｅｒｗｉｔｈ
ＴｒａｎｓｍｉｓｓｉＯｎＲａｔＥＢｅｌＯｗ９．６Ｋ
ｂｉｔｓ／ｓ″ＩＥＥＥＴｒａｎｓａｃｔｉＯｎｓＯｎ
ＣＯｍｍｕｎｉｅａｔｉＯｎｓｌｖＯｌ．ＣＯＭ−２３
、Ａ６．ｌ２、Ｄｅｃｅｍｂｅｒｌ９７５）において、
分析側で予測残差波形を、分析側における線形予測係数
から求まる正規化予測残差電力で除し、予測残差波形の
振巾変動範囲を圧縮した後に、合成側へ予測残差波形を
伝送し、合成側において合成側で線形予測係数から求め
た正規化予測残差電力を前記予測残差波形に乗すること
により、線形予測係数の伝送路エラー等による影響が、
合成音声の振巾再現性を劣化させることを防ぎ得ること
は明らかである。Also, a speech analysis and synthesis device using a predictive residual waveform excitation method (for example, CHONGKWANUN.ANDD.THOMASMA)
GILL゛TheResidualExcitedLi
nearPredictiOnVOcOderwith
TransmissionOnRatEBelOw9.6K
bits/s″IEEETransactiOnsOn
Community OnslvOl. COM-23
, A6. l2, December 975),
The analysis side divides the prediction residual waveform by the normalized prediction residual power found from the linear prediction coefficients on the analysis side, compresses the amplitude fluctuation range of the prediction residual waveform, and then transmits the prediction residual waveform to the synthesis side. However, by multiplying the predicted residual waveform by the normalized prediction residual power obtained from the linear prediction coefficients on the synthesis side, the influence of transmission path errors on the linear prediction coefficients can be reduced.
It is clear that deterioration of the amplitude reproducibility of synthesized speech can be prevented.

[Brief explanation of drawings]

図は本発明の実施例を説明するためのプロツク図である
。１０１・・・・・・分析側構成、１０２・・・・・・線
形予測係数伝送路、１０３・・・・・・音源情報伝送路
、１０４・・・・・・合成側構成、１０５・・・・・・
波形入力端子、１０６・・・・・・線形予測分析器、１
０７・・・・・・音源パラメータ分析器、１０８・・・
・・・線形予測係数符号化器、１０９・・・・・・音源
情報符号化器、１１０．．．・・・線形予測係数復号化
器、１１１・・・・・・復号線形予測係数伝送路、１１
２・・・・・・復号線形予測係数伝送路、１１３・・・
・・・正規化予測残差電力計算器、１１４・・・・・・
正規化予測残差電力伝送路、１１５・・・・・・励振信
号発生器、１１６・・・・・・音源情報復号化器、１１
７・・・・・・復号音源情報伝送路、１１８・・・・・
・フイルタ励振信号伝送路、１１９・・・・・・音声合
成フイルタ、１２０・・・・・・波形出力端子。The figure is a block diagram for explaining an embodiment of the present invention. 101... Analysis side configuration, 102... Linear prediction coefficient transmission line, 103... Sound source information transmission line, 104... Synthesis side configuration, 105...・・・・・・
Waveform input terminal, 106...Linear prediction analyzer, 1
07...Sound source parameter analyzer, 108...
. . . Linear prediction coefficient encoder, 109 . . . Sound source information encoder, 110. ．．．． ...Linear prediction coefficient decoder, 111...Decoding linear prediction coefficient transmission path, 11
2...Decoded linear prediction coefficient transmission path, 113...
...Normalized prediction residual power calculator, 114...
Normalized predicted residual power transmission line, 115... Excitation signal generator, 116... Sound source information decoder, 11
7...Decoded sound source information transmission line, 118...
- Filter excitation signal transmission line, 119...Speech synthesis filter, 120...Waveform output terminal.

Claims

[Claims]

1. Measure parameters indicating the frequency spectrum of the input audio signal, such as linear prediction coefficients, at predetermined time intervals on the analysis side, and synthesize speech by determining the coefficients of the synthesis filter provided on the synthesis side based on the parameters. In the speech analysis and synthesis device, the synthesis side includes means for calculating normalized predicted residual power defined by a signal obtained by dividing the predicted residual power by the input signal power from the parameters, and a means for calculating the normalized predicted residual power defined by the signal obtained by dividing the predicted residual power by the input signal power. A speech analysis and synthesis device characterized in that excitation sound source information of a synthesis filter is used.