JPH01304499A

JPH01304499A - System and device for speech synthesis

Info

Publication number: JPH01304499A
Application number: JP63136969A
Authority: JP
Inventors: Kazunori Ozawa; 一範小澤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1988-06-02
Filing date: 1988-06-02
Publication date: 1989-12-08
Anticipated expiration: 2012-06-04
Also published as: JP2615856B2

Abstract

PURPOSE:To obtain excellent speech quality by storing a sound source signal and a spectrum parameter as to a unit speech, controlling the rhythm of the sound source signal and synthesizing a speech by using the spectrum parameter, and correcting the spectrum of the synthesized speech by a filter. CONSTITUTION:The signal is analyzed in speech units, the sound source signal is stored in a storage part 110, and the spectrum parameter is stored in a storage part 350. The storage part 110 selects a required unit speed according to input control information from a terminal 100 and generates a corresponding predicted residue signal. A control part 150 uses information for pitch variation in control information, varies the pitch of the residue signal in each pitch section in a vowel section according to a pitch start position, and adjusts the section length in pitch units according to the specification of information. A storage part 350 outputs a corresponding LPC parameter ai and a synthesizing film 200 generates a synthesized speech (x) by using the predicted residue signal and parameter (a). A calculation part 300 calculates a parameter bi for distortion correction from the signal (a) and synthesized speech (x) and a filter 250 makes corrections with the signals (x) and bi to output a speech x' of good quality from a terminal 360.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、音源信号とスペクトルパラメータを格納し、
音源信号の韻律（ピッチ、振幅２時間長など）を制御し
、前記音源信号を用いて合成フィルタを駆動して音声を
合成する音声合成方式とその装置に関する。[Detailed Description of the Invention] [Industrial Application Field] The present invention stores sound source signals and spectral parameters,
The present invention relates to a speech synthesis method and apparatus for controlling the prosody (pitch, amplitude, two-time length, etc.) of a sound source signal and driving a synthesis filter using the sound source signal to synthesize speech.

[Conventional technology]

任意語音声合成の方式として、線形予測分析等を用いて
得られた予測残差信号の一部を音源信号として用い、こ
の音源信号により線形予測係数から構成される合成フィ
ルタを駆動して音声を合成する方式が知られている。こ
の方式は例えば、佐藤氏による“’ｃｖｃと音源要素に
もとづ（（ＳＹＭＰＬＥ）音声合成′°（日本音響学会
音声研究会資料　５８３−６９．１９８４年）と題した
論文（文献１）に詳細に記載されている。文献１の方式
においては、無声音区間では元の音声から線形予測分析
して得られた予測残差信号を音源信号として用い、有声
音区間では母音区間の代表的な１ピッチ周期区間から切
り出した予測残差信号分有声音区間の音源として使用し
て、合成フィルタを駆動して音声を合成している。この
方式では、音源として有声音区間ではインパルス列を、
無声音区間では雑音信号を用いる方式と比べて音質が改
善されるとしている。As a method of arbitrary word speech synthesis, a part of the prediction residual signal obtained using linear prediction analysis etc. is used as a sound source signal, and this sound source signal drives a synthesis filter composed of linear prediction coefficients to generate speech. A method of synthesizing is known. This method is described, for example, in a paper (Reference 1) by Mr. Sato entitled "(SYMPLE) Speech synthesis based on cvc and sound source elements" (Acoustical Society of Japan Speech Study Group Materials 583-69, 1984). In the method of Reference 1, a prediction residual signal obtained by linear predictive analysis of the original speech is used as a sound source signal in an unvoiced section, and a typical one of the vowel sections is used in a voiced section. The prediction residual signal cut out from the pitch period section is used as the sound source for the voiced sound section to drive the synthesis filter to synthesize speech.In this method, the impulse train is used as the sound source for the voiced sound section.
It is said that the sound quality in unvoiced sections is improved compared to methods that use noise signals.

[Problem to be solved by the invention]

音声合成、特に任意語合成では、単位音声を接続して音
声を合成するわけであるが、人間が発話の際に行ってい
るような自然な抑揚をつけるために、韻律情報あるいは
韻律規則に従い、音声信号あるいは音源信号のピッチ周
期を変化させる必要がある。しかるに前記文献１の方式
では、有声区間の音源である残差信号のピッチ周期を変
化させたときに、合成フィルタの係数を分析した元の音
声のピッチ周期と合成すべき音声のピッチ周期が異なる
ので、残差信号の変更したピッチと合成フィルタのスペ
クトル包絡とでミスマツチングが発生して合成音声のス
ペクトルが大きく歪むため、合成音声が大きく歪んだり
明瞭度が大幅に低下するという大きな問題点があった。Speech synthesis, especially arbitrary word synthesis, connects unit sounds to synthesize speech, but in order to create natural intonation similar to what humans do when speaking, prosodic information or prosodic rules are used to synthesize speech. It is necessary to change the pitch period of the audio signal or sound source signal. However, in the method of Document 1, when the pitch period of the residual signal, which is the sound source of the voiced section, is changed, the pitch period of the original voice whose synthesis filter coefficients are analyzed differs from the pitch period of the voice to be synthesized. Therefore, mismatching occurs between the changed pitch of the residual signal and the spectral envelope of the synthesis filter, and the spectrum of the synthesized speech is greatly distorted, resulting in major problems such as the synthesized speech being greatly distorted and the clarity significantly reduced. Ta.

また、この問題点は、ピッチ周期の短い女声話者におい
てピッチ周期を大きく変化させたときに時にｍ著であっ
た。Further, this problem was sometimes more severe when the pitch period was greatly changed for a female speaker with a short pitch period.

この問題点については、スペクトル包絡の低域のホルマ
ントのピークを、合成するときのピッチ周波数の位置に
一致させるようにピーク位置をずらすことによりある程
度改善する方法が知られており、具体的には例えば、匂
坂氏らによるピッチ構造を考慮したスペクトル包絡の合
成法“′　（日本音響学会講演論文集　５０１−５０２
頁、１９７９年１０月）と題した論文（文献２）を参照
できる。しかしながら前記文献２の方法では、ホルマン
トのピーク位置を、変更したピッチ周波数の位置にずら
してしまうので、重責的に改善法ではなく、ホルマント
位置の移動によって明瞭性および音質が劣化するという
問題点が新たに発生していた。Regarding this problem, there is a known method to improve it to some extent by shifting the peak position of the formant in the low range of the spectrum envelope so that it matches the position of the pitch frequency when synthesizing. For example, the spectral envelope synthesis method considering the pitch structure by Mr. Osaka et al.
(Ref. 2). However, the method of Document 2 shifts the formant peak position to the position of the changed pitch frequency, so it is not a serious improvement method, but has the problem that clarity and sound quality deteriorate due to the shift of the formant position. A new occurrence had occurred.

さらに、前記文献１の方式では、母音区間では、同一母
音区間の代表的な１ピッチ区間の予測残差信号を基本的
には繰り返して使用しているので、母音区間での残差信
号のスペクトル及び位相の時間的な変化を十分に表すこ
とができず、母音区間で音質が劣化していた。Furthermore, in the method of Document 1, in the vowel section, the predicted residual signal of a typical one pitch section of the same vowel section is basically repeatedly used, so the spectrum of the residual signal in the vowel section is It was not possible to sufficiently express temporal changes in phase, and the sound quality deteriorated in vowel intervals.

本発明の目的は、音源信号のピッチ周期を変化させて合
成フィルタを駆動して音声を合成する際に、従来の問題
点を改善するのみならず、母音区間でも良好な音質の得
られる音声合成方式とその装置を提供することにある。It is an object of the present invention to not only improve the conventional problems when synthesizing speech by changing the pitch period of a sound source signal and driving a synthesis filter, but also to provide speech synthesis that can obtain good sound quality even in vowel intervals. The objective is to provide a method and its equipment.

[Means to solve the problem]

本発明の音声合成方式は、音源信号とスペクトルパラメ
ータを単位音声について格納し、前記音源信号の韻律を
制御しながら前記スペクトルパラメータを用いて音声を
合成し、フィルタにより前記合成音声のスペクトルを補
正することを特徴とする。The speech synthesis method of the present invention stores a sound source signal and spectral parameters for each unit speech, synthesizes speech using the spectral parameters while controlling the prosody of the sound source signal, and corrects the spectrum of the synthesized speech using a filter. It is characterized by

また、本発明の音声合成装置は、単位音声毎に音源信号
を格納する音源信号格納回路と、前記単位音声毎にスペ
クトル特性を表すスペクトルパラメータを格納するスペ
クトルパラメータ格納回路と、前記音源信号の韻律を制
御する韻律制御回路と、前記韻律を制御された音源信号
と前記スペクトルパラメータを用いて音声を合成する合
成回路と、前記スベク）〜ルパラメータと前記合成音声
から求めたスペクトルパラメータとを用い前記合成音声
のスペクトルを補正するフィルタ回路とを有することを
特徴とする。Further, the speech synthesis device of the present invention includes a sound source signal storage circuit that stores a sound source signal for each unit sound, a spectral parameter storage circuit that stores a spectral parameter representing a spectral characteristic for each unit sound, and a prosody of the sound source signal. a prosody control circuit for controlling the prosody control circuit; a synthesis circuit for synthesizing speech using the sound source signal whose prosody has been controlled and the spectral parameters; The present invention is characterized in that it includes a filter circuit that corrects the spectrum of synthesized speech.

[For cattle]

本発明は、音源信号を単位音声区間では、有音声、無音
声を問わず全区間に対して有するとともに、音源信号の
ピッチを変化させて音声を合成したときに、スペクトル
の歪を補正するための補正用フィルタを用いることを特
徴とする。The present invention has a sound source signal for the entire unit voice section, regardless of voice presence or no voice, and also for correcting spectral distortion when voice is synthesized by changing the pitch of the sound source signal. It is characterized by using a correction filter.

第２図は本発明の作用を示すブロック図で、音源信号記
憶部１１０は単位音声（例えばＣＶ、ＶＣなど）毎に音
声信号を分析して音源信号を求め、この音源信号を単位
音声毎に記憶しておく。また分析して求めたスペクトル
パラメータ（次数Ｍ＋）をスペクトルパラメータ記憶部
３５０に格納しておく。ここでは、分析法としては周知
の線形予測分析を用い、音源信号として線形予測分析し
て得られた予測残差信号を用いるものとして説明を進め
る。ただし、スペクトルパラメータ、音源信号としては
、周知の他の良好なものを用いることもできる。また、
予測残差信号の母音区間では、各ピッチ毎の開始位置も
格納しておく。スペクトルパラメータとしては線形予測
パラメータとして種種のものが考えられるが、ここては
ＬＰＣパラメータを用いることにする。これ以外にもＬ
ＳＰ。FIG. 2 is a block diagram showing the operation of the present invention, in which the sound source signal storage unit 110 analyzes the sound signal for each unit sound (for example, CV, VC, etc.) to obtain a sound source signal, and stores this sound source signal for each unit sound. Remember it. In addition, the spectral parameters (order M+) obtained through analysis are stored in the spectral parameter storage section 350. Here, the explanation will proceed assuming that a well-known linear predictive analysis is used as the analysis method, and a prediction residual signal obtained by the linear predictive analysis is used as the sound source signal. However, as the spectrum parameters and sound source signals, other well-known and good ones can also be used. Also,
In the vowel section of the prediction residual signal, the start position for each pitch is also stored. Various types of linear prediction parameters can be considered as spectral parameters, but here LPC parameters will be used. Besides this, L
SP.

Ｐ　Ａ　ＲＣＯＲ、ホルマントなど、他の周知のパラメ
ータを用いることができる。分析は、あらかじめ定めら
れた固定長フレーム（５あるいは１０ｍ５）でもよいし
、母音区間ではピンチ周期に同期したピッチ同期分析を
用いることもできる。Other well-known parameters can be used, such as P A RCOR, formant, etc. The analysis may be performed using a predetermined fixed length frame (5 or 10 m5), or a pitch synchronized analysis synchronized with the pinch period may be used in the vowel section.

また、音源信号記憶部１１．０は端子１００から入力し
た制御情報にもとづき、必要な単位音声を選択してこれ
に対応する予測残差信号を出力する。Furthermore, based on the control information input from the terminal 100, the sound source signal storage section 11.0 selects a necessary unit sound and outputs a prediction residual signal corresponding thereto.

ピッチ制御部１５０では前記制御情報のうちピッチを変
化させるための情報を用いて、母音区間では前記ピッチ
の開始位置にもとづいてピッチ区間毎に残差信号のピッ
チの伸縮を行う。具体的な方法については、前記文献１
に記載されているように、ピッチ周期を長くするときは
ピッチ区間の後ろに零を詰め、ピッチ周期を短くすると
きはピッチ区間の後ろからサンプルを切り詰める。また
母音区間の時間長は前記制御情報により指定された時間
長を用いてピッチ単位で調整する。The pitch control unit 150 uses information for changing the pitch among the control information to expand or contract the pitch of the residual signal for each pitch section based on the pitch start position in the vowel section. For the specific method, see the above document 1.
As described in , to lengthen the pitch period, fill in zeros at the end of the pitch section, and to shorten the pitch period, truncate the samples from the end of the pitch section. Further, the time length of the vowel section is adjusted on a pitch-by-pitch basis using the time length specified by the control information.

スペクトルパラメータ記憶部３５０は、あらかじめ線形
予測分析により求めたＬＰＧパラメータを各単位音声に
ついて記憶しておく。そして前記制御情報に従い、単位
音声と選択しこれに対応するＬＰＣパラメータａｌ　　
（次数Ｍｌ）を出力する。The spectral parameter storage unit 350 stores LPG parameters obtained in advance by linear predictive analysis for each unit voice. Then, according to the control information, select the unit voice and set the corresponding LPC parameter al.
(order Ml) is output.

合成フィルタ２００は下式の伝達特性を持ち５（ｚ）＝
　　　　　　　　−−・・・・・・（１〉１−ΣａｌＺ
−’ ピッチを変化させた予測残差信号とＬ　Ｐ　Ｇパラメー
タとを用いて合成した合成音声ｘ　（ｎ）を出力する。The synthesis filter 200 has the following transfer characteristic and 5(z)=
−−・・・・・・(1〉1−ΣalZ
-' Output synthesized speech x (n) synthesized using the pitch-changed prediction residual signal and LPG parameters.

補正用スペクトルパラメータ記憶部３００はｌ−５ＰＣ
パラメータａｌ　と合成音声ｘ　＜ｎ）を用いて、ピッ
チを変化させたときに合成音声に発生するスペク）・ル
歪を補正するための補正用スペクトルパラメータｂ１を
計算する９具体的には以下のように行う。The correction spectral parameter storage unit 300 is l-5PC.
Calculate the correction spectral parameter b1 to correct the spectral distortion that occurs in the synthesized speech when the pitch is changed using the parameter al and the synthesized speech Do it like this.

まず、ＬＰＧパラメータｃｌ　＋　３用いて以下のパワ
スペクトルＨ”（ｚ）を計算する。First, the following power spectrum H''(z) is calculated using the LPG parameter cl + 3.

Ｈ２（ｚ　）　＝　−−− ・・・・・・（２）次に、合成音声ｘ（ｎ）の有声音区間についてあらかじ
め定められた区間長毎に、あるいはピッチ周期にＬＰＧ
分析を行い、スペクトルパラメータａ１′　（次数：Ｎ
１２）を計算し、これを用いて以下のパワスペクトルＦ
２　（ｚ）を計算する。H2(z) = −−− ・・・・・・(2) Next, the LPG is
Analysis is performed and the spectrum parameter a1' (order: N
12) and use this to calculate the following power spectrum F
2 Calculate (z).

Ｆ２　（ｚ）＝−−二一一一一□ ・・・・・・（３）次に、（１）式と（２）式の比を以下のように求める。F2 (z)=--21111□ ・・・・・・(3) Next, the ratio between equations (1) and (2) is determined as follows.

そして、＜３）式を逆フーリエ変換して自己相関関数Ｒ
（ｍ）を求め、Ｒ（ｍ）がらＬＰＣ分析により補正用ス
ペクトルパラメータｂ、（次数３〉を計算する。なお、
（１）　、　（２）式はフーリエ変換を用いて計算する
ことができる。Then, the autocorrelation function R is obtained by inverse Fourier transform of the formula <3)
(m), and calculate the correction spectral parameter b, (order 3) from R(m) by LPC analysis.
Equations (1) and (2) can be calculated using Fourier transform.

補正用フィルタ２５０は以下の伝達特性Ｑ（ｚ）を持ち
、］Ｑ（Ｚ）　　　−−□　　　　　　　　　・・・　・・
べ５　〉合成音声ｘ　（ｎ＞を入力し、補正用スペクト
ルパラメータｂＩを用いて、スペクトル歪を補正した合
成音声ｘ’（ｎ、）を端子３６０へ出力する。The correction filter 250 has the following transfer characteristic Q(z), ] Q(Z) −−□ ・・・・・・
5 > Synthesized speech x (n>) is input, and the synthesized speech x'(n,) whose spectral distortion is corrected using the correction spectral parameter bI is output to the terminal 360.

〔Example〕

次に本発明について第１図を参照して詳細に説明する。 Next, the present invention will be explained in detail with reference to FIG.

第１図は本発明の一実施例の構成を示すブロック図であ
る。制御回路５］０は端子５００から韻律制御（ピッチ
、時間長、振幅）情報、単位音声の接続情報を入力し、
音源記憶回路５５０．スペクトルパラメータ記憶回路５
８０．ピッチ制御回路５６０．振幅制御回路５７０へ出
力する。音源記憶回路５５０は単位音声の接続情報を入
力し、その単位音声に対応する予測残差信号を出力する
。FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention. The control circuit 5]0 inputs prosody control (pitch, duration, amplitude) information and unit voice connection information from the terminal 500,
Sound source storage circuit 550. Spectral parameter storage circuit 5
80. Pitch control circuit 560. Output to amplitude control circuit 570. The sound source storage circuit 550 receives connection information of a unit sound and outputs a prediction residual signal corresponding to the unit sound.

ピッチ制御回路５６０はピッチ制御情報を入力し、母音
区間においてあらかじめ指定されているピッチ分割位置
を用いて予測残差信号のピッチの変更を行う。ピッチを
変更するための具体的な方法は、前記作用の項で説明し
た方法や他の周知の方法を用いることができる。The pitch control circuit 560 receives pitch control information and changes the pitch of the prediction residual signal using pitch division positions specified in advance in the vowel section. As a specific method for changing the pitch, the method explained in the section of the above-mentioned operation or other known methods can be used.

次に、振幅制御回路５７０は振幅制御情報を入力し、そ
れに従って予測残差信号の振幅を制御して予測残差信号
ｅ　（ｎ）を出力する。スペクトルパラメータ記憶回路
５８０は単位音声の接続情報を入力し、その単位音声に
対応するスペクトルパラメータ系列を出力する。ここで
は、前記作用の項と同様にスペクＩ〜ルパラメータとし
て、ＬＰＧ係数ａＩを用いることにす為が、他の周知の
パラメータを用いることができる。Next, the amplitude control circuit 570 inputs the amplitude control information, controls the amplitude of the prediction residual signal according to the input, and outputs the prediction residual signal e (n). The spectral parameter storage circuit 580 receives connection information of a unit voice and outputs a spectral parameter series corresponding to the unit voice. Here, the LPG coefficient aI is used as the spectrum parameter as in the effect section, but other known parameters may be used.

合成フィルタ回路６００は（１）式の特性を有しており
、ピッチを変更した予測残差信号を入力してＬＰＧ係数
ａ、を用いて次式に従い合成音声ｘ（ｎ）を計算する。The synthesis filter circuit 600 has a characteristic expressed by equation (1), and receives the pitch-changed prediction residual signal and calculates synthesized speech x(n) using the LPG coefficient a according to the following equation.

・・・・・・（６）振幅制御回路７１０は合成音声ｘ（ｎ）にゲインＧをか
けて出力する。ゲインＧはゲイン計算回路７００から入
力する。なお、ゲイン計算回路７００の動作は後述する
。(6) The amplitude control circuit 710 multiplies the synthesized speech x(n) by a gain G and outputs the result. The gain G is input from the gain calculation circuit 700. Note that the operation of the gain calculation circuit 700 will be described later.

ＦＦＴ計算回路６１０はＬ　Ｐ　Ｃ係数ａ、を入力し、
あらかじめ定められた点数（例えば２５６点）のＦＦＴ
（高速フーリエ変換）を行い、前記（２）式で定義した
パワスペクトルＨ２（ｚ）を計算して出力する。なお、
ＦＦＴの計算法は、例えばＯｐｐｅｎｈｅｉｍ氏らによ
る’Ｄｉｇｉｔａｌ　Ｓｉｇｎａｌ　Ｐｒｏｃｅｓｓｉ
ｎｇ　”　　（Ｐｒｅｎｔｉｃｅ−Ｈａｌｌ、１９７５
年〉と題した単行本の第６章（文献３）に記載されてい
るのでここでは説明を省略する。The FFT calculation circuit 610 inputs the LPC coefficient a,
FFT with a predetermined number of points (for example, 256 points)
(fast Fourier transform) is performed to calculate and output the power spectrum H2(z) defined by equation (2) above. In addition,
The FFT calculation method is described, for example, in 'Digital Signal Processing' by Oppenheim et al.
ng” (Prentice-Hall, 1975
The explanation is omitted here because it is described in Chapter 6 (Reference 3) of the book titled 2007.

ＬＰＧ分析回路６４０はピッチ周期を変更して得な合成
音声ｘ（ｎ）の母音区間においてＬＰＣ分析を行い、Ｌ
ＰＣ係数ａｌ’を計算する。このとき、作用の項で述べ
たように、ＬＰＣ分析をピッチ同期で行ってもよいし、
固定長フレーム区間毎に行ってもよい。ＦＦＴ計算回路
６３０は係数ａ＋’を入力し、前記（３）式で定めたパ
ワーペクトルＦ２　（ｚ）を計算し出力する。The LPG analysis circuit 640 performs LPC analysis on the vowel section of the synthesized speech x(n), which is advantageous by changing the pitch period.
Calculate the PC coefficient al'. At this time, as mentioned in the section on effects, LPC analysis may be performed in pitch synchronization, or
It may be performed for each fixed length frame section. The FFT calculation circuit 630 inputs the coefficient a+', calculates and outputs the power spectrum F2 (z) determined by the above equation (3).

補正用スペクトルパラメータ計算回路６２０はパワース
ペクトルＨ２（ｚ）、Ｆ２　（ｚ）を用いて、（４）式
に従い比Ｇ２　（ｚ）を計算する。さらに、これを逆Ｆ
　Ｆ　Ｔ　して自己相関関数Ｒ（ｍ）を求め、ＬＰＧ分
析してＬＰＧ係数ｂ１を求める。The correction spectral parameter calculation circuit 620 uses the power spectra H2(z) and F2(z) to calculate the ratio G2(z) according to equation (4). Furthermore, this is inverted F
F T to obtain the autocorrelation function R(m), and LPG analysis to obtain the LPG coefficient b1.

補正用フィルタ６５０は係数ｂ＋を用い振幅制御回路７
１０の出力を入力して、スペクトル歪を補正した合成音
声ｘ’　　（ｎ）を下式に従い計算する。The correction filter 650 uses the coefficient b+ and the amplitude control circuit 7
By inputting the output of 10, the synthesized speech x' (n) with the spectral distortion corrected is calculated according to the following formula.

ｘ’　　（ｎ）＝Ｇ−ｘ（ｎ）（７）式でＧ−ｘ（ｎ）は補正用フィルタ６５０の入力
信号を示す。x′ (n)=G−x(n) In the equation (7), G−x(n) represents the input signal of the correction filter 650.

ゲイン計算回路７００はピッチを変化させた区間で、合
成音声ｘ　（ｎ）とｘ’　　（ｎ）のピッチ毎の平均電
力を等しくするためのゲインＧを計算する。これは、補
正用フィルタ６５０のゲインが１ではないからである。The gain calculation circuit 700 calculates a gain G to equalize the average power for each pitch of the synthesized speech x (n) and x' (n) in the interval where the pitch is changed. This is because the gain of the correction filter 650 is not 1.

具体的には、ピッチを変化させた区間で、ピッチ毎に合
成音声ｘ（ｎ）とｘ’　　（ｎ＞の平均電力を下式に従
い計算する。Specifically, in the interval where the pitch is changed, the average power of the synthesized speech x(n) and x'(n>) is calculated for each pitch according to the following formula.

Ｐ２−１／′Ｎ・Σｘ　′”　　（ｎ　）　　　　−−
−−（８ｂ）ｎ＋１ここでＮはピッチ区間のサンプル数を示す。そしてゲイ
ンＧを下式から求める。P2-1/'N・Σx'" (n) --
--(8b)n+1 Here, N indicates the number of samples in the pitch section. Then, obtain the gain G from the following formula.

Ｇ＝　　５肩−石Ｔ７　　　　　　　　　・・・・・・
（９）このゲインＧがかけられた最終的な合成音声信号
ｘ′　（ｎ）は端子６６０を通して出力される。G = 5 shoulders - stone T7 ・・・・・・
(9) The final synthesized speech signal x' (n) multiplied by this gain G is outputted through the terminal 660.

上記実施例は、あくまでも本発明の一構成例にすぎず、
種種の変形も可能である。The above embodiment is only one configuration example of the present invention,
Various variations are also possible.

すなわち、本実施例では単位音声の全区間について、音
源信号として線形予測分析して得られた予測残差信号を
用いたが、演算量、メモリ量の低減のために、有声区間
、特に母音区間では代表的な１ピッチ区間の予測残差信
号分用いて、これの振福、ピッチ３制御しながら繰り返
して用いてもよい。That is, in this example, the prediction residual signal obtained by linear predictive analysis was used as the sound source signal for the entire section of the unit speech, but in order to reduce the amount of calculation and memory, the voiced section, especially the vowel section Then, the prediction residual signal of a representative one pitch section may be used repeatedly while controlling the vibration and pitch 3.

また、音源信号としては、線形予測分析して得られる予
測残差信号のみならず、他の良好な音源信号、例えば零
位相化信号９位相等化信号、マルチパルス音源などを用
いることができる。Furthermore, as the sound source signal, not only the prediction residual signal obtained by linear prediction analysis but also other good sound source signals such as a zero-phase signal, a 9-phase equalized signal, a multi-pulse sound source, etc. can be used.

また、スペクトルパラメータとしては、ＬＰＣ以外に他
の良好なスペクトルパラメータ、例えばＬ　Ｓ　Ｐ　、
ホルマント、ケプストラムなどを用いることができる。In addition to LPC, other good spectral parameters such as L S P ,
Formant, cepstrum, etc. can be used.

また、補正用フィルタのスペクトルパラメータもＬＰＣ
以外に、他の良好なパラメータ、例えばＬ　Ｓ　ｒ−’
　、ポルマント、ケプストラムなどを用いることができ
る。In addition, the spectral parameters of the correction filter are also LPC
Besides, other good parameters, such as L S r-'
, pormant, cepstrum, etc. can be used.

また、補正用フィルタの構成としては、（５）式で示し
たような全横形フィルタを用いたが、極−雲形フィルタ
やＦＩＲフィルタを用いる構成としてもよい。ただしこ
のようにすると演算量がかなり増大する。Further, as the configuration of the correction filter, a fully horizontal filter as shown in equation (5) is used, but a configuration using a polar-cloud filter or an FIR filter may be used. However, if this is done, the amount of calculation will increase considerably.

また、演算量低減化のために、振幅制御回路７１０、ゲ
イン計算回路７００を省略することもできる。ただしこ
のようにすると合成音声ｘ’　　（ｎ）のレベルが多少
変化するおそれがある。Further, in order to reduce the amount of calculation, the amplitude control circuit 710 and the gain calculation circuit 700 can be omitted. However, if this is done, there is a possibility that the level of the synthesized speech x' (n) will change somewhat.

また、振幅制御回路５７０は残差信号のパワを制御する
のではなく、ゲイン計算回路７００．振幅制御回路７１
０と同一の構成とし、合成音声ｘ　（ｎ）のパワを制御
するようにしてもよい。ただしこのときは、制御回路５
１０から入力する制御信号は残差信号のピッチ毎の単位
パワではなく、合成音声のピッチ毎の単位パワとする必
要がある。Further, the amplitude control circuit 570 does not control the power of the residual signal, but the gain calculation circuit 700. Amplitude control circuit 71
The configuration may be the same as that of 0, and the power of the synthesized speech x (n) may be controlled. However, in this case, the control circuit 5
The control signal input from 10 needs to be a unit power for each pitch of the synthesized speech, not a unit power for each pitch of the residual signal.

また、本実施例では韻律制御情報を端子５００を通して
入力する構成としたが、韻律制御に関しては、アクセン
ト情報、イントネーション情報を入力して、規則により
韻律制御情報を発生するようにしてもよい。Further, in this embodiment, the prosody control information is input through the terminal 500, but for prosody control, accent information and intonation information may be input and the prosody control information may be generated according to rules.

また、演算量低減のために、補正用ライルタの計算はピ
ッチ制御回路５６０においてピッチの変（ヒが大きいと
きにのみ計算するような構成としてもよい。Further, in order to reduce the amount of calculation, the correction Lylter calculation may be performed in the pitch control circuit 560 only when the pitch change (H) is large.

〔Effect of the invention〕

以上説明したように本発明によれば、単位音声のすべて
の区間について音源信号とスペクトルパラメータを有し
ており、これらを用いて音声を合成しているので、子音
区間のみならず、従来音質が劣化していた母音区間でも
良好な音質の合成音を得ることができるという大きな効
果が得られる。As explained above, according to the present invention, the sound source signal and spectral parameters are provided for all sections of unit speech, and these are used to synthesize speech, so that not only the consonant section but also the conventional sound quality A great effect can be obtained in that a synthesized sound with good sound quality can be obtained even in vowel sections that have deteriorated.

また、本発明によれば、音源（Ｂ号のピッチ周期をあら
かじめ分析して格納しておいな音源信号のピッチ周期に
比べ大きく変化させて合成しても、それにより発生する
スペクトル歪を補正することが可能であるので、音質劣
化のほとんどない音声を合成することができるという効
果が得られる。またこの効果は、ピッチ周期の短い女性
話者について特に顕著である。Furthermore, according to the present invention, even if the pitch period of the sound source (No. B is analyzed and stored in advance) and synthesized by changing it greatly compared to the pitch period of the sound source signal, the resulting spectral distortion can be corrected. Therefore, it is possible to synthesize speech with almost no deterioration in sound quality.This effect is particularly noticeable for female speakers with short pitch periods.

[Brief explanation of the drawing]

第１図は本発明の一実施例の構成を示すブロック図、第
２図は本発明の作用を示すブロック図である。１１０・・・音源信号記憶部、１５０・・・ピッチ制御
部、２００．６００・・・合成フィルタ、２５０，６５
０・・・補正用フィルタ、３００・・・補正用スペク１
ヘルパラメータ計算部、３５０・・・スペクＩ・ルパラ
メータ記憶部、５１０・・・制御回路、５５０・・・音
源記憶回路、５６０・・・ピッチ制御回路、５７０゜７
１０・・・振幅制御回路、５８０・・・スペクトルパラ
メータ記憶回路、６１０，６３０・・・ＦＦＴ計算回路
、６２０・・・補正用スペクトルパラメータ計算回路、
６４０・・・ＬＰＣ分析回路、７００・・・ゲイン計算
回路。FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention, and FIG. 2 is a block diagram showing the operation of the present invention. 110... Sound source signal storage section, 150... Pitch control section, 200.600... Synthesis filter, 250, 65
0...Filter for correction, 300...Spec for correction 1
Hell parameter calculation section, 350... Spec I/Le parameter storage section, 510... Control circuit, 550... Sound source storage circuit, 560... Pitch control circuit, 570°7
DESCRIPTION OF SYMBOLS 10... Amplitude control circuit, 580... Spectrum parameter storage circuit, 610, 630... FFT calculation circuit, 620... Spectral parameter calculation circuit for correction,
640...LPC analysis circuit, 700...gain calculation circuit.

Claims

[Claims]

(1) A sound source signal and a spectral parameter are stored for each unit speech, a speech is synthesized using the spectral parameter while controlling the prosody of the sound source signal, and the spectrum of the synthesized speech is corrected by a filter. Speech synthesis method.

(2) a sound source signal storage circuit that stores a sound source signal for each unit sound; a spectral parameter storage circuit that stores spectral parameters representing spectral characteristics for each unit sound; and a prosody control circuit that controls the prosody of the sound source signal; , a synthesis circuit that synthesizes speech using the prosody-controlled sound source signal and the spectral parameters, and a filter circuit that corrects the spectrum of the synthesized speech using the spectral parameters and spectral parameters obtained from the synthesized speech. A speech synthesis device characterized by having: