JPH0594199A

JPH0594199A - Residual driving type speech synthesizing device

Info

Publication number: JPH0594199A
Application number: JP3253863A
Authority: JP
Inventors: Toru Kitamura; 徹北村; Mitsuo Fujimoto; 光男藤本
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1991-10-01
Filing date: 1991-10-01
Publication date: 1993-04-16
Anticipated expiration: 2015-08-28
Also published as: JP3081300B2

Abstract

PURPOSE:To obtain a composite voice of a more natural and high quality by constituting the device so that even if a residual waveform of any pitch period of a high-pitched sound, a middle-pitched sound and a low-pitched sound is used to a voice element piece selected in order to synthesize a voice, waveform data of all residual signals that can be selected is extracted by a reverse filter in which 2 voice parameter itself provided to actual synthesis is a coefficient, and a sound source and the parameter are extracted from the same voice in spite of a change of the pitch. CONSTITUTION:A voice parameter of a voice element piece unit of a voice element piece memory is used as a coefficient of a voice analytic filter consisting of a reverse filter of a voice synthesizing filter, waveform data of residual signals of a voice of three kinds of different pitch periods obtained by inputting a voice of pitch periods of a high-pitched sound, a middle-pitched sound and a low-pitched sound to this analytic filter, respectively are extracted, and these waveform data are accumulated in a residual waveform memory.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、任意の言葉を発声する
ことが可能な規則音声合成装置、特に残差駆動を行う残
差駆動型規則音声合成装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a regular speech synthesizer capable of uttering arbitrary words, and more particularly to a residual drive type regular speech synthesizer for performing residual drive.

【０００２】[0002]

【従来の技術】近年、任意の文章から音声を合成するた
めの規則合成手法の研究が盛んであり、現在では、新聞
の校閲装置や盲人用読書機などに試作、実用化されてい
るものがある。2. Description of the Related Art In recent years, research on a rule synthesizing method for synthesizing a voice from an arbitrary sentence has been actively conducted, and at present, a trial proofing device for a newspaper, a reading machine for blind persons, etc. have been put into practical use. is there.

【０００３】任意の文章から音声を合成するための規則
合成装置は、例えば、テキスト入力に対し、文章解析を
行って読みがなやアクセントを決定し、音韻規則から、
必要な合成単位である音声素片（例えばＣＶＣ単位）を
決定して結合し、韻律規則から、声の高さなどを決定し
て、音声パラメータの時系列とピッチパターンを生成
し、これらのパラメータから音源とディジタルフィルタ
を構成することにより、合成音声を生成する。A rule synthesizing device for synthesizing a voice from an arbitrary sentence, for example, performs a sentence analysis on a text input to determine a reading or an accent, and from a phonological rule,
A speech unit (for example, a CVC unit) that is a necessary synthesis unit is determined and combined, a voice pitch and the like are determined from a prosody rule, and a time series of voice parameters and a pitch pattern are generated. Synthesizing speech is generated by constructing a sound source and a digital filter from.

【０００４】さて、このような音声合成手法に用いる音
声パラメータとしては、ＬＰＣ、ＬＳＰなどの線形予測
系のパラメータやフォルマントなどが一般的であり、一
方、音源としては、メモリの削減と処理の簡単化のた
め、インパルスと白色雑音が用いられていた。The speech parameters used in such a speech synthesis method are generally linear predictive parameters such as LPC and LSP, formants, and the like. On the other hand, as a sound source, memory reduction and simple processing are performed. Impulse and white noise were used for this purpose.

【０００５】而して、ＬＰＣ、ＬＳＰなどの線形予測系
の音声合成手法では、予測残差を駆動音源として用いる
ことにより、原音声に近い合成音声を得られることが知
られている。It is known that in a linear predictive speech synthesis method such as LPC or LSP, synthetic speech close to the original speech can be obtained by using the prediction residual as a driving sound source.

【０００６】従って、文字等の入力により任意の音声を
発声可能な規則合成についても、駆動音源として残差を
入力することにより、上記の原理から、高品質な合成音
を得られることが期待され、このような残差駆動型の規
則音声合成装置が提案されており（特願平２−２４９４
９３号）、これについて以下に概説する。Therefore, even in the case of rule synthesis capable of uttering an arbitrary voice by inputting characters or the like, it is expected that a high-quality synthesized voice can be obtained from the above principle by inputting the residual as the driving sound source. Such a residual-driven type regular speech synthesizer has been proposed (Japanese Patent Application No. 2-2494).
No. 93), which will be outlined below.

【０００７】図１は、既提案の残差駆動型の規則音声合
成装置の構成をしたものであり、この装置によると、発
声すべき文字列が文字列バッファ（１）に入力される
と、音韻記号列生成部（２）は入力された文字列を音韻
記号列に変換する。例えば、「た＊べにき＊た」（但
し、＊はアクセント位置を示す記号）という文字列が入
力されると「ｔａｂｅｎｉｋｉｔａ」という音韻記号列
に変換する。FIG. 1 shows a configuration of a previously proposed residual-driven type regular speech synthesizer. According to this apparatus, when a character string to be uttered is input to a character string buffer (1), The phoneme symbol string generation unit (2) converts the input character string into a phoneme symbol string. For example, when a character string "ta * beniki * ta" (where * is a symbol indicating an accent position) is input, it is converted into a phoneme symbol string "tabenikita".

【０００８】次に、選択回路１（３）は、音韻記号列か
ら必要な音声素片を順次、決定選択し、音声素片メモリ
（４）に蓄えられた音声素片のうち必要な音声素片が、
音声素片接続部（５）で接続される。音声素片の単位と
しては、ＣＶＣ（子音＋母音＋子音）、あるいは、ＣＶ
（子音＋母音）とＶＣ（母音＋子音）を併用するものな
ども用いられるが、例として簡単のため、ＣＶ（子音＋
母音）すなわち音節を単位とするものを考えると、「ｔ
ａ、ｂｅ、ｎｉ、ｋｉ、ｔａ」が必要な音声素片として
順次選択され接続される。ここで接続された音声素片
は、音声パラメータとして音声パラメータバッファ
（６）に蓄えられ、係数として合成フィルタ（１１）に
与えられる。Next, the selection circuit 1 (3) sequentially determines and selects necessary speech units from the phoneme symbol string, and selects the necessary speech units among the speech units stored in the speech unit memory (4). One piece
It is connected by the voice unit connecting part (5). The unit of the speech unit is CVC (consonant + vowel + consonant) or CV
A combination of (consonant + vowel) and VC (vowel + consonant) is also used, but for the sake of simplicity, CV (consonant +
Considering vowels, that is, syllables as a unit, "t
"a, be, ni, ki, ta" are sequentially selected and connected as necessary speech units. The speech unit connected here is stored in the speech parameter buffer (6) as a speech parameter and given to the synthesis filter (11) as a coefficient.

【０００９】一方、アクセント位置等のイントネーショ
ンを表す記号も、発声すべき文字列とともに入力され、
文字列バッファ（１）から、ピッチパターン生成部
（７）に与えられると、ピッチパターン生成部（７）
は、発声文全体のピッチ（音程）を決定する。例えば、
「た＊べにき＊た」という入力の場合、「た」と「き」
にアクセントが存在するので、第５図に示すようなピッ
チパターンとなる。ピッチパターン生成部（７）では、
文全体にピッチが降下するフレーズ成分と、アクセント
位置でピッチが高くなるアクセント成分が加えられて、
ピッチパターンが生成される。On the other hand, a symbol representing intonation such as an accent position is also input together with the character string to be uttered,
When supplied from the character string buffer (1) to the pitch pattern generation unit (7), the pitch pattern generation unit (7)
Determines the pitch (pitch) of the entire utterance. For example,
In the case of inputting "ta * beniki * ta", "ta" and "ki"
Since there is an accent in the pitch pattern, the pitch pattern is as shown in FIG. In the pitch pattern generator (7),
A phrase component with a lower pitch and an accent component with a higher pitch at the accent position are added to the entire sentence,
A pitch pattern is generated.

【００１０】また、残差波形メモリ（８）には、図３に
示す如く、各音声素片に対応して、駆動音源として利用
するための残差波形が蓄えられており、選択回路１
（３）で決定選択された音声素片に対応して、必要な残
差波形が選択される。例の場合、「ｔａ、ｂｅ、ｎｉ、
ｋｉ、ｔａ」の順で対応する残差波形が選択される。さ
らに、この残差波形メモリ（８）には、やはり図３に示
す如く、各音声素片に対してピッチの異なる複数の残差
波形が蓄えられており、ピッチパターン生成部（７）で
生成されたピッチに応じて、選択回路２（９）が適切な
ピッチの残差波形を選択決定し、駆動音源生成部（１
０）に蓄える。最後に、選択された残差波形は、駆動音
源生成部（１０）で、ピッチパターン生成部（７）で生
成されたピッチに一致する値にピッチ変更が施され、所
望のピッチの残差波形が生成される。Further, as shown in FIG. 3, a residual waveform for use as a driving sound source is stored in the residual waveform memory (8) corresponding to each speech unit, and the selection circuit 1
A required residual waveform is selected corresponding to the speech unit determined and selected in (3). In the case of the example, "ta, be, ni,
The corresponding residual waveforms are selected in the order of “ki, ta”. Further, as shown in FIG. 3, the residual waveform memory (8) also stores a plurality of residual waveforms having different pitches for each speech unit, and the residual waveform memory (8) generates them by the pitch pattern generation unit (7). The selection circuit 2 (9) selects and determines a residual waveform having an appropriate pitch according to the selected pitch, and the driving sound source generation unit (1
Store in 0). Finally, the selected residual waveform is subjected to pitch change in the driving sound source generation unit (10) to a value that matches the pitch generated in the pitch pattern generation unit (7), and the residual waveform of the desired pitch is obtained. Is generated.

【００１１】このようにして生成された残差波形は、駆
動音源として合成フィルタ（１１）に入力され、合成フ
ィルタ（１１）で合成音声が生成される。合成音声はＤ
／Ａ変換器（１２）を経て、スピーカ（１３）から出力
される。The residual waveform thus generated is input to the synthesis filter (11) as a driving sound source, and synthetic speech is generated by the synthesis filter (11). Synthetic voice is D
The signal is output from the speaker (13) via the / A converter (12).

【００１２】このような既提案装置の駆動音源生成部
（１０）の動作について、以下にさらに説明を加える。The operation of the driving sound source generation unit (10) of the already proposed device will be further described below.

【００１３】まず、図３は残差波形メモリ（８）に蓄え
られている残差信号の波形データ例を示したものであ
る。このメモリ（８）の波形データは、ＣＶ（子音＋母
音）構成の音節を音声素片の単位とした場合に対応して
おり、各音声素片に対応して、それぞれピッチ周期が異
なる３形態、即ち、高音用残差波形（ピッチ周期の短い
残差）、中音用残差波形（ピッチ周期の中程度残差）、
低音用残差波形（ピッチ周期の長い残差）が蓄えられて
いる。そして、このような残差波形メモリ（８）に蓄え
られている残差信号の波形データは、図４の駆動音源生
成部（１０）によって、以下の如く処理されるのであ
る。First, FIG. 3 shows an example of waveform data of the residual signal stored in the residual waveform memory (8). The waveform data of the memory (8) corresponds to the case where a syllable having a CV (consonant + vowel) structure is used as a unit of a voice unit, and three types having different pitch periods are provided for each voice unit. , That is, the treble residual waveform (residual with a short pitch period), the middle tone residual waveform (medium residual pitch period),
The bass residual waveform (residue with a long pitch period) is stored. Then, the waveform data of the residual signal stored in the residual waveform memory (8) is processed as follows by the driving sound source generation unit (10) of FIG.

【００１４】即ち、図４の選択回路１（３）からの選択
信号により、スイッチ１（１０１）が選択され、必要な
音声素片に対応する残差波形が、残差波形メモリ（８）
から読み出され、残差波形バッファ１（１０２）に蓄え
られる。「た＊べにき＊た」の例では、まず、「ｔａ」
の残差波形が、読み出される。次に、選択回路２（９）
からの選択信号により、スイッチ２（１０３）が選択さ
れ、適切なピッチの残差が選択され、残差波形バッファ
２（１０４）に蓄えられる。例では、図２に示すように
「た」のピッチは、４００Ｈｚと高いので、高音用の
「ｔａ」の残差波形が選択されて、残差波形バッファ２
（１０４）に蓄えられる。最後に、ピッチパターン生成
部（７）で決定されたピッチになるように、残差波形の
ピッチ周期の変更がピッチ変更回路（１０５）で施され
る。例えば、高音用の「ｔａ」の残差波形が、３８０Ｈ
ｚの音声から抽出されたものであれば、ピッチパターン
生成部（７）で決定された４００Ｈｚになるよう、２０
Ｈｚだけピッチが低くなる（ピッチ周期が長くなる）よ
うな変更が施される。That is, the switch 1 (101) is selected by the selection signal from the selection circuit 1 (3) in FIG. 4, and the residual waveform corresponding to the necessary speech segment is stored in the residual waveform memory (8).
Are stored in the residual waveform buffer 1 (102). In the case of "Ta * Beniki * Ta", first, "ta"
The residual waveform of is read out. Next, the selection circuit 2 (9)
The switch 2 (103) is selected by the selection signal from (1) to select a residual with an appropriate pitch, and the residual is stored in the residual waveform buffer 2 (104). In the example, as shown in FIG. 2, the pitch of "ta" is as high as 400 Hz, so that the residual waveform of "ta" for the high tone is selected and the residual waveform buffer 2
It is stored in (104). Finally, the pitch changing circuit (105) changes the pitch cycle of the residual waveform so that the pitch is determined by the pitch pattern generator (7). For example, the residual waveform of "ta" for treble is 380H.
If it is extracted from the voice of z, it is set to 400 Hz determined by the pitch pattern generation unit (7).
A change is made so that the pitch becomes lower by Hz (pitch period becomes longer).

【００１５】尚、ピッチ変更回路（１０５）で行われる
残差波形のピッチの変更としては、例えば、ピッチを低
くする時は、途中に零データを挿入してピッチ周期を長
くし、ピッチを高くする時は、途中のデータを削除して
ピッチ周期を短くする「零詰め切り捨て法」が用いられ
る。As the pitch change of the residual waveform performed by the pitch change circuit (105), for example, when lowering the pitch, zero data is inserted in the middle to lengthen the pitch cycle and increase the pitch. When this is done, the “zero-padded truncation method” is used in which data in the middle is deleted to shorten the pitch cycle.

【００１６】ピッチ変更を大幅に行うと、音質の劣化が
生じるので、上記の残差駆動型規則合成装置の例では、
高音用、中音用、低音用の３段階のピッチの異なる残差
をあらかじめ残差波形メモリに蓄えておき、所望のピッ
チに近いピッチ周期の残差波形を用いることにより、ピ
ッチの変更量が少なくすむように工夫している。If the pitch is changed drastically, the sound quality will be deteriorated. Therefore, in the above example of the residual drive type rule synthesizer,
By storing residuals with different pitches of three steps for high tone, middle tone, and low tone in the residual waveform memory in advance, and using the residual waveform having a pitch cycle close to the desired pitch, the amount of pitch change can be reduced. It is devised so that it can be reduced.

【００１７】上述の如く、選択された音声素片に対応
し、かつ、所望のピッチに変更された残差波形が、駆動
音源として生成されるので、発音の自然性が高い合成音
声が得られるのである。As described above, since the residual waveform corresponding to the selected speech unit and changed to the desired pitch is generated as the driving sound source, a synthetic speech with a high natural sounding can be obtained. Of.

【００１８】以上に概説した残差駆動型音声合成装置に
よれば、その残差波形メモリ（８）に蓄えられているピ
ッチの異なる残差波形は、従来は図５に示す方法で作成
されるのが一般的であった。According to the residual drive type speech synthesizer outlined above, the residual waveforms having different pitches stored in the residual waveform memory (8) are conventionally created by the method shown in FIG. Was common.

【００１９】即ち、例えば、３種類（３段階）のピッチ
の残差波形を作成する場合、図５（ａ）に示すように、
高音の入力音声を分析することにより、ピッチ周期の短
い、高音用の残差波形を抽出するのである。また、図５
（ｃ）に示すように、低音の入力音声を分析することに
より、ピッチ周期の長い、低音用の残差波形を抽出す
る。中音のそれについても、図５（ｂ）に示すように、
中音の入力音声を分析することにより、ピッチ周期の平
均的な長さの、低音用の残差波形を抽出するのである。That is, for example, when creating residual waveforms of three types (three stages) of pitch, as shown in FIG.
By analyzing the treble input voice, the residual waveform for the treble with a short pitch period is extracted. Also, FIG.
As shown in (c), a low-pitched residual voice having a long pitch period is extracted by analyzing the low-pitched input voice. As for the middle tone, as shown in FIG.
By analyzing the input sound of middle tone, the residual waveform for the low tone having the average length of the pitch period is extracted.

【００２０】[0020]

【発明が解決しようとする課題】前述した如く、従来の
残差駆動型規則合成装置では、ピッチの異なる残差波形
を利用して、駆動音源を生成する場合、音声素片として
蓄えられているＬＰＣやＬＳＰなどのパラメータと、駆
動音源として利用される残差波形が、異なる音声から分
析して抽出されたものとなるため、蓄えられた残差波形
を駆動音源とし、蓄えられた音声素片のパラメータを係
数として合成フィルタに通しても原音声を再生すること
ができず、生成される合成音声が劣化するという問題点
が生じる。As described above, in the conventional residual drive type rule synthesizer, when the drive sound source is generated by using the residual waveforms having different pitches, it is stored as the voice unit. Since the parameters such as LPC and LSP and the residual waveform used as the driving sound source are those which are analyzed and extracted from different voices, the accumulated residual waveform is used as the driving sound source, and the accumulated speech unit is stored. Even if the parameter is used as a coefficient to pass through the synthesis filter, the original voice cannot be reproduced and the generated synthesized voice deteriorates.

【００２１】また、各残差波形ごとに、対応する高さの
音声から抽出した音声素片を利用すれば、劣化はなくな
ると考えられるが、この場合は各ピッチに対応して音声
素片を複数個、蓄えておくことが必要となり、メモリ量
が増大する。Further, it is considered that the deterioration is eliminated by using the speech element extracted from the speech of the corresponding pitch for each residual waveform, but in this case, the speech element is corresponded to each pitch. It becomes necessary to store a plurality of them, which increases the amount of memory.

【００２２】[0022]

【課題を解決するための手段】本発明の残差駆動型音声
合成装置は、音声合成フィルタの逆フィルタからなる音
声分析フィルタの係数として、音声素片メモリの音声素
片単位の音声パラメータを用い、この分析フィルタにそ
れぞれ異なるピッチ周期の音声を入力することにより得
られるそれぞれ異なるピッチ周期の音声の残差信号の波
形データを抽出し、この波形データを残差波形メモリに
蓄えたものである。According to the residual drive type speech synthesizer of the present invention, a speech parameter in a speech unit unit of a speech unit memory is used as a coefficient of a speech analysis filter which is an inverse filter of a speech synthesis filter. Waveform data of residual signals of voices having different pitch periods, which are obtained by inputting voices having different pitch periods to the analysis filter, and the waveform data is stored in a residual waveform memory.

【００２３】[0023]

【作用】本発明の残差駆動型音声合成装置によれば、音
声を合成するために選択された音声素片に対して、どの
ような周期の残差波形が駆動音源として用いられても、
選択され得る全ての残差信号の波形データが、実際の合
成に供せられる音声パラメータそのものを係数とした逆
フィルタにより抽出されたものであるので、ピッチの変
更にかかわらず、音源とパラメータが同一音声から抽出
したことになるため、より自然で高品質な合成音声が得
られる。According to the residual drive type speech synthesizer of the present invention, no matter what period the residual waveform is used as the driving sound source for the speech unit selected for synthesizing the speech,
The waveform data of all the residual signals that can be selected is extracted by an inverse filter that uses the speech parameters themselves that are actually used for synthesis as coefficients, so that the parameters are the same as the sound source regardless of the pitch change. Since it is extracted from the voice, a more natural and high quality synthetic voice can be obtained.

【００２４】[0024]

【実施例】図６は、本発明の残差駆動型音声合成装置に
用いる残差信号の波形データの作成方法を示したもので
ある。図６（ａ）（ｂ）（ｃ）の各Ｈ（ｚ）は、音声合
成フィルタの逆フィルタからなる音声分析フィルタの伝
達特性をゼット変換の記述で表したものである。これら
の図６（ａ）（ｂ）（ｃ）にそれぞれ示すように、高
音、中音、低音の３段階のピッチの異なる音声に対し
て、音声素片として蓄えられるＬＰＣ（線形予測係数、
編自己相関係数等）やＬＳＰなどの音声パラメータを係
数とする逆フィルタをかけることにより、それぞれ高音
用、中音用、低音用の残差波形が生成される。FIG. 6 shows a method of creating waveform data of a residual signal used in the residual drive type speech synthesizer of the present invention. Each H (z) in FIGS. 6A, 6B, and 6C represents the transfer characteristics of the voice analysis filter, which is an inverse filter of the voice synthesis filter, in the description of the Zet conversion. As shown in FIGS. 6 (a), 6 (b) and 6 (c), LPCs (linear prediction coefficients
By applying an inverse filter using voice parameters such as volume autocorrelation coefficient) and LSP as coefficients, residual waveforms for treble, middle tone, and bass are generated, respectively.

【００２５】本発明の残差駆動型音声合成装置は、各音
声素片に対し、図６に示したのと同様の方法で作成した
残差波形を図３に示す残差波形メモリに蓄え、図１に示
す構成で残差駆動型の規則合成を行うのである。The residual drive type speech synthesizer of the present invention stores the residual waveform created by the same method as shown in FIG. 6 for each speech unit in the residual waveform memory shown in FIG. Residual drive type rule composition is performed with the configuration shown in FIG.

【００２６】すなわち、発声すべき文字列が文字列バッ
ファ（１）に入力されると、音韻記号列生成部（２）は
入力された文字列を音韻記号列に変換する。例えば、
「た＊べにき＊た」（但し、＊はアクセント位置を示す
記号）という文字列が入力されると「ｔａｂｅｎｉｋ
ｉｔａ」という音韻記号列に変換する。That is, when the character string to be uttered is input to the character string buffer (1), the phoneme symbol string generator (2) converts the input character string into a phoneme symbol string. For example,
When the character string "ta * beniki * ta" (where * is a symbol indicating an accent position) is input, "tabeni k
It is converted into a phoneme symbol string "ita".

【００２７】次に、選択回路１（３）は、音韻記号列か
ら必要な音声素片を順次、決定選択し、音声素片メモリ
（４）に蓄えられた音声素片のうち必要な音声素片が、
音声素片接続部（５）で接続される。音声素片の単位と
しては、ＣＶＣ（子音＋母音＋子音）、あるいは、ＣＶ
（子音＋母音）とＶＣ（母音＋子音）を併用するものな
ども用いられるが、例として簡単のため、ＣＶ（子音＋
母音）すなわち音節を単位とするものを考えると、「ｔ
ａ、ｂｅ、ｎｉ、ｋｉ、ｔａ」が必要な音声素片として
順次選択され接続される。Next, the selection circuit 1 (3) sequentially determines and selects necessary speech units from the phoneme symbol string, and selects the necessary speech units among the speech units stored in the speech unit memory (4). One piece
It is connected by the voice unit connecting part (5). The unit of the speech unit is CVC (consonant + vowel + consonant) or CV
A combination of (consonant + vowel) and VC (vowel + consonant) is also used, but for the sake of simplicity, CV (consonant +
Considering vowels, that is, syllables as a unit, "t
"a, be, ni, ki, ta" are sequentially selected and connected as necessary speech units.

【００２８】接続された音声素片は、音声パラメータと
して音声パラメータバッファ（６）に蓄えられ、係数と
して合成フィルタ（１１）に与えられる。The connected voice unit is stored in the voice parameter buffer (6) as a voice parameter and given to the synthesis filter (11) as a coefficient.

【００２９】一方、アクセント位置等のイントネーショ
ンを表す記号も、発声すべき文字列とともに入力され、
文字列バッファ（１）から、ピッチパターン生成部
（７）に与えられると、ピッチパターン生成部（７）
は、発声文全体のピッチ（音程）を決定する。例えば、
「た＊べにき＊た」という入力の場合、「た」と
「き」にアクセントが存在するので、第５図に示すよう
なピッチパターンとなる。On the other hand, a symbol representing intonation such as accent position is also input together with the character string to be uttered,
When supplied from the character string buffer (1) to the pitch pattern generation unit (7), the pitch pattern generation unit (7)
Determines the pitch (pitch) of the entire utterance. For example,
In the case of inputting “ta * beniki * ta”, the pitch pattern is as shown in FIG. 5 because accents exist on “ta” and “ki”.

【００３０】また、残差波形メモリ（８）には、各音声
素片に対応して、駆動音源として利用するための残差波
形が蓄えられており、選択回路１（３）で決定選択され
た音声素片に対応して、必要な残差波形が選択される。
例の場合、「ｔａ、ｂｅ、ｎｉ、ｋｉ、ｔａ」の順で残
差波形が選択される。The residual waveform memory (8) stores residual waveforms to be used as a driving sound source corresponding to each voice segment, and is selected and selected by the selection circuit 1 (3). The required residual waveform is selected according to the speech unit.
In the case of the example, the residual waveforms are selected in the order of “ta, be, ni, ki, ta”.

【００３１】さらに、残差波形メモリ（８）には、各音
声素片に対して、本発明で提案する方法で作成したピッ
チの異なる複数の残差波形が蓄えられており、ピッチパ
ターン生成部（７）で生成されたピッチに応じて、選択
回路２（９）が適切なピッチの残差波形を選択決定し、
駆動音源生成部（１０）に蓄える。最後に、選択された
残差波形は、駆動音源生成部（１０）で、ピッチパター
ン生成部（７）で生成されたピッチに一致する値にピッ
チ変更が施され、所望のピッチの残差波形が生成され
る。Further, the residual waveform memory (8) stores a plurality of residual waveforms with different pitches created by the method proposed by the present invention for each speech unit, and the pitch pattern generator In accordance with the pitch generated in (7), the selection circuit 2 (9) selects and determines the residual waveform having an appropriate pitch,
It is stored in the driving sound source generator (10). Finally, the selected residual waveform is subjected to pitch change in the driving sound source generation unit (10) to a value that matches the pitch generated in the pitch pattern generation unit (7), and the residual waveform of the desired pitch is obtained. Is generated.

【００３２】このようにして生成された残差波形は、駆
動音源として合成フィルタ（１１）に入力され、合成フ
ィルタ（１１）で合成音声が生成される。合成音声はＤ
／Ａ変換器（１２）を経て、スピーカ（１３）から出力
される。The residual waveform thus generated is input to the synthesis filter (11) as a driving sound source, and synthetic speech is generated by the synthesis filter (11). Synthetic voice is D
The signal is output from the speaker (13) via the / A converter (12).

【００３３】[0033]

【発明の効果】本発明の残差駆動型音声合成装置は、ピ
ッチの変更にかかわらず、音源とパラメータが同一音声
から抽出したことになるので、接続すべき音声素片に対
応して選択された残差波形であれば、いずれのピッチの
残差波形を駆動音源として用いても、原音声に近い高品
質な合成音声を得ることができる。As described above, the residual drive type speech synthesizer of the present invention is selected according to the speech unit to be connected since the sound source and the parameter are extracted from the same speech regardless of the pitch change. With such a residual waveform, it is possible to obtain a high-quality synthesized speech close to the original speech by using the residual waveform of any pitch as the driving sound source.

[Brief description of drawings]

【図１】残差駆動型規則合成装置の構成図、FIG. 1 is a block diagram of a residual drive type rule synthesizer,

【図２】ピッチパターンの模式図、FIG. 2 is a schematic diagram of a pitch pattern,

【図３】残差波形メモリの模式図、FIG. 3 is a schematic diagram of a residual waveform memory,

【図４】駆動音源生成部の模式図、FIG. 4 is a schematic diagram of a driving sound source generation unit,

【図５】従来の残差波形作成方法の解説図、FIG. 5 is an explanatory diagram of a conventional residual waveform generation method,

【図６】本発明の残差駆動型音声合成装置で用いる残差
波形作成方法の解説図。FIG. 6 is an explanatory diagram of a residual waveform creating method used in the residual drive type speech synthesizer of the present invention.

[Explanation of symbols]

（１）・・・文字列バッファ、（２）・・・音韻記号列生成部、（３）・・・選択回路１、（４）・・・音声素片メモリ、（５）・・・音声素片接続部、（６）・・・音声パラメータバッファ、（７）・・・ピッチパターン生成部、（８）・・・残差波形メモリ、（９）・・・選択回路２、（１０）・・駆動音源生成部、（１１）・・合成フィルタ、（１２）・・Ｄ／Ａ変換器、（１３）・・スピーカ、（１０１）・スイッチ１、（１０２）・残差波形バッファ１、（１０３）・スイッチ２、（１０４）・残差波形バッファ２、（１０５）・ピッチ変更回路 (1) ... Character string buffer, (2) ... Phonological symbol string generator, (3) ... Selection circuit 1, (4) ... Speech segment memory, (5) ... Speech Unit connection unit, (6) ... Voice parameter buffer, (7) ... Pitch pattern generation unit, (8) ... Residual waveform memory, (9) ... Selection circuit 2, (10) .. drive sound source generation unit, (11), synthesis filter, (12), D / A converter, (13), speaker, (101), switch 1, (102), residual waveform buffer 1, (103) -switch 2, (104) -residual waveform buffer 2, (105) -pitch changing circuit

Claims

[Claims]

1. A speech unit memory in which speech feature parameters of a linear prediction system such as LPC and LSP are stored in units of speech units, and a phonological symbol sequence generation unit for generating a symbol sequence indicating speech units of speech to be uttered. , And a voice feature parameter control means including a voice unit connection unit that sequentially connects the voice units read from the voice unit memory based on the symbol sequence generated by the phoneme symbol sequence generation unit, for each voice unit. Residual waveform memory that stores waveform data groups of a plurality of residual signals having different pitch periods corresponding to the above, a pitch pattern generation unit that generates a pitch pattern indicating a pitch period change of speech to be uttered, and the residual waveform A residual selection circuit that selects residual waveform data corresponding to the pitch cycle at each time point determined by the pitch pattern generation unit from the waveform data group of the residual signal corresponding to the speech unit of the memory. A driving sound source control means provided with the residuals selected by the residual selection circuit as a driving sound source, and a linear for synthesizing a voice by using a voice parameter of a voice unit connected by the voice unit connecting unit as a coefficient. In a residual drive type speech synthesizer comprising a speech synthesizer having a predictive speech synthesis filter, waveform data of a plurality of residual signals with different pitch periods to be stored in the residual waveform memory corresponds to this. As a coefficient, the voice parameter of the voice unit of the voice unit memory
A residual drive type speech synthesizer characterized by being obtained by inputting speech of different pitch periods to a speech analysis filter composed of an inverse filter of the speech synthesis filter.