JPS63110498A

JPS63110498A - Regular type voice synthesizer

Info

Publication number: JPS63110498A
Application number: JP25748986A
Authority: JP
Inventors: 幸夫三留
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1986-10-29
Filing date: 1986-10-29
Publication date: 1988-05-14
Anticipated expiration: 2012-01-22
Also published as: JP2573586B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Abstract] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は、文字列などの音声を表す情報から規則により
音声を合成する装置に関する。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to an apparatus for synthesizing speech based on rules from information representing speech such as character strings.

（従来の技術）音声応答システム等においては、システム開発時にあら
かじめ決められた特定のメツセージたけでなく、運用時
に任意の文章や単語の読み等を表す文字列から音声を合
成する必要が生じる場合がある。また、人間が読むため
のテキスト、例えば日本語ならば漢字仮名混じりの文章
を機械に音読させる場合には；テキストを解析し、読み
等を表す情報（以後、音声情報と呼ぶ）を生成し、それ
から音声を合成することになる。(Prior art) In voice response systems, etc., it may be necessary to synthesize speech not only from specific messages determined in advance during system development, but also from character strings representing arbitrary sentences or word pronunciations during operation. be. In addition, when a machine reads aloud a text intended for human reading, such as a Japanese sentence that includes kanji and kana; it analyzes the text and generates information representing the pronunciation (hereinafter referred to as audio information). Then the audio will be synthesized.

このような場合、ピッチ、音素の時間長、振幅あるいは
スペクトルパラメータなどを制御する様々な音声合成規
則を用意しておき、入力された音声情報に対してそれら
の音声合成規則を適用することによって音声を合成する
いわゆる音声の規則合成が知られている。In such cases, you can prepare various speech synthesis rules that control pitch, phoneme duration, amplitude, or spectral parameters, and then apply these speech synthesis rules to the input speech information. So-called rule synthesis of speech is known.

このような音声の規則合成の例は、三留と伏木田による
日本音響学会音声研究会資料５８５−３１（１９８５，
７）、「ポルマント、ｃｖ−ｖｃ型規則合成」に示され
ている。An example of such a rule-based synthesis of speech is given in the Acoustical Society of Japan Speech Study Group Material 585-31 (1985,
7), "Pormant, cv-vc type rule synthesis".

これは、音声合成規則として、時間長規則、ピッチ規則
、ポーズ規則、パラメータ編集規則などがあり、あらか
じめ自然音声を分析して得られたＣＶ−ＶＣ（Ｃは子音
■は母音を表す）を単位とするホルマントパラメータを
編集し、それをホルマント型音声合成器に与えることで
任意の音声を合成するものである。This includes voice synthesis rules such as time length rules, pitch rules, pause rules, and parameter editing rules, and is based on CV-VC (C represents a consonant and ■ represents a vowel) obtained by analyzing natural speech in advance. By editing formant parameters and feeding them to a formant-type speech synthesizer, arbitrary speech can be synthesized.

これらの諸規則の内、時間長規則は、単語の長さ、文章
中の単語の位置、アクセントの位置などに基づいて音素
の時間長を決定する規則である。Among these rules, the time length rule is a rule that determines the time length of a phoneme based on the length of a word, the position of a word in a sentence, the position of an accent, and the like.

ピッチ規則は、呼気段落の長さ或は、文節の長さやアク
セントに基づいて、各ＣＶ−ＶＣのピッチ周波数の値を
決め、それらを直線補間することにより音声合成フレー
ム毎のピッチを決定する規則である。The pitch rule is a rule that determines the pitch frequency value of each CV-VC based on the length of an exhalation paragraph or clause length or accent, and then determines the pitch of each speech synthesis frame by linearly interpolating them. It is.

ポーズ規則は、文節相互の係り受けに基づいて、長い文
章を幾つかの呼気段落に分け、その間のポーズの時間長
を決定する規則である。The pause rule is a rule that divides a long sentence into several breath paragraphs based on the dependencies between clauses and determines the length of the pause between them.

パラメータ編集規則は、合成すべき音声の音素系列に基
づいてまず単位となるＣＶ−ＶＣ音声の番号を決定し、
あらかじめ用意されたｃｖ−ｖｃを単位とするホルマン
トデータを編集して音声を合成する規則である。The parameter editing rule first determines the number of the unit CV-VC voice based on the phoneme sequence of the voice to be synthesized,
This is a rule for synthesizing speech by editing formant data prepared in advance in units of cv-vc.

一方、第二の従来例としては、三留、伏木田、高島によ
る、電子通信学会情報システム部門全国大会Ｍ演論文集
、Ｎｏ、　１−１３１に示されな「調音素片編集方式に
よる音声合成システム」がある。これは、第一の従来例
と同様に、調音素片と呼ばれる単位音声を編集して任意
の音声を合成するものであるが、ホルマントなどのパラ
メータではなく、数段階のピッチレベルを有する音声波
形を編集する点が第一の例と異なる。そのため、時間長
規則などは第一の例と同様であるが、ピッチ規則は、あ
らかじめ用意されたピッチの中から選択するという点が
異なり、又、データの編集もピッチを考慮して行なうこ
とになる。On the other hand, a second conventional example is "Speech synthesis using articulatory segment editing method", which is presented in Proceedings of the National Conference of IEICE Information Systems Division, No. 1-131, by Mitome, Fushikida, and Takashima. There is a system. Similar to the first conventional example, this synthesizes arbitrary speech by editing unit speech called articulatory segments, but instead of using parameters such as formants, it synthesizes speech waveforms with several pitch levels. This example differs from the first example in that it edits the . Therefore, the time length rules etc. are the same as in the first example, but the difference is that the pitch rules are selected from pitches prepared in advance, and the data is edited taking the pitch into consideration. Become.

いずれの例によっても、音声情報に基づいて任意の日本
語音声を合成することができる。又、時間長や、ピッチ
などの個々のパラメータに関する制御規則についても多
くの例が知られている。In either example, arbitrary Japanese speech can be synthesized based on speech information. Furthermore, many examples are known regarding control rules regarding individual parameters such as time length and pitch.

更に、英語などの他の言語の音声も同様に規則によって
合成することか可能であり、そのような例も多く知られ
ている。その様な例は、プロシーデインダス　アイシー
エイニスエスピー８２（Ｐｒｏｃｅｅｄｉｎｇｓ　ＩＣ
ＡＳＳＰ　８２．１９８２年音響・音声・信号処理国際
会議論文集）の１５８９ページから１５９２ページに示
されたクラット（Ｋｌａｔｔ）による論文「ザ　クララ
トーク　テキスト　トウー　スピーチ　コンバージョン
　　システム（Ｔｈｅ　ＫｌａｔｔａｌｋＴｅＸｔづｏ
−８ｐｅｅｃｈ　Ｃｏｎｖｅｒｓｉｏｎ　５ｙｓｔｅｒ
Ｒ）　Ｊがある。Furthermore, speech in other languages such as English can be similarly synthesized using rules, and many such examples are known. An example of such is Proceedings IC ANISP 82.
ASSP 82. The Klattalk Text to Speech Conversion System (The Klattalk Text to Speech Conversion System), published on pages 1589 to 1592 of the 1982 International Conference on Acoustics, Speech, and Signal Processing.
-8peech Conversion 5yster
R) There is a J.

この例では、ピッチ、音素のホルマントや振幅等の各種
の音声合成パラメータのターゲツト値を与え、その間を
なめらかに補間して各パラメータの時系列パタンを生成
する規則を有している。This example has a rule that provides target values for various speech synthesis parameters such as pitch, phoneme formant, and amplitude, and generates a time-series pattern for each parameter by smoothly interpolating between them.

これらの例に共通するのは、入力の音声情報に対して条
件の適合する諸規則を適用し、各種のパラメータ（時間
長、ピッチ、ホルマント周波数や振幅のターゲツト値、
単位音声の番号など）の値を決定し、その値に基づいて
音声を合成することである。What these examples have in common is that various rules are applied to the input audio information, and various parameters (time length, pitch, formant frequency, amplitude target values,
The process involves determining the value of the unit voice (such as the number of the unit voice) and synthesizing the voice based on that value.

（発明が解決しようとする問題点）しかしながら、このような従来の規則型音声合成装置に
おいては、同一の文章あるいは単語などは、常に同一の
規則が適用されるため、常に同じように発音されること
になる。そのため、従来の規則型音声合成装置によって
生成された合成音声は機械的で不自然な印象をうけ、長
時間聞くと疲れるといった問題があった。(Problem to be solved by the invention) However, in such conventional rule-based speech synthesis devices, the same sentence or word is always pronounced in the same way because the same rule is always applied. It turns out. Therefore, the synthesized speech generated by the conventional regular speech synthesizer has a mechanical and unnatural impression, and it is tiring to listen to it for a long time.

単語や単文が長文の一部として現れる場合には、前後関
係が異なるならば、規則によっては少し異なって合成さ
れる場合もある。しかし、単語などが句読点によって区
切られ単独で現れ、規則がその前後を考慮しないような
場合にはいつも同じ様に合成されることになる。When a word or simple sentence appears as part of a long sentence, if the context is different, it may be synthesized slightly differently depending on the rules. However, when words appear on their own, separated by punctuation marks, and the rules do not take context into account, they will always be combined in the same way.

このような状況は、テキストが長くなればなるほど発生
する可能性が高くなり、不自然さが目立つことになる。The longer the text becomes, the more likely this situation will occur, and the more unnatural it will become.

本発明の目的は、従来の規則型音声合成装置に若干の回
路を追加した程度の簡単な構成で、より自然な音声を合
成し得る規則型音声合成装置を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to provide a regular speech synthesizer capable of synthesizing more natural speech with a simple configuration such as adding some circuits to a conventional regular speech synthesizer.

（問題点を解決するための手段）前述の問題点を解決するために本発明が提供する手段は
、入力された音声情報に基づいて、規則合成の為のピッ
チ、音素時間長、振幅、スペクトル等の音声合成パラメ
ータの値を決定する手段と、そのパラメータの値から合
成音声を生成する手段とを有する規則型音声合成装置で
あって、ランダムデータを発生ずる手段と、そのランダ
ムデータの値に応じて前記の音声合成パラメータの値を
変化させる手段とを更に有することを特徴とする。(Means for Solving the Problems) Means provided by the present invention to solve the above-mentioned problems is based on the input speech information, and determines the pitch, phoneme duration, amplitude, and spectrum for regular synthesis. A regular speech synthesizer has means for determining the values of speech synthesis parameters such as The apparatus is characterized in that it further comprises means for changing the value of the voice synthesis parameter accordingly.

（作用）本発明は、音声合成パラメータに対する制御規則により
、従来と同様に音声合成パラメータの値を決定し、それ
をランダムに変化させて音声を合成することにより、機
械的な合成音声の発生を避けようとするものである。こ
れは、実際に人間が同一の文章などを音読する場合には
、頭律や音色などは発声の度に確率的に揺らぐという性
質があり、それを実現することで自然性を向上させよう
とするものである。そのために、ランダムデータを発生
する手段によって生成されたランダムデータに応じて、
音声合成パラメータ値を変化させる手段により、従来と
同様に決定されたパラメータ値を変化させるのである。(Function) The present invention determines the value of the speech synthesis parameter in the same way as in the past using control rules for the speech synthesis parameter, and synthesizes speech by randomly changing the value, thereby preventing the generation of mechanically synthesized speech. It's something you try to avoid. This is because when humans actually read the same sentence aloud, the initial rhythm and timbre vary stochastically each time they are uttered, and the idea is to improve naturalness by realizing this. It is something to do. To this end, depending on the random data generated by the random data generating means,
The means for changing the voice synthesis parameter value changes the parameter value determined in the same way as in the conventional method.

このとき、あらかじめ実際に人間が発声した音声を分析
して、音声合成パラメータの値の統計的な分布を調べて
おき、その分布にもとづいて音声合成パラメータ値の変
化させる量を決定する。At this time, the actual voice uttered by a human being is analyzed in advance to find out the statistical distribution of the values of the voice synthesis parameters, and the amount by which the voice synthesis parameter values are changed is determined based on the distribution.

ランダムデータを発生する手段としては、従来から合同
法によるものやＭ系列によるもの等が知られており、い
ずれの方法に基づくものでも利用可能である。As means for generating random data, methods based on the congruence method and methods based on M-sequence are conventionally known, and methods based on either method can be used.

（実施例）次に、図面を参照して本発明の詳細な説明する６第１図は、本発明の第一の実施例のブロック図である。(Example) Next, the present invention will be explained in detail with reference to the drawings. FIG. 1 is a block diagram of a first embodiment of the invention.

この例は、前述の第二の従来例と同様に、単位音声波形
を編集して音声を合成する装置である。This example is a device that edits unit audio waveforms and synthesizes audio, similar to the second conventional example described above.

図において、１０１は制御回路、１０２はデータ番号生
成回路、１０３は単位音声波形メモリ、１０４は波形編
集回路、１０５は乱数発生回路、１０６は加算器である
。In the figure, 101 is a control circuit, 102 is a data number generation circuit, 103 is a unit audio waveform memory, 104 is a waveform editing circuit, 105 is a random number generation circuit, and 106 is an adder.

単位音声波形メモリ　１０３には、各惟位音声に対して
複数の波形データが記憶されており、それぞれに割り当
てられたデータ番号を信号線１１８から与えるとそのデ
ータが信号線１１９から出力される。The unit voice waveform memory 103 stores a plurality of waveform data for each voice, and when a data number assigned to each voice is given from a signal line 118, the data is output from a signal line 119.

ここで、同一の単位音声に対する複数の波形データの番
号は連続的に割り当てておく。即ち、単位音声波形メモ
リ　１０３内には、一つの単位音声に対する異なった波
形データがグループを成して記憶されている。このよう
に、記憶されているデータの内容は異なるものの、単位
音声波形メモリ　１０３の構成は、第二の従来例に於て
あらかじめ用意しておく単位音声の波形の゛データを記
憶しておくものと同様である。Here, numbers of a plurality of waveform data for the same unit sound are consecutively assigned. That is, in the unit speech waveform memory 103, different waveform data for one unit speech are stored as a group. Although the contents of the stored data are different, the configuration of the unit speech waveform memory 103 is similar to that of the second conventional example in which data of the waveform of a unit speech prepared in advance is stored. It is similar to

データ番号生成回路１０２は、信号線１１３がら入力さ
れる単位音声名の系列から、単位音声波形メモリ　１０
３内の一つの単位音声に対する異なっな波形データのグ
ループの先頭のデータの番号を生成し、信号線１１６へ
出力する。The data number generation circuit 102 generates a unit speech waveform memory 10 from a series of unit speech names inputted through a signal line 113.
The number of the first data of a group of different waveform data for one unit voice in 3 is generated and output to the signal line 116.

波形編集回路１０４は、単位音声波形メモリ　１０３か
ら信号線１１９を介して送られる単位音声の波形データ
のうち、信号線１１５がら入力される時間長データの示
す部分のみを用い、各単位音声波形の間を補間すること
によって合成音声波形を生成する。　これら、データ番
号生成回路１０２と波形編集回路１０４も、第二の従来
例におけるものと同様の構成で実現できる。The waveform editing circuit 104 uses only the portion indicated by the time length data input through the signal line 115 out of the waveform data of the unit audio sent from the unit audio waveform memory 103 via the signal line 119, and edits each unit audio waveform. A synthesized speech waveform is generated by interpolating between the two. These data number generation circuit 102 and waveform editing circuit 104 can also be realized with the same configuration as in the second conventional example.

乱数発生回路１０５は、制御回路１０１から指示がある
毎に、乱数を発生し信号線１１７を介して加算器１０６
に送る。このとき生成される乱数は、各単位音声ごとに
記憶されている波形データの個数をＮとすると、０から
、（Ｎ−１）の値をとるものである。The random number generation circuit 105 generates a random number every time there is an instruction from the control circuit 101 and sends it to the adder 106 via a signal line 117.
send to The random number generated at this time takes a value from 0 to (N-1), where N is the number of waveform data stored for each unit voice.

加算器１０６では、データ番号生成回路１０２がらおく
られる、一つの単位音声に対する異なった波形データの
グループの先頭のデータの番号に、乱数発生日ｊ？４）
１０５から送られる乱数を加算し、信号線１１８を介し
て単位音声波形メモリ　１０３に送る。The adder 106 assigns the random number generation date j? 4)
The random numbers sent from 105 are added and sent to unit audio waveform memory 103 via signal line 118.

制御回路１０１は、信号線１１１から単位音声名の系列
と時間長データが入力されると、単位音声名の系列を信
号線１１３を介してデータ番号生成回路１０２に送りデ
ータ番号を発生させ、時間長データを信号線１１５を介
して波形編集回路１０４に送る。When the sequence of unit phonetic names and time length data are input from the signal line 111, the control circuit 101 sends the sequence of unit phonetic names to the data number generation circuit 102 via the signal line 113 to generate a data number. The long data is sent to the waveform editing circuit 104 via the signal line 115.

更に、信号線１１４を介して乱数発生回路１０５に指示
を送り乱数を発生させる。こうして、データ番号生成口
７１１０２で生成されたデータ番号に、乱数発生器ＦＩ
！１１０５で生成された乱数が加算され、そのデータが
単位音声波形メモリ　１０３から読み出され、波形編集
口ｌ　１０４に於て編集されて合成音声波形が生成され
、信号線１１２から出力される。Furthermore, an instruction is sent to the random number generation circuit 105 via the signal line 114 to generate random numbers. In this way, the random number generator FI inputs the data number generated by the data number generation port 71102.
! The random numbers generated in step 1105 are added, and the data is read out from the unit speech waveform memory 103 and edited in the waveform editing port 104 to generate a synthesized speech waveform, which is output from the signal line 112.

本実施例では、単位音声波形データを編集するものとし
て説明しなか、前述の第一の従来例のように、単位音声
めホルマンドパクンなどのデータを編集する装置でも本
実施例と同様に実現することができる。即ち、波形デー
タの代わりにホルマントパタン等のデータを記憶してお
き、それを編集する様にすれば良い。Although this embodiment is described as editing unit speech waveform data, it can also be realized in the same manner as in this embodiment with a device that edits data such as unit speech formand pakun, as in the first conventional example described above. Can be done. That is, instead of waveform data, data such as formant patterns may be stored and edited.

第２図は、本発明の第二の実施例のブロック図である。FIG. 2 is a block diagram of a second embodiment of the invention.

本実施例は前述の第三の従来例と同様に、ピッチ、音素
のホルマントや振幅等の各種の音声合成パラメータのタ
ーゲツト値を与え、その間をなめらかに補間して各パラ
メータの時系列パタンを生成する型の装置である。Similar to the third conventional example described above, this embodiment provides target values for various speech synthesis parameters such as pitch, phoneme formant, and amplitude, and generates a time series pattern for each parameter by smoothly interpolating between them. It is a type of device that

図において、２０１は制御回路、２０２はパラメータタ
ーゲツト値生成回路、２０３はデータ補間回路、２０４
は音声合成回路、２０５はデータ分布値メモリ、２０６
は乱数発生回路、２０７は乗算器、２０８は加算器であ
る。In the figure, 201 is a control circuit, 202 is a parameter target value generation circuit, 203 is a data interpolation circuit, and 204 is a control circuit.
is a speech synthesis circuit, 205 is a data distribution value memory, 206
is a random number generation circuit, 207 is a multiplier, and 208 is an adder.

パラメータターゲツト値生成回路２０２は、第三の従来
例におけるものと同様に、制御口ｉ　２０１から信号線
２１３を介して送られる音素系列に基づいて、パラメー
タターゲツト値を生成し、信号線２２０に送出する。Similar to the third conventional example, the parameter target value generation circuit 202 generates a parameter target value based on the phoneme sequence sent from the control port i 201 via the signal line 213, and sends it to the signal line 220. do.

データ補間回路２０３も、第三の従来例におけるものと
同様に、信号線２１６から送られる制御信号に基づいて
、信号線２２１から入力されるパラ、メータのターゲツ
ト値の間を補間して各パラメータの時系列パタンを生成
し、そのデータを信号線２２２を介して音声合成回路２
０４に送る。Similarly to the third conventional example, the data interpolation circuit 203 also interpolates between the target values of the parameters and meters input from the signal line 221 based on the control signal sent from the signal line 216, and calculates each parameter. generates a time-series pattern of
Send to 04.

音声合成回路２０４も、第三の従来例におけるものと同
様に、データ補間回路２０３から送られるデータをもと
に合成音声を生成し、信号線２１２へ出力する。Similarly to the third conventional example, the speech synthesis circuit 204 also generates synthesized speech based on the data sent from the data interpolation circuit 203 and outputs it to the signal line 212.

データ分布値メモリ　２０５には、各パラメータのター
ゲツト値の分布する最大値が記憶されていて、制御回路
２０１から信号線２１４を介して送られる指示に従って
、その分布の最大値を信号線２１７へと送り出す。The data distribution value memory 205 stores the maximum distributed value of the target value of each parameter, and in accordance with the instruction sent from the control circuit 201 via the signal line 214, the maximum value of the distribution is sent to the signal line 217. send out.

乱数発生回路２０６では、制御回路２０１から信号線２
１５を介して送られる指示に従って、絶対値が１以下の
正負の乱数が発生され信号線２１８へと送り出される。In the random number generation circuit 206, the signal line 2 is connected from the control circuit 201.
15, positive and negative random numbers having an absolute value of 1 or less are generated and sent to the signal line 218.

乗算器２０７では、データ分布値メモリ　２０５から送
られるデータの分布の最大値に、乱数発生回路２０６か
ら送られる乱数が乗じられ、その積が信号線２１９を介
して加算器２０８へと送られる。In the multiplier 207, the maximum value of the data distribution sent from the data distribution value memory 205 is multiplied by the random number sent from the random number generation circuit 206, and the product is sent to the adder 208 via the signal line 219.

加算器２０８では、パラメータターゲット値生成回Ｈ＠
２０２から送られるパラメータターゲツト値と、乗算器
２０７から送られるデータ分布値と乱数の積か加え合わ
されて新たなパラメータターゲツト値として信号線２２
１を介してデータ補間回路２０３に送られる。In the adder 208, the parameter target value generation time H@
The product of the parameter target value sent from the multiplier 202, the data distribution value sent from the multiplier 207, and the random number is added and sent to the signal line 22 as a new parameter target value.
1 to the data interpolation circuit 203.

制御口ｆＩ＠２０１は、信号線２１１がら音素系列と時
間長データが入力されると、音素系列を信号線２１３を
介してパラメータターゲツト値生成回路２０２に送りパ
ラメータターゲツト値を発生させ、制御信号を信号線２
１６を介してデータ補間回路２０３に送る。更に、信号
線２１４を介してデータ分布値メモリ　２０５に指示を
送りデータの分布の最大値を発生させ、信号線２１５を
介して乱数発生回路２０６に指示を送り乱数を発生させ
る。When the phoneme sequence and time length data are input through the signal line 211, the control port fI@201 sends the phoneme sequence to the parameter target value generation circuit 202 via the signal line 213, generates a parameter target value, and generates a control signal. Signal line 2
16 to the data interpolation circuit 203. Furthermore, an instruction is sent to the data distribution value memory 205 via the signal line 214 to generate the maximum value of the data distribution, and an instruction is sent to the random number generation circuit 206 via the signal line 215 to generate random numbers.

こうして、パラメータターゲツト値生成回路２０２で生
成されたパラメータターゲツト値に、データ分布値メモ
リ　２０５で生成されたデータの分布の最大値と乱数発
生回路２０６で生成された乱数の積が加算され、その新
たなパラメータターゲツト値に基づいて各音声合成パラ
メータの時系列パタンが生成され、そのデータをもとに
音声が合成される。In this way, the product of the maximum value of the data distribution generated by the data distribution value memory 205 and the random number generated by the random number generation circuit 206 is added to the parameter target value generated by the parameter target value generation circuit 202, and the new value is A time series pattern of each speech synthesis parameter is generated based on the parameter target value, and speech is synthesized based on the data.

〈発明の効果）以上説明したように、本発明によれば、合成しようとす
る音声情報に同一の表現が繰り返し現れても、各種の音
声合成パラメータの値がその都度微妙に異なった値がと
られるなめ、明瞭性が損なわれることなく自然な合成音
が得られるという効果がある。<Effects of the Invention> As explained above, according to the present invention, even if the same expression repeatedly appears in the speech information to be synthesized, the values of various speech synthesis parameters will be slightly different each time. This has the effect of producing a natural synthesized sound without compromising clarity.

[Brief explanation of the drawing]

第１図および第２図は本発明の第一および第二の実施例
をそれぞれ示すプロ・・ツク図である。図において、　１０１は制御回路、　１０２はデータ番
号生成回路、１０３はｍ位音声波形メモリ、１０４は波
形編集回路、１０５は乱数発生回路、１０６は加算器、
２０１は制御回路、２０２はパラメータターゲツト値生
成回路、２０３はデータ補間回路、２０４は音声合成回
路、２０５はデータ分布値メモリ、２０６は乱数発生回
路、２０７は乗算器、２０８は加算器をそれぞれ表す。1 and 2 are process diagrams showing a first and second embodiment of the present invention, respectively. In the figure, 101 is a control circuit, 102 is a data number generation circuit, 103 is an m-order audio waveform memory, 104 is a waveform editing circuit, 105 is a random number generation circuit, 106 is an adder,
201 is a control circuit, 202 is a parameter target value generation circuit, 203 is a data interpolation circuit, 204 is a speech synthesis circuit, 205 is a data distribution value memory, 206 is a random number generation circuit, 207 is a multiplier, and 208 is an adder. .

Claims

[Claims]

means for determining the values of speech synthesis parameters such as pitch, phoneme duration, amplitude, spectrum, etc. for regular synthesis of speech based on input speech information; and means for generating synthesized speech from the values of the parameters. 1. A regular speech synthesis device, further comprising: means for generating random data; and means for changing the value of the speech synthesis parameter according to the value of the random data.