JPH11282484A

JPH11282484A - Voice synthesizer

Info

Publication number: JPH11282484A
Application number: JP10081319A
Authority: JP
Inventors: Yuji Wada; 田祐司和
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 1998-03-27
Filing date: 1998-03-27
Publication date: 1999-10-15

Abstract

PROBLEM TO BE SOLVED: To perform natural and smooth voice synthesis without enlarging the storage capacity of a phoneme unit waveform storage part so much. SOLUTION: Text data inputted from a text data input part 1 are sent to a phonetic symbol generation part 2 and phonetic symbols are generated there. A phoneme unit waveform calling part 3 calls data pertinent to first phoneme unit waveform data A0 from the phoneme unit waveform storage part 4 based on the generated phonetic symbol. For the text data inputted from the text data input part 1, a phonetic rule is analyzed by a phonetic rule part 6 and a phonetic parameter generation part 7 generates a phonetic parameter based on the analysis. Then, a pitch pattern generation part 8 generates a pitch pattern based on the phonetic parameter. The phoneme waveform data string generation means 12 of a waveform data connection part 5A receives a parameter numerical value from a voice quality parameter storage part 11 based on information from the pitch pattern generation part 8 and generates the phoneme unit waveform data A1 to follow next based on it and the data A0 from the calling part 3.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、規則合成方式に基
づく音声合成を行う音声合成装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech synthesizer for performing speech synthesis based on a rule synthesis method.

【０００２】[0002]

【従来の技術】図３は従来の音声合成装置の構成を示す
ブロック図である。この図において、テキストデータ入
力部１は、入力したテキストデータを音韻記号生成部２
に出力するようになっており、音韻記号生成部２はテキ
ストデータに対応する音韻記号を生成するようになって
いる。音素単位波形呼び出し部３は、音韻記号生成部２
が生成した音韻記号に対応する音素単位波形データを音
素単位波形記憶部４から呼び出すようになっている。そ
して、波形データ接続部５は、音素単位波形呼び出し部
３が順次呼び出した音素単位波形データを接続して音素
波形データ列を生成するようになっている。2. Description of the Related Art FIG. 3 is a block diagram showing a configuration of a conventional speech synthesizer. In the figure, a text data input unit 1 converts input text data into a phoneme symbol generation unit 2.
The phoneme symbol generation unit 2 generates a phoneme symbol corresponding to the text data. The phoneme unit waveform calling unit 3 includes a phoneme symbol generation unit 2
The phoneme unit waveform data corresponding to the generated phoneme symbol is called from the phoneme unit waveform storage unit 4. The waveform data connection unit 5 connects the phoneme unit waveform data sequentially called by the phoneme unit waveform calling unit 3 to generate a phoneme waveform data sequence.

【０００３】図４は、このようにして生成される音素波
形データ列の一例を示す波形図であり、音素単位波形記
憶部４から順次呼び出されたＮ個の音素単位波形データ
Ａ0，Ａ1 ，…，Ａn ，Ａn+1 ，…，ＡN-1 が接続され
て一つの音素波形データ列が形成されている。FIG. 4 is a waveform diagram showing an example of the phoneme waveform data sequence generated in this manner. The N phoneme unit waveform data A0, A1,... , An, An + 1,..., AN-1 are connected to form one phoneme waveform data string.

【０００４】一方、テキストデータ入力部１は、入力し
たテキストデータを韻律規則部６にも出力するようにな
っており、韻律規則部６はテキストデータ中の韻律規則
を分析するようになっている。韻律パラメータ生成部７
は、韻律規則部６が分析した韻律規則に基づいて韻律パ
ラメータを生成するようになっている。そして、ピッチ
パターン生成部８は、韻律パラメータ生成部７が生成し
た韻律パラメータに基づきピッチパターンを生成するよ
うになっている。On the other hand, the text data input unit 1 also outputs the input text data to the prosody rule unit 6, and the prosody rule unit 6 analyzes the prosody rules in the text data. . Prosody parameter generation unit 7
Generates prosody parameters based on the prosody rules analyzed by the prosody rule unit 6. The pitch pattern generation unit 8 generates a pitch pattern based on the prosody parameters generated by the prosody parameter generation unit 7.

【０００５】波形合成部９は、波形データ接続部５が生
成した音素波形データ列と、ピッチパターン生成部８が
生成したピッチパターンとの入力に基づいて音声波形を
合成するようになっている。そして、音声出力部１０は
波形合成部９が合成した音声波形に基づいて音声出力を
行うようになっている。[0005] The waveform synthesizing section 9 synthesizes a voice waveform based on the input of the phoneme waveform data string generated by the waveform data connecting section 5 and the pitch pattern generated by the pitch pattern generating section 8. The audio output unit 10 outputs audio based on the audio waveform synthesized by the waveform synthesizing unit 9.

【０００６】[0006]

【発明が解決しようとする課題】ところで、できるだけ
自然で滑らかな音声合成を行うためには、音素単位波形
記憶部４に全ての音韻データに対応する音素単位データ
を記憶させておき、この中から最適な音素単位データを
呼び出す必要がある。しかし、全ての音韻データに対応
する音素単位データを記憶させるためには、音素単位波
形記憶部４の記憶容量は非常に大きなものとなり、高コ
ストとなる。By the way, in order to synthesize speech as naturally and smoothly as possible, phoneme unit data corresponding to all phoneme data is stored in the phoneme unit waveform storage unit 4, and from among these, It is necessary to call the optimal phoneme unit data. However, in order to store the phoneme unit data corresponding to all phoneme data, the storage capacity of the phoneme unit waveform storage unit 4 becomes very large, and the cost becomes high.

【０００７】一方、このように高コストとなるのを回避
するために音素単位波形記憶部４の記憶容量を一定以下
に制限すると、充分な量の音素単位データを記憶させる
ことができず不自然な音声合成が行われてしまう結果と
なる。On the other hand, if the storage capacity of the phoneme unit waveform storage unit 4 is limited to a certain value or less in order to avoid such a high cost, a sufficient amount of phoneme unit data cannot be stored and unnaturalness occurs. As a result, an unnatural voice synthesis is performed.

【０００８】本発明は上記事情に鑑みてなされたもので
あり、音素単位波形記憶部の記憶容量をそれほど大きく
することなく、自然で滑らかな音声合成を行うことが可
能な音声合成装置を提供することを目的としている。The present invention has been made in view of the above circumstances, and provides a speech synthesis apparatus capable of performing natural and smooth speech synthesis without significantly increasing the storage capacity of a phoneme unit waveform storage unit. It is intended to be.

【０００９】[0009]

【課題を解決するための手段】上記課題を解決するため
の手段として、請求項１記載の発明は、入力したテキス
トデータから音韻記号を生成する音韻記号生成部と、前
記音韻記号に対応する音素単位波形データを記憶する音
素単位波形記憶部と、前記音素単位波形記憶部から順次
呼び出した各音素単位波形データを接続して音素波形デ
ータ列を生成する波形データ接続部と、前記テキストデ
ータからピッチパターンを生成するピッチパターン生成
部と、を備え、前記波形データ接続部から出力される音
素波形データ列を、前記ピッチパターン生成部により生
成したピッチパターンに基づき合成する音声合成装置で
あって、前記波形データ接続部は、声質に関して予め設
定されている項目についてのパラメータ数値を記憶する
声質パラメータ記憶部と、前記ピッチパターン生成部に
より生成したピッチパターンに応じて前記声質パラメー
タ記憶部に記憶されている中から選択されたパラメータ
数値と、前回呼び出した音素単位波形データとが入力さ
れて、次に続くべき音素単位波形データを生成する音素
波形データ列生成手段と、前記音素波形データ列生成手
段にて生成された音素単位波形データを順次接続して出
力する接続手段と、を有するものである、ことを特徴と
する。As means for solving the above-mentioned problems, the invention according to claim 1 comprises a phoneme symbol generation section for generating a phoneme symbol from input text data, and a phoneme corresponding to the phoneme symbol. A phoneme unit waveform storage unit that stores unit waveform data, a waveform data connection unit that connects each phoneme unit waveform data sequentially called from the phoneme unit waveform storage unit to generate a phoneme waveform data sequence, and a pitch from the text data. A voice pattern synthesizer for synthesizing a phoneme waveform data string output from the waveform data connection unit based on the pitch pattern generated by the pitch pattern generation unit, comprising: The waveform data connection unit stores a voice parameter value for a parameter set in advance for voice quality. Unit, parameter values selected from among those stored in the voice quality parameter storage unit according to the pitch pattern generated by the pitch pattern generation unit, and the previously called phoneme unit waveform data is input, and then Phoneme waveform data generation means for generating phoneme unit waveform data to be continued, and connection means for sequentially connecting and outputting the phoneme unit waveform data generated by the phoneme waveform data string generation means, It is characterized by the following.

【００１０】請求項２記載の発明は、請求項１記載の発
明において、前記音素単位データ列生成手段は、前記ピ
ッチパターン生成部により生成したピッチパターンに応
じて前記声質パラメータ記憶部に記憶されている中から
選択されたパラメータ数値と、前回呼び出した音素単位
波形データとが入力されて、次に続くべき音素単位波形
データを出力する階層型ニューラルネットワークで構成
されている、ことを特徴とする。According to a second aspect of the present invention, in the first aspect of the present invention, the phoneme unit data string generating means is stored in the voice quality parameter storing section in accordance with the pitch pattern generated by the pitch pattern generating section. It is configured by a hierarchical neural network that receives a parameter value selected from among the parameters and the previously called phoneme unit waveform data, and outputs the next phoneme unit waveform data to be continued.

【００１１】[0011]

【発明の実施の形態】以下、本発明の実施形態を図に基
づき説明する。但し、図３に示したものと同様の構成要
素には同一の符号を付して重複した説明を省略する。図
１は、本発明の実施形態の構成を示すブロック図であ
る。図１が図３と異なる主な点は、波形データ接続部５
Ａが声質パラメータ記憶部１１、音素波形データ列生成
手段１２、及び接続手段１３を有している点である。但
し、図３における音素単位波形記憶部４は、全ての音韻
データに対応する音素単位データを記憶するため非常に
大きな記憶容量を有していたが、この図１における音素
単位波形記憶部４は、全ての音韻データに対応する音素
単位データを記憶する必要がなくなるため、その記憶容
量は小さなものとなっている。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. However, the same components as those shown in FIG. 3 are denoted by the same reference numerals, and redundant description will be omitted. FIG. 1 is a block diagram showing the configuration of the embodiment of the present invention. The main difference between FIG. 1 and FIG.
A is that it has a voice quality parameter storage unit 11, a phoneme waveform data string generation unit 12, and a connection unit 13. However, the phoneme unit waveform storage unit 4 in FIG. 3 has a very large storage capacity for storing phoneme unit data corresponding to all phoneme data, but the phoneme unit waveform storage unit 4 in FIG. Since it is not necessary to store phoneme unit data corresponding to all phoneme data, the storage capacity is small.

【００１２】声質パラメータ記憶部１１は、声質を決定
する所定のパラメータについて予め設定された数値を記
憶している。所定のパラメータとは、例えば、「アクセ
ントデータ」、「声の高さ」、「発声速度」、「性別
（男女数種類用意する）」等である。The voice quality parameter storage unit 11 stores numerical values preset for predetermined parameters for determining voice quality. The predetermined parameter is, for example, “accent data”, “pitch of voice”, “utterance speed”, “sex (many types of men and women are prepared)” and the like.

【００１３】音素波形データ列生成手段１２は、ピッチ
パターン生成部８が生成したピッチパターン情報に基づ
いて、声質パラメータ記憶部１１から入力するパラメー
タの数値を決定し、さらに、音素単位波形呼び出し部３
が前回呼び出した音素単位波形データの値を入力する。
そして、音素波形データ列生成手段１２は、これらの入
力に基づいて次に続くべき音素単位波形データを生成す
るようになっている。そして、接続手段１３は、音素波
形データ列生成手段１２にて生成された音素単位波形デ
ータを順次接続して出力するものである。The phoneme waveform data string generating means 12 determines the numerical values of the parameters input from the voice quality parameter storage section 11 based on the pitch pattern information generated by the pitch pattern generating section 8, and further generates a phoneme unit waveform calling section 3.
Inputs the value of the previously called phoneme unit waveform data.
Then, the phoneme waveform data sequence generating means 12 is adapted to generate the next successive phoneme unit waveform data based on these inputs. The connection means 13 sequentially connects and outputs the phoneme unit waveform data generated by the phoneme waveform data string generation means 12.

【００１４】図２は、音素波形データ列生成手段１２の
構成図であり、この図に示すように、音素波形データ列
生成手段１２はニューラルネットワークにより構成され
ている。このニューラルネットワークは入力層、中間
層、出力層を有しており、入力層は、音素単位波形デー
タを入力するためのｑ個のニューロンＲ1 ，Ｒ2 ，…，
Ｒq と、声質パラメータを入力するためのｍ個のニュー
ロンＳ1 ，Ｓ2 ，…，Ｓm とから構成されている。ま
た、中間層はｑ個のニューロンＭ1 ，Ｍ2 ，…，Ｍq に
より構成され、出力層は音素単位波形データを出力する
ためのｑ個のニューロンＶ1 ，Ｖ2 ，…，Ｖq により構
成されている。そして、このニューラルネットワークは
上記のような入力及び出力に基づく学習機能（バックプ
ロパゲーション法）を有するものである。FIG. 2 is a block diagram of the phoneme waveform data string generating means 12. As shown in FIG. 2, the phoneme waveform data string generating means 12 is constituted by a neural network. This neural network has an input layer, an intermediate layer, and an output layer. The input layer is composed of q neurons R1, R2, ..., for inputting phoneme unit waveform data.
Rq and m neurons S1, S2,..., Sm for inputting voice quality parameters. The intermediate layer is composed of q neurons M1, M2, ..., Mq, and the output layer is composed of q neurons V1, V2, ..., Vq for outputting phoneme unit waveform data. The neural network has a learning function (back propagation method) based on the input and output as described above.

【００１５】入力層のニューロンＲ1 ，Ｒ2 ，…，Ｒq
に入力される音素単位波形データとは、例えば、図４に
おける音素単位波形データＡ0 の所定周期（Ａ0 全体の
時間をｑ等分することにより得られる周期）毎の振幅値
である。また、声質パラメータの入力については、１つ
の入力データに対して１つ以上のニューロンを用いるこ
ととする。この場合、入力データを２進数化して必要な
数のニューロンを用意し、これに「０」又は「１」を入
力するようにしてもよい。The neurons R1, R2,..., Rq in the input layer
The phoneme unit waveform data to be input to, for example, is an amplitude value of the phoneme unit waveform data A0 in FIG. 4 for each predetermined cycle (period obtained by dividing the entire time of A0 by q). For inputting voice quality parameters, one or more neurons are used for one input data. In this case, the required number of neurons may be prepared by converting the input data into a binary number, and “0” or “1” may be input thereto.

【００１６】次に、上記のように構成される本実施形態
の動作につき説明する。テキストデータ入力部１から入
力されたテキストデータは音韻記号生成部２に送られ、
音韻記号生成部２はこのテキストデータに対応する音韻
記号を生成する。音素単位波形呼び出し部３は、この生
成された音韻記号に基づき、最初の音素単位波形データ
Ａ0 に該当するデータを音素単位波形記憶部４から呼び
出す。Next, the operation of the present embodiment configured as described above will be described. The text data input from the text data input unit 1 is sent to the phoneme symbol generation unit 2,
The phoneme symbol generation unit 2 generates a phoneme symbol corresponding to the text data. The phoneme unit waveform calling unit 3 calls data corresponding to the first phoneme unit waveform data A0 from the phoneme unit waveform storage unit 4 based on the generated phoneme symbols.

【００１７】一方、テキストデータ入力部１から入力さ
れたテキストデータは韻律規則部６にも送られてテキス
トデータ中の韻律規則が分析され、韻律パラメータ生成
部７は、この分析に基づいて韻律パラメータを生成す
る。そして、ピッチパターン生成部８は、この生成され
た韻律パラメータに基づきピッチパターンを生成する。
音素単位波形呼び出し部３が音素単位波形記憶部４から
呼び出した音素単位波形データＡ0 についての数値は音
素波形データ列生成手段１２のニューロンＲ1，Ｒ2 ，
…，Ｒq に入力される。また、ピッチパターン生成部８
が生成したピッチパターンも音素波形データ列生成手段
１２に送られ、音素波形データ列生成手段１２はこのピ
ッチパターンに対応する声質パラメータの数値をニュー
ロンＳ1，Ｓ2 ，…，Ｓm に入力する。On the other hand, the text data input from the text data input unit 1 is also sent to the prosody rule unit 6 to analyze the prosody rules in the text data, and the prosody parameter generation unit 7 generates a prosody parameter based on the analysis. Generate Then, the pitch pattern generation unit 8 generates a pitch pattern based on the generated prosody parameters.
The numerical values of the phoneme unit waveform data A0 called by the phoneme unit waveform calling unit 3 from the phoneme unit waveform storage unit 4 are stored in the neuron R1, R2,.
.., Rq. Also, a pitch pattern generation unit 8
Are also sent to the phoneme waveform data string generating means 12, and the phoneme waveform data string generating means 12 inputs the voice quality parameter values corresponding to the pitch pattern to the neurons S1, S2,..., Sm.

【００１８】音素波形データ列生成手段１２のニューロ
ンＲ1 ，Ｒ2 ，…，Ｒq 及びニューロンＳ1 ，Ｓ2 ，
…，Ｓm にこのような入力が行われると、所定の演算が
行われ、出力層のニューロンＶ1 ，Ｖ2 ，…，Ｖq から
次の音素単位波形データＡ1 についての数値が出力され
る。したがって、図１において、音素波形データ列生成
手段１２はこの数値に対応する音素単位波形データＡ0
を音素単位波形記憶部４から取り出して、音素単位波形
データＡ1 を生成する。The neuron R1, R2,..., Rq of the phoneme waveform data generation means 12 and the neurons S1, S2,.
.., Vq, a predetermined calculation is performed, and a numerical value for the next phoneme unit waveform data A1 is output from the neurons V1, V2,. Therefore, in FIG. 1, the phoneme waveform data string generating means 12 outputs the phoneme unit waveform data A0 corresponding to this numerical value.
From the phoneme unit waveform storage unit 4 to generate phoneme unit waveform data A1.

【００１９】次いで、音素波形データ列生成手段１２の
ニューロンＲ1 ，Ｒ2 ，…，Ｒq には、今度は音素単位
波形データＡ0 の代わりにＡ1 についての数値が入力さ
れ、ニューロンＶ1 ，Ｖ2 ，…，Ｖq から次の音素単位
波形データＡ2 についての数値が出力される。そして、
音素波形データ列生成手段１２は同様にして音素波形デ
ータＡ2 を生成する。音素波形データ列生成手段１２
は、以下同様にして音素波形データ列Ａ3 ，Ａ4 ，…，
Ａn ，Ａn+1 ，…，ＡN-1 を順次生成する。Next, the neuron R1, R2,..., Rq of the phoneme waveform data sequence generating means 12 is supplied with a numerical value for A1 instead of the phoneme unit waveform data A0, and the neurons V1, V2,. Output a numerical value for the next phoneme unit waveform data A2. And
The phoneme waveform data sequence generating means 12 similarly generates phoneme waveform data A2. Phoneme waveform data string generation means 12
Is performed in the same manner as described above, and the phoneme waveform data strings A3, A4,.
An, An + 1,..., AN-1 are sequentially generated.

【００２０】接続手段１３では、このようにして音素波
形データ列生成手段１２が生成した音素波形データ列Ａ
0 ，Ａ1 ，…，Ａn ，Ａn+1 ，…，ＡN-1 を接続し、こ
れを波形合成部９に出力する。波形合成部９は、ピッチ
パターン生成部８からのピッチパターンに基づき、波形
データ接続部５Ａから逐次送られてくるＡ0 ，Ａ1 ，
…，Ａn ，Ａn+1 ，…，ＡN-1 等の音素波形データ列を
用いて音声波形の合成を行う。そして、音声出力部１０
は波形合成部９が合成した音声波形に基づいて音声出力
を行う。In the connection means 13, the phoneme waveform data string A generated by the phoneme waveform data string
, An, An + 1,..., AN-1 are connected to each other and output to the waveform synthesizer 9. Based on the pitch pattern from the pitch pattern generating unit 8, the waveform synthesizing unit 9 sequentially transmits A0, A1,.
, An, An + 1,..., AN-1, etc., are used to synthesize a speech waveform. Then, the audio output unit 10
Performs audio output based on the audio waveform synthesized by the waveform synthesizer 9.

【００２１】上述したように、図１の構成では、音素波
形データ列生成手段１２は、声質パラメータ記憶部１１
からのデータ入力と、前回呼び出した音素単位波形デー
タの入力とに基づいて、次に続くべき音素単位波形デー
タを生成している。したがって、自然で滑らかな音声合
成を行うことができると共に、音素単位波形記憶部４に
は、従来のように、全ての音韻データに対応する音素単
位波形データを記憶させておく必要がなくなるため、こ
の音素単位波形記憶部４の記憶容量を小さなものとする
ことができる。As described above, in the configuration of FIG. 1, the phoneme waveform data string generation means 12
, And the phoneme unit waveform data to be continued next is generated based on the input of the phoneme unit waveform data called last time. Therefore, it is possible to perform natural and smooth speech synthesis, and it is not necessary to store phoneme unit waveform data corresponding to all phoneme data in the phoneme unit waveform storage unit 4 unlike the related art. The storage capacity of the phoneme unit waveform storage unit 4 can be reduced.

【００２２】なお、上記実施形態では、図２に示した音
素波形データ列生成手段１２の入力層におけるニューロ
ンＲ1 ，Ｒ2 ，…，Ｒq には、前回呼び出された音素単
位波形データの数値が入力されるようになっていたが、
ニューロン数をさらに増やし、前回以前に呼び出された
全ての波形データの数値を入力させるようにして次に続
くべき音素単位波形データを生成するようにしてもよ
い。これによれば、より一層適切な音素単位波形データ
を呼び出すことができるようになる。In the above embodiment, the values of the previously called phoneme unit waveform data are input to the neurons R1, R2,..., Rq in the input layer of the phoneme waveform data sequence generating means 12 shown in FIG. Was supposed to be
The number of neurons may be further increased, and the numerical values of all the previously called waveform data may be input to generate the next phoneme unit waveform data to be continued. According to this, it becomes possible to call up more appropriate phoneme unit waveform data.

【００２３】さらに、上記実施形態では音素波形データ
列生成手段１２をニューラルネットワークにより構成し
た例を示したが、特にニューラルネットワークのみに限
定する必要はなく、例えばファジイ演算回路など他の機
能を有する回路を用いて構成することもできる。Further, in the above-described embodiment, an example is shown in which the phoneme waveform data string generating means 12 is constituted by a neural network. However, the present invention is not limited to the neural network only. For example, a circuit having another function such as a fuzzy arithmetic circuit Can also be used.

【００２４】[0024]

【発明の効果】以上のように、本発明によれば、声質パ
ラメータと、前回呼び出した音素単位波形データとに基
づいて、次に続くべき音素単位波形データを生成する構
成としているので、音素単位波形記憶部の記憶容量をそ
れほど大きくすることなく、自然で滑らかな音声合成を
行うことができる。As described above, according to the present invention, based on the voice quality parameter and the previously called phoneme unit waveform data, the phoneme unit waveform data to be succeeded next is generated. Natural and smooth speech synthesis can be performed without increasing the storage capacity of the waveform storage unit so much.

[Brief description of the drawings]

【図１】本発明の実施形態の構成図。FIG. 1 is a configuration diagram of an embodiment of the present invention.

【図２】図１における音素波形データ列生成手段１２の
ニューラルネットワーク構成を示す説明図。FIG. 2 is an explanatory diagram showing a neural network configuration of a phoneme waveform data string generation unit 12 in FIG.

【図３】従来例の構成図。FIG. 3 is a configuration diagram of a conventional example.

【図４】図１又は図３における波形データ接続部５，５
Ａによって生成される音素波形データ列の例を示す波形
図。FIG. 4 is a waveform data connection unit 5, 5 in FIG. 1 or FIG.
FIG. 4 is a waveform chart showing an example of a phoneme waveform data string generated by A.

[Explanation of symbols]

１テキストデータ入力部２音韻記号生成部３音素単位波形呼び出し部４音素単位波形記憶部５，５Ａ波形データ接続部６韻律規則部７韻律パラメータ生成部８ピッチパターン生成部９波形合成部１０音声出力部１１声質パラメータ記憶部１２音素波形データ列生成手段１３接続手段 Reference Signs List 1 Text data input unit 2 Phoneme symbol generation unit 3 Phoneme unit waveform calling unit 4 Phoneme unit waveform storage unit 5, 5A waveform data connection unit 6 Prosody rule unit 7 Prosody parameter generation unit 8 Pitch pattern generation unit 9 Waveform synthesis unit 10 Voice output Unit 11 voice quality parameter storage unit 12 phoneme waveform data string generation means 13 connection means

Claims

[Claims]

1. A phoneme symbol generation unit for generating a phoneme symbol from input text data, a phoneme unit waveform storage unit for storing phoneme unit waveform data corresponding to the phoneme symbol, and sequentially called from the phoneme unit waveform storage unit. And a pitch pattern generation unit that generates a phoneme waveform data string by connecting each of the phoneme unit waveform data, and a pitch pattern generation unit that generates a pitch pattern from the text data, and is output from the waveform data connection unit. A speech synthesizer for synthesizing a phoneme waveform data sequence based on a pitch pattern generated by the pitch pattern generation unit, wherein the waveform data connection unit stores a parameter value for a preset item for voice quality. A parameter storage unit; A phoneme waveform data string generating means for receiving a parameter numerical value selected from those stored in the voice quality parameter storage unit and the previously called phoneme unit waveform data and generating phoneme unit waveform data to be continued next; And a connection means for sequentially connecting and outputting phoneme unit waveform data generated by the phoneme waveform data string generation means, and a speech synthesizing apparatus.

2. The phoneme unit data string generation means according to a pitch pattern generated by the pitch pattern generation section, a parameter numerical value selected from the voice quality parameter storage section and a previously called phoneme. The speech synthesis apparatus according to claim 1, wherein the speech synthesis apparatus is configured by a hierarchical neural network that receives unit waveform data and outputs phoneme unit waveform data to be continued next.