JP2679623B2

JP2679623B2 - Text-to-speech synthesizer

Info

Publication number: JP2679623B2
Application number: JP6103652A
Authority: JP
Inventors: 和彦岩田
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1994-05-18
Filing date: 1994-05-18
Publication date: 1997-11-19
Anticipated expiration: 2012-11-19
Also published as: JPH07311588A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、文字で書かれたテキス
トを音声で読み上げるテキスト音声合成装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a text-to-speech synthesizer for reading a text written in characters by voice.

【０００２】[0002]

【従来の技術】従来のテキスト音声合成装置は、複数の
文からなるテキストを読み上げる場合、どの文も同じよ
うなリズムやイントネーションで読み上げていた。2. Description of the Related Art In a conventional text-to-speech synthesizer, when reading a text consisting of a plurality of sentences, every sentence is read with a similar rhythm and intonation.

【０００３】図３は、従来技術によるテキスト音声合成
装置の一実施例を示すブロック図である。複数の文から
なるテキストが、テキスト入力端子１０１を通して文切
り出し部１０２に入力される。文切り出し部１０２は、
テキストから文の区切りとなる文字を探し、テキストを
文に分割する。文の区切りとなる文字としては「。」や
「．」などが考えられるが、その他の文字を予め定義し
ておいても構わない。分割された各文は、韻律生成部３
００に送られる。韻律生成部３００は、各文の韻律パタ
ンをそれぞれ独立に生成する。すなわち、当該文の持つ
特徴、例えば文の長さや文の構造などに基づいて韻律パ
タンを生成するが、当該文の前後にある他の文の情報は
利用しない。生成された韻律パタンは、音声合成部１０
６に送られる。音声合成部１０６は、各文の韻律パタン
に基づいてそれぞれの文の合成音声を生成し、合成音声
出力端子１０７から出力する。FIG. 3 is a block diagram showing an embodiment of a conventional text-to-speech synthesizer. Text consisting of a plurality of sentences is input to the sentence cutout unit 102 through the text input terminal 101. The sentence cutout unit 102
Find the character that separates sentences from the text and divide the text into sentences. Characters that can be used as sentence delimiters include "." And ".", But other characters may be defined in advance. The divided sentences are processed by the prosody generation unit 3
Sent to 00. The prosody generation unit 300 independently generates the prosody pattern of each sentence. That is, the prosody pattern is generated based on the characteristics of the sentence, such as the length of the sentence or the structure of the sentence, but the information of other sentences before and after the sentence is not used. The generated prosody pattern is used by the speech synthesis unit 10
Sent to 6. The voice synthesis unit 106 generates a synthetic voice of each sentence based on the prosodic pattern of each sentence, and outputs the synthetic voice from the synthetic voice output terminal 107.

【０００４】このような従来技術を用いたテキスト音声
合成装置については、日本音響学会平成５年度秋季研究
発表会講演論文集２−８−１３「パソコン向けソフトウ
ェア日本語テキスト音声合成」（文献１）などで詳しく
述べられている。Regarding the text-to-speech synthesizer using such a conventional technique, the Acoustical Society of Japan 1993 Autumn Research Presentation Meeting Proceedings 2-8-13 "Software Japanese Text-to-Speech Synthesis for PC" (Reference 1) And so on.

【０００５】[0005]

【発明が解決しようとする課題】上述した従来のテキス
ト音声装置では、文を読み上げる際の韻律的特徴は常に
一定となっている。ここで言う韻律的特徴とは、読み上
げの速さ、声の高さ、声の大きさなどの値や、それらの
変化のパタンを指す。１つ１つの文は自然に発声するこ
とができるような装置であっても、どの文も常に一定の
リズムやイントネーションで読み上げたのでは、利用者
に単調な印象を与えてしまう。読み上げが単調にならな
いようにするためには、テキストの流れに沿って韻律的
特徴を変化させる必要がある。しかしながら、従来のテ
キスト音声合成装置では、韻律的特徴に変化を持たせる
ためには、利用者が装置に対して何らかの指示を出さな
ければならなかった。In the above-mentioned conventional text-to-speech apparatus, the prosodic features when reading a sentence are always constant. The prosodic features mentioned here refer to values such as reading speed, voice pitch, and voice volume, and patterns of changes thereof. Even if each sentence is a device that can spontaneously speak, if every sentence is always read out with a constant rhythm and intonation, it gives a monotonous impression to the user. In order to avoid monotone reading, it is necessary to change prosodic features along the flow of text. However, in the conventional text-to-speech synthesizer, the user has to give some instruction to the device in order to change the prosodic features.

【０００６】これに対して本発明は、入力されたテキス
トの流れに沿って自動的に各文の韻律的特徴を変化さ
せ、人間が行うのにより近い読み上げを可能にするテキ
スト音声合成装置を提供することを目的としている。On the other hand, the present invention provides a text-to-speech synthesizer capable of automatically changing the prosodic characteristics of each sentence along with the flow of input text and enabling a reading closer to human beings. The purpose is to do.

【０００７】[0007]

【課題を解決するための手段】第１の発明のテキスト音
声合成装置は、複数の文からなるテキストを入力するテ
キスト入力端子と、前記テキスト入力端子に接続し前記
テキストを文に分割する文切り出し手段と、この文切り
出し手段に接続し予め定めた基準に基づいて前記文の韻
律パタンを生成し運用韻律パタンとする韻律生成手段
と、この韻律生成手段に接続し前記運用韻律パタンにし
たがって音声を合成する音声合成手段とを有するテキス
ト音声合成装置であって、前記韻律生成手段が、前記文
切り出し手段に接続し予め定めた基準に基づいて前記文
の韻律パタンを生成し標準韻律パタンとする標準韻律生
成手段と、この標準韻律生成手段に接続し前記テキスト
を構成する全ての文の前記標準韻律パタンを蓄積する韻
律パタン蓄積手段と、この韻律パタン蓄積手段に接続し
予め定めた基準にしたがって各文の前記標準韻律パタン
を変形し運用韻律パタンとする韻律パタン変更手段とを
含むテキスト音声合成装置において、 The text-to-speech synthesizer of the first invention is a text input terminal for inputting a text consisting of a plurality of sentences, and a sentence cutout for connecting the text input terminal to divide the text into sentences. Means, a prosody generation means connected to the sentence cut-out means to generate a prosody pattern of the sentence on the basis of a predetermined criterion and used as an operation prosody pattern, and a voice connected to the prosody generation means according to the operation prosody pattern. A text-to-speech synthesizer having a speech synthesizing means for synthesizing , wherein the prosody generation means is connected to the sentence cut-out means and generates a prosody pattern of the sentence on the basis of a predetermined standard to obtain a standard prosody pattern. A prosody generation means, and a prosody pattern storage means that is connected to the standard prosody generation means and stores the standard prosody pattern of all the sentences forming the text. A prosodic pattern changing means for this prosodic pattern storage means connected to a predetermined reference according to each statement of the modification of the standard prosodic pattern production prosody pattern
In a text-to-speech synthesizer including

【０００８】前記韻律パタン変更手段が、前記韻律パタ
ン蓄積手段に接続し前記テキストを構成する全ての文の
前記標準韻律パタンの変化を近似する近似曲線を算出す
るとともに近似曲線と各文の前記標準韻律パタンとの差
である近似誤差を算出する近似曲線算出手段と、この近
似曲線算出手段に接続し前記近似誤差に基づいて各文の
前記標準韻律パタンを変形し運用韻律パタンとする近似
誤差変更手段とを含むことを特徴とする。[0008] The prosody pattern changing means is configured to change the prosody pattern.
Of all the sentences that make up the text by connecting to the
Calculate an approximate curve that approximates the change in the standard prosodic pattern
Difference between the approximate curve and the standard prosodic pattern of each sentence
The approximation curve calculating means for calculating the approximation error
It is connected to the similar curve calculation means, and based on the approximation error of each sentence
An approximation that transforms the standard prosody pattern into an operational prosody pattern
And an error changing unit .

【０００９】第２の発明のテキスト音声合成装置は、第
１の発明において、近似曲線算出手段が算出する近似曲
線が回帰直線であることを特徴とする。 The text-to-speech synthesizer of the second invention is
In the invention of 1, the approximate music calculated by the approximate curve calculation means
The line is a regression line.

【００１０】第３の発明のテキスト音声合成装置は、第
１または第２の発明において、近似誤差変更手段が、前
記近似誤差をＥとするとき予め定めた定数ａ、ｂを用い
て運用近似誤差Ｅ′＝ａＥ＋ｂを求め、前記近似曲線に
前記運用近似誤差を加算して運用韻律パタンを生成する
ことを特徴とする。 The text-to-speech synthesizer of the third invention is
In the first or second invention, the approximation error changing means is
When the approximation error is set to E, the predetermined constants a and b are used.
Operation approximation error E ′ = aE + b
The operational approximation error is added to generate an operational prosodic pattern.
It is characterized by the following.

【００１１】第４の発明のテキスト音声合成装置は、第
１、第２または第３の発明において、韻律パタン変更手
段が変更する韻律パタンが各文の発話速度であることを
特徴とする。 The text-to-speech synthesizer of the fourth invention is
In the first, second or third invention, the prosody pattern changing hand
Check that the prosodic pattern that the dan changes is the speaking rate of each sentence.
Features.

【００１２】第５の発明のテキスト音声合成装置は、第
１，第２または第３の発明において、韻律パタン変更手
段が変更する韻律パタンが各文の平均ピッチ周波数であ
ることを特徴とする。 The text-to-speech synthesizer of the fifth invention is
In the first, second or third invention, the prosody pattern changing hand
The prosodic pattern that the dan changes is the average pitch frequency of each sentence.
It is characterized by that.

【００１３】第６の発明のテキスト音声合成装置は、第
１、第２または第３の発明において、韻律パタン変更手
段が変更する韻律パタンが各文の平均パワーであること
を特徴とする。 The text-to-speech synthesizer of the sixth invention is
In the first, second or third invention, the prosody pattern changing hand
The prosodic pattern that the dan changes is the average power of each sentence
It is characterized by.

【００１４】[0014]

【作用】音声の韻律的な特徴に関わる物理的なパラメー
タとしては、音素の継続時間長、ピッチ周波数パタン、
振幅パタンなどが上げられる。人間は、テキストを読み
上げるとき、そのテキストの流れに沿ってこれらの韻律
的な特徴を様々に変化させる。本発明は、複数の文から
なるテキストにおいて、各文の韻律的特徴を変更し、人
間が行うのに近い読み上げを実現する。以下では、その
方法について詳しく説明する。The physical parameters related to the prosodic features of speech include phoneme duration, pitch frequency pattern,
The amplitude pattern etc. can be raised. When reading a text, humans vary these prosodic features along the flow of the text. The present invention modifies the prosodic features of each sentence in a text consisting of a plurality of sentences, and realizes reading aloud similar to what humans do. The method will be described in detail below.

【００１５】まず、入力される複数の文からなるテキス
トを１つ１つの文に分割し、文ごとにそれぞれの韻律パ
タンを生成する。文の韻律パタンを生成する方法として
は、文献１で述べられているような方法を用いることが
できる。次に、一旦、テキストを構成する全ての文の韻
律パタンを蓄積する。このときの各文の韻律パタンに
は、文の長さなど、その文の持つ固有の特徴に応じてあ
る程度のばらつきが生じる。しかし、そのばらつきは小
さいので、このままではテキスト全体を通じて同じ調子
で読み上げている印象を与えてしまう。そこで、一旦蓄
積した各文の韻律パタンを予め定めた基準にしたがって
変形し、出力する合成音声の韻律パタンとする。 First , a text consisting of a plurality of input sentences is divided into individual sentences, and prosodic patterns are generated for each sentence. As a method of generating a prosodic pattern of a sentence, the method described in Document 1 can be used. Next, the prosodic patterns of all the sentences that make up the text are temporarily stored. At this time, the prosodic pattern of each sentence varies to some extent depending on the peculiar characteristics of the sentence such as the length of the sentence. However, since the variation is small, it gives the impression that the text is read in the same tone throughout the text. Therefore, the prosodic pattern of each sentence that has been temporarily stored is transformed according to a predetermined standard to obtain the prosodic pattern of the synthetic speech to be output.

【００１６】第１、第２、及び第３の発明は、一旦蓄積
した各文の韻律パタンを変形する方法を提供するもので
ある。第１の発明では、各文の韻律パタンのばらつきを
変形する方法として、まず近似曲線を求める。この近似
曲線は、テキスト全体を通しての韻律パタンの大きな流
れを表したものとなる。次に、この近似曲線による各文
の韻律パタンの近似誤差を求める。これは、各文に固有
の特徴（例えば文の長さ、文の構造など）によって生じ
る、全体の流れからのズレと捉えることができる。この
ズレ、すなわち近似誤差を強調するような処理を施すこ
とによって、テキスト全体の韻律パタンに変化を与え
る。第２の発明では、この近似曲線として、特に回帰直
線を用いる。第３の発明では、近似誤差をＥとすると、
予め定めた定数ａ、ｂを用いて、Ｅ′＝ａＥ＋ｂ（１）なる変形を施す。この定数ａは、その絶対値が１より大
きいとき入力テキストの各文の近似誤差を拡大し、１よ
り小さいとき縮小する役割を果たす。定数ｂは、入力テ
キスト全体の韻律パタンを平行移動する役割を果たす。
すなわち、例えば対象としている韻律パタンが発話速度
である場合、ｂに正の数を与えればテキスト全体の読み
上げ速度が速くなる。逆に、ｂに負の数を与えれば、読
み上げ速度が遅くなる。 The first, second, and third inventions provide a method of transforming the prosodic pattern of each sentence once accumulated. In the first aspect of the invention, an approximate curve is first obtained as a method of modifying the variation of the prosody pattern of each sentence. This approximate curve represents a large flow of prosodic patterns throughout the text. Next, the approximation error of the prosodic pattern of each sentence by this approximation curve is obtained. This can be regarded as a deviation from the overall flow caused by a characteristic unique to each sentence (for example, sentence length, sentence structure, etc.). By subjecting this deviation, that is, processing that emphasizes the approximation error, the prosodic pattern of the entire text is changed. In the second invention, a regression line is used as the approximation curve. In the third invention, when the approximation error is E,
Using the predetermined constants a and b, the transformation of E ′ = aE + b (1) is performed. This constant a serves to expand the approximation error of each sentence of the input text when its absolute value is larger than 1 and to reduce it when it is smaller than 1. The constant b serves to translate the prosodic pattern of the entire input text.
That is, for example, when the target prosodic pattern is the utterance speed, if a positive number is given to b, the reading speed of the entire text becomes faster. On the contrary, if a negative number is given to b, the reading speed becomes slow.

【００１７】第４、第５、及び第６の発明は、変形され
る韻律パタンがそれぞれ、各文の発話速度、平均ピッチ
周波数、及び平均パワーであるものである。In the fourth, fifth, and sixth inventions, the prosodic patterns to be transformed are the speech rate, the average pitch frequency, and the average power of each sentence, respectively.

【００１８】次に、回帰直線による近似誤差Ｅの変形方
法として、Ｅ′＝３Ｅ（２）を用いて、各文の発話速度を変更する場合を例に取り、
本発明の動作原理をさらに詳しく説明する。図４は本発
明の動作原理を説明するための図である。入力テキスト
として、６つの文からなるテキストを考える。As a method of transforming the approximation error E by the regression line, E '= 3E (2) is used to change the speech rate of each sentence.
The operation principle of the present invention will be described in more detail. FIG. 4 is a diagram for explaining the operation principle of the present invention. As input text, consider a text consisting of six sentences.

【００１９】まず、テキストを構成するそれぞれの文に
ついて、音素の継続時間長を算出する。この方法として
は、例えば文献１で述べられているような従来から知ら
れている方法を利用することができる。文の各音素の継
続時間長を足し合わせて文全体の継続時間長を求める。
文のモーラ数をこの文全体の継続時間長で割って、その
文の発話速度（モーラ／秒）を求める。このようにして
求めた各文の発話速度が示すパタンが、図４の標準発話
速度パタン４０１のようであったとする。First, the phoneme duration is calculated for each sentence constituting the text. As this method, for example, a conventionally known method described in Document 1 can be used. The duration of each phoneme of the sentence is added to obtain the duration of the entire sentence.
The number of moras of a sentence is divided by the duration of the whole sentence to obtain the speech rate (mora / second) of the sentence. It is assumed that the pattern indicating the speech rate of each sentence thus obtained is like the standard speech rate pattern 401 in FIG.

【００２０】次に、この標準発話速度パタン４０１と文
番号との関係を近似する回帰直線４０２を求める。この
回帰直線は、入力テキスト全体を通しての発話速度の大
まかな変化の傾向を表している。この回帰直線４０２と
各文の標準発話速度との差（近似誤差）Ｅを（２）式に
よって変形した近似誤差Ｅ′を求める。回帰直線４０２
に変形後の近似誤差Ｅ′を加えた発話速度のパタンを、
運用発話速度パタン４０３とする。このようにして得ら
れた運用発話速度パタン４０３は、テキスト全体として
の発話速度の変化傾向は元の標準発話速度パタン４０１
の持つ傾向を維持したまま、標準発話速度パタン４０１
よりも大きく変化することになる。そして、それぞれの
文の発話速度が、運用発話速度パタン４０３が示す値と
なるように各文の音素の継続時間長を算出し直す。これ
により、従来方式では同じような発話速度の繰り返しで
単調になってしまっていたテキストの読み上げに、変化
を持たせることが可能となる。Next, a regression line 402 that approximates the relationship between the standard speech rate pattern 401 and the sentence number is obtained. This regression line shows the tendency of the rough change in the speaking rate throughout the input text. An approximation error E'is obtained by transforming the difference (approximation error) E between the regression line 402 and the standard speech rate of each sentence by the equation (2). Regression line 402
The pattern of the speech speed, which is obtained by adding the approximate error E ′ after transformation to
The operation speech rate pattern 403 is set. The operation speech rate pattern 403 obtained in this way is based on the original standard speech rate pattern 401, which indicates that the change tendency of the speech rate of the entire text is the original standard speech rate pattern 403.
Standard speech rate pattern 401 while maintaining the tendency of
Will change more than. Then, the phoneme duration of each sentence is recalculated so that the utterance speed of each sentence becomes the value indicated by the operational utterance speed pattern 403. As a result, it becomes possible to change the reading of text, which was monotonous in the conventional method due to the repetition of the same speech rate.

【００２１】ここでは、韻律パタンとして特に発話速度
を例にして説明したが、各文の平均ピッチ周波数や平均
パワーも、同じように扱って同様の効果を得ることがで
きる。Although the utterance speed has been described as an example of the prosodic pattern, the average pitch frequency and the average power of each sentence can be treated in the same manner to obtain the same effect.

【００２２】また、回帰直線による近似を例に示した
が、その他の近似方法を利用することも可能である。近
似誤差の変形方法についても、ここでは当該文の近似誤
差のみを用いる方法について説明したが、前後の文の近
似誤差を利用する方法も考えられる。Further, although the approximation by the regression line is shown as an example, other approximation methods can be used. Regarding the method of transforming the approximation error, the method using only the approximation error of the sentence has been described here, but a method of utilizing the approximation error of the preceding and following sentences is also conceivable.

【００２３】以上のような方法を用いることによって、
入力されたテキストの各文の韻律パタンを自動的に変化
させることが可能となる。これにより、従来のテキスト
音声合成装置では不可能であった自然な朗読音声を出力
できるようになる。By using the above method,
It is possible to automatically change the prosody pattern of each sentence of the input text. As a result, it becomes possible to output a natural read-aloud voice, which was impossible with the conventional text-to-speech synthesizer.

【００２４】[0024]

【実施例】図１は、本発明によるテキスト音声合成装置
の参考例を示すブロック図である。複数の文からなるテ
キストを、テキスト入力端子１０１を通して文切り出し
部１０２に入力する。文切り出し部１０２は、入力テキ
ストの中から、文の区切りとなる文字を探し出す。そし
て、この区切りとなる文字によって入力テキストをそれ
を構成する各文に分割し、順次、標準韻律生成部１０３
に送る。標準韻律生成部１０３は、従来から知られてい
る方法などを用いて、文切り出し部１０２から送られて
くる各文の標準的な韻律パタンを生成し、これらを順
次、蓄積部１０４に送る。蓄積部１０４は、標準韻律生
成部１０３から送られてくる各文の韻律パタンを蓄積し
ていく。標準韻律生成部１０３より、入力テキストの最
後の文の韻律パタンが送られてきたところで、蓄積して
いた全ての韻律パタンを変更部１０５に送る。変更部１
０５は、入力テキストの全ての韻律パタンの情報を使っ
て、それぞれの文の韻律パタンに変形を加え、音声を合
成する際に用いる最終的な運用韻律パタンを生成し、音
声合成部１０６に送る。音声合成部１０６では、変更部
１０５から送られてくる運用韻律パタンにしたがって合
成音声を生成し、合成音声出力端子１０７に出力する。1 is a block diagram showing a reference example of a text-to-speech synthesizer according to the present invention. A text composed of a plurality of sentences is input to the sentence cutout unit 102 through the text input terminal 101. The sentence cutout unit 102 searches the input text for a character that serves as a sentence break. Then, the input text is divided into the sentences that compose it by the characters that serve as the delimiters, and the standard prosody generation unit 103
Send to The standard prosody generation unit 103 generates a standard prosody pattern of each sentence sent from the sentence cutout unit 102 using a conventionally known method or the like, and sequentially sends these to the storage unit 104. The accumulation unit 104 accumulates the prosody pattern of each sentence sent from the standard prosody generation unit 103. When the prosody pattern of the last sentence of the input text is sent from the standard prosody generation unit 103, all the stored prosody patterns are sent to the changing unit 105. Change part 1
05 uses the information of all the prosodic patterns of the input text to modify the prosodic patterns of the respective sentences to generate the final operational prosodic pattern used when synthesizing the voice, and sends it to the voice synthesizer 106. . The voice synthesizing unit 106 generates a synthetic voice according to the operational prosody pattern sent from the changing unit 105, and outputs it to the synthetic voice output terminal 107.

【００２５】図２は、第１の発明によるテキスト音声合
成装置の一実施例を示すブロック図である。変更部２０
５が、近似曲線算出部２０１、及び近似誤差変更部２０
２を含む構成となっている。蓄積部１０４に蓄えられて
いた、入力テキストを構成する全ての文の標準韻律パタ
ンは、近似曲線算出部２０１に送られる。近似曲線算出
部２０１は、入力テキストの各文の文番号と、各文の韻
律パタンとの間の関係を近似する近似曲線と、その近似
曲線による近似誤差とを求める。ここで、近似曲線とし
て特に回帰直線を用いるものが、第２の発明である。得
られた近似曲線及び近似誤差は、近似誤差変更部２０２
に送られる。近似誤差変更部２０２は、近似曲線算出部
２０１が求めた近似誤差に対して、予め定めた方法によ
って変更を加える。変更後の近似誤差と、近似曲線算出
部２０１から送られたきた近似曲線とから各文の韻律パ
タンを求め直し、これを運用韻律パタンとする。このと
き、近似曲線算出部２０１が求めた近似誤差に対して、
作用の項で説明した（１）式による変形を加えるもの
が、第３の発明である。その他の部分の動作について
は、前述の参考例と同様であるので説明を省略する。FIG. 2 is a block diagram showing an embodiment of a text-to-speech synthesizer according to the first invention. Change unit 20
5 is an approximate curve calculation unit 201 and an approximate error changing unit 20.
It is configured to include 2. The standard prosody patterns of all the sentences forming the input text, which are stored in the storage unit 104, are sent to the approximate curve calculation unit 201. The approximate curve calculation unit 201 obtains an approximate curve that approximates the relation between the sentence number of each sentence of the input text and the prosodic pattern of each sentence, and an approximation error due to the approximate curve. Here, the second invention particularly uses a regression line as the approximated curve. The approximated curve and the approximated error obtained are approximated by the approximated error changing unit 202.
Sent to The approximation error changing unit 202 changes the approximation error calculated by the approximation curve calculating unit 201 by a predetermined method. The prosody pattern of each sentence is recalculated from the changed approximation error and the approximation curve sent from the approximation curve calculation unit 201, and this is used as the operational prosody pattern. At this time, with respect to the approximation error calculated by the approximation curve calculation unit 201,
A third aspect of the invention is a modification of the equation (1) described in the section of the action. The operation of the other parts is the same as that of the above-mentioned reference example, and therefore the description thereof is omitted.

【００２６】図１または図２の実施例における変更部１
０５または２０５において変更される韻律パタンが、入
力テキストを構成する各文の発話速度であるものが、第
４の発明である。Modification unit 1 in the embodiment of FIG. 1 or FIG.
The prosody pattern changed in 05 or 205 is the utterance speed of each sentence constituting the input text.
4th invention.

【００２７】図１または図２の実施例における変更部１
０５または２０５において変更される韻律パタンが、入
力テキストを構成する各文の平均ピッチ周波数であるも
のが、第５の発明である。Modification unit 1 in the embodiment of FIG. 1 or FIG.
The fifth invention is that the prosody pattern changed in 05 or 205 is the average pitch frequency of each sentence constituting the input text.

【００２８】さらに、図１または図２の実施例における
変更部１０５または２０５において変更される韻律パタ
ンが、入力テキストを構成する各文の平均パワーである
ものが、第６の発明である。Further, the sixth invention is that the prosody pattern changed by the changing unit 105 or 205 in the embodiment of FIG. 1 or FIG. 2 is the average power of each sentence constituting the input text.

【００２９】[0029]

【発明の効果】以上述べてきたように、本発明によれ
ば、複数の文からなるテキストが入力された場合に、そ
のテキストを構成する各文のリズムやイントネーション
などの韻律的な特徴を自動的に変化させて読み上げを行
うことが可能となる。したがって、例えば、読書器など
のように大量のテキストを読み上げる必要があるテキス
ト音声合成装置などを実現するのに非常に有効である。As described above, according to the present invention, when a text consisting of a plurality of sentences is input, prosodic features such as rhythm and intonation of each sentence constituting the text are automatically detected. It is possible to read aloud by changing the input. Therefore, it is very effective for realizing a text-to-speech synthesizer that needs to read a large amount of text, such as a reader.

[Brief description of the drawings]

【図１】本発明によるテキスト音声合成装置の参考例を
示すブロック図である。FIG. 1 is a block diagram showing a reference example of a text-to-speech synthesizer according to the present invention.

【図２】第１の発明によるテキスト音声合成装置の一実
施例を示すブロック図である。FIG. 2 is a block diagram showing an embodiment of a text-to-speech synthesizer according to the first invention.

【図３】従来技術によるテキスト音声合成装置の一実施
例を示すブロック図である。FIG. 3 is a block diagram showing an embodiment of a conventional text-to-speech synthesizer.

【図４】本発明の動作原理を説明するための図である。FIG. 4 is a diagram for explaining the operating principle of the present invention.

[Explanation of symbols]

１００，２００，３００韻律生成部１０１テキスト入力端子１０２文切り出し部１０３標準韻律生成部１０４蓄積部１０５，２０５変更部１０６音声合成部１０７合成音声出力端子２０１近似曲線算出部２０２近似誤差変更部４０１標準発話速度パタン４０２回帰直線４０３運用発話速度パタン 100, 200, 300 Prosody generation unit 101 Text input terminal 102 Sentence extraction unit 103 Standard prosody generation unit 104 Accumulation unit 105, 205 Change unit 106 Speech synthesis unit 107 Synthetic speech output terminal 201 Approximate curve calculation unit 202 Approximation error change unit 401 Standard Speech rate pattern 402 Regression line 403 Operation speech rate pattern

Claims

(57) [Claims]

1. A text inputting text consisting of a plurality of sentences.
Connected to the text input terminal and the text input terminal.
A sentence segmentation method that divides the text into sentences, and this sentence segmentation
Connected to the means and based on a predetermined standard, the prosody of the sentence
A prosody generating means for generating a pattern to be an operational prosody pattern,
By connecting to this prosody generation means, the above-mentioned operational prosody pattern was used.
Sound having a voice synthesizing means for synthesizing voice
A voice synthesizer, wherein the prosody generation means includes the sentence segmentation
It is connected to the sending means and the rhyme of the sentence is based on predetermined criteria.
A standard prosody generator that generates a temperament pattern and makes it a standard prosody pattern.
Dan and connect to this standard prosody generation means to construct the text.
Prosodic pattern that accumulates the standard prosodic pattern of all sentences
Tan storage means and this prosody pattern storage means
Change the standard prosodic pattern of each sentence according to the defined criteria.
Including prosody pattern changing means for shaping and operating prosody pattern
In the text-to-speech synthesizer, the prosody pattern changing means is stored in the prosody pattern accumulating means.
The standard prosody of all sentences that connect and form the text
Calculate an approximate curve that approximates the change in pattern and
Approximation error, which is the difference between the similarity curve and the standard prosodic pattern of each sentence
Approximate curve calculating means for calculating the difference and this approximate curve calculating means
The standard prosody of each sentence based on the approximation error.
Approximation error changing means for transforming a pattern into an operational prosodic pattern
A text-to-speech synthesizer comprising:

2. An approximate curve calculated by the approximate curve calculating means.
Is a regression line, and the textile according to claim 1,
Stroke synthesizer.

3. The approximation error changing means removes the approximation error.
When E is set, the operation approximation error is made by using the predetermined constants a and b.
The difference E ′ = aE + b is obtained, and the operational approximation is made to the approximate curve.
It is characterized by generating an operational prosodic pattern by adding errors.
The text-to-speech synthesizer according to claim 1 or 2.

4. The prosody pattern changed by the prosody pattern changing means.
Tan is the speech rate of each sentence.
The text-to-speech synthesizer according to 1, 2, or 3.

5. The prosody pattern changed by the prosody pattern changing means.
Ton is the average pitch frequency of each sentence
The text-to-speech synthesizer according to claim 1, 2 or 3.

6. The prosody pattern changed by the prosody pattern changing means.
Tan is the average power of each sentence.
The text-to-speech synthesizer according to 1, 2, or 3.