JP2785628B2

JP2785628B2 - Pitch pattern generator

Info

Publication number: JP2785628B2
Application number: JP4355726A
Authority: JP
Inventors: 田和彦岩
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1991-12-20
Filing date: 1992-12-19
Publication date: 1998-08-13
Anticipated expiration: 2013-08-13
Also published as: JPH05333892A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、文字で書かれた文章を
音声出力する音声合成装置等におけるピッチパタン生成
装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a pitch pattern generator in a speech synthesizer or the like for outputting text written in text.

【０００２】[0002]

【従来の技術】任意の文章を音声出力する音声合成装置
等において、自然なピッチパタンを生成することは、合
成音声の品質を向上させる上で極めて重要である。2. Description of the Related Art In a speech synthesizer or the like for outputting an arbitrary sentence as speech, it is extremely important to generate a natural pitch pattern in order to improve the quality of synthesized speech.

【０００３】従来、ピッチパタンの生成には、話調成分
と呼ばれる発話全体を通して徐々に下降する成分に、ア
クセント成分呼ばれる単語あるいは通して文節ごとの成
分アクセント型に基づく成分重畳する方式が多く用いら
れてきた。例えば、話調成分を単調に減少する直線また
は「へ」の字形のパタンで近似し、アクセント成分を折
れ線で近似する方法が知られている。このような従来技
術については例えば、日本音響学会音声研究会資料Ｓ７
８−０７（１９７８−４）「文音声の音調規則の検
討」（文献１）などに詳述されている。Conventionally, pitch patterns are generated by superimposing a component called a tone component, which gradually decreases throughout the entire utterance, based on a word called an accent component or through a component accent type for each phrase. Have been. For example, a method is known in which a speech component is approximated with a straight line or a “H” -shaped pattern that monotonically decreases, and an accent component is approximated with a polygonal line. Such a conventional technique is described in, for example, the material of the Society of Sound Engineers of Japan, S7.
8-07 (1978-4), "Examination of tone rules for sentence speech" (Literature 1).

【０００４】図５は、従来技術によるピッチパタンの生
成方法を説明するための図である。これは、「彼は白い
花を買いました。」ののピッチパタンの生成方法を示し
た例であるこの文相は、「彼は」、「白い」、「花
を」、「買いました」という４つのアクセント核を有す
る単位）からなっている。そこで、アクセント成分２１
は、４っつの山を持った折れ線で近似されている。各山
の形状は、それぞれののアクセント句のアクセント型、
モーラ数などに基づいて決定される。これを、右下がり
の直線で表されている話調成分２２に重畳する事で、文
章のピッチパタン２３が生成される。図５のＬ１−Ｌ４
はストレスベルと呼ばれている。隣接するアクセント句
のストレスレベルの相対的な大きさの関係は、文章の構
造を反映しており、ぴっちの自然性に重要である。すな
わち、前後のアクセントの結びつきが弱いと、先行アク
セント句のストレスレベルに後続アクセント句のストレ
スレベルは大きくなる。逆に、意味の上で結びつきが強
いと、後続アクセント句、のストレスレベルの大きさは
小さくなる。FIG. 5 is a diagram for explaining a method of generating a pitch pattern according to the prior art. Here's an example that shows how to generate a pitch pattern for "He bought a white flower." This phrase, "He bought a white flower,""Bought a flower,""Is a unit having four accent nuclei. Therefore, the accent component 21
Is approximated by a polygonal line with four peaks. The shape of each mountain is the accent type of each accent phrase,
It is determined based on the number of mora and the like. This is superimposed on the speech component 22 represented by a straight line descending to the right to generate a pitch pattern 23 of the sentence. L1-L4 in FIG.
Is called a stress bell. The relationship between the relative levels of stress levels of adjacent accent phrases reflects the structure of the sentence and is important to the naturalness of the petit. That is, if the connection between the front and rear accent phrases is weak, the stress level of the subsequent accent phrase becomes higher than the stress level of the preceding accent phrase. Conversely, if the connection is strong in meaning, the magnitude of the stress level of the subsequent accent phrase becomes small.

【０００５】文献１などの従来のピッチパタン生成方式
では、隣接するアクセント句の結びつきの強さを表す尺
度として、先行アクセント句から受けのアクセント句に
いたるまでのアクセント句数をもちいており、これを分
離度と呼んでいる。分離度は、文章の係り受け構造から
求められる。図６は分離度を説明するための図である。
あるアクセント句境界における分離度が大きいと言うこ
とは、その境界を挟んで先行するアクセント句がより遠
くにあるアクセント句と意味の上で結びついていてお
り、直後のアクセント句との結びつきは弱いということ
を表している。図６の「彼は」は、「買いました」に係
わっており、直後の「白い」との結びつきは弱い。ま
た、先行アクセント句が後続するアクセント句に直接係
わっているとき、分離度は、最小値１となる。図６の例
では、「白い」と「花を」の関係、及び「花を」と「買
いました」の関係がそれである。そして、分離度の大き
いアクセント句境界においては、後続アクセント句のス
トレスレベルを、先行アクセント句のストレスレベルよ
りも大きくし、分離度の小さいアクセント句境界におい
ては、後続アクセント句のストレスレベルよりも小さく
なるようにする。[0005] In the conventional pitch pattern generation method such as Reference 1, the number of accent phrases from the preceding accent phrase to the received accent phrase is used as a scale indicating the strength of connection between adjacent accent phrases. Is called the degree of separation. The degree of separation is obtained from the dependency structure of the sentence. FIG. 6 is a diagram for explaining the degree of separation.
Greater separation at an accent phrase boundary means that the preceding accent phrase across that boundary is semantically connected to the more distant accent phrase and weaker to the next accent phrase. It represents that. “He” in FIG. 6 is related to “bought”, and the connection with “white” immediately after is weak. When the preceding accent phrase is directly related to the succeeding accent phrase, the degree of separation has a minimum value of 1. In the example of FIG. 6, the relationship between “white” and “flower” and the relationship between “flower” and “bought” are such. The stress level of the subsequent accent phrase is set to be larger than the stress level of the preceding accent phrase at the accent phrase boundary where the degree of separation is large, and smaller than the stress level of the subsequent accent phrase at the boundary of the accent phrase where the degree of separation is small. To be.

【０００６】以上のように、従来のピッチパタン生成方
式では、文章の係り受け構造を用いて隣接するアクセン
ト句の結びつきの強さを調べ強弱に応じて各アクセント
句のストレスレベルを決定していた。このようにして生
成されたアクセント成分を、話調成分に重畳することに
よって、文章の全体のピッチパタンを生成していた。As described above, in the conventional pitch pattern generation method, the strength of the connection between adjacent accent phrases is examined using the sentence dependency structure, and the stress level of each accent phrase is determined according to the strength. . The pitch component of the entire sentence is generated by superimposing the thus generated accent component on the speech tone component.

【０００７】[0007]

【発明が解決しようとする課題】従来のピッチパタン生
成方式では、文章を構成するアクセント句の係り受け構
造が用いられており、これが正しく求められていること
が前提となっている。しかしながら、一般に、文章の係
り受け構造を常に正確に解析することは困難である。こ
のため、係り受け構造の解析誤りが原因で、生成される
ピッチパタンが不自然になるという問題点があった。In the conventional pitch pattern generation method, a dependency structure of an accent phrase constituting a sentence is used, and it is premised that this is correctly obtained. However, it is generally difficult to always accurately analyze the dependency structure of a sentence. For this reason, there has been a problem that a generated pitch pattern becomes unnatural due to an analysis error of the dependency structure.

【０００８】そこで、本発明の目的は、文章の係り受け
構造を用いることなしに、自然なピッチパタンを生成す
ることが可能なピッチパタン生成装置を提供することに
ある。An object of the present invention is to provide a pitch pattern generation device capable of generating a natural pitch pattern without using a sentence dependency structure.

【０００９】[0009]

【課題を解決するための手段】前述の課題を解決するた
め、本発明によるピッチパタン生成装置は、任意の文章
を音声で読み上げる音声合成装置におけるピッチパタン
生成装置であって隣接するアクセント句の大きさの比を
品詞の組み合わせごとに記憶するストレスベル比記憶部
と、入力された文章をそれを構成する単語に分割して各
単語の品詞及びアクセント句境界を決定する形態素解析
部と、各アクセント句境界の前後にある単語の品詞の組
み合わせに応じて前記ストレスレベル比記憶部からアク
セント句の大きさの比を読み出すアクセント成分生成部
と、前記読み出されたアクセント句の大きさの比に基づ
いてピッチパタンを生成するピッチパタン生成部を備え
て構成される。In order to solve the above-mentioned problems, a pitch pattern generating apparatus according to the present invention is a pitch pattern generating apparatus in a voice synthesizing apparatus which reads an arbitrary sentence by voice, and the size of an adjacent accent phrase is large. A stress-bell-ratio storage unit that stores the ratio of each word for each part-of-speech combination, a morphological analysis unit that divides an input sentence into words that make up the word, and determines the part-of-speech and accent phrase boundaries of each word, An accent component generating unit that reads out the ratio of the size of the accent phrase from the stress level ratio storage unit according to the combination of the parts of speech of the words before and after the phrase boundary; And a pitch pattern generation unit that generates a pitch pattern.

【００１０】[0010]

【作用】本発明では、文章の係り受け構造よりも正確に
求めることが可能な単語の品詞に基づいてピッチパタン
を生成することにより、従来方式に比べより自然なピッ
チパタンを生成することを可能とする。According to the present invention, it is possible to generate a more natural pitch pattern than the conventional method by generating a pitch pattern based on the part of speech of a word that can be more accurately obtained than the dependency structure of a sentence. And

【００１１】[0011]

【実施例】次に、本発明の実施例について図面を参照し
ながら説明する。先に述べたように、従来方式では、正
確な係り受け構造の解析が困難であることに起因して、
不自然なピッチパタンが生成されてしまうことがある。
これに対して、本発明では係り受け構造を用いずにピッ
チパタンを生成する。本発明では、文章の係り受け構造
よりも正確に求めることが可能な単語の品詞に基づいて
ピッチパタンを生成する。各アクセント句境界の前後に
ある単語の品詞の組み合わせは、前後のアクセント句の
意味の上での結びつきの強さを反映していると考えられ
る。図３は、本発明によるピッチパタンの生成方法を説
明するための図である。この文章は、「彼は」、「白
い」、「花を」、「買いました」という４つのアクセン
ト句からなっている。例えば、「白い」と「花を」の境
界における品詞の組み合わせは、「形容詞・連体詞＋名
詞」である。この組み合わせから、先行する形容詞が直
後の名詞を直接修飾していることを推察することは容易
である。Next, embodiments of the present invention will be described with reference to the drawings. As described above, in the conventional method, it is difficult to accurately analyze the dependency structure,
An unnatural pitch pattern may be generated.
On the other hand, in the present invention, a pitch pattern is generated without using a dependency structure. In the present invention, a pitch pattern is generated based on the part of speech of a word that can be obtained more accurately than the dependency structure of a sentence. The combination of the parts of speech of the words before and after each accent phrase boundary is considered to reflect the strength of the connection in the meaning of the preceding and following accent phrases. FIG. 3 is a diagram for explaining a pitch pattern generation method according to the present invention. This sentence consists of four accent phrases, "he", "white", "flower", and "bought". For example, the combination of parts of speech at the boundary between “white” and “flower” is “adjective / adjective + noun”. From this combination, it is easy to infer that the preceding adjective directly modifies the next noun.

【００１２】そこで、アクセント句境界の前後にある単
語の品詞の組み合わせごとに、境界前後のアクセント句
のストレスレベルの比（先行アクセント句のストレスレ
ベルに対する後続アクセント句のストレスレベルの比、
あるいはその逆数）を予め定めて置く。図４は、品詞の
組み合わせごとのストレスレベル比の一例を示すための
図である。これらの値は、例えば、人間が発声した文章
音声に基づいて決定しておくことが考えられる。Therefore, for each combination of the parts of speech of the words before and after the accent phrase boundary, the ratio of the stress level of the accent phrase before and after the boundary (the ratio of the stress level of the succeeding accent phrase to the stress level of the preceding accent phrase,
Or its reciprocal) is set in advance. FIG. 4 is a diagram illustrating an example of a stress level ratio for each part of speech combination. It is conceivable that these values are determined based on, for example, a sentence voice uttered by a human.

【００１３】ピッチパタンを生成する際には、先ず、読
み上げるべき文章を形態素解析によって単語に分割し、
品詞を定める。次に、各アクセント句境界において、境
界前後の単語の品詞に基づいて、境界の前後にあるアク
セント句のストレスレベル比を求める。図３では、例え
ば、「花を」のストレスレベルは、先行する「白い」の
ストレスレベルの０．７倍となる。これは、この２つの
アクセント句が「形容詞・連体形＋名詞」の連鎖である
ことから定まる値である。このように、各アクセント句
境界においてストレスレベル比を求めた後、文頭のアク
セント句のストレスレベルに対する全てのアクセント句
のストレスレベルの比を求める。例えば「花を」の「彼
は」に対する比は１．２×０．７＝０．８４として求め
られる。これにより、文頭のアクセント句のストレスレ
ベルの値（例えば８０Ｈｚなど）を与えれば、文章中の
全てのアクセント句のストレスレベルの値を算出するこ
とができる。このようにして得られたアクセント成分を
話調成分に重畳することによって、文章のピッチパタン
を生成する。When generating a pitch pattern, first, a sentence to be read is divided into words by morphological analysis.
Determine the part of speech. Next, at each accent phrase boundary, the stress level ratio of the accent phrase before and after the boundary is determined based on the part of speech of the word before and after the boundary. In FIG. 3, for example, the stress level of “flower” is 0.7 times the preceding “white” stress level. This is a value determined from the fact that these two accent phrases are a chain of “adjective / adnominal + noun”. After the stress level ratio is determined at each accent phrase boundary, the ratio of the stress level of all accent phrases to the stress level of the accent phrase at the beginning of the sentence is determined. For example, the ratio of “flower” to “he” is obtained as 1.2 × 0.7 = 0.84. Thus, given the stress level value of the accent phrase at the beginning of the sentence (for example, 80 Hz), the stress level values of all accent phrases in the sentence can be calculated. A pitch pattern of a sentence is generated by superimposing the thus obtained accent component on the speech tone component.

【００１４】以上、話調成分にアクセント成分を重畳す
る方式に基づいて説明を行ったが、本発明は、話調成分
を用いない方式にも容易に適用できる。例えば、アクセ
ント成分と話調成分を別々のモデルで生成して重畳する
代わりに、各アクセント句について少なくとも一点（例
えば、ピッチ周波数がピークになる時点）のピッチ周波
数を与え、その値に基づいて各アクセント句の形状を決
定し、直接文章全体のピッチパタンを生成する装置が考
えられる。この装置においては、例えば、各アクセント
句のピッチ周波数のピーク値を与えるのであれば、スト
レスレベル比の代わりに、隣接するアクセント句のピッ
チ周波数のピーク値の比を、品詞の組み合わせごとに用
意しておけば良い。Although the description has been made based on the method of superimposing the accent component on the speech component, the present invention can be easily applied to a system that does not use the speech component. For example, instead of generating an accent component and a speech component using different models and superimposing them, at least one pitch frequency (for example, at the time when the pitch frequency reaches a peak) is given to each accent phrase, and based on the value, A device that determines the shape of the accent phrase and directly generates a pitch pattern of the entire sentence is conceivable. In this device, for example, if the peak value of the pitch frequency of each accent phrase is given, instead of the stress level ratio, the ratio of the peak value of the pitch frequency of the adjacent accent phrase is prepared for each part of speech combination. You should leave it.

【００１５】一般に、形態素解析による品詞の特定は、
係り受け解析による係り受け構造の推定よりも正確に行
うことができる。本発明は、この品詞の情報を用いてピ
ッチパタンを生成するので、従来のピッチパタン生成方
式に比べ誤りを生じることが少なく、自然なピッチパタ
ンを生成することが可能である。In general, the part of speech is specified by morphological analysis,
It can be performed more accurately than the estimation of the dependency structure by the dependency analysis. According to the present invention, since the pitch pattern is generated using the information of the part of speech, errors are less likely to occur than in the conventional pitch pattern generation method, and a natural pitch pattern can be generated.

【００１６】図１は、本発明によるピッチパタン生成装
置を実現するための一実施例を示すブロック図である。
先ず、音声で読み上げるべき文章を表す文字列が、文字
列入力端子１１から入力される。入力された前記文字列
は形態素解析部１２に送られる。形態素解析部１２は、
入力文字列で表される文章を単語に分解し、各単語の品
詞やアクセント句境界を決定する。この結果を、アクセ
ント成分生成部１３及び話調成分生成部１５に送る。FIG. 1 is a block diagram showing an embodiment for realizing a pitch pattern generation device according to the present invention.
First, a character string representing a text to be read aloud is input from the character string input terminal 11. The input character string is sent to the morphological analyzer 12. The morphological analysis unit 12
The sentence represented by the input character string is decomposed into words, and the parts of speech and accent phrase boundaries of each word are determined. This result is sent to the accent component generator 13 and the speech component generator 15.

【００１７】ストレスレベル比記憶部１４には、アクセ
ント句境界の前後にある単語の品詞の組み合わせごと
に、境界前後のアクセント句のストレスレベルの比が蓄
えらている。アクセント成分生成部１３は、各アクセン
ト句境界の前後にある単語の品詞の組み合わせに基づい
て、ストレスレベル比記憶部１４からストレスレベル比
を読み出す。読み出したストレスレベル比を用いて、作
用の項で説明したような手法により、文章中の全てのア
クセント句のストレスレベルの値を決定し、アクセント
成分を生成する。話調成分生成部１５は、形態素解析部
１２の解析結果に基づいて、必要によっては入力文章を
複数の話調成分に分割した後、時間の経過とともにピッ
チ周波数が低くなっていく直後などで近似された話調成
分を生成する。ピッチパタン生成部１６は、アクセント
成分生成部１３と話調成分生成部１５とでそれぞれ生成
されたアクセント成分と話調成分とを加え合わせること
により文章全体のピッチパタンを生成し、ピッチパタン
出力端子１７から出力する。The stress level ratio storage unit 14 stores the ratio of the stress level of the accent phrase before and after the boundary for each combination of the parts of speech of the words before and after the boundary of the accent phrase. The accent component generation unit 13 reads the stress level ratio from the stress level ratio storage unit 14 based on the combination of the parts of speech of the words before and after each accent phrase boundary. Using the read stress level ratio, the stress level values of all the accent phrases in the text are determined by the method described in the section of the action, and an accent component is generated. The speech tone component generation unit 15 divides the input sentence into a plurality of speech tone components as necessary based on the analysis result of the morphological analysis unit 12, and then approximates the pitch immediately after the pitch frequency decreases as time elapses. The generated speech-tone component is generated. The pitch pattern generation unit 16 generates a pitch pattern of the entire sentence by adding the accent component and the speech component generated by the accent component generation unit 13 and the speech component generation unit 15, respectively, and outputs a pitch pattern output terminal. 17 to output.

【００１８】図２は、図１に示す装置のより詳細なブロ
ック図である。図中、図１と同一符号が付されている要
素は同様な機能をもつ要素である。先ず、音声で読み上
げるべき文を表す文字列が、文字列入力端子１１から入
力される。入力された前記文字列は形態素解析部１２１
に送られる。形態素解析部１２１は、単語辞書１２２を
用いて入力文字列で表される文を単語に分割し、各単語
の読みや品詞、アクセント型、アクセント句境界位置を
決定する。このような形態素解析部については、米国出
願中特許出願番号第０７／４８７０４４号に詳しいので
説明を省略する。FIG. 2 is a more detailed block diagram of the apparatus shown in FIG. In the figure, the elements denoted by the same reference numerals as those in FIG. 1 have the same functions. First, a character string representing a sentence to be read aloud is input from the character string input terminal 11. The input character string is a morphological analysis unit 121
Sent to The morphological analysis unit 121 divides the sentence represented by the input character string into words using the word dictionary 122, and determines the reading of each word, the part of speech, the accent type, and the accent phrase boundary position. Such a morphological analyzer is described in detail in U.S. Patent Application No. 07 / 487,044, and a description thereof will be omitted.

【００１９】形態素解析部１２１で生成された単語の読
みや品詞、アクセント型、アクセント句境界位置は、ア
クセント成分モデル読み出し部１３１、ストレスレベル
比読み出し部１３３、及び音素継続時間長記憶部１５２
にそれぞれ送られる。アクセント成分モデル記憶部１３
２には、単語のアクセント型ごとのピッチパタンの概形
が記憶されている。アクセント成分モデル読み出し部１
３１は、形態素解析部１２１から送られてくる各単語の
アクセント型にしたがって、アクセント成分モデル記憶
部１３２に蓄えられているアクセント句のピッチパタン
の概形を読み出し、アクセント成分モデル編集部１３４
に送る。The word reading, part of speech, accent type, and accent phrase boundary position generated by the morphological analysis unit 121 are stored in an accent component model reading unit 131, a stress level ratio reading unit 133, and a phoneme duration storage unit 152.
Respectively. Accent component model storage unit 13
2 stores the outline of the pitch pattern for each accent type of the word. Accent component model reading unit 1
31 reads out the outline of the pitch pattern of the accent phrase stored in the accent component model storage unit 132 according to the accent type of each word sent from the morphological analysis unit 121, and reads the accent component model editing unit 134
Send to

【００２０】ストレスレベル比記憶部１４は、図４に示
した例にように、アクセント句境界の前後にある単語の
品詞の組み合わせごとに、境界前後のアクセント句のス
トレスレベルの比の値を蓄えている。ストレスレベル比
読み出し部１３３は、各アクセント句境界の前後にある
単語の品詞の組み合わせに対応するストレスレベル比
を、ストレスレベル比記憶部１４から読み出す。アクセ
ント成分モデル編集部１３４は、ストレスレベル比読み
出し部１３３によって読み出されたストレスレベル比を
用いて、作用の項で説明したような手法によって、文中
の全てのアクセント句のストレスレベルの値を決定し、
アクセント成分モデル読み出し部１３１で読み出された
アクセント句ごとのピッチパタンのストレスレベルを変
更し、文全体のアクセント成分を生成する。As shown in FIG. 4, the stress level ratio storage unit 14 stores, for each part of speech combination before and after the accent phrase boundary, the value of the stress level ratio of the accent phrase before and after the boundary. ing. The stress level ratio reading unit 133 reads, from the stress level ratio storage unit 14, the stress level ratio corresponding to the combination of the parts of speech of the words before and after each accent phrase boundary. The accent component model editing unit 134 uses the stress level ratio read by the stress level ratio reading unit 133 to determine the stress level values of all the accent phrases in the sentence by the method described in the section of the action. And
The stress level of the pitch pattern for each accent phrase read by the accent component model reading unit 131 is changed to generate an accent component of the entire sentence.

【００２１】次に、音素継続時間長算出部１５１は、形
態素解析部１２１で得られた単語の読み、すなわち音素
の系列を用いて、発声されるべき各音素の継続時間長を
算出する。これは、例えば、予め各音素の平均的な継続
時間長を音素継続時間長記憶部１５２に蓄えておき、こ
れを読み出すことにより実現することができる。呼気段
落長算出部１５３は、文を構成する呼気段落の継続時間
長を算出する。ここで、呼気段落とは、ポーズによって
区切られる発話の単位であり、１つの呼気段落は１つの
話調成分を形作るものである。もし、文中にポーズが１
つも現れなければその文は１つの呼気段落になる。ま
た、文中にポーズが１つ現れるならばその文は２つの呼
気段落から構成されることになる。ポーズを文のどの位
置に挿入するかの判定については、本願の発明には直接
関係しないので説明は省略する。呼気段落長算出部１５
３は、文を構成する呼気段落ごとに、その呼気段落に含
まれる全ての音素の継続時間長を加算することにより、
呼気段落の継続時間長を算出する。Next, the phoneme duration calculation unit 151 calculates the duration of each phoneme to be uttered using the word reading obtained by the morphological analysis unit 121, that is, the sequence of phonemes. This can be realized, for example, by previously storing the average duration of each phoneme in the phoneme duration storage unit 152 and reading out the same. The breath paragraph length calculation unit 153 calculates the duration of the breath paragraph constituting the sentence. Here, the exhalation paragraph is a unit of utterance divided by a pause, and one exhalation paragraph forms one speech tone component. If the pause is 1 in the sentence
If none appear, the sentence is a single exhalation paragraph. If one pause appears in the sentence, the sentence is composed of two exhalation paragraphs. The determination of the position in the sentence where the pause is to be inserted is not directly related to the invention of the present application and will not be described. Expiration paragraph length calculation unit 15
3 is that by adding the duration of all phonemes included in the exhalation paragraph for each exhalation paragraph constituting the sentence,
Calculate the duration of the expiration paragraph.

【００２２】話調成分算出部１５４は、話調成分の概形
を決定するために必要な話調の始端と終端のピッチ周波
数を始端周波数記憶部１５５、終端周波数記憶部１５６
からそれぞれ読み出す。さらに、呼気段落長算出部１５
３で計算された呼気段落の継続時間長を用いて、話調成
分の傾きを計算する。すなわち、話調成分の傾き［Ｈｚ／ｓｅｃ］＝（話調終端周波数
［Ｈｚ］−話調始端周波数［Ｈｚ］／呼気段落継続時間
長［ｓｅｃ］）である。これにより、話調成分の概形が
決定される。最後に、加算器１６０は、アクセント成分
モデル編集部１３４で計算されたアクセント成分と、話
調成分算出部１５４で計算された話調成分とを加算する
ことによって、入力文のピッチパタンを計算し、ピッチ
パタン出力端子１７から出力する。The speech-tone component calculating section 154 stores a start-end frequency storage section 155 and an end-frequency storage section 156 for the pitch frequencies at the start and end of the speech tone necessary for determining the outline of the speech tone component.
Respectively. Furthermore, the expiration paragraph length calculation unit 15
The inclination of the speech component is calculated using the duration of the expiration paragraph calculated in step 3. That is, the gradient of the tone component [Hz / sec] = (the tone end frequency [Hz] -the tone start frequency [Hz] / the expiration paragraph duration time [sec]). Thus, the outline of the speech component is determined. Finally, the adder 160 calculates the pitch pattern of the input sentence by adding the accent component calculated by the accent component model editing unit 134 and the speech component calculated by the speech component calculation unit 154. , And output from a pitch pattern output terminal 17.

【００２３】[0023]

【発明の効果】以上説明したように、本発明では、正し
く解析することが困難な係り受け解析の結果を用いるこ
となしにピッチパタンを決定するので、従来方式に比べ
より自然なピッチパタンを生成することが可能である。
したがって、本発明は、文字列で与えられた任意の文章
を音声で読み上げる音声合成装置等におけるピッチパタ
ン生成装置として極めて有効である。As described above, in the present invention, the pitch pattern is determined without using the result of the dependency analysis, which is difficult to analyze correctly, so that a more natural pitch pattern can be generated as compared with the conventional method. It is possible to
Therefore, the present invention is extremely effective as a pitch pattern generation device in a speech synthesis device or the like that reads an arbitrary sentence given as a character string by voice.

[Brief description of the drawings]

【図１】本発明によるピッチパタン生成装置を実現する
ための一実施例を示すブロック図である。FIG. 1 is a block diagram showing one embodiment for realizing a pitch pattern generation device according to the present invention.

【図２】図１に示す実施例の詳細構成ブロック図であ
る。FIG. 2 is a detailed block diagram of the embodiment shown in FIG. 1;

【図３】本発明によるピッチパタンの生成方法を説明す
るための図である。FIG. 3 is a diagram illustrating a method for generating a pitch pattern according to the present invention.

【図４】品詞の組み合わせごとのストレスレベル比の一
例を示すための図である。FIG. 4 is a diagram illustrating an example of a stress level ratio for each combination of parts of speech.

【図５】従来技術によるピッチパタンの生成方法を説明
するための図である。FIG. 5 is a diagram for explaining a pitch pattern generation method according to the related art.

【図６】分離度を説明するための図である。FIG. 6 is a diagram for explaining the degree of separation.

[Explanation of symbols]

１１文字列入力端子１２形態素解析部１３アクセント成分生成部１４ストレスレベル比記憶部１５話調成分生成部１６ピッチパタン生成部１７ピッチパタン出力端子２１アクセント成分２２話調成分２３ピッチパタン DESCRIPTION OF SYMBOLS 11 Character string input terminal 12 Morphological analysis part 13 Accent component generation part 14 Stress level ratio storage part 15 Speech tone component generation part 16 Pitch pattern generation part 17 Pitch pattern output terminal 21 Accent component 22 Speech tone component 23 Pitch pattern

Claims

(57) [Claims]

1. A stress pattern ratio generating unit in a speech synthesizer for reading an arbitrary sentence aloud, wherein a stress bell ratio storage unit for storing a ratio of the size of an adjacent accent phrase for each combination of part of speech. A morphological analysis unit that divides a sentence into words that constitute the sentence and determines a part of speech and an accent phrase boundary of each word; and the stress level ratio storage unit according to a combination of parts of speech of words before and after each accent phrase boundary. A pitch pattern generation unit comprising: an accent component generation unit that reads a ratio of the size of an accent phrase; and a pitch pattern generation unit that generates a pitch pattern based on the ratio of the size of the read accent phrase. apparatus.

2. The pitch pattern generation device according to claim 1, wherein the pitch pattern generation unit generates the pitch pattern by superimposing an accent component read from the accent component generation unit on a speech tone component of the sentence. .

3. The pitch pattern generation unit according to claim 1, wherein the pitch pattern generation unit gives at least one pitch frequency for each word, determines a shape of each word based on the value, and generates a pitch pattern of the entire sentence. Pitch pattern generator.