JP3357796B2

JP3357796B2 - Speech synthesis apparatus and method for generating prosodic information in the apparatus

Info

Publication number: JP3357796B2
Application number: JP23645296A
Authority: JP
Inventors: 孝章新居
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1996-09-06
Filing date: 1996-09-06
Publication date: 2002-12-16
Anticipated expiration: 2016-09-06
Also published as: JPH1083192A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、任意文章が与えら
れたとき、その文章の構造より、文章中の韻律語から構
成される韻律句間の境界強度を求め、この境界強度を用
いて韻律制御を行なうことにより、自然性の高い音声を
合成する音声合成装置及び同装置における韻律情報生成
方法に関する。BACKGROUND OF THE INVENTION The present invention relates to a technique for obtaining a boundary strength between prosodic phrases composed of prosodic words in a sentence from a sentence structure when an arbitrary sentence is given, and using the boundary strength to obtain a prosody. The present invention relates to a voice synthesizing apparatus that synthesizes a highly natural voice by performing control, and a method for generating prosody information in the voice synthesizing apparatus.

【０００２】[0002]

【従来の技術】一般に、単語中の音節固有の特徴と単語
固有の特徴であるアクセント型に従った規則だけで単語
連鎖である文の音声を合成すると不自然になる。任意の
文章の音声合成において、より自然性の高い合成音声を
発声するためには、文の構文構造の解析が必要である。
文の解析結果は、韻律特徴を決めるための構造的な情報
を提供するので、韻律（イントネーション）の自然な制
御のためには、正確な構文構造だけでなく意味的な構造
や文脈情報までも必要であるといわれている。2. Description of the Related Art In general, it is unnatural to synthesize speech of a sentence that is a word chain using only rules in accordance with syllable-specific features in a word and accent-type features that are a word-specific feature. In speech synthesis of an arbitrary sentence, it is necessary to analyze a sentence syntactic structure in order to utter a synthesized speech with higher naturalness.
Sentence analysis results provide structural information for determining prosodic features, so for natural control of prosody (intonation), not only accurate syntactic structure but also semantic structure and contextual information It is said that it is necessary.

【０００３】ところで、従来のテキスト−音声変換シス
テムでは、限定された構文解析を行なっている。これら
の解析は種々の方法で行なわれている。品詞の分類を詳
細にし２つの隣合う形態素連鎖の特徴だけから韻律特徴
を推定する方法も提案されている。また連接する３文節
間の関係を使って韻律制御を行なう手法もある。[0003] In a conventional text-to-speech conversion system, limited parsing is performed. These analyzes have been performed in various ways. A method of estimating prosodic features only from the features of two adjacent morpheme chains by classifying the parts of speech in detail has been proposed. There is also a method of performing prosodic control using the relation between three connected phrases.

【０００４】これらは文全体の構造を計算する際に、文
構造の曖昧性をなくし文の全体構造を唯一に定めるため
にコストがかかることと、構文構造と韻律構造の間に
は、間接的な関係があることが指摘され、文全体の構造
を得なくてもある程度の韻律制御が可能であることによ
る。[0004] When calculating the structure of the entire sentence, it is costly to eliminate the ambiguity of the sentence structure and uniquely determine the entire structure of the sentence. It is pointed out that the prosodic control is possible to some extent without obtaining the structure of the entire sentence.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、例えば
文末の「〜でしょうか」が疑問であるか疑念であるかに
よってピッチ形状が違うこと、「〜は〜で〜は〜だ」の
ように文全体で対比構造を持つ文では、対比部分を示す
ためポーズ（△部分：「〜は〜で（△）〜は〜だ」）
が、「〜は〜で〜だ」という文のポーズ（△部分：「〜
は（△）〜で〜だ」）と位置も大きさも違うことなど、
限定された文構造による構文情報によって韻律制御を行
なうと、正しく意味を伝達しない、聞き難い、不自然な
単調な発声になってしまうのである。However, the pitch shape differs depending on whether the end of the sentence is "Is it?" In a sentence with a contrast structure, a pause is used to indicate the contrast part (△ part: “~ wa ~ (△) ~ wa ~”)
However, the pose of the sentence "~ wa ~ de ~" (△ part: "~
Is (△) ~~~)
If prosody control is performed using syntactic information with a limited sentence structure, the meaning will not be transmitted correctly, and it will be difficult to hear and unnatural monotonous utterances.

【０００６】また、文全体の構文情報を用いて文章の自
然な韻律を生成しようとする場合においても、例えば
「あらゆる現実をすべて自分のほうへねじ曲げたの
だ。」という文のように１韻律句が長くなる際の適切な
処理には構文構造情報は対応しない。[0006] Also, when trying to generate a natural prosody of a sentence using the syntactic information of the entire sentence, for example, one prosody such as the sentence "All the reality has been twisted toward yourself." Syntactic structure information does not correspond to appropriate processing when the phrase becomes long.

【０００７】本発明は上記事情を考慮してなされたもの
でその目的は、文全体のイントネーションの自然性を向
上させる柔軟な韻律制御が実現できる音声合成装置及び
同装置における韻律情報生成方法を提供することにあ
る。SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and has as its object to provide a speech synthesis apparatus capable of realizing flexible prosody control for improving the naturalness of intonation of a whole sentence and a method for generating prosody information in the apparatus. Is to do.

【０００８】[0008]

【課題を解決するための手段】本発明の音声合成装置
は、入力されたテキスト情報を対象とする形態素解析を
行ない、その解析結果から１つのアクセント型を決める
最小単位である韻律語を生成し、その韻律語の連鎖を形
成する韻律語形成手段と、上記韻律語の連鎖における各
韻律語間の依存強度を求める韻律語間依存強度決定手段
と、上記韻律語の連鎖における各韻律語間の依存強度を
基に韻律語間の係り受け関係を求める韻律語間係り受け
解析手段と、得られた係り受け関係から各韻律語の受け
数を求める韻律語受け数決定手段と、得られた韻律語間
の依存強度、係り受け関係、及び受け数のうちの少なく
とも１つを用いて隣接する韻律語間の結合強度を決定す
る隣接韻律語間結合強度決定手段と、得られた隣接韻律
語間の結合強度により韻律句の境界強度を決定する韻律
句境界強度決定手段と、上記韻律句境界強度を加味して
上記テキスト情報の韻律情報を生成する韻律情報生成手
段とを備えたことを特徴とする。The speech synthesizer of the present invention performs a morphological analysis on input text information, and generates a prosodic word as a minimum unit for determining one accent type from the analysis result. A prosodic word forming means for forming a chain of the prosodic words, a prosodic word dependency strength determining means for obtaining a dependency strength between the prosodic words in the chain of the prosodic words, and a prosodic word between the prosodic words in the chain of the prosodic words. Prosodic inter-word dependency analysis means for obtaining dependency relations between prosodic words based on dependency strength, prosodic word count determining means for obtaining the number of prosodic words from the obtained dependency relations, and prosody word obtained Means for determining a connection strength between adjacent prosody words using at least one of a dependency strength between words, a dependency relationship, and the number of received words, Depending on the bonding strength Prosodic phrase boundary strength determination means for determining the boundary strength of prosodic phrase, in consideration of the prosodic phrase boundary strength is characterized in that a prosodic information generating means for generating a prosodic information of the text information.

【０００９】このような構成の音声合成装置において
は、文章中の韻律語から、韻律語間の依存強度、係り受
け構造、及び受け数のうちの少なくとも１つの情報を用
いて、隣接する韻律語間の結合強度を決定して韻律句間
の境界強度を求め、この境界強度を用いて韻律制御のた
めの韻律情報を生成することにより、韻律語を制御単位
とした簡便な処理で自然な韻律制御が可能となる。In the speech synthesizing apparatus having such a configuration, an adjacent prosody word is obtained from a prosody word in a sentence by using at least one of information on the dependency strength between the prosody words, the dependency structure, and the number of receptions. By determining the strength of the connection between the prosodic phrases and determining the strength of the boundary between the prosodic phrases and using this strength of the prosody to generate prosodic information for prosodic control, natural prosody is achieved by simple processing using prosodic words as control units. Control becomes possible.

【００１０】ここで、韻律語間の依存強度を求めるの
に、各韻律語の文法属性間の相互依存関係を表す相互依
存関係テーブルを用いるとよい。また、隣接韻律語間結
合強度の決定に用いる係り受け関係として、テキスト情
報中の対象となる韻律語が文末までに係る韻律語数と後
続の韻律語が文末までに係る韻律語数との差を示す境界
の深さを用いるとよい。Here, in order to obtain the dependency strength between prosodic words, an interdependency table showing the interdependency between grammatical attributes of each prosodic word may be used. In addition, as a dependency relationship used for determining the strength of connection between adjacent prosodic words, the difference between the number of prosodic words in which the target prosodic word in the text information reaches the end of the sentence and the number of prosodic words in which the succeeding prosodic word reaches the end of the sentence is indicated. The depth of the boundary may be used.

【００１１】また、上記韻律情報生成手段による韻律情
報生成においては、韻律語の連鎖に対して、予め定めら
れている１呼気に許される規定モーラ数、韻律句のモー
ラ数及び上記韻律句境界強度から各韻律句と後続する韻
律句との境界に設定する韻律制御指令を決定するように
するとよい。例えば、規定モーラ数以上のモーラ数を有
する韻律句については、該当する隣韻律語間結合強度を
もとに、当該韻律区内の韻律語間を分離し、規定モーラ
数よりも少ないモーラ数を有する韻律句については、後
続の韻律句の隣韻律語間結合強度をもとに、後続の韻律
句と融合して１つの呼気段落とするとよい。Further, in the generation of the prosody information by the prosody information generating means, the prescribed number of mora, the number of mora of the prosody phrase, and the prosody phrase boundary strength, which are allowed in one expiration, are determined for the chain of prosody words. , A prosody control command to be set at the boundary between each prosody phrase and the following prosody phrase may be determined. For example, for a prosodic phrase having a mora number equal to or greater than the specified mora number, the prosodic words in the corresponding prosody section are separated based on the corresponding adjacent prosody word connection strength, and a mora number smaller than the specified mora number is determined. The prosodic phrase having the same prosody phrase may be combined with the following prosodic phrase based on the connection strength between adjacent prosodic words of the subsequent prosodic phrase to form one exhalation paragraph.

【００１２】[0012]

【発明の実施の形態】以下、本発明の実施の形態につい
て図面を参照して説明する。図１は本発明の一実施形態
に係る音声合成装置の構成を示すブロック図である。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a speech synthesizer according to one embodiment of the present invention.

【００１３】図１に示す音声合成装置は、入力部１１、
言語処理部１２、音韻処理部１３、音響パラメータ生成
部１４、音声波形生成部１５及び出力部１６から構成さ
れる。The speech synthesizer shown in FIG.
It comprises a language processing unit 12, a phoneme processing unit 13, an acoustic parameter generation unit 14, a speech waveform generation unit 15, and an output unit 16.

【００１４】ここで、上記した各構成要素の説明をする
前に韻律語について説明する。“韻律語”とは、言語音
上、有意義の最小単位である文節素連鎖において、その
連鎖中にピッチ変化が認められるもの（自律韻律素と呼
ぶ）、ピッチ変化が認められないもの（他律韻律素と呼
ぶ）の連鎖から、少なくとも１つの自律韻律素もしくは
語基（語の基本となる、それ以上分割できないもの）に
所属する他律韻律素自体、或いは少なくとも１つの自律
韻律素もしくは語基に所属する他律韻律素と、それに後
続する１個以上の助辞、接辞に所属する他律韻律素を結
合したものとする。Here, the prosodic words will be described before describing the above-mentioned components. A “prosodic word” is defined as a phrase phrase that is the smallest unit of meaning in a linguistic sound, in which a pitch change is recognized in the chain (referred to as an autonomous prosody), and in which a pitch change is not recognized (other law). A prosodic element that belongs to at least one autonomous prosodic element or base (a word that cannot be further divided), or at least one autonomous prosodic element or base And other ones that belong to one or more subsequent particles and affixes.

【００１５】一般に、“形態素”は、有意義の言語形態
上の最小単位であり、語の構成要素である（森岡健二、
「語彙の形成」、明治書院１９８７）。そして、形態素
は語基、接辞、助辞に分類される。語基は概念もしくは
観念を示す形態で自由形式と束縛形式がある。形態素を
さらに分解すると音節の連鎖となり文節素連鎖に対応す
る。したがって文節素連鎖を通じて形態素と韻律素が対
応する。音声合成における音調処理の基本単位として自
律・他律韻律素を用い、その連鎖から構成されるものを
韻律語として用いる。In general, a "morpheme" is a minimum unit of a meaningful linguistic form and is a component of a word (Kenji Morioka,
"Formation of vocabulary", Meiji Shoin 1987). Then, morphemes are classified into bases, affixes, and particles. A base is a form that indicates a concept or idea, and there are free form and bound form. When the morpheme is further decomposed, it becomes a syllable chain, which corresponds to a syllable chain. Therefore, morphemes and prosodic elements correspond to each other through a phrase chain. Autonomous and heterogeneous prosodic elements are used as basic units of tone processing in speech synthesis, and those composed of a chain thereof are used as prosodic words.

【００１６】次に、上述の各構成要素について説明す
る。まず言語処理部１２は、入力部１１により入力され
る任意の文章（テキスト情報）、例えば漢字仮名交じり
文から韻律語を生成し、隣接韻律語間結合強度を決定す
る部分である。この言語処理部１２は、韻律語形成部１
２１、韻律語間依存強度決定部１２２、韻律語間係り受
け解析部１２３、韻律語受け数決定部１２４及び隣接韻
律語間結合強度決定部１２５からなる。Next, each of the above components will be described. First, the language processing unit 12 is a part that generates a prosodic word from an arbitrary sentence (text information) input by the input unit 11, for example, a sentence mixed with a kanji kana, and determines a connection strength between adjacent prosodic words. The language processing unit 12 includes the prosodic word forming unit 1
21, a prosodic word dependency strength determining unit 122, a prosodic word dependency analyzing unit 123, a prosodic word receiving number determining unit 124, and an adjacent prosodic word coupling strength determining unit 125.

【００１７】韻律語形成部１２１は、入力部１１から送
られた漢字仮名交じり文を入力し、当該漢字仮名交じり
文を対象に（図示せぬ形態素解析用の辞書に登録されて
いる）形態素解析情報を参照して形態素解析を行って、
自律・他律韻律素を求め、韻律素連鎖を基に、自律韻律
素であるか他律韻律素であるかを判断し、少なくとも１
つの自律韻律素もしくは語基に所属する他律韻律素自体
により、或いは少なくとも１つの自律韻律素もしくは語
基に所属する他律韻律素と、それに後続する１個以上の
助辞、接辞に所属する他律韻律素を結合することで、韻
律語を構成する。このようにして、韻律語形成部１２１
にて韻律語の連鎖が形成される。The prosodic word forming unit 121 receives the kanji kana mixed sentence sent from the input unit 11 and performs morphological analysis (registered in a morphological analysis dictionary (not shown)) on the kanji kana mixed sentence. Perform morphological analysis with reference to the information,
An autonomous / other rhythm element is determined, and based on a rhythm element chain, it is determined whether the element is an autonomous rhythm element or another rhythm element.
By other autonomous prosodic elements belonging to one autonomous prosodic element or base, or by other autonomous prosodic elements belonging to at least one autonomous prosodic element or base and one or more subsequent particles or affixes A prosodic word is constructed by combining prosodic and prosodic elements. Thus, the prosody word forming unit 121
Forms a chain of prosodic words.

【００１８】さて、上記形態素解析情報には、韻律素毎
に、文法属性名、読み、形態分類（語基、助辞、接
辞）、韻律素の分類（自律韻律素、他律韻律素）が含ま
れている他、自律韻律素である場合には、その基本アク
セント型が含まれている。韻律語形成部１２１は、この
形態素解析情報を参照することで、韻律語に属性（読
み、アクセント型、文法属性、韻律語のモーラ数）を設
定して出力する。The morphological analysis information includes grammatical attribute names, readings, morphological classifications (bases, particles, affixes), and prosodic classifications (autonomous prosody, other prosody) for each prosodic element. In addition, if it is an autonomous prosodic element, its basic accent type is included. By referring to the morphological analysis information, the prosodic word forming unit 121 sets and outputs attributes (reading, accent type, grammatical attribute, number of mora of the prosodic word) to the prosodic word.

【００１９】韻律語間依存強度決定部１２２は、韻律語
形成部１２１にて形成された韻律語連鎖を入力として、
韻律語に設定された文法属性情報を基に、文法属性間の
相互依存関係表（図示せず）を参照して、２つの韻律語
間に依存関係を設定する。この文法属性間の相互依存関
係表（相互依存関係テーブル）の内容例を図２に示す。
ここでは、２つの韻律語の文法属性の組み合わせ毎に、
その韻律語間の依存の強さの度合い（依存強度）が、２
０，４０，６０，８０，１００の５段階で予め登録され
ている。A prosodic word dependency strength determining unit 122 receives the prosodic word chain formed by the prosodic word forming unit 121 as an input,
Based on the grammatical attribute information set for the prosodic words, a dependency between the two prosodic words is set with reference to an interdependence table (not shown) between the grammatical attributes. FIG. 2 shows an example of the contents of the interdependence table between the grammatical attributes (interdependency table).
Here, for each combination of the grammatical attributes of the two prosodic words,
The degree of dependency between the prosodic words (dependency) is 2
It is registered in advance in five stages of 0, 40, 60, 80, and 100.

【００２０】韻律語間依存強度決定部１２２は、韻律語
形成部１２１にて形成された韻律語連鎖の先頭の韻律語
から末尾の韻律語に至るまで、文法属性間の相互依存関
係表を参照することで、すべての２つの韻律語間に依存
関係を設定し、その韻律語間に依存強度を設定する。The inter-prosodic word dependency strength determining unit 122 refers to the interdependence table between the grammatical attributes from the first prosodic word to the last prosodic word of the prosodic word chain formed by the prosodic word forming unit 121. By doing so, a dependency is set between all two prosodic words, and a dependency strength is set between the prosodic words.

【００２１】韻律語間係り受け解析部１２３は、韻律語
形成部１２１にて形成された韻律語連鎖の先頭の韻律語
から末尾の韻律語に至るすべての韻律語間の依存関係か
ら文中における係り受け関係を解析する。The inter-prosodic dependency analysis unit 123 determines the dependency in the sentence from the dependency between all prosodic words from the first prosodic word to the last prosodic word in the prosodic word chain formed by the prosodic word forming unit 121. Analyze the receiving relationship.

【００２２】この係り受け関係の解析においては、文構
造を表現するために、曖昧性を削減するための図３の例
に示すような原則を用いて韻律語間の係り受け関係を決
定する。この原則は次の通りである。In the analysis of the dependency relationship, a dependency relationship between prosodic words is determined using a principle shown in the example of FIG. 3 for reducing ambiguity in order to express a sentence structure. The principle is as follows.

【００２３】（原則１）係り受けのリンクは交差しない
（図３（ａ）参照）（原則２）前方の韻律語への修飾は禁止する（図３
（ｂ）参照）（原則３）ある１つの韻律語は同時に２つ以上の韻律語
に係らない（図３（ｃ）参照）（原則４）韻律語間の依存強度の強いものを優先する
（図３（ｄ）参照）（原則５）ある１つの韻律語に依存し最近接するものを
優先する（図３（ｅ）参照）図４に、入力漢字仮名交じり文（入力テキスト情報）が
「あらゆる現実をすべて自分のほうへねじ曲げたの
だ。」という文の場合の、韻律語間係り受け解析部１２
３での係り受け解析結果の例を示す。ここで、連続する
番号１，２，３，４，５，６が付された「あらゆる」
「現実を」「すべて」「自分の」「ほうへ」「ねじ曲げ
たのだ」の各語は韻律語である。この図４からは、例え
ば「あらゆる」の係り先は、番号２が付された「現実
を」であり、その依存強度は６０であることが分かる。(Principle 1) Dependency links do not cross (see FIG. 3 (a)) (Principle 2) Modification to the forward prosodic word is prohibited (FIG. 3)
(Refer to (b).) (Principle 3) A certain prosodic word does not involve two or more prosodic words at the same time (see FIG. 3 (c)). (Refer to FIG. 3 (d).) (Principle 5) Depend on one prosodic word and give priority to the closest one (see FIG. 3 (e)). In FIG. 4, the input kanji kana mixed sentence (input text information) In the sentence "I twisted all the reality toward myself."
3 shows an example of a dependency analysis result in Example 3. Here, “any” with consecutive numbers 1, 2, 3, 4, 5, 6
The words "reality,""all,""own,""toward," and "twisted" are prosodic words. From FIG. 4, it can be seen that, for example, the relation of “all” is “reality” with the number 2, and the dependency strength is 60.

【００２４】韻律語受け数決定部１２４は、韻律語間係
り受け解析部１２３にて解析された文中における係り受
け関係から、それぞれの韻律語の受け数を決定する。受
け数は韻律語に係るリンクの数を受け数とする。図４に
は、この受け数の例も併せて示してある。The number-of-prosodic-words determining unit 124 determines the number of prosodic words received from the dependency relationship in the sentence analyzed by the inter-prosodic-word dependency analyzing unit 123. The number of receptions is the number of links relating to the prosodic words. FIG. 4 also shows an example of the number of receptions.

【００２５】隣接韻律語間結合強度決定部１２５は、韻
律語間依存強度決定部１２２で得られた２つの韻律語間
の依存強度、韻律語間係り受け解析部１２３で得られた
係り受け関係から求められる係り受け構造における境界
の深さ、及び韻律語受け数決定部１２４で得られた受け
数の情報をもとに、隣接する韻律語間の結合強度を決定
する。ここで、境界の深さとは、隣接韻律語間結合強度
決定の対象となる文（入力テキスト情報）中の韻律語が
文末までに係る韻律語数と後続の韻律語が文末までに係
る韻律語数との差を示すもので、図４には、この境界の
深さの例も併せて示してある。The adjacent prosody word bond strength determining unit 125 determines the dependency strength between the two prosodic words obtained by the prosodic word dependency strength determining unit 122 and the dependency relation obtained by the prosodic word dependency analyzing unit 123. The connection strength between adjacent prosodic words is determined based on the information on the depth of the boundary in the dependency structure obtained from the above and the number of receptions obtained by the prosodic word reception number determination unit 124. Here, the depth of the boundary means the number of prosodic words in the sentence (input text information) to be determined for the connection strength between adjacent prosodic words until the end of the sentence, and the number of prosodic words in the succeeding prosodic word until the end of the sentence. FIG. 4 also shows an example of the depth of the boundary.

【００２６】さて、上記隣接韻律語間の結合強度は、主
に言語情報をもとに、構文情報を考慮して、韻律語間の
境界の依存関係の強さを表すもので、以下の(a1)〜(a5)
の５種類の強度区分に分類される。The connection strength between adjacent prosodic words represents the strength of the dependency of the boundary between prosodic words, mainly based on linguistic information and taking into account syntactic information. a1) ~ (a5)
Are classified into five types.

【００２７】(a1)特殊境界記号や括弧、引用などのように、言語外のもの（音声と
して発声しないもの）との関係などが、これに当たる。
ここでは、相互依存関係テーブル上で韻律語間依存強度
が２０のものに相当する。この他、後続の韻律語の受け
数が０のものとの関係もこれに当たる。(A1) Special Boundary The relationship with a non-language one (one that is not uttered as speech), such as a symbol, parenthesis, or quotation, corresponds to this.
In this case, the interdependence table corresponds to a prosody word dependency strength of 20. In addition to this, the relationship with the case where the number of subsequent prosodic words is 0 is also applicable.

【００２８】(a2)独立境界助詞「は」、「も」「について」などがつく名詞と用言
との関係や、接続助詞「から」、「たり」、「し」など
と用言との関係などの他、提題と叙述の依存関係など
が、これに当たる。ここでは、相互依存関係テーブル上
で韻律語間依存強度が４０のものに相当する。この他、
読点、更には境界の深さが２以上で且つ後続の韻律語の
受け数が１以上の関係もこれに当たる。(A2) Independence Boundary The relation between nouns that have particles such as "ha", "mo", and "about" and the verbs, and the relation between the conjunctive particles "kara", "ri", "shi", etc. In addition to relationships, this is the dependency between the title and the narrative. Here, the interdependency table has a prosodic inter-word dependency strength of 40. In addition,
This also applies to the relationship where the reading point and the depth of the boundary are 2 or more and the number of subsequent prosodic words is 1 or more.

【００２９】(a3)連接境界状況的な補語と用言との関係や、接続助詞「ので」、
「だから」などと用言との関係、用言の連用形と用言と
の関係、意図などを示す副詞と用言との関係などが、こ
れに当たる。ここでは、相互依存関係テーブル上で韻律
語間依存強度が６０のものに相当する。この他、境界の
深さが０の関係（隣接韻律語間が依存するもの）も、こ
れに当たる。(A3) Concatenation Boundary The relation between a situational complement and a verb, the connecting particle "NODE",
This includes the relationship between "so" and the verb, the relation between the conjunctive form of the verbal and the verbal, and the relationship between the adverb indicating intention and the verbal. Here, the prosodic word dependency strength on the interdependency relation table is equivalent to 60. In addition, the relationship where the depth of the boundary is 0 (the relationship between adjacent prosody words) also corresponds to this.

【００３０】(a4)連結境界主語、目的語、補語と用言との関係、接続助詞「なが
ら」、「つつ」と用言の関係、時、場所などの副詞と用
言との関係、連体語と名詞との関係などが、これに当た
る。ここでは、相互依存関係テーブル上で韻律語間依存
強度が８０のものに相当する。この他、境界の深さが１
で且つ後続の韻律語の受け数が１以上の関係も、これに
当たる。(A4) Connection boundary Relation between subject, object, complement and verb, relation between conjunctive particles "tsu", "tsutsu" and verb, relation between adverbs such as time and place and verb, adjunct This is the relationship between words and nouns. Here, the interdependency table has a prosodic inter-word dependency strength of 80. In addition, the boundary depth is 1
And the number of subsequent prosodic words received is one or more.

【００３１】(a5)結合境界語を構成する接頭辞、接尾辞と名詞、用言との関係、用
言と助動詞、補助用言の連鎖などが、これに当たる。こ
こでは、相互依存関係テーブル上で韻律語間依存強度が
１００のもの（接続して１文節に対応する）に相当す
る。(A5) Combination Boundaries The prefixes, suffixes and nouns, relations between words, words and auxiliary verbs, chains of auxiliary words, and the like that constitute words are applicable to this. Here, the prosodic word dependency strength on the interdependence relation table is 100 (corresponding to one phrase when connected).

【００３２】ここで、上記(a1)〜(a5)に区分される隣接
韻律語間結合強度は、(a1)の特殊境界が最も弱く、(a2)
→(a3)→(a4)の順で強くなり、(a5)の結合境界が最も強
い。以上の隣接韻律語間結合強度（の区分）は、韻律語
間依存強度決定部１２２で得られた２つの韻律語間の依
存強度、韻律語間係り受け解析部１２３で得られた係り
受け関係から求められる係り受け構造における境界の深
さ、及び韻律語受け数決定部１２４で得られた受け数の
情報をもとに、隣接韻律語間結合強度決定部１２５にて
決定される。この（言語処理部１２内の）隣接韻律語間
結合強度決定部１２５で決定された韻律語連鎖における
隣接韻律語間結合強度の情報は音韻処理部１３に渡され
る。Here, the connection strength between adjacent prosodic words classified into the above (a1) to (a5) is such that the special boundary of (a1) is the weakest and (a2)
It becomes stronger in the order of → (a3) → (a4), and the bond boundary of (a5) is the strongest. The above-mentioned inter-prosodic inter-word connection strength (division) is the dependency strength between two prosodic words obtained by the prosodic word dependency strength determining unit 122 and the dependency relation obtained by the prosodic word dependency analyzing unit 123. Is determined by the adjacent prosody word connection strength determining unit 125 based on the information on the depth of the boundary in the dependency structure obtained from the above and the number of receptions obtained by the prosody word receiving number determination unit 124. The information on the strength of connection between adjacent prosody words in the prosodic word chain determined by the strength of connection between adjacent prosody words (within the language processing unit 12) is passed to the phoneme processing unit 13.

【００３３】音韻処理部１３は、この韻律語連鎖におけ
る隣接韻律語間結合強度の情報を基に、韻律句における
境界強度を決定してピッチパターンを生成するためのフ
レーズ立ち上げ位置やフレーズ指令の大きさ、ポーズ位
置、ポーズ長などの韻律情報（韻律制御情報）を求め
る。音韻処理部１３はまた、得られた韻律情報等から、
韻律記号列と音韻記号列を生成する。この音韻処理部１
３で生成された韻律記号列と音韻記号列は音響パラメー
タ生成部１４に渡される。The phoneme processing unit 13 determines the boundary strength in the prosodic phrase based on the information on the connection strength between adjacent prosodic words in the prosodic word chain, and generates a phrase start position and a phrase command for generating a pitch pattern. Prosody information (prosody control information) such as size, pause position, and pause length is obtained. The phoneme processing unit 13 also obtains
Generate a prosody symbol string and a phoneme symbol string. This phoneme processing unit 1
The prosodic symbol string and the phoneme symbol string generated in step 3 are passed to the acoustic parameter generator 14.

【００３４】音響パラメータ生成部１４は、音韻処理部
１３から渡された韻律記号列と音韻記号列を基にして、
音声波形生成部１５内の音声合成器（図示せず）を駆動
するための音響パラメータ系列を生成する。この系列
は、音声波形生成部１５に送られて、出力部１６を通じ
て出力される。The acoustic parameter generation unit 14 generates a prosodic symbol string and a phoneme symbol string passed from the phoneme processing unit 13 based on the
An acoustic parameter sequence for driving a speech synthesizer (not shown) in the speech waveform generator 15 is generated. This sequence is sent to the audio waveform generator 15 and output via the output unit 16.

【００３５】ここで音韻処理部１３の詳細を説明する。
まず音韻処理部１３は、韻律句境界強度決定部１３１、
韻律情報生成部１３２及び音韻記号・韻律記号生成部１
３３からなる。The details of the phoneme processing unit 13 will now be described.
First, the phonemic processing unit 13 includes a prosodic phrase boundary strength determining unit 131,
Prosody information generation unit 132 and phoneme symbol / prosody symbol generation unit 1
33.

【００３６】韻律句境界強度決定部１３１は、韻律語連
鎖から韻律句を形成し、韻律句の境界における強度を決
定する。韻律句は１つ以上の韻律語連鎖から構成され、
どの韻律語同士を結合するかは、言語処理部１２から音
韻処理部１３に渡された隣接韻律語間結合強度を韻律句
境界強度決定部１３１が参照することにより決定され
る。ここでは、予め規定した値１００より強い結合強度
（値が１００の結合強度）を有する２つの韻律語は融合
されて、韻律句として生成される。なお、ここで予め規
定した値は、適宜指定する値でもよい。The prosodic phrase boundary strength determining unit 131 forms a prosodic phrase from a prosodic word chain, and determines the strength at the boundary of the prosodic phrase. A prosodic phrase is composed of one or more prosodic word chains,
Which prosodic words are to be combined is determined by referring to the prosodic phrase boundary strength determining unit 131 based on the coupling strength between adjacent prosodic words passed from the language processing unit 12 to the phoneme processing unit 13. Here, two prosody words having a connection strength higher than a predetermined value 100 (connection strength with a value of 100) are fused to generate a prosody phrase. Here, the value specified in advance may be a value appropriately designated.

【００３７】韻律句境界強度決定部１３１は、生成した
韻律句の境界において、言語処理部１２から渡された対
応する隣接韻律語間結合強度を参照して、以下に示す韻
律句境界強度を決定する。これは韻律情報におけるピッ
チパターンを生成するための情報に対応するものであ
る。The prosodic phrase boundary strength determining unit 131 determines the following prosodic phrase boundary strength at the generated prosodic phrase boundary with reference to the corresponding adjacent prosodic word connection strength passed from the language processing unit 12. I do. This corresponds to information for generating a pitch pattern in prosody information.

【００３８】さて、上記韻律句境界強度は、韻律上の境
界の強度を示すもので、以下の(b1)〜(b4)４種類の強度
区分に分類される。 (b1)独立境界主に大きいポーズを挿入するか、または大きなフレーズ
の立ち上げ行なう境界に対応する。The above-described prosodic phrase boundary strength indicates the strength of a prosodic boundary, and is classified into the following four types of strength categories (b1) to (b4). (b1) Independent Boundary This mainly corresponds to a boundary where a large pause is inserted or a large phrase is started.

【００３９】(b2)連接境界主に小さいポーズを挿入するか、またはフレーズの立ち
上げ行なう境界に対応する。(B2) Connected Boundary This mainly corresponds to a boundary where a small pause is inserted or a phrase is started.

【００４０】(b3)連結境界主にフレーズの追加を行なう境界に対応する。 (b4)なしポーズ、フレーズ共に追加しない（句境界にならない）
境界に対応する。(B3) Boundary boundary This corresponds mainly to a boundary where a phrase is added. (b4) None Do not add both poses and phrases (do not form a phrase boundary)
Corresponds to the border.

【００４１】以上の(b1)〜(b4)の韻律句境界強度と前記
(a1)〜(a5)の隣接韻律語間結合強度との対応関係は以下
の通りである。 (b1)独立境界←→(a1)特殊境界、(a2)独立境界 (b2)連接境界←→(a3)連接境界 (b3)連結境界←→(a4)連結境界 (b4)なし ←→(a5)結合境界韻律句境界強度決定部１３１では、上記した韻律句の境
界における強度を決定する際に、韻律句を構成する各韻
律語のアクセント処理を行ない、韻律句を構成する韻律
語連鎖における韻律句全体の読み、並びにモーラ数を決
定する。この韻律句境界強度決定部１３１での決定結果
は韻律情報生成部１３２に渡される。The prosodic phrase boundary strengths of (b1) to (b4) and
The correspondence between (a1) to (a5) and the connection strength between adjacent prosody words is as follows. (b1) Independent boundary ← → (a1) Special boundary, (a2) Independent boundary (b2) Connected boundary ← → (a3) Connected boundary (b3) Connected boundary ← → (a4) Connected boundary (b4) None ← → (a5 ) Joining Boundary The prosodic phrase boundary strength determining unit 131 performs accent processing of each prosodic word constituting the prosodic phrase when determining the strength at the above-described prosodic phrase boundary, and determines the prosody in the prosodic word chain forming the prosodic phrase. Determine the reading of the entire phrase, as well as the number of mora. The result of determination by the prosody phrase strength determining section 131 is passed to the prosody information generating section 132.

【００４２】韻律情報生成部１３２は、韻律句連鎖の入
力に対して、予め定められている１呼気に許されるモー
ラ数（規定モーラ数）と、韻律句に既に設定されている
モーラ数（韻律句のモーラ数）を参照して、韻律句境界
強度から当該韻律句と後続する韻律句との境界にどのよ
うな韻律制御指令を与えるか（韻律制御操作を施すか）
を判定する。The prosody information generating unit 132 responds to the input of the prosody phrase chain by setting a predetermined number of mora (a specified number of mora) allowed for one exhalation and a number of mora (prosody) already set in the prosody. With reference to the phrase mora number), based on the prosodic phrase boundary strength, what prosody control command is given to the boundary between the prosodic phrase and the following prosodic phrase (whether the prosodic control operation is performed)
Is determined.

【００４３】即ち韻律情報生成部１３２は、規定モーラ
数以上のモーラ数を有する韻律句については、韻律句を
構成している韻律語連鎖の中で、韻律語間の隣接韻律語
間結合度を参照することで、当該韻律句内の韻律語間を
分離したり、逆にモーラ数が少ない韻律句については、
後続の韻律句の隣接韻律語間結合度を参照することで融
合して１つの呼気段落とし、句境界に適切な韻律制御指
令を与える。That is, the prosody information generating unit 132 determines, for a prosodic phrase having a number of moras equal to or greater than the prescribed number of moras, the degree of connection between adjacent prosodic words between prosodic words in the prosodic word chain forming the prosodic phrase. By referencing, the prosodic words in the prosodic phrase can be separated, and conversely, for prosodic phrases with a small number of mora,
The subsequent prosodic phrases are merged into one exhalation paragraph by referring to the degree of connection between adjacent prosodic words, and an appropriate prosody control command is given to a phrase boundary.

【００４４】このように韻律情報生成部１３２は、再計
算をしながら次々と句境界の韻律制御指令を求める処理
を、最終の韻律句の処理が終わるまで続ける。図５及び
図６は、韻律情報生成のアルゴリズムを示したものであ
る。なお、ここで予め規定した値は、適時指定する値で
もよい。As described above, the prosody information generating unit 132 continues the process of obtaining the prosody control commands of the phrase boundaries one after another while performing the recalculation until the processing of the final prosody phrase is completed. 5 and 6 show an algorithm for generating prosody information. Note that the value specified in advance may be a value specified as appropriate.

【００４５】上記韻律情報は、各句境界における韻律制
御指令の概算的レベルを示す。この韻律制御指令の概算
的レベルには、フレーズ立ち上げ指令の大きさの概算的
レベルと、ポーズ（休止区間長）挿入指令の大きさの概
算的レベルと、モーラ長指令値（規定モーラ数）の概算
的レベルとがある。The prosody information indicates the approximate level of the prosody control command at each phrase boundary. The approximate level of the prosody control command includes the approximate level of the phrase start command, the approximate level of the pause (pause section length) insertion command, and the mora length command value (specified number of mora). There is an approximate level.

【００４６】音韻記号・韻律記号生成部１３３における
上記韻律記号列の生成処理では、上記韻律制御指令（フ
レーズ立ち上げ指令、ポーズ挿入指令）の位置情報（フ
レーズ立ち上げ位置、ポーズ挿入位置）と概算的レベル
（各指令の大きさ）が記号に変換される。また、予め規
定している１呼気に許されるモーラ数などのモーラ長指
令値も同様に概算的レベルで示される。これは発声速度
などにより適時変更・指定してもよい。In the process of generating the prosody symbol string in the phoneme symbol / prosodic symbol generation unit 133, the position information (phrase start position, pause insertion position) of the prosody control command (phrase start command, pause insert command) is roughly calculated. The target level (the size of each command) is converted to a symbol. Similarly, a prescribed mora length command value such as a prescribed number of mora allowed for one exhalation is also indicated at an approximate level. This may be changed or designated as appropriate according to the utterance speed or the like.

【００４７】本実施形態において、フレーズ立ち上げ指
令の大きさの概算的レベルは、（フ１），（フ２），
（フ３）の３レベルが設定されるようになっており、そ
の大小関係は（フ１）＞（フ２）＞（フ３）である。In the present embodiment, the approximate levels of the magnitude of the phrase start command are (F1), (F2),
Three levels of (F3) are set, and the magnitude relation is (F1)>(F2)> (F3).

【００４８】また、ポーズ挿入指令の大きさの概算的レ
ベルは（ポ１），（ポ２），（ポ３）の３レベルが設定
されるようになっており、その大小関係は（ポ１）＞
（ポ２）＞（ポ３）である。The approximate level of the magnitude of the pause insertion command is set at three levels (Po1), (Po2), and (Po3), and the magnitude relation is (Po1). )>
(Po2)> (Po3).

【００４９】また、モーラ長指令値の概算的レベルは
Ｓ，Ｔ，Ｕの３レベルが設定されるようになっており、
その大小関係はＵ＞Ｔ＞Ｓである。なお、１文における
先頭には（フ１）、文末には（ポ１）を挿入するものと
する。また読点には（フ１）、（ポ１）を挿入するもの
とする。The approximate level of the mora length command value is set to three levels of S, T, and U.
The magnitude relation is U>T> S. Note that (F1) is inserted at the beginning of a sentence and (Po1) is inserted at the end of one sentence. It is assumed that (F1) and (Po1) are inserted at the reading points.

【００５０】さて本実施形態では、文中における韻律句
の連鎖には文頭から連続する番号ｎ（ｎ＝１〜Ｎ、Ｎは
末尾の最終韻律句）が付与されている。韻律情報生成部
１３２での各韻律句境界における韻律情報の生成は、１
番目の韻律句から処理が始まり、Ｎ番目の韻律句の処理
が終了するまで、図５に示すアルゴリズムに従って行な
われる。以下、ｎ番目の韻律句の処理を図５に示すアル
ゴリズムに従って説明する。In the present embodiment, a chain of prosodic phrases in a sentence is given a number n (n = 1 to N, where N is the last prosodic phrase at the end) which is continuous from the beginning of the sentence. The generation of the prosody information at each prosodic phrase boundary by the prosody information generation unit 132 is 1
The processing starts from the ith prosody phrase and is performed according to the algorithm shown in FIG. 5 until the processing of the Nth prosody phrase ends. Hereinafter, the processing of the n-th prosodic phrase will be described according to the algorithm shown in FIG.

【００５１】（処理１）まずｎ番目の韻律句において、
後続する韻律句との境界における境界強度（の区分）を
韻律句境界強度決定部１３１の結果から取得する処理を
行なう（ステップＳ２）。なお、ｎの初期値は１である
（ステップＳ１）。(Process 1) First, in the n-th prosodic phrase,
A process of acquiring (partition) the boundary strength at the boundary with the following prosody phrase from the result of the prosody phrase boundary strength determination unit 131 is performed (step S2). Note that the initial value of n is 1 (step S1).

【００５２】（処理２）次に、ステップＳ２の結果、ｎ
番目の韻律句において、後続する韻律句との境界（句境
界）における境界強度が独立境界であったならば（ステ
ップＳ３）、当該句境界に（フ２）、（ポ２）を挿入す
る（ステップＳ４〜Ｓ６）。(Process 2) Next, as a result of step S2, n
In the second prosodic phrase, if the boundary strength at the boundary (phrase boundary) with the following prosodic phrase is an independent boundary (Step S3), (F2) and (Po2) are inserted at the phrase boundary (Step S3). Steps S4 to S6).

【００５３】但し、直前の句境界（フ１）または（フ
２）からの距離がＳモーラ以下ならば（ステップＳ
４）、当該句境界に（フ３）、（ポ２）を挿入する（ス
テップＳ７）。However, if the distance from the immediately preceding phrase boundary (F1) or (F2) is equal to or less than S mora (step S
4) Insert (F3) and (Po2) at the phrase boundary (step S7).

【００５４】また直後の文末までの距離がＴモーラ以下
の場合は、当該句境界に（フ３）のみを挿入する（ステ
ップＳ８）。この場合、直前の独立境界からの距離がＵ
モーラ以上離れているならば（ステップＳ９）、休止を
伴わない連接境界において、（ポ３）を挿入する（ステ
ップＳ１０）。If the distance to the end of the sentence immediately after is less than T mora, only (F3) is inserted at the phrase boundary (step S8). In this case, the distance from the immediately preceding independent boundary is U
If it is separated by more than a mora (step S9), (po3) is inserted at the connection boundary without pause (step S10).

【００５５】次にｎがＮに一致しないならば、当該ｎを
１インクリメントし（ステップＳ１１，Ｓ１２）、上記
（処理１の）ステップＳ２に戻り処理を続ける。（処理３）一方、ｎ番目の韻律句において、後続する韻
律句との境界における境界強度（の区分）が（独立境界
ではなくて）連結境界であったならば（ステップＳ１
３）、当該句境界に（フ３）を挿入する（ステップＳ１
４，Ｓ１５）。Next, if n does not coincide with N, the n is incremented by 1 (steps S11 and S12), and the process returns to step S2 (of process 1) to continue the process. (Process 3) On the other hand, in the n-th prosodic phrase, if the (partition) of the boundary strength at the boundary with the following prosodic phrase is a connection boundary (not an independent boundary) (step S1)
3), (F3) is inserted at the phrase boundary (step S1)
4, S15).

【００５６】但し、直前の句境界（フ１）、（フ２）ま
たは（フ３）からの距離がＳモーラ以下ならば（ステッ
プＳ１４）、何も挿入しない。このことは、後続の韻律
句と融合して１つの呼気段落とすることを意味する。次
にｎを１インクリメントし（ステップＳ１２）、上記
（処理１の）ステップＳ２に戻り処理を続ける。However, if the distance from the immediately preceding phrase boundary (F1), (F2) or (F3) is equal to or less than S mora (step S14), nothing is inserted. This means that the subsequent prosodic phrase is merged into one exhalation paragraph. Next, n is incremented by 1 (step S12), and the process returns to step S2 (of process 1) to continue.

【００５７】（処理４）また、直前の連結境界からの距
離がＴモーラ以上の連結境界の場合（ステップＳ１
６）、連接境界間における連結境界において、隣接韻律
語結合強度の最も弱い境界に（フ３）を挿入する（ステ
ップＳ１７，Ｓ１８）。但し、各連接境界における隣接
韻律語結合強度がすべて等しい場合、連結境界間におけ
る全ての韻律句連鎖がＴモーラ以下になるように均等に
連接境界に（フ３）を挿入する（ステップＳ１９）。(Process 4) In the case where the distance from the immediately preceding connection boundary is a connection boundary of T mora or more (step S1)
6) At the connection boundary between the connection boundaries, (F3) is inserted at the boundary having the weakest prosodic word connection strength (steps S17 and S18). However, if the adjacent prosodic word binding strengths at each connection boundary are all equal, (F3) is inserted evenly into the connection boundary so that all prosodic phrase chains between the connection boundaries are equal to or less than T mora (step S19).

【００５８】韻律情報生成部１３２では、上記した一連
の処理がｎ＝１からｎ＝ＮまでＮ回繰り返し実行される
（ステップＳ１１）。韻律情報生成部１３２での処理結
果は音韻記号・韻律記号生成部１３３に渡される。音韻
記号・韻律記号生成部１３３は、韻律情報生成部１３２
での処理結果から得られる、テキスト情報を構成する語
彙のそれぞれの読み、及び読みに対する語音の表層にお
ける音調変化を基に音韻記号列を生成すると共に、韻律
句境界の音調性質と韻律情報を基に韻律記号列を生成す
る。ここでは、語音に対応する音節音の記号、アクセン
トを指示する記号、休止区間長を指定する記号が生成さ
れることになる。In the prosody information generating unit 132, the above series of processing is repeatedly executed N times from n = 1 to n = N (step S11). The processing result of the prosody information generation unit 132 is passed to the phoneme symbol / prosodic symbol generation unit 133. The phoneme symbol / prosodic symbol generation unit 133 includes a prosody information generation unit 132
A phonological symbol string is generated based on the pronunciation of each vocabulary composing text information obtained from the processing result in step 1, and the tonal change in the surface layer of the vocal sound corresponding to the reading, and based on the tonal properties of the prosodic phrase boundaries and the prosodic information. To generate a prosodic symbol string. Here, a syllable symbol, a symbol indicating an accent, and a symbol specifying a pause interval length corresponding to a speech sound are generated.

【００５９】この音韻記号列と韻律記号列は、音響パラ
メータ生成部１４に渡される。音響パラメータ生成部１
４は、この音韻記号列と韻律記号列から音声合成器を駆
動するための音響パラメータ系列を生成する。この系列
は、音声波形生成部１５に送られて、出力部１６を通じ
て出力される。The phoneme symbol sequence and the prosody symbol sequence are passed to the acoustic parameter generation unit 14. Sound parameter generator 1
Reference numeral 4 generates an acoustic parameter sequence for driving the speech synthesizer from the phoneme symbol sequence and the prosody symbol sequence. This sequence is sent to the audio waveform generator 15 and output via the output unit 16.

【００６０】以上、図１の構成の音声合成装置における
各構成要素の基本機能について説明した。次に、当該音
声合成装置における動作の具体例を、図７及び図８を参
照して、図７（ａ）に示す入力テキスト情報（図４
（ａ）に示したのと同一の漢字仮名交じり文）に対する
韻律処理の場合につき説明する。The basic function of each component in the speech synthesizer having the configuration shown in FIG. 1 has been described above. Next, a specific example of the operation in the speech synthesizer will be described with reference to FIG. 7 and FIG.
The prosody processing for the same kanji kana sentence (a) shown in FIG.

【００６１】まず、入力部１１から送られた漢字仮名交
じり文「あらゆる現実をすべて自分のほうへねじ曲げた
のだ。」は、言語処理部１２（内の韻律語形成部１２
１）における形態素解析処理に供され、その形態素解析
結果から、自律・他律韻律素の連鎖において、少なくと
も１の自律韻律素もしくは語基に所属する他律韻律素自
体により、或いは少なくとも１つの自律韻律素もしくは
語基に所属する他律韻律素と、それに後続する１個以上
の助辞、接辞に所属する他律韻律素を結合することで、
韻律語が構成される。First, the kanji kana sentence sentence sent from the input unit 11 is that all the realities have been twisted toward ourselves. The language processing unit 12 (the prosodic word forming unit 12 in the
In the morphological analysis process in 1), based on the morphological analysis results, at least one autonomous prosodic element or another autonomous prosodic element belonging to the base, or at least one autonomous prosodic element in a chain of autonomous / otherrhythmic prosody. By combining other prosodic elements belonging to a prosodic element or a base and one or more subsequent particles or affixes belonging to affixes,
Prosody words are composed.

【００６２】このようにして、上記漢字仮名交じり文の
形態素解析結果から形成される韻律語の連鎖は、次のよ
うになる。［あらゆる］［現実｜を］［すべて］［自分｜の］［ほ
う｜へ］［ねじ曲げた｜のだ］ここで、“｜”で区切られたものは自律・他律韻律素
を、“［”と“］”とで囲まれたものは韻律語を示す。The chain of prosodic words formed from the result of the morphological analysis of the kanji kana mixed sentence as described above is as follows. [Every] [reality |] [all] [your | of] [to | | [bent | |] Here, those separated by "|" Those enclosed by "and"] "indicate prosodic words.

【００６３】次に言語処理部１２（内の韻律語間依存強
度決定部１２２）では、（韻律語形成部１２１により形
成された）上記韻律語の連鎖を対象に図２の相互依存関
係テーブルを参照することで、韻律語を単位として、韻
律語間の依存強度が決定される。ここでは、（末尾の韻
律語を除く）各韻律語について、後続するすべての韻律
語間の組合わせにおいての依存関係（依存強度）が図７
（ｂ）のように得られる。依存強度は、前記したように
５段階の数値で示される。０以外は依存関係があること
を示し、数値が大きいほど依存関係が強いことを示す。Next, the language processing unit 12 (the prosodic word inter-dependence strength determining unit 122 therein) generates the interdependency table of FIG. 2 for the chain of the prosodic words (formed by the prosodic word forming unit 121). By referencing, the degree of dependence between prosodic words is determined in units of prosodic words. Here, for each prosodic word (excluding the last prosodic word), the dependency (dependency) in the combination between all succeeding prosodic words is shown in FIG.
(B) is obtained. The dependence strength is indicated by numerical values in five stages as described above. A value other than 0 indicates that there is a dependency, and a larger value indicates a stronger dependency.

【００６４】次に言語処理部１２（内の韻律語間係り受
け解析部１２３）では、図７（ｂ）に示したような韻律
語間の依存強度を基に、係り受け関係が求められる。こ
こでは、文構造を近似するために、曖昧性を削減するた
めの原則を適用することにより、図７（ｃ）のような係
り受けの構造が得られる。Next, the language processing unit 12 (the inter-prosodic word dependency analysis unit 123 therein) obtains the dependency relationship based on the dependency strength between prosodic words as shown in FIG. 7B. Here, by applying a principle for reducing ambiguity in order to approximate the sentence structure, a dependency structure as shown in FIG. 7C is obtained.

【００６５】次に言語処理部１２（内の韻律語受け数決
定部１２４）では、図７（ｃ）に示した係り受けの構造
から、図７（ｄ）に示すような各韻律語の受け数が求め
られる。この韻律語の受け数は、上記の係り受けの構造
上で、前方の韻律語からどの程度係られているかを示す
ものである。Next, the language processing section 12 (in which the number of prosodic words received is determined by the section 124) receives the respective prosodic words as shown in FIG. 7D from the dependency structure shown in FIG. 7C. A number is required. This number of prosodic words indicates how much the prosodic word is related from the preceding prosodic word in the structure of the dependency.

【００６６】次に言語処理部１２（内の隣接韻律語間結
合強度決定部１２５）では、それまでに（韻律語間依存
強度決定部１２２、韻律語間係り受け解析部１２３及び
韻律語受け数決定部１２４により）得られている、韻律
語間の依存強度、係り受け構造（から求められる境界の
深さ）、及び受け数の情報を用いて、図８（ａ）に示す
ような隣接する韻律語間の結合強度（の区分）が決定さ
れる。Next, the language processing unit 12 (in which the inter-prosodic inter-prosodic word connection strength determining unit 125) has already performed (the prosodic inter-word dependency strength determining unit 122, the prosodic inter-word dependency analyzing unit 123, and the Using the information of the dependency strength between the prosodic words, the dependency structure (depth of the boundary determined from it), and the number of receptions obtained by the determining unit 124), the adjacent words as shown in FIG. The connection strength between the prosodic words is determined.

【００６７】音韻処理部１３では、言語処理部１２にて
得られた韻律語連鎖における隣接韻律語間結合強度（の
区分）から、以下に述べるように韻律情報を生成する。
まず音韻処理部１３（内の韻律句境界強度決定部１３
１）では、図８（ａ）に示した韻律語連鎖における隣接
韻律語間結合強度を基に、韻律句における境界強度（の
区分）が求められる。この例では、韻律語連鎖における
各々の韻律語が韻律句となり、各韻律句の境界強度（の
区分）は図８（ｂ）に示すようになる。ここでは、韻律
句を構成する韻律語連鎖における韻律句全体の読み、並
びにモーラ数も決定される。このモーラ数については図
４（ｂ）を参照されたい。The phoneme processing unit 13 generates prosody information from the prosodic word connection strength (division) in the prosodic word chain obtained by the language processing unit 12 as described below.
First, the phonological processing unit 13 (the prosodic phrase boundary strength determining unit 13
In 1), the boundary strength (division) of the prosodic phrase is obtained based on the connection strength between adjacent prosodic words in the prosodic word chain shown in FIG. 8A. In this example, each prosodic word in the prosodic word chain is a prosodic phrase, and the boundary strength (division) of each prosodic phrase is as shown in FIG. 8B. Here, the reading of the entire prosodic phrase in the prosodic word chain constituting the prosodic phrase and the number of mora are also determined. See FIG. 4B for the number of moras.

【００６８】次に音韻処理部１３（内の韻律情報生成部
１３２）では、上記韻律句における境界強度（の区分）
とモーラ数の情報を基に、韻律情報が生成される。ここ
では、図５及び図６に示すアルゴリズムに従って、図８
（ｃ）に示すような概算的レベルで表される韻律制御指
令（フレーズ立ち上げ指令、ポーズ挿入指令）が与えら
れる。なお、Ｓ，Ｔ，Ｕの示すモーラ数は、Ｓ＝４，Ｔ
＝１５，Ｕ＝３５であるものとする。Next, the phoneme processing unit 13 (the prosody information generating unit 132 in the phoneme processing unit) performs (the division of) the boundary strength in the prosody phrase.
And prosodic information is generated based on the information on the mora number. Here, according to the algorithm shown in FIGS. 5 and 6, FIG.
A prosody control command (phrase start command, pause insertion command) represented at an approximate level as shown in FIG. The number of mora indicated by S, T, and U is S = 4, T
= 15 and U = 35.

【００６９】このように、フレーズ立ち上げ位置やフレ
ーズ指令の大きさ、ポーズ位置やポーズ長などの韻律情
報が求められると、音韻処理部１３（内の音韻記号・韻
律記号生成部１３３）では、テキスト情報を構成する語
彙のそれぞれの読み、読みに対する語音の表層における
音調変化、韻律句境界の音調性質と韻律情報を基に、音
節音の記号（からなる音韻記号列）、及びアクセントを
指示する記号、休止区間長を指定する記号（からなる韻
律記号列）が生成される。As described above, when the prosody information such as the phrase start position, the size of the phrase command, the pause position, and the pause length is obtained, the phoneme processing unit 13 (the phoneme symbol / prosody symbol generation unit 133 therein) Indicate the pronunciation of each vocabulary composing the text information, the tone change in the surface layer of the vocabulary to the pronunciation, the syllable symbol (a phonological symbol sequence consisting of), and the accent based on the tone properties and prosodic information of the prosodic phrase boundaries. A symbol (a prosody symbol string consisting of) that designates a symbol and a pause interval length is generated.

【００７０】すると音響パラメータ生成部１４では、音
韻処理部１３（内の音韻記号・韻律記号生成部１３３）
で求められた韻律記号列と音韻記号列を基にして、音声
合成器を駆動するための音響パラメータ系列が生成され
る。音声波形生成部１５では、この音響パラメータ系列
から音声波形が生成され、出力部１６を通じて音声出力
される。Then, in the acoustic parameter generation unit 14, the phoneme processing unit 13 (the phoneme symbol / prosodic symbol generation unit 133 therein)
Based on the prosody symbol sequence and the phoneme symbol sequence obtained in the above, an acoustic parameter sequence for driving the speech synthesizer is generated. The audio waveform generation unit 15 generates an audio waveform from the acoustic parameter sequence, and outputs the audio through the output unit 16.

【００７１】このように本実施形態においては、文章中
の韻律語から、韻律語間の依存強度、係り受け構造、及
び受け数の（うちの少なくとも１つの）情報を用いて、
隣接する韻律語間の結合強度を決定して韻律句間の境界
強度を求め、この境界強度を用いて韻律制御を行なうこ
とにより、韻律語を制御単位とした簡便な処理で自然な
韻律制御が可能となる。As described above, in the present embodiment, from the prosodic words in the sentence, information on (at least one of) the dependency strength between the prosodic words, the dependency structure, and the number of receptions is used.
By determining the bond strength between adjacent prosodic words and determining the boundary strength between prosodic phrases, and performing prosodic control using this boundary strength, natural prosody control can be performed with simple processing using prosodic words as control units. It becomes possible.

【００７２】以上に述べた図１の構成の音声合成装置
は、コンピュータ、例えば図９に示すパーソナルコンピ
ュータ９０を、入力部１１、言語処理部１２、音韻処理
部１３、音響パラメータ生成部１４、音声波形生成部１
５、及び出力部１６として機能させるためのプログラム
を記録した記録媒体、例えばフロッピーディスク（Ｆ
Ｄ）９１を用い、当該フロッピーディスク９１をパーソ
ナルコンピュータ９０に装着して、当該フロッピーディ
スク９１に記録されているプログラムをパーソナルコン
ピュータで９０で読み取り実行させることにより実現さ
れる。The above-described speech synthesizer having the configuration shown in FIG. 1 includes a computer, for example, a personal computer 90 shown in FIG. 9 and an input unit 11, a language processing unit 12, a phoneme processing unit 13, an acoustic parameter generation unit 14, Waveform generator 1
5 and a recording medium on which a program for functioning as the output unit 16 is recorded, for example, a floppy disk (F
D) By using the 91, the floppy disk 91 is mounted on the personal computer 90, and the program recorded on the floppy disk 91 is read and executed by the personal computer 90.

【００７３】[0073]

【発明の効果】以上説明したように、本発明によれば、
韻律語を構成単位としてテキスト情報を解析し、韻律語
間の依存関係、係り受け関係、受け数により隣接韻律語
間の結合強度を設定することで、それに基づいて韻律句
の結合強度を求め、句境界における韻律情報を生成する
ことで、文全体の自然な韻律制御を可能とすることがで
きる。As described above, according to the present invention,
Analyzing text information with prosodic words as constituent units, setting the connection strength between adjacent prosodic words by the dependency relationship between prosodic words, dependency relationship, and the number of receptions, finds the connection strength of prosodic phrases based on it, By generating prosody information at phrase boundaries, natural prosody control of the entire sentence can be made possible.

[Brief description of the drawings]

【図１】本発明の一実施形態に係る音声合成装置の構成
を示すブロック図。FIG. 1 is a block diagram showing a configuration of a speech synthesizer according to an embodiment of the present invention.

【図２】韻律語間の依存関係設定時に参照される相互依
存関係テーブルの内容例を示す図。FIG. 2 is a view showing an example of the contents of an interdependency table which is referred to when setting a dependency between prosodic words.

【図３】韻律語間の係り受け関係の解析において適用さ
れる、文構造の曖昧性を削減するための原則例を示す
図。FIG. 3 is a diagram showing an example of principles applied to the analysis of dependency relations between prosodic words for reducing ambiguity in sentence structure.

【図４】入力テキスト情報の例と、その入力テキスト情
報に対する言語処理部１２での言語処理結果である、韻
律語間の依存強度、係り受け関係、受け数、各韻律語の
モーラ数、境界の深さ、隣接韻律語間の結合強度の例を
対比して示す図。FIG. 4 shows an example of input text information and the results of language processing performed by the language processing unit 12 on the input text information, such as the dependency strength between prosodic words, the dependency relation, the number of receptions, the number of mora of each prosodic word, and the boundary. The figure which contrasts and shows the example of the connection depth between adjacent prosody words.

【図５】韻律情報生成部１３２における韻律情報生成の
アルゴリズムの一部を示す図。FIG. 5 is a diagram showing a part of an algorithm for generating prosody information in a prosody information generating unit 132;

【図６】韻律情報生成部１３２における韻律情報生成の
アルゴリズムの残りを示す図。FIG. 6 is a diagram showing the rest of the algorithm for generating prosody information in the prosody information generation unit 132.

【図７】図１の構成の音声合成装置における動作の具体
例を説明するための図。FIG. 7 is a diagram for explaining a specific example of the operation of the speech synthesizer having the configuration of FIG. 1;

【図８】図１の構成の音声合成装置における動作の具体
例を説明するための図。FIG. 8 is a diagram for explaining a specific example of an operation in the speech synthesis device having the configuration of FIG. 1;

【図９】図１の構成の音声合成装置の各部の機能を実現
するためのプログラムを記録したフロッピーディスクが
装着されるパーソナルコンピュータの外観を示す図。9 is a diagram showing the appearance of a personal computer on which a floppy disk storing a program for realizing the function of each unit of the speech synthesizer having the configuration shown in FIG. 1 is mounted.

[Explanation of symbols]

１１…入力部、１２…言語処理部、１３…音韻処理部、１４…音響パラメータ生成部、１５…音声波形生成部、１６…出力部、１２１…韻律語形成部、１２２…韻律語間依存強度決定部、１２３…韻律語間係り受け解析部、１２４…韻律語受け数決定部、１２５…隣接韻律語間結合強度決定部、１３１…韻律句境界強度決定部、１３２…韻律情報生成部、１３３…音韻記号・韻律記号生成部。 DESCRIPTION OF SYMBOLS 11 ... Input part, 12 ... Language processing part, 13 ... Phoneme processing part, 14 ... Sound parameter generation part, 15 ... Speech waveform generation part, 16 ... Output part, 121 ... Prosody word formation part, 122 ... Prosody word dependence intensity Determining unit 123: prosodic inter-word dependency analyzing unit 124: prosodic word receiving number determining unit 125: adjacent prosodic word connection strength determining unit 131: prosodic phrase boundary strength determining unit 132: prosodic information generating unit 133 ... Phonological / prosodic symbol generation unit.

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 13/08 ──────────────────────────────────────────────────続き Continued on the front page (58) Field surveyed (Int.Cl. ⁷ , DB name) G10L 13/08

Claims

(57) [Claims]

1. Prosody information and phoneme information are generated from input text information, sound parameters necessary for speech synthesis are generated from the prosody information and phoneme information, and a synthesizer is driven according to the sound parameters to generate a speech waveform. A morphological analysis is performed on the input text information, and a prosody word that is a minimum unit that determines one accent type is generated from the analysis result. Prosodic word forming means for forming a chain, prosodic word dependency strength determining means for obtaining the dependency strength between each prosodic word in the prosodic word chain formed by the prosodic word forming means, and said prosodic word dependency strength determining means Prosody word dependency analysis means for obtaining a dependency relationship between prosodic words based on the dependency strength between prosodic words in the chain of prosodic words obtained by A prosodic word receiving number determining means for determining the receiving number of each prosodic word in the chain of prosodic words from the dependency relation between prosodic words determined by the inter-prosodic dependency analyzing means, The prosody word dependency analysis unit and the prosody word reception number determination unit determine the prosody word dependency strength, the dependency relationship, and the number of receptions, respectively, using at least one of the prosody word. Means for determining the strength of connection between adjacent prosody words in the chain, and determining the strength of the boundary between prosodic phrases by the strength of connection between adjacent prosody words obtained by the means for determining strength of connection between adjacent prosody words. A prosodic phrase boundary strength determining means for determining, a prosody information generating means for generating prosodic information of the text information in consideration of the prosodic phrase boundary strength obtained by the prosodic phrase boundary strength determining means; A speech synthesizer comprising:

2. The method according to claim 1, further comprising an interdependence table indicating interdependence between grammatical attributes of each prosodic word, wherein said interprosodic word dependency strength determining means converts a prosodic word chain formed by said prosodic word forming means. 2. The speech synthesis apparatus according to claim 1, wherein the inter-prosodic word dependency strength is obtained by referring to the interdependence table based on grammatical attribute information of the prosodic word as an input.

3. The dependency relation used by the adjacent prosody word connection strength determining means for determining the strength of connection between adjacent prosody words is defined by the number of prosody words in which the target prosody word in the text information reaches the end of the sentence. 2. The speech synthesizer according to claim 1, wherein the following prosodic word is a depth of a boundary indicating a difference from the number of prosodic words up to the end of the sentence.

4. The prosodic information generating means calculates a prosodic phrase based on a predetermined prescribed number of mora, a number of prosodic mora, and a prosodic phrase boundary strength for a predetermined sequence of prosodic words. 2. The speech synthesizer according to claim 1, wherein a prosody control command to be set at a boundary between the prosody phrase and the following prosody phrase is determined.

5. The prosodic information generating means, for a prosodic phrase having a number of moras equal to or greater than the prescribed number of moras, determines a prosodic word in the prosodic area based on the corresponding connection strength between adjacent prosodic words. A prosodic phrase having a mora number smaller than the prescribed number of mora is merged with a subsequent prosodic phrase to form one exhalation paragraph based on the connection strength between adjacent prosodic words of the subsequent prosodic phrase. 5. The speech synthesizer according to claim 4, wherein:

6. Prosody information and phoneme information are generated from input text information, sound parameters necessary for speech synthesis are generated from the prosody information and phoneme information, and a synthesizer is driven according to the sound parameters to generate a speech waveform. A prosody information generation method applied to a speech synthesis device that generates and outputs a prosody word, wherein a morphological analysis is performed on the input text information, and a prosody word that is a minimum unit that determines one accent type from the analysis result. A first step of forming a chain of prosodic words, a second step of determining the dependency strength between each prosodic word in the chain of prosodic words, A third step of obtaining a dependency relationship between prosodic words based on the dependency strength; and the number of prosodic words in the chain of the prosodic words from the dependency relationship between the prosodic words. Determining a fourth step of determining a connection strength between adjacent prosody words in the chain of prosody words using at least one of a dependency strength between the prosodic words, a dependency relationship, and the number of receptions. A step of determining a boundary strength of a prosodic phrase based on a connection strength between the adjacent prosody words; and a seventh step of generating prosody information of the text information in consideration of the prosody phrase boundary strength. And a prosody information generating method.