JPH1078795A

JPH1078795A - Speech synthesizing device

Info

Publication number: JPH1078795A
Application number: JP8233879A
Authority: JP
Inventors: Masayo Ikeno; 雅代池野
Original assignee: Secom Co Ltd
Current assignee: Secom Co Ltd
Priority date: 1996-09-04
Filing date: 1996-09-04
Publication date: 1998-03-24

Abstract

PROBLEM TO BE SOLVED: To obtain the rule capable of accurately determining the phoneme duration time length of mora phonemes, such as syllabic nasal and geminative consonants, and phoneme systems, such as vowel unvoiced phoneme systems and eventually to easily obtain more natural synthesized speeches. SOLUTION: A block and segment part 40 determines the phoneme blocks by segmenting the phoneme symbol strings in the uttered vowels. A section 42 for setting the length of the time between the energy centroids of the vowel parts determines the length DG of the time between the energy centroids of the vowel parts to the phoneme section with the kinds and utterance speeds of the phoneme system held between the vowel parts at both ends of the phoneme section as parameters by using the rhythm rule (A) 44. A section 46 for setting the duration time length of the respective phonemes segments the phoneme blocks to respective phoneme parts and determines the phoneme duration time length of the respective phoneme parts with DG and the kinds of the phoneme systems as parameters by using the rhythm rule (B) 48. The kinds of the phoneme systems are discriminated by the kinds of the consonants following, for example, the syllabic nasal, the kinds of the consonants constituting the geminative consonants and the vowel unvoiced phoneme systems.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音韻継続時間長を
利用して音声合成する音声合成装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech synthesizer for synthesizing speech using a phoneme duration.

【０００２】[0002]

【従来の技術】従来、モーラ（mora）を単位とし、音韻
継続時間長を制御して音声合成を行う音声合成装置があ
る。ここで、モーラとは、日本語音声において単音節に
相当する韻律の単位であり、大まかには仮名１文字が１
モーラに相当する。また、音韻継続時間長とは、音声を
合成する際に各音韻に割り当てる時間長のことである。2. Description of the Related Art Conventionally, there is a speech synthesizer which performs speech synthesis by controlling a phoneme duration time in units of mora. Here, the mora is a unit of prosody corresponding to a single syllable in Japanese speech.
Corresponds to Mora. Further, the phoneme duration is a time length assigned to each phoneme when synthesizing a voice.

【０００３】従来の音声合成装置では、音韻継続時間長
を制御する規則として、相隣り合う２つのモーラを単位
とし、これに含まれるＶＣＶ（母音／子音／母音）形の
音韻系列に基づいて音韻継続時間長を定めるようにして
いた。具体的には、これら各モーラの母音部（Ｖ）に関
する音声波形のエネルギーの時間積分の重心点（ＣＥＧ
Ｖ：The Center of Energy Gravity of Vowels）を用
い、これら２つのＣＥＧＶ間の時間長、すなわち当該２
つの母音部（Ｖ）の間に存在する子音部（Ｃ）の種類に
応じて母音部エネルギー重心点間時間長ＤGを求め、さ
らに各母音及び子音の音韻継続時間長を定めていた（特
開平５−２８１９９３号公報、特開平６−２６６３９１
号公報、及び特開平６−２７４１９５号公報）。In a conventional speech synthesizer, as a rule for controlling the duration of a phoneme, two adjacent mora units are used as units, and a phoneme sequence based on a VCV (vowel / consonant / vowel) type contained in the mora is used. The duration was set. Specifically, the center of gravity (CEG) of the time integral of the energy of the speech waveform for the vowel part (V) of each mora
V: The Center of Energy Gravity of Vowels), the time length between these two CEGVs,
According to the type of the consonant part (C) existing between two vowel parts (V), the time length DG between the vowel part energy centroids is obtained, and the phoneme duration of each vowel and consonant is determined (Japanese Patent Laid-Open No. 5-281993, JP-A-6-266391
And JP-A-6-274195).

【０００４】[0004]

【発明が解決しようとする課題】日本語には、音韻継続
時間長の観点では他のモーラと同等の資格をもつが、そ
れ自体では単音節を作れず、常に他の音節の付属成分に
とどまる撥音『ん』、促音『っ』、長音『ー』といった
「モーラ音素」（宮地編：日本語と日本語教育．第一巻
日本語学要説（１９８９）明治書院）と呼ばれる特殊な
モーラ類が存在する。The Japanese language has the same qualifications as other mora in terms of phonological duration, but cannot create monosyllables by itself and always stays in the adjunct component of other syllables A special type of mora called "Mora phonemes" (Miyaji: Japanese and Japanese education. Volume 1 Japanese Studies (1989) Meiji Shoin), such as "N", "G", and "-" Exists.

【０００５】従来の音声合成装置では、母音部間に上述
したモーラ音素が含まれていると、母音部エネルギー重
心点間時間長ＤG及び音韻継続時間長を求める際に、モ
ーラ音素を一般の母音または子音として取り扱ってい
た。そのため、モーラ音素の特徴のうち捨象される側面
が生じ、上記両時間長はモーラ音素を十分に表現できな
かった。In the conventional speech synthesizer, when the above-mentioned mora phoneme is included between vowel parts, the mora phoneme is converted to a general vowel when calculating the time length DG between the vowel part energy centroids and the phoneme duration time. Or it was treated as a consonant. As a result, aspects of the features of the mora phoneme were discarded, and the two time lengths could not sufficiently represent the mora phoneme.

【０００６】また、日本語には、人間が母音を発生する
ときの声道形状を口・のど等により形成し声帯を振動さ
せないときや該声道形状すら形成させないときに、一般
の母音の特徴である有声という性質が失われる音韻系
列、つまり母音の無声化を生じる音韻系列がある。この
母音の無声化を生じる音韻系列を本願明細書中では、母
音無声化音韻系列と称している。このような無声化する
母音を含んだ音韻系列に対して、従来の音声合成装置で
は、特別な配慮が成されておらず、つまり一般の有声の
母音に対すると同様、ＶＣＶ形の音韻系列のパターンが
適用され、無声化するはずの母音に母音部エネルギー重
心点ＣＥＧＶが無理に設定されていた。そのため、この
取り扱いにより得られる母音部エネルギー重心点間時間
長ＤG及び音韻継続時間長は、母音の無声化を十分に表
現できなかった。[0006] In Japanese, when a human generates vowels, the vocal tract shape is formed by the mouth and throat and the vocal cords are not vibrated or even the vocal tract shape is not formed. There is a phoneme sequence in which the voiced property is lost, that is, a phoneme sequence that causes vowel devoicing. The phoneme sequence that causes the vowel to be unvoiced is referred to herein as a vowel unvoiced phoneme sequence. In the conventional speech synthesizer, no special consideration is given to the phoneme sequence including the vowel to be devoiced, that is, the pattern of the VCV-type phoneme sequence is the same as in the case of a general voiced vowel. Is applied, and the vowel part energy centroid CEGV is forcibly set to the vowel that should be unvoiced. For this reason, the vowel part energy center-of-gravity point time length DG and the phoneme duration time obtained by this treatment cannot sufficiently represent the vowel devoicing.

【０００７】このため、モーラ音素や母音無声化音韻系
列が含まれているテキストを音声合成すると、不自然な
音声の合成となってしまうという問題があった。For this reason, there has been a problem that when speech synthesis is performed on a text including a mora phoneme or a vowel unvoiced phoneme sequence, an unnatural speech is synthesized.

【０００８】そこで、本発明は上記問題点を解消するた
めに、母音部間に撥音、促音、長音といったモーラ音素
を有する音韻系列、母音の無声化を生じる音韻系列に対
して、精度のよい音韻継続時間長を付与し、自然な日本
語合成音声を生成する音声合成装置を提供することを目
的とする。In order to solve the above-mentioned problems, the present invention provides a highly accurate phonological sequence for a phonological sequence having mora phonemes between vowel parts, such as a vowel sound, a consonant sound, and a prolonged vowel, and a phonological sequence causing vowel devoicing. It is an object of the present invention to provide a speech synthesizer that gives a duration and generates natural Japanese synthesized speech.

【０００９】[0009]

【課題を解決するための手段】第１の発明に係る音声合
成装置は、テキストから母音部、子音部及びモーラ音素
を含んだ音韻記号列を生成する音韻記号列生成手段と、
前記音韻記号列を前記母音部によって音韻区画に区分す
る区画区分手段と、前記音韻区画内の前記子音部及び前
記モーラ音素の種類に応じて当該音韻区画の両端に位置
する前記各母音部のエネルギー重心点の間隔である母音
部エネルギー重心点間時間長を求める母音部エネルギー
重心点間時間長生成手段と、前記音韻区画の前記母音部
エネルギー重心点間時間長、当該音韻区画内の子音部及
び前記モーラ音素の種類に応じて当該音韻区画内の各音
韻の音韻継続時間長を決定する音韻継続時間長生成手段
とを含み、この音韻継続時間長を利用して音声合成する
ことを特徴とする。According to a first aspect of the present invention, there is provided a speech synthesizer for generating a phoneme symbol sequence including a vowel portion, a consonant portion, and a mora phoneme from a text;
Partitioning means for partitioning the phoneme symbol string into phonemic sections by the vowel parts, and energy of each of the vowel parts located at both ends of the phoneme section according to the type of the consonant part and the mora phoneme in the phoneme section Vowel part energy barycenter time length generating means for obtaining a vowel part energy barycenter time length that is an interval between barycentric points, and a vowel part energy barycenter time length between the phonological sections, a consonant part in the phoneme section, Phoneme duration generation means for determining the phoneme duration of each phoneme in the phoneme section according to the type of the mora phoneme, and performing speech synthesis using the phoneme duration. .

【００１０】本発明によれば、本願発明者が新たに発見
した「母音部エネルギー重心点ＣＥＧＶが存在する母音
から次の母音へと遷移していく際に、実現しなければな
らない母音部以外の音韻系列の発声のしやすさ・しにく
さが母音部エネルギー重心点間時間長を決定する。」と
いう原理に基づいて音韻継続時間長制御規則が構築され
る。本発明で用いる原理は、従来技術で用いた原理が
「母音間に存在する音韻は子音である。」としていた限
定をはずすものである。つまり、本発明では、母音部間
には子音だけでなく、撥音、促音及び長音といったモー
ラ音素が存在することが原理において認められ、よっ
て、子音の種類だけでなくモーラ音素の種類も考慮した
規則が生成される。すなわち、まず、音韻記号列生成手
段が、テキストから母音部、子音部及びモーラ音素に区
別された音韻記号列を生成する。そして、区画区分手段
は、一つの母音部から次の母音部までを一つの音韻区画
として定める。よって、音韻区画の始まりと終わりはそ
れぞれ母音部であって、それらの間に子音部及びモーラ
音素が挟まれ得る。母音部エネルギー重心点間時間長生
成手段は、母音部間に存在する子音のみならずモーラ音
素の種類も考慮して母音部エネルギー重心点間時間長を
生成する。また音韻継続時間長生成手段は、母音部間に
存在する子音のみならずモーラ音素の種類も考慮して前
記音韻区画内の各音韻の音韻継続時間長を決定する。こ
こで、モーラ音素の種類とは、例えばモーラ音素とその
後続子音との組み合わせに基づいた分類による。[0010] According to the present invention, the present inventor has newly discovered "when a vowel transitioning from a vowel having the vowel part energy centroid CEGV to the next vowel, other than the vowel parts which must be realized. A phonological duration control rule is constructed based on the principle that the ease and difficulty of utterance of a phonemic sequence determines the time length between vowel energy center-of-gravity points. The principle used in the present invention removes the limitation that the principle used in the prior art is "a phoneme existing between vowels is a consonant." In other words, in the present invention, it is recognized in principle that not only consonants but also mora phonemes such as repellent, consonant, and prolonged sounds exist between vowel parts. Therefore, rules that take into account not only the types of consonants but also the types of mora phonemes are recognized. Is generated. That is, first, the phoneme symbol string generation means generates a phoneme symbol string which is classified into vowel parts, consonant parts, and mora phonemes from the text. Then, the section dividing means determines one vowel section to one vowel section as one phoneme section. Thus, the beginning and end of each phoneme section are vowel parts, and a consonant part and a mora phoneme may be interposed between them. The vowel part energy centroid point length generation means generates a vowel part energy centroid point length in consideration of not only consonants present between vowel parts but also types of mora phonemes. The phoneme duration generation means determines the phoneme duration of each phoneme in the phoneme section in consideration of not only the consonants present between the vowels but also the types of mora phonemes. Here, the type of the mora phoneme is based on, for example, a classification based on a combination of the mora phoneme and its subsequent consonant.

【００１１】第２の発明に係る音声合成装置は、テキス
トから母音部、子音部及び母音無声化音韻系列を含んだ
音韻記号列を生成する音韻記号列生成手段と、前記音韻
記号列を前記母音部によって音韻区画に区分する区画区
分手段と、前記音韻区画内の前記子音部及び前記母音無
声化音韻系列の種類に応じて当該音韻区画の両端に位置
する前記各母音部のエネルギー重心点の間隔である母音
部エネルギー重心点間時間長を求める母音部エネルギー
重心点間時間長生成手段と、前記音韻区画の前記母音部
エネルギー重心点間時間長、当該音韻区画内の子音部及
び前記母音無声化音韻系列の種類に応じて当該音韻区画
内の各音韻の音韻継続時間長を決定する音韻継続時間長
生成手段とを含み、この音韻継続時間長を利用して音声
合成することを特徴とする。According to a second aspect of the present invention, there is provided a speech synthesizer for generating a phoneme symbol sequence including a vowel portion, a consonant portion, and a vowel unvoiced phoneme sequence from a text, and converting the phoneme symbol sequence into the vowel sound. Segmentation means for segmenting into phonological segments by segments, and intervals between the energy centroid points of the vowel portions located at both ends of the phonological segment according to the type of the consonant portion and the vowel unvoiced phoneme sequence in the phonological segment. Vowel part energy barycenter time length generating means for obtaining the vowel part energy barycenter time length, and the vowel part energy barycenter time length of the phoneme section, the consonant part in the phoneme section and the vowel devoicing Phoneme duration generation means for determining the phoneme duration of each phoneme in the phoneme segment according to the type of phoneme series, and performing speech synthesis using the phoneme duration. To.

【００１２】本発明によれば、無声化する母音は母音無
声化音韻系列として、母音部と区別される。そして上述
した本発明の原理に基づき、母音部間に存在する子音の
種類だけでなく、有声である母音部の間に母音無声化音
韻系列が存在する場合には、その種類も考慮される。つ
まり、本発明では、母音部間には子音だけでなく、母音
無声化音韻系列が存在することが原理において認めら
れ、よって、子音の種類だけでなく母音無声化音韻系列
の種類も考慮した音韻継続時間長制御規則が生成され
る。すなわち、まず、音韻記号列生成手段が、テキスト
から母音部、子音部及び母音無声化音韻系列が区別され
た音韻記号列を生成する。そして、区画区分手段は、音
韻区画の始まりと終わりはそれぞれ母音部であって、そ
れらの間に子音部及び母音無声化音韻系列が挟まれ得る
音韻区画を定める。音韻記号列生成手段によって母音無
声化音韻系列は一般の有声の母音部と区別され、区画区
分手段は母音無声化音韻系列中の無声化した母音に、音
韻区画の始まり又は終わりとなる母音部エネルギー重心
点を設定することがない。言い換えれば、区画区分手段
は、有声で発声される母音が存在するモーラに対して１
つの母音部エネルギー重心点を設定する。母音部エネル
ギー重心点間時間長生成手段は、母音部間に存在する子
音のみならず母音無声化音韻系列の種類も考慮して母音
部エネルギー重心点間時間長を生成する。また音韻継続
時間長生成手段は、母音部間に存在する子音のみならず
母音無声化音韻系列の種類も考慮して前記音韻区画内の
各音韻の音韻継続時間長を決定する。According to the present invention, a vowel to be unvoiced is distinguished from a vowel part as a vowel unvoiced phoneme sequence. Based on the above-described principle of the present invention, when a vowel unvoiced phoneme sequence exists between voiced vowel parts, not only the type of consonant existing between vowel parts, the type is also considered. That is, in the present invention, it is recognized in principle that not only consonants but also vowel unvoiced phoneme sequences exist between vowel parts, and thus, not only the types of consonants but also the types of vowel unvoiced phoneme sequences are considered. A duration control rule is generated. That is, first, the phoneme symbol string generation unit generates a phoneme symbol string in which vowel parts, consonant parts, and vowel unvoiced phoneme sequences are distinguished from the text. Then, the section dividing means determines a phonological section in which the beginning and the end of the phoneme section are vowel parts, respectively, and a consonant part and a vowel unvoiced phoneme sequence can be interposed therebetween. The vowel unvoiced phoneme sequence is distinguished from a general voiced vowel part by the phonological symbol string generation means, and the partitioning means is used for the unvoiced vowels in the vowel unvoiced phonological sequence, the vowel energy at the beginning or end of the phonological section. No center of gravity is set. In other words, the partitioning means is one for a mora having a vowel uttered voiced.
Vowel energy center of gravity is set. The vowel part energy center-of-gravity point time length generating means generates the vowel part energy center-of-gravity point time length in consideration of not only the consonants existing between the vowel parts but also the type of the vowel unvoiced phoneme sequence. The phoneme duration generation means determines the phoneme duration of each phoneme in the phoneme section in consideration of not only the consonants existing between the vowel parts but also the type of the vowel unvoiced phoneme sequence.

【００１３】第３の発明に係る音声合成装置は、テキス
トから母音部、子音部、モーラ音素及び母音無声化音韻
系列を含んだ音韻記号列を生成する音韻記号列生成手段
と、前記音韻記号列を前記母音部によって音韻区画に区
分する区画区分手段と、前記音韻区画内の前記子音部、
前記モーラ音素及び前記母音無声化音韻系列の種類に応
じて当該音韻区画の両端に位置する前記各母音部のエネ
ルギー重心点の間隔である母音部エネルギー重心点間時
間長を求める母音部エネルギー重心点間時間長生成手段
と、前記音韻区画の前記母音部エネルギー重心点間時間
長、当該音韻区画内の子音部、前記モーラ音素及び前記
母音無声化音韻系列の種類に応じて当該音韻区画内の各
音韻の音韻継続時間長を決定する音韻継続時間長生成手
段とを含み、この音韻継続時間長を利用して音声合成す
ることを特徴とする。According to a third aspect of the present invention, there is provided a speech synthesizer for generating a phoneme symbol sequence including a vowel portion, a consonant portion, a mora phoneme, and a vowel unvoiced phoneme sequence from a text, and the phoneme symbol sequence. Partitioning means for partitioning the vowel part into phonemic sections, the consonant part in the phoneme section,
A vowel part energy centroid for calculating a time length between vowel part energy centroids which is an interval between energy centroids of the vowel parts located at both ends of the vowel segment according to the type of the mora phoneme and the vowel unvoiced phoneme sequence. Inter-time length generation means, the time length between the vowel part energy centroids of the phoneme section, the consonant part in the phoneme section, the mora phoneme, and the type of the vowel unvoiced phoneme sequence in each of the phoneme sections. Phoneme duration generating means for determining the phoneme duration of the phoneme, and synthesizes speech using the phoneme duration.

【００１４】本発明によれば、上述した本発明の原理に
基づき、母音間に存在する子音の種類だけでなく、モー
ラ音素や母音無声化音韻系列が母音部間に存在する場合
には、これらの種類も考慮される。つまり、本発明で
は、母音部間には子音だけでなく、モーラ音素や母音無
声化音韻系列が存在することが原理において認められ、
よって、子音の種類だけでなくモーラ音素の種類及び母
音無声化音韻系列の種類も考慮した音韻継続時間長制御
規則が生成される。すなわち、まず、音韻記号列生成手
段が、テキストから母音部、子音部、モーラ音素及び母
音無声化音韻系列が区別された音韻記号列を生成する。
そして、区画区分手段は、音韻区画の始まりと終わりは
それぞれ母音部であって、それらの間に子音部、モーラ
音素及び母音無声化音韻系列が挟まれ得る音韻区画を定
める。音韻記号列生成手段によって母音無声化音韻系列
は母音部と区別されるので、区画区分手段は母音無声化
音韻系列中の無声化した母音に、音韻区画の始まり又は
終わりとなる母音部エネルギー重心点を設定することは
ない。母音部エネルギー重心点間時間長生成手段は、母
音間に存在する子音のみならずモーラ音素の種類及び母
音無声化音韻系列の種類も考慮して母音部エネルギー重
心点間時間長を生成する。また音韻継続時間長生成手段
は、母音間に存在する子音のみならずモーラ音素及び母
音無声化音韻系列の種類も考慮して前記音韻区画内の各
音韻の音韻継続時間長を決定する。According to the present invention, based on the above-described principle of the present invention, when not only the types of consonants existing between vowels but also mora phonemes and vowel unvoiced phoneme sequences exist between vowel parts, Types are also considered. That is, in the present invention, it is recognized in principle that not only consonants but also mora phonemes and vowel unvoiced phoneme sequences exist between vowel parts.
Therefore, a phoneme duration control rule is generated that takes into account not only the types of consonants but also the types of mora phonemes and the types of vowel unvoiced phoneme sequences. That is, first, the phoneme symbol string generation means generates a phoneme symbol string in which vowel parts, consonant parts, mora phonemes, and vowel unvoiced phoneme sequences are distinguished from the text.
Then, the section dividing means determines a phoneme section in which the beginning and the end of the phoneme section are vowel parts, respectively, and a consonant part, a mora phoneme, and a vowel unvoiced phoneme sequence can be interposed therebetween. Since the vowel unvoiced phoneme sequence is distinguished from the vowel part by the phonological symbol string generation means, the partitioning means adds the vowel part energy center point at the beginning or end of the vowel section to the unvoiced vowel in the vowel unvoiced phoneme sequence. Never set. The vowel part energy centroid time length generation means generates the vowel part energy centroid point time length in consideration of not only consonants present between vowels but also types of mora phonemes and types of vowel unvoiced phoneme sequences. Further, the phoneme duration generation means determines the phoneme duration of each phoneme in the phoneme section in consideration of not only consonants existing between vowels but also types of mora phonemes and vowel unvoiced phoneme sequences.

【００１５】第４の発明に係る音声合成装置は、上記第
１の発明において、前記母音部エネルギー重心点間時間
長生成手段は発話速度の逆数の一次関数により前記母音
部エネルギー重心点間時間長を定め、前記一次関数の係
数及び定数項は子音部及び前記モーラ音素の種類に応じ
て定められることを特徴とする。また、第５の発明に係
る音声合成装置は、上記第２の発明において、前記母音
部エネルギー重心点間時間長生成手段は発話速度の逆数
の一次関数により前記母音部エネルギー重心点間時間長
を定め、前記一次関数の係数及び定数項は子音部及び前
記母音無声化音韻系列の種類に応じて定められることを
特徴とする。さらに、第６の発明に係る音声合成装置
は、上記第３の発明において、前記母音部エネルギー重
心点間時間長生成手段は発話速度の逆数の一次関数によ
り前記母音部エネルギー重心点間時間長を定め、前記一
次関数の係数及び定数項は子音部、前記モーラ音素及び
前記母音無声化音韻系列の種類に応じて定められること
を特徴とする。According to a fourth aspect of the present invention, in the speech synthesizer according to the first aspect, the means for generating the time length between the vowel part energy centroids is a linear function of a reciprocal of the speech speed. And the coefficient and constant term of the linear function are determined according to the type of the consonant part and the mora phoneme. In the speech synthesis apparatus according to a fifth aspect, in the second aspect, the vowel part energy centroid point time length generation means generates the vowel part energy centroid point time length by a linear function of a reciprocal of a speech rate. The coefficient and the constant term of the linear function are determined according to the type of the consonant part and the vowel unvoiced phoneme sequence. Further, in the speech synthesis apparatus according to a sixth aspect of the present invention, in the third aspect, the vowel part energy centroid point time length generating means calculates the vowel part energy centroid point time length by a linear function of a reciprocal of a speech rate. The coefficient and constant term of the linear function are determined according to the type of the consonant part, the mora phoneme, and the vowel unvoiced phoneme sequence.

【００１６】これら本発明によれば、母音部エネルギー
重心点間時間長生成手段は、音韻区画の母音部に挟まれ
た音韻が何であるかに拘らず、発話速度の逆数に応じた
母音部エネルギー重心点間時間長を生成する。しかも、
この母音部エネルギー重心点間時間長は発話速度の逆数
の一次関数で表され、発話速度の逆数の比例係数と定数
項は、音韻区画の有声の母音部で挟まれた音韻の種類ご
とに定められる。この音韻の種類とは、子音部の種類に
加えて、第４の発明に係る音声合成装置においてはモー
ラ音素の種類、第５の発明に係る音声合成装置において
は母音無声化音韻系列、第６の発明に係る音声合成装置
においてはモーラ音素の種類及び母音無声化音韻系列を
いう。これにより、母音部エネルギー重心点間時間長が
定量的かつ精度良く定められ、自然な日本語合成音声が
実現される。According to the present invention, the vowel part energy center-of-gravity point time length generating means is capable of generating the vowel part energy corresponding to the reciprocal of the speech speed, regardless of the phoneme sandwiched between the vowel parts of the phoneme section. Generate the time length between the centers of gravity. Moreover,
The time length between the vowel part energy centroid points is represented by a linear function of the reciprocal of the speech rate, and the proportional coefficient and the constant term of the reciprocal of the speech rate are determined for each type of phoneme sandwiched between the voiced vowel parts of the phoneme section. Can be The type of the phoneme includes, in addition to the type of the consonant part, the type of the mora phoneme in the speech synthesis device according to the fourth invention, the vowel unvoiced phoneme sequence in the speech synthesis device according to the fifth invention, and the sixth type. In the speech synthesizer according to the invention, the type of mora phoneme and the vowel unvoiced phoneme sequence are referred to. Thereby, the time length between the vowel part energy centroids is quantitatively and accurately determined, and a natural Japanese synthesized speech is realized.

【００１７】[0017]

【発明の実施の形態】以下、本発明の実施形態を図面を
参照して説明する。図１は本発明の実施形態に係る日本
語音声合成装置のシステム構成を示すブロック図であ
る。この日本語音声合成システムには、音声合成しよう
とする合成文章が端末２のキーボード等から入力され
る。テキスト解析部４は、この合成文章の日本語文とし
ての構造等を解析し、音声合成処理に必要なアクセント
の情報、ポーズ、母音の無声化などといった発音情報を
加えた音韻記号列を生成し、音韻継続時間長生成部６へ
出力する。音韻継続時間長生成部６は後述するリズム規
則８を用いて音韻記号列に音韻継続時間長を付与する。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a system configuration of a Japanese speech synthesizer according to an embodiment of the present invention. In this Japanese speech synthesis system, a synthesized sentence to be synthesized is input from a keyboard or the like of the terminal 2. The text analysis unit 4 analyzes the structure of the synthesized sentence as a Japanese sentence, and generates a phoneme symbol string to which pronunciation information such as accent information, pause, and vowel devoicing necessary for speech synthesis processing is added. Output to the phoneme duration time generation unit 6. The phoneme duration generation unit 6 assigns a phoneme duration to a phoneme symbol string using a rhythm rule 8 described later.

【００１８】この音韻継続時間長を付与された音韻記号
列は、音源振幅パタン生成部１０、ピッチパタン生成部
１２及びスペクトルパタン生成部１４に入力される。こ
れら音源振幅パタン生成部１０、ピッチパタン生成部１
２及びスペクトルパタン生成部１４は、それぞれ音声の
３要素である強度、基本周波数の高さ及び音色を決定す
るものである。つまり、音源振幅パタン生成部１０は、
音韻継続時間長生成部６のリズム規則８により与えられ
る母音部エネルギー重心点間時間長ＤG及び各音韻継続
時間長を維持しつつ、音声のパワー包絡をパワー規則１
６に基づいて決定する。また、ピッチパタン生成部１２
は、韻律制御規則１８から各アクセント句ごとに点ピッ
チパタンを決め、それらを補間して音韻継続時間長生成
部６のリズム規則８により与えられる母音部エネルギー
重心点間時間長ＤG及び各音韻継続時間長に合わせて連
続ピッチパタンを生成する。スペクトルパタン生成部１
４は、音韻性向上規則２０、音韻結合規則２２に基づい
て、母音・子音といった音韻の種類を決め、音韻継続時
間長生成部６のリズム規則８により与えられる母音部エ
ネルギー重心点間時間長ＤG及び各音韻継続時間長に合
わせて各音韻の周波数スペクトルの包絡パタンを結合
し、スペクトルパタンを生成する。The phoneme symbol string to which the phoneme duration is added is input to a sound source amplitude pattern generation unit 10, a pitch pattern generation unit 12, and a spectrum pattern generation unit 14. The sound source amplitude pattern generator 10 and the pitch pattern generator 1
2 and the spectral pattern generation unit 14 determine the intensity, the height of the fundamental frequency, and the timbre, which are the three elements of the voice, respectively. That is, the sound source amplitude pattern generation unit 10
While maintaining the vowel portion energy center-of-gravity point time length DG and each phoneme duration time length given by the rhythm rule 8 of the phoneme duration time generation unit 6, the power envelope of the speech is changed according to the power rule 1.
6 is determined. The pitch pattern generation unit 12
Determines the point pitch patterns for each accent phrase from the prosody control rules 18, interpolates them, interpolates them, and gives the vowel part energy centroid point time length DG and each phoneme continuation given by the rhythm rule 8 of the phoneme duration time generation unit 6. A continuous pitch pattern is generated according to the time length. Spectrum pattern generator 1
4 determines the type of phoneme such as a vowel or a consonant based on the phonological improvement rule 20 and the phoneme combination rule 22, and determines the vowel part energy barycenter time DG given by the rhythm rule 8 of the phoneme duration generation unit 6. And the envelope pattern of the frequency spectrum of each phoneme is combined in accordance with the duration of each phoneme to generate a spectrum pattern.

【００１９】音源生成部２４は、音源振幅パタン生成部
１０から出力されたパワーパタンとピッチパタン生成部
１２から出力された連続ピッチパタンをもとに音源情報
を生成する。音声合成器２６は、スペクトルパタン生成
部１４から入力されるスペクトルパタンによって、音源
生成部２４からの音源情報を変調して音色を付加し合成
音声を生成する。The sound source generator 24 generates sound source information based on the power pattern output from the sound source amplitude pattern generator 10 and the continuous pitch pattern output from the pitch pattern generator 12. The speech synthesizer 26 modulates the sound source information from the sound source generation unit 24 according to the spectrum pattern input from the spectrum pattern generation unit 14 and adds a tone to generate a synthesized speech.

【００２０】図２は図１の音韻継続時間長生成部６の構
成を示すブロック図である。音韻継続時間長生成部６に
おいては、まず区画区分部４０が、テキスト解析部４に
よって生成された音韻記号列を、“／母音部〜母音部以
外の音韻〜母音部／”という音韻区画単位に区分する。
母音部エネルギー重心点間時間長設定部４２は、母音部
エネルギー重心点間時間長生成手段であり、前記各音韻
区画での母音部以外の音韻系列の種類と発話速度ＳＲ
（Speech Rate）とをパラメータとして格納されている
リズム規則（Ａ）４４から、望みの発話速度における母
音部エネルギー重心点間時間長ＤGを算出し、各区画に
設定する。各音韻継続時間長設定部４６は音韻継続時間
長生成手段であり、母音部エネルギー重心点間時間長設
定部４２で得られたＤGと各音韻区画の母音部以外の音
韻系列の種類とをパラメータとして格納されているリズ
ム規則（Ｂ）４８から、各音韻区画に含まれる各音韻の
継続時間長を決定し、その結果を上記音源振幅パタン生
成部１０、ピッチパタン生成部１２及びスぺクトルパタ
ン生成部１４に出力する。FIG. 2 is a block diagram showing the structure of the phoneme duration generating unit 6 shown in FIG. In the phonological duration generation unit 6, first, the partitioning unit 40 divides the phoneme symbol string generated by the text analysis unit 4 into phonemic partition units of "/ vowel part-non-vowel part-vowel part /". Classify.
The vowel part energy barycenter time length setting unit 42 is a vowel part energy barycenter time length generation unit, and includes a type of a phoneme sequence other than a vowel part and a speech rate SR in each of the phoneme segments.
(Speech Rate) and the rhythm rule (A) 44 stored as a parameter, a vowel portion energy center-of-gravity point time length DG at a desired utterance speed is calculated and set in each section. Each phoneme duration setting unit 46 is a phoneme duration generation unit, and converts the DG obtained by the vowel part energy centroid time length setting unit 42 and the type of the phoneme sequence other than the vowel part of each phoneme segment into parameters. The duration of each phoneme included in each phoneme section is determined from the rhythm rule (B) 48 stored as the following formula, and the result is used as the sound source amplitude pattern generation unit 10, the pitch pattern generation unit 12, and the spectrum pattern generation. Output to the unit 14.

【００２１】次に、本発明の特徴部分である音韻継続時
間長生成部６において各音韻継続時間長を生成するため
の区画区分部４０、リズム規則（Ａ）４４及びリズム規
則（Ｂ）４８について以下に詳細を説明する。Next, the division section 40, the rhythm rule (A) 44, and the rhythm rule (B) 48 for generating each phoneme duration in the phoneme duration generator 6 which is a characteristic part of the present invention. The details will be described below.

【００２２】区画区分部４０は、テキスト解析部４から
出力される音韻記号列に基づいて、明らかに母音が存在
するモーラに対してそれぞれ母音部エネルギー重心点Ｃ
ＥＧＶを１つずつ設定し、任意の母音部エネルギー重心
点ＣＥＧＶから次の母音部エネルギー重心点ＣＥＧＶま
での区間を一つの音韻区画として決定する。このように
して区分された音韻区画の内部は、明らかに母音である
部分と母音以外の各音韻部分とにさらに区別されラベリ
ングされる。この音韻区画内での音韻の区分に関して以
下に述べる。Based on the phoneme symbol string output from the text analysis unit 4, the partitioning unit 40 determines a vowel part energy centroid C
EGVs are set one by one, and a section from an arbitrary vowel part energy centroid CEGV to the next vowel part energy centroid CEGV is determined as one phoneme section. The inside of the phoneme section thus divided is clearly distinguished into a part that is clearly a vowel and each phoneme part other than the vowel and is labeled. The classification of phonemes within this phoneme section will be described below.

【００２３】母音部エネルギー重心点間時間長ＤGを用
いた音韻継統時間長制御規則は、「発声器官の物理的制
約が時間長を定める。」という仮説に基づいている。従
来の規則では、この仮説を「母音部エネルギー重心点Ｃ
ＥＧＶ間に存在する子音の発声のしやすさ・しにくさが
母音部エネルギー重心点間時間長ＤGを決定する。」と
し、これに基づいて規則を構築したが、その場合、既に
述べたように、モーラ音素及び母音無声化音韻系列では
音韻継続時間長の誤差が大きくなることがあるという問
題があった。そこで、本発明では、上の仮説を「それぞ
れ母音部エネルギー重心点ＣＥＧＶを有する母音間を遷
移する際に実現しなければならない母音部以外の音韻系
列の発声のしやすさ・しにくさが母音部エネルギー重心
点間時間長ＤGを決定する」と具体化する。この具体化
に従うと、母音部間に存在する音韻の実現において、単
なる子音という属性のほか、様々な要因を取り込む余地
が生まれる。すなわち、本発明は、従来においては母音
部間に存在する音韻は子音であると一足飛びに決めつけ
ていた立脚点を見直し、そこから一歩元に戻って、他の
より正しい方向を目指そうとするものである。The phonological succession time length control rule using the vowel part energy center-of-gravity point time length DG is based on the hypothesis that "physical constraints on the vocal organs determine the time length." According to conventional rules, this hypothesis is referred to as "vowel part energy centroid C
The ease and difficulty of uttering a consonant existing between EGVs determines the vowel part energy center-of-gravity point time length DG. And a rule was constructed based on this. In this case, however, as described above, there was a problem that the error in the phoneme duration of the mora phoneme and the vowel unvoiced phoneme sequence might be large. Therefore, in the present invention, the above hypothesis is described as “easiness of vocalization of phonological sequences other than vowel parts, which is required to be realized when transitioning between vowels having vowel part energy centroids CEGV, The time length DG between the local energy centroids is determined. " According to this embodiment, in realizing the phonemes existing between the vowel parts, there is room for taking in various factors in addition to the attribute of a mere consonant. In other words, the present invention is to review the stance point which conventionally determined that the phoneme existing between the vowel parts is a consonant one step at a time, return to one step from there, and aim at another more correct direction. .

【００２４】例えば、従来は、一般にＶＣＶ形音韻系列
は／母音〜子音〜母音／とラベリングされ、母音部以外
の音韻は１種類の子音から成るとされていた。しかし、
上記仮説の具体化においては、１種類の子音であって
も、その実現には様々な状態遷移が伴うとする。具体例
として、母音部間の破裂音“／ｋ／”の実現には、必ず
“母音部からのわたり＋（無音部＋破裂部）＋次の母音
部へのわたり”という状態の流れを経ることが必要であ
る。同様に“／ｋｙ／”は、“母音部からのわたり＋
（無音部＋破裂部＋拗音）＋次の母音部へのわたり”と
いう一連の状態を経る必要がある。For example, conventionally, it has been generally assumed that a VCV-type phoneme sequence is labeled as / vowel to consonant to vowel /, and phonemes other than the vowel portion consist of one type of consonant. But,
In the realization of the above hypothesis, it is assumed that various state transitions accompany the realization of one type of consonant. As a specific example, the realization of the plosive sound “/ k /” between vowels always involves a state of “crossing from a vowel + (silence + bursting) + crossing to the next vowel”. It is necessary. Similarly, “/ ky /” means “over vowel part +
It is necessary to go through a series of states of “(silent part + burst part + resonant sound) + over the next vowel part”.

【００２５】また、ＶＣＶ形音韻系列以外の音韻系列で
あっても、母音部以外の音韻系列部分は、ある母音部か
ら次の母音部へ遷移する際の様々な状態遷移の集合体で
あると考えることができる。Further, even in a phoneme sequence other than the VCV type phoneme sequence, the phoneme sequence portion other than the vowel portion is an aggregate of various state transitions when transitioning from one vowel portion to the next vowel portion. You can think.

【００２６】例えば、一般に撥音として扱われる音韻
は、後続の子音と同質の調音点をとる鼻音として実現さ
れるという共通点を有した様々な音をまとめた集合体で
ある。すなわち撥音を含む音韻系列は“母音部からのわ
たり＋後続子音で決まる鼻音状態＋次の子音＋次の母音
部へのわたり”という一連の状態を成すと考えられる。
促音を含む音韻系列も後続の子音で多様な変化を伴うも
のであり、“母音部からのわたり＋後続子音で決まる促
音状態＋次の子音＋次の母音部へのわたり”という状態
の集合から成り立つと考えられる。また、母音無声化音
韻系列は、“母音部からのわたり＋無声化した状態＋次
の子音＋次の母音部へのわたり”という状態の集合から
成り立つと考えられる。さらに、異なる母音が連続する
音韻系列の場合も、その遷移部について、“母音部から
のわたり＋次の母音部へのわたり”と考えられる。For example, a phoneme generally treated as a sound repellent is an aggregate of various sounds having a common feature that it is realized as a nasal having the same articulation point as the subsequent consonant. In other words, it is considered that the phoneme sequence including the repellent sound forms a series of states of “transition from vowel part + nasal state determined by succeeding consonant + next consonant + next vowel part”.
A phonological sequence including a consonant also has various changes in the following consonant, and is derived from a set of states “transition from vowel part + consonant state determined by subsequent consonant + next consonant + next vowel part”. It is considered to hold. Also, a vowel unvoiced phoneme sequence is considered to be composed of a set of states "over the vowel part + unvoiced state + next consonant + over the next vowel part". Further, also in the case of a phoneme sequence in which different vowels are continuous, it is considered that the transition portion is “transition from the vowel portion + transition to the next vowel portion”.

【００２７】以上の考察をもとに、区画区分部４０から
出力される音韻区画の例を図３、図４に示した。図３は
一般のＶＣＶ形音韻系列の音韻区画の模式図である。図
は水平右向きに時間軸をとって音韻系列を示したもの
で、楕円形状はパワー包絡を模式的に示したものであ
る。この場合の音韻区画はｉ番目のモーラの母音部のエ
ネルギー重心点ＣＥＧＶから、（ｉ＋１）番目のモーラ
の母音部のエネルギー重心点ＣＥＧＶまでであり、ｉ番
目、（ｉ＋１）番目のモーラの母音部の間に、（ｉ＋
１）番目のモーラの子音部が存在する。図４は、撥音、
促音及び長音といったモーラ音素や母音無声化音韻系列
の音韻区画の模式図であり、図３と同様の表現方法によ
って示されている。この場合、モーラ音素や母音の無声
化を含んだモーラがそれぞれ１モーラを構成するので、
音韻区画はｉ番目のモーラの母音部のエネルギー重心点
ＣＥＧＶから、（ｉ＋２）番目のモーラの母音部のエネ
ルギー重心点ＣＥＧＶまでである。つまり、撥音や促
音、長音、母音の無声化を生じた音韻系列の場合には、
次に有声で発声される母音と判定されるモーラまでが１
つの音韻区画として区分される。そしてｉ番目、（ｉ＋
２）番目のモーラ母音部の間に、（ｉ＋１）番目のモー
ラであるモーラ音素や母音の無声化を含んだモーラ、
（ｉ＋２）番目のモーラの子音部及びそれらの間の遷移
部が存在することになる。Based on the above considerations, examples of phonological sections output from the section dividing section 40 are shown in FIGS. FIG. 3 is a schematic diagram of a phoneme section of a general VCV phoneme sequence. The figure shows a phoneme sequence with the time axis taken horizontally rightward, and the elliptical shape schematically shows the power envelope. In this case, the phoneme section extends from the energy center of gravity CEGV of the vowel part of the i-th mora to the energy center of gravity CEGV of the vowel part of the (i + 1) -th mora, and the vowel part of the i-th and (i + 1) -th mora. Between (i +
1) There is a consonant part of the mora. FIG. 4 shows sound repellency,
FIG. 4 is a schematic diagram of a phonetic section of a mora phoneme such as a prompt sound and a long sound and a vowel unvoiced phoneme sequence, which is represented by the same expression method as in FIG. 3. In this case, each mora including mora phonemes and vowel devoicing constitutes one mora.
The phonemic section extends from the energy centroid CEGV of the vowel part of the i-th mora to the energy centroid CEGV of the vowel part of the (i + 2) -th mora. In other words, in the case of a phonological sequence in which devoicing, vowels, long sounds, and vowels have been devoiced,
Next, up to the mora determined to be a voiced vowel is 1
Are divided into two phoneme sections. And the ith, (i +
2) Between the vowel parts of the mora, the mora including the mora phonemes and vowels that are the (i + 1) th mora,
There will be a consonant part of the (i + 2) th mora and a transition part between them.

【００２８】また区画区分部４０は、母音部の取り扱い
に関し、上述した「明らかに母音が存在するモーラに対
してそれぞれ母音部エネルギー重心点ＣＥＧＶを１つず
つ設定する。」という母音部エネルギー重心点ＣＥＧＶ
の存在場所についての制限とともに、「母音部エネルギ
ー重心点ＣＥＧＶは、母音と明らかに判定できる部分を
用いて算出する。」という母音部エネルギー重心点ＣＥ
ＧＶの算出に用いる母音部の扱いについての制限を課
す。この措置により、長音以外の母音の連続の場合に、
母音から次の母音へと遷移する区間が母音部エネルギー
重心点ＣＥＧＶの算出からはずされ、母音部エネルギー
重心点ＣＥＧＶがより安定して求められることになる。
その結果、母音部エネルギー重心点ＣＥＧＶの分析結果
を用いて導出される以下に説明する規則の誤差が減少す
る。以上、区画区分部４０について説明した。The vowel division section 40 sets the vowel part energy centroid point "sets one vowel part energy centroid CEGV for each mora in which a vowel clearly exists" regarding the handling of vowel parts. CEGV
Along with the restriction on the location of the vowel part energy centroid point CE, "The vowel part energy centroid CEGV is calculated using a part that can be clearly determined to be a vowel."
The vowel part used for calculating the GV is restricted. With this measure, in the case of a series of non-long vowels,
The transition from the vowel to the next vowel is excluded from the calculation of the vowel part energy centroid CEGV, and the vowel part energy centroid CEGV can be obtained more stably.
As a result, the error of the rules described below, which is derived using the analysis result of the vowel part energy centroid CEGV, is reduced. As described above, the partition section 40 has been described.

【００２９】次に、リズム規則（Ａ）４４と母音部エネ
ルギー重心点間時間長設定部４２について説明する。発
話速度ＳＲを一秒間に発声するモーラ数として定義す
る。ＶＣＶ形音韻系列の場合、母音部エネルギー重心点
間時間長ＤG（秒）は、発話速度ＳＲ（モーラ数／秒）
の逆数の一次関数である以下の近似式で表すことができ
る（特開平６−２６６３９１号公報）。ここで、係数Ａ
cons及び定数項Ｂconsは、当該母音部間に存在する子音
部の種類ごとに定められる。Next, the rhythm rule (A) 44 and the vowel part energy barycenter time length setting unit 42 will be described. The utterance speed SR is defined as the number of mora uttered per second. In the case of the VCV-type phonological sequence, the time length DG (second) between the vowel part energy centroids is the utterance speed SR (number of mora / second).
Can be expressed by the following approximate expression which is a linear function of a reciprocal of the following (Japanese Patent Laid-Open No. 6-266391). Where the coefficient A
The cons and the constant term Bcons are determined for each type of consonant part existing between the vowel parts.

【００３０】[0030]

【数１】実音声の分析結果を用いて、リズム規則（Ａ）を順次説
明する。まず、一般のＶＣＶ形音韻系列のＳＲとＤGと
の関係の分析には、７モーラの無意味単語を含む「それ
は、“こ○○めんかい”です」という一文を用いた。無
意味単語部中の○部分には、それぞれｂＸ₁とＣＸ₂が入
る。ここでＸ₁、Ｘ₂は任意の母音、Ｃは任意の子音を示
す。このようにして、無意味単語中の第２と第３モーラ
を様々な音韻系列に変化させ、各々の場合における母音
部エネルギー重心点間時間長ＤGを分析した。なお、発
話速度ＳＲの違いについては、おおよその速度で高速／
普通の速度／低速の３段階に分け、各１０回ずつ発声し
た。発声者は、女性話者一名である。図５は、ＶＣＶ形
音韻系列のＳＲとＤGとの関係についての分析結果の一
例を示すグラフである。このグラフには、子音Ｃを“ｔ
ｙ”とし、ｂＸ₁、ＣＸ₂の各母音Ｘ₁、Ｘ₂について３通
りの組み合わせが示され、各組み合わせについての測定
点と、各組み合わせについての関係式（１）による近似
曲線とが同図中に示されている。この図から、ＳＲとＤ
Gとの関係は、ＶＣＶ形音韻系列の両端の母音の組み合
わせにはほとんど依存せず、上で述べた子音部の種類ご
とに定められる係数Ａcons及び定数項Ｂconsが子音部を
挟む母音部の種類には依存しないことが分かる。つま
り、子音部の種類ごとに定められる係数Ａcons及び定数
項Ｂconsを用いた関数式（１）で、ＳＲとＤGの関係が
近似できることが分かる。(Equation 1) The rhythm rule (A) will be sequentially described using the analysis result of the actual voice. First, in the analysis of the relationship between SR and DG of a general VCV-type phonological sequence, a sentence containing a 7-mora nonsense word, "It is" This is a OO Menkai "" was used. BX ₁ and CX ₂ are respectively entered in the o portions in the meaningless word portion. Here, X ₁ and X ₂ indicate arbitrary vowels, and C indicates an arbitrary consonant. In this way, the second and third mora in the meaningless word were changed into various phoneme sequences, and the vowel part energy centroid time length DG in each case was analyzed. Note that the difference in the utterance speed SR is approximately
They uttered 10 times each in three stages of normal speed / low speed. The speaker is a female speaker. FIG. 5 is a graph showing an example of an analysis result on the relationship between SR and DG of the VCV phoneme sequence. In this graph, the consonant C is represented by “t”.
y ”, three combinations of vowels X ₁ and X ₂ of bX ₁ and CX ₂ are shown, and the measurement points for each combination and the approximate curve according to the relational expression (1) for each combination are shown in FIG. From this figure, it can be seen that SR and D
The relationship with G hardly depends on the combination of the vowels at both ends of the VCV-type phonological sequence, and the coefficient Acons and the constant term Bcons determined for each type of consonant described above are the types of vowels sandwiching the consonant. It does not depend on. That is, it can be seen that the relationship between SR and DG can be approximated by the function expression (1) using the coefficient Acons and the constant term Bcons determined for each type of consonant part.

【００３１】以上の枠組みを、区画区分部４０から出力
される図４に示した新しい各区画にも拡張する。すなわ
ち、撥音や促音、長音、母音が無声化するような音韻系
列が母音部間に挟まれる場合であり、係数Ａcons、定数
項Ｂconsは当該母音部間に存在する母音部以外の音韻系
列の種類により決まる係数とする。The above-mentioned framework is extended to each new section shown in FIG. That is, a phonological sequence in which vowels, consonants, long sounds, and vowels are unvoiced is interposed between vowel parts. The coefficient Acons and the constant term Bcons are the types of phonological sequences other than the vowel parts existing between the vowel parts. The coefficient is determined by

【００３２】撥音を含む音韻系列のＳＲとＤGの関係に
は、８モーラの無意味単語を含む「それは、“こ○ん○
めんかい”です」という一文を用いた。無意味単語部の
○部分には、それぞれｂＸ₁とＣＸ₂が入る。ここで
Ｘ₁、Ｘ₂は任意の母音、Ｃは任意の子音を示す。このよ
うにして、無意味単語中の第２〜４モーラ間を様々な音
韻系列に変化させ、各々の場合における母音部エネルギ
ー重心点間時間長ＤGを分析した。なお、発声者、速
度、回数に関する条件については、上記ＶＣＶ形音韻系
列の場合と同じである。図６は、撥音を含む音韻系列の
ＳＲとＤGとの関係についての分析結果の一例を示すグ
ラフである。このグラフには、第２モーラ“ｂＸ₁”の
母音Ｘ₁を“ａ”とし、第３モーラ“ＣＸ₂”を“ｂａ”
とした無意味単語部“こばんばめんかい”についての測
定点と、これに対する関係式（１）による近似曲線とが
同図中に示されている。この近似曲線を特徴づける
（１）式の係数Ａcons、定数項Ｂconsは、ＤGを定義す
る両端の母音部の種類にはあまり依存しないという測定
結果が得られた。一方、撥音を示す音韻“Ｎ”に続く子
音Ｃを変えると、係数Ａcons、定数項Ｂconsに有為な違
いが現れるという測定結果が得られた。この音韻“Ｎ
Ｃ”のＣの種類によるＤGの変化は、ＶＣＶ形音韻系列
について得られるＣの種類による変化とは別個のもので
ある。つまり撥音を含む音韻系列についての今回の測定
結果であるＤGは、単純にＶＣＶ形音韻系列の各Ｃにつ
いてのＤGにそれぞれ一定時間加えることによっては近
似できない。The relationship between the SR and DG of the phoneme sequence including the sound repellency includes an insignificant word of 8 mora.
"I'm sorry." BX ₁ and CX ₂ are respectively entered in the ○ portions of the meaningless word portion. Here, X ₁ and X ₂ indicate arbitrary vowels, and C indicates an arbitrary consonant. In this way, the second to fourth moras in the meaningless word were changed into various phoneme sequences, and the time length DG between the vowel energy centroids in each case was analyzed. The conditions regarding the speaker, the speed, and the number of times are the same as those in the case of the VCV-type phoneme series. FIG. 6 is a graph showing an example of an analysis result on the relationship between SR and DG of a phoneme sequence including a sound repellent. In this graph, the vowel X ₁ of the second mora “bX ₁ ” is “a”, and the third mora “CX ₂ ” is “ba”.
The measurement points for the meaningless word portion "Kobaban Menkai" and the approximate curve based on the relational expression (1) are shown in FIG. A measurement result was obtained that the coefficient Acons and the constant term Bcons of the equation (1) characterizing this approximated curve did not depend much on the types of the vowel parts at both ends defining DG. On the other hand, a measurement result was obtained in which, when the consonant C following the phoneme “N” indicating the sound repellency was changed, significant differences appeared in the coefficient Acons and the constant term Bcons. This phoneme “N”
The change in DG according to the type of C in C "is different from the change according to the type of C obtained for the VCV-type phonological sequence. Cannot be approximated by adding to the DG for each C of the VCV-type phoneme sequence for a certain period of time.

【００３３】促音を含む音韻系列のＳＲとＤGの関係に
は、８モーラの無意味単語を含む「それは、“こ○っ○
めんかい”です」という一文を用いた。無意味単語部の
○部分には、それぞれｂＸ₁とＣＸ₂が入る。ここで
Ｘ₁、Ｘ₂は任意の母音、Ｃは任意の子音を示す。このよ
うにして、無意味単語中の第２〜４モーラ間を様々な音
韻系列に変化させ、各々の場合における母音部エネルギ
ー重心点間時間長ＤGを分析した。なお、発声者、速
度、回数に関する条件については、上記ＶＣＶ形音韻系
列の場合と同じである。図７は、促音を含む音韻系列の
ＳＲとＤGとの関係についての分析結果の一例を示すグ
ラフである。このグラフには、第２モーラ“ｂＸ₁”の
母音Ｘ₁を“ａ”とし、第３モーラ“ＣＸ₂”を“ｋａ”
とした無意味単語部“こばっかめんかい”についての測
定点と、これに対する関係式（１）による近似曲線とが
同図中に示されている。この近似曲線を特徴づける
（１）式の係数Ａcons、定数項Ｂconsは、ＤGを定義す
る両端の母音部の種類にはあまり依存しないという測定
結果が得られた。一方、促音を示す音韻“ＣＣ”（グラ
フに示す例では“ｋｋ”）の子音Ｃの種類を変えると、
係数Ａcons、定数項Ｂconsに有為な違いが現れるという
測定結果が得られた。この音韻“ＣＣ”のＣの種類によ
るＤGの変化は、ＶＣＶ形音韻系列について得られるＣ
の種類による変化とは別個のものである。つまり促音を
含む音韻系列についての今回の測定結果であるＤGは、
ＶＣＶ形音韻系列の各ＣについてのＤGにそれぞれ一定
時間加えるという従来の方法では近似できない。The relationship between the SR and DG of the phoneme sequence including the prompting sound includes an insignificant word of 8 mora,
"I'm sorry." BX ₁ and CX ₂ are respectively entered in the ○ portions of the meaningless word portion. Here, X ₁ and X ₂ indicate arbitrary vowels, and C indicates an arbitrary consonant. In this way, the second to fourth moras in the meaningless word were changed into various phoneme sequences, and the time length DG between the vowel energy centroids in each case was analyzed. The conditions regarding the speaker, the speed, and the number of times are the same as those in the case of the VCV-type phoneme series. FIG. 7 is a graph showing an example of an analysis result on a relationship between SR and DG of a phoneme sequence including a prompt sound. In this graph, the vowel X ₁ of the second mora “bX ₁ ” is “a”, and the third mora “CX ₂ ” is “ka”.
The measurement points for the meaningless word portion “Kobakamenkai” and the approximate curve based on the relational expression (1) are shown in FIG. A measurement result was obtained that the coefficient Acons and the constant term Bcons of the equation (1) characterizing this approximated curve did not depend much on the types of the vowel parts at both ends defining DG. On the other hand, when the type of the consonant C of the phoneme "CC"("kk" in the example shown in the graph) indicating the prompting sound is changed,
The measurement results showed that significant differences appeared in the coefficient Acons and the constant term Bcons. The change in DG according to the type of C in the phoneme “CC” is the CV obtained for the VCV-type phoneme sequence.
Is different from the change due to the type of In other words, DG, which is the result of this measurement of the phoneme sequence including the prompting sound,
It cannot be approximated by the conventional method of adding to the DG for each C of the VCV-type phoneme sequence for a certain period of time.

【００３４】母音無声化音韻系列のＳＲとＤGの関係に
は、８モーラの無意味単語を含む「それは、“こ○○○
めんかい”です」という一文を用いた。無意味単語部の
○部分には、それぞれｂＸ₁とＣ₁Ｘ₂とＣ₂Ｘ₃が入る。
ここでＸ₁、Ｘ₃は任意の母音、Ｘ₂は“／ｉ／”又は
“／ｕ／”、そしてＣ₁、Ｃ₂は、間に挟まれた母音Ｘ₂
が無声化を起こす任意の子音の組を示す。このようにし
て、無意味単語中の第３モーラの母音Ｘ₂に無声化を生
じるような様々な音韻系列を作成し、第２〜４モーラ間
の母音部エネルギー重心点間時間長ＤGを分析した。な
お、発声者、速度、回数に関する条件については、上記
ＶＣＶ形音韻系列の場合と同じである。図８は、母音無
声化音韻系列のＳＲとＤGの関係についての分析結果の
一例を示すグラフである。このグラフには、第２モーラ
“ｂＸ₁”の母音Ｘ₁を“ａ”とし、第３モーラ“Ｃ
₁Ｘ₂”を“ｋｉ”、第４モーラ“Ｃ₂Ｘ₃”を“ｃｈａ”
とした無意味単語部“こばきちゃめんかい”についての
測定点と、これに対する関係式（１）による近似曲線と
が同図中に示されている。この近似曲線を特徴づける
（１）式の係数Ａcons、定数項Ｂconsは、ＤGを定義す
る両端の母音部の種類にはあまり依存しないという測定
結果が得られた。一方、母音無声化音韻系列（グラフに
示す例では“ｋｉｃｈ”）の種類を変えると、係数Ａco
ns、定数項Ｂconsに有為な違いが現れるという測定結果
が得られた。The relationship between SR and DG of the vowel unvoiced phoneme sequence includes an insignificant word of 8 mora,
"I'm sorry." BX ₁ , C ₁ X _2, and C ₂ X ₃ are respectively entered in the ○ portion of the meaningless word portion.
Here, X ₁ and X ₃ are arbitrary vowels, X ₂ is “/ i /” or “/ u /”, and C ₁ and C ₂ are vowels X ₂ sandwiched therebetween.
Indicates an arbitrary set of consonants that causes devoicing. Thus, to create a variety of phoneme series such as occurs silent into vowels X ₂ of the third mora in meaningless words, analyzing the vowel portion energy centroid between duration DG between the second to fourth mora did. The conditions regarding the speaker, the speed, and the number of times are the same as those in the case of the VCV-type phoneme series. FIG. 8 is a graph showing an example of an analysis result on the relationship between SR and DG of a vowel unvoiced phoneme sequence. In this graph, the vowel X ₁ of the second mora “bX ₁ ” is set to “a”, and the third mora “C
₁ X ₂ "to" ki ", the fourth mora" C ₂ X ₃ "a" cha "
The measurement points for the meaningless word portion “Kobaki Chamenkai” and the approximate curve based on the relational expression (1) are shown in FIG. A measurement result was obtained that the coefficient Acons and the constant term Bcons of the equation (1) characterizing this approximated curve did not depend much on the types of the vowel parts at both ends defining DG. On the other hand, when the type of the vowel unvoiced phoneme sequence (in the example shown in the graph, “kich”) is changed, the coefficient Aco
The measurement result that significant difference appears in ns and constant term Bcons was obtained.

【００３５】その他の音韻系列についての図示は省略す
るが、本リズム規則（Ａ）では、上記音韻系列で述べた
ように、母音間に存在する母音部以外の音韻系列の種類
の数だけ係数Ａcons、定数項Ｂconsの組が用意される。
なお、本リズム規則（Ａ）では、関係式（１）を近似式
として用いているが、特にこの関係式に限らず、特性が
示せる近似式であれば、同様に用いることができる。ま
た、このリズム規則（Ａ）では、合成音声を作成すると
きの目的に応じて母音間に存在する母音部以外の音韻系
列の種類をグルーピングすることにより、用いる近似式
を簡略化できる。以上、リズム規則（Ａ）４４と母音部
エネルギー重心点間時間長設定部４２について説明し
た。Although illustration of other phonological sequences is omitted, in the present rhythm rule (A), as described in the above phonological sequence, the coefficient Acons is equal to the number of types of phonological sequences other than vowel parts existing between vowels. , A set of constant terms Bcons is prepared.
In the present rhythm rule (A), the relational expression (1) is used as an approximate expression. However, the present invention is not limited to this relational expression, and any other approximate expression that can exhibit characteristics can be used. In the rhythm rule (A), the approximate expression to be used can be simplified by grouping the types of phonemic sequences other than vowel parts existing between vowels according to the purpose at the time of creating a synthesized speech. The rhythm rule (A) 44 and the vowel part energy center-of-gravity time length setting unit 42 have been described above.

【００３６】次に、リズム規則（Ｂ）４８と各音韻継続
時間長設定部４６について説明する。ＶＣＶ形音韻系列
の場合、図９に示す如く、当該母音部エネルギー重心点
ＣＥＧＶ間は先行母音部後半部、子音部、後続母音部前
半部に３分割され、これら各部分の継続時間長Ｄvp、Ｄ
c、Ｄvsは以下のＤGの一次式で近似される（特開平６−
２６６３９１号公報、特開平６−２７４１９５号公
報）。Next, the rhythm rule (B) 48 and each phoneme duration setting unit 46 will be described. In the case of the VCV-type phonological sequence, as shown in FIG. 9, the vowel part energy centroid CEGV is divided into three parts: a preceding vowel part latter part, a consonant part, and a succeeding vowel part former part, and the duration time Dvp, D
c and Dvs are approximated by the following linear equation of DG (Japanese Patent Laid-Open No.
266391, JP-A-6-274195).

【００３７】[0037]

【数２】ここで、Ａvp、Ｂvpは先行母音に関するＤGの係数及び
定数項、Ａc、Ｂcは子音に関するＤGの係数及び定数
項、Ａvs、Ｂvsは後続母音に関するＤGの係数及び定数
項である。これら係数Ａvp、Ａc、Ａvs、及び定数項Ｂv
p、Ｂc、Ｂvsは母音部間に存在する子音部の種類ごとに
定められる。(Equation 2) Here, Avp and Bvp are DG coefficients and constant terms relating to preceding vowels, Ac and Bc are DG coefficients and constant terms relating to consonants, and Avs and Bvs are DG coefficients and constant terms relating to succeeding vowels. These coefficients Avp, Ac, Avs, and the constant term Bv
p, Bc, and Bvs are determined for each type of consonant part existing between vowel parts.

【００３８】実音声の分析結果を用いて、リズム規則
（Ｂ）を順次説明する。一般のＶＣＶ形音韻系列のＤv
p、Ｄc、ＤvsとＤGとの関係の分析には、７モーラの無
意味単語を含む「それは、“こ○○めんかい”です」と
いう一文を用いた。無意味単語部中の○部分には、それ
ぞれｂＸ₁とＣＸ₂が入る。ここでＸ₁、Ｘ₂は任意の母
音、Ｃは任意の子音を示す。このようにして、無意味単
語中の第２と第３モーラを様々な音韻系列に変化させ、
各々の場合におけるＤvp、Ｄc、Ｄvs、とＤGの関係を分
析した。発声者は、女性話者一名である。図１０は、Ｖ
ＣＶ形音韻系列のＤvp、Ｄc、ＤvsとＤGとの関係の分析
結果の一例を示すグラフである。この図では、横軸がＤ
G値、縦軸がＤvp、Ｄc、Ｄvsの値を示し、発話速度を変
化させて得られた各ＤG値についてＤvp、Ｄc、Ｄvsの測
定値がプロットされている。Ｄvp、Ｄc、Ｄvsそれぞれ
の測定値に対して最小自乗法により求めた（２）式の直
線も同図中に示されている。かくして、Ｄvp、Ｄc、Ｄv
sとＤGとの関係は、一次関数である上記（２）式で良く
近似されることが分かる。The rhythm rule (B) will be described sequentially using the analysis result of the actual voice. Dv of general VCV phoneme series
For the analysis of the relationship between p, Dc, Dvs and DG, a sentence containing a 7-mora nonsense word, "It's" this is the name of xx "is used. BX ₁ and CX ₂ are respectively entered in the o portions in the meaningless word portion. Here, X ₁ and X ₂ indicate arbitrary vowels, and C indicates an arbitrary consonant. In this way, the second and third mora in the meaningless word are changed into various phoneme sequences,
The relationship between Dvp, Dc, Dvs, and DG in each case was analyzed. The speaker is a female speaker. FIG.
It is a graph which shows an example of the analysis result of the relationship between Dvp, Dc, Dvs and DG of a CV type phoneme series. In this figure, the horizontal axis is D
The G value and the vertical axis indicate the values of Dvp, Dc, and Dvs, and the measured values of Dvp, Dc, and Dvs are plotted for each DG value obtained by changing the speech rate. The straight line of the equation (2) obtained by the least squares method for each of the measured values of Dvp, Dc and Dvs is also shown in FIG. Thus, Dvp, Dc, Dv
It can be seen that the relationship between s and DG is well approximated by equation (2), which is a linear function.

【００３９】以上の枠組みを、区画区分部４０から出力
される音韻区画のうち、撥音や促音、長音、母音無声化
音韻系列を含んだ各音韻区画へと拡張する。撥音や促
音、長音、母音が無声化するような音韻系列の場合、次
に有声の母音部と判定されるモーラまでを１つの音韻区
画とし、各音韻区画に含まれる各音韻に対応する各部分
の音韻継続時間長を（２）式と同様の式で近似する。ま
た、異なる母音が連続する場合には、母音部と遷移部と
を明確に区分して（２）式で近似する。The above-mentioned framework is extended to each phoneme section including a vowel sound, a vowel sound, a long sound, and a vowel unvoiced phoneme sequence among the phoneme sections output from the section dividing section 40. In the case of a phonological sequence in which a vowel sound, a consonant sound, a prolonged sound, and a vowel sound are unvoiced, a portion up to the next mora determined to be a voiced vowel portion is defined as one phoneme segment, and each portion corresponding to each phoneme included in each phoneme segment. Is approximated by an expression similar to the expression (2). When different vowels are continuous, the vowel part and the transition part are clearly divided and approximated by the equation (2).

【００４０】撥音や促音、長音、母音無声化音韻系列の
場合の、各音韻部分の区分の例について図１１〜図１４
に示す。なお、各音韻部分の区分は、例に示した区分の
方法に限らず、その他の別の規準を用いて区分しても構
わない。FIG. 11 to FIG. 14 show examples of the division of each phoneme part in the case of a sound-repelling sound, a prompt sound, a long sound, and a vowel unvoiced phoneme sequence.
Shown in Note that the division of each phoneme portion is not limited to the division method shown in the example, and may be divided using another different criterion.

【００４１】図１１は、撥音を含んだ音韻系列の音韻区
画内における音韻部分の区分の例を示す模式図である。
この音韻区画は、図１１に示す如く、母音部エネルギー
重心点ＣＥＧＶ間を順に、先行母音部後半部、撥音部、
後続子音部、後続母音部前半部に４分割され、これら各
音韻部分の継続時間長をそれぞれＤvp、ＤN、Ｄc、Ｄvs
とする。Ｄvp、ＤN、Ｄc、ＤvsとＤGとの関係の分析に
は、８モーラの無意味単語を含む「それは、“こ○ん○
めんかい”です」という一文を用いた。無意味単語部中
の○部分には、それぞれｂＸ₁とＣＸ₂が入る。ここでＸ
₁、Ｘ₂は任意の母音、Ｃは任意の子音を示す。このよう
にして、無意味単語中の第２〜４モーラを様々な音韻系
列に変化させ、各々の場合におけるＤvp、ＤN、Ｄc、Ｄ
vsとＤGの関係を分析した。発声者は、女性話者一名で
ある。図１５は、撥音を含む音韻系列の各音韻部分の継
続時間長Ｄvp、ＤN、Ｄc、ＤvsとＤGとの関係の分析結
果の一例を示すグラフである。この例では無意味単語は
“こばんばめんかい”である。この図では、横軸がＤG
値、縦軸がＤvp、ＤN、Ｄc、Ｄvsの値を示し、発話速度
を変化させて得られた各ＤG値についてＤvp、ＤN、Ｄ
c、Ｄvsの測定値がプロットされている。Ｄvp、ＤN、Ｄ
c、Ｄvsそれぞれの測定値に対して最小自乗法により求
めたＤGの一次式で表される直線も同図中に示されてい
る。かくして、Ｄvp、ＤN、Ｄc、ＤvsとＤGとの関係
は、上記（２）式同様の一次式で良く近似されることが
分かる。FIG. 11 is a schematic diagram showing an example of the division of a phoneme portion in a phoneme section of a phoneme series including a sound repellent.
As shown in FIG. 11, the phonological sections are sequentially arranged between the vowel part energy centroids CEGV, the latter part of the preceding vowel part, the sound repellent part,
The subsequent consonant part and the first half of the subsequent vowel part are divided into four parts, and the durations of these phoneme parts are respectively Dvp, DN, Dc, Dvs
And The analysis of the relationship between Dvp, DN, Dc, Dvs and DG includes the meaningless word of 8 mora,
"I'm sorry." BX ₁ and CX ₂ are respectively entered in the o portions in the meaningless word portion. Where X
_1, X ₂ is any vowel, C is represents any consonant. In this way, the second to fourth mora in the meaningless word are changed into various phoneme sequences, and Dvp, DN, Dc, D in each case are changed.
The relationship between vs and DG was analyzed. The speaker is a female speaker. FIG. 15 is a graph showing an example of an analysis result of the relationship between the durations Dvp, DN, Dc, Dvs and DG of each phoneme portion of the phoneme sequence including the sound repellency. In this example, the meaningless word is “Kobanba Menkai”. In this figure, the horizontal axis is DG
And the vertical axis indicates the values of Dvp, DN, Dc, and Dvs, and Dvp, DN, and Dv for each DG value obtained by changing the speech rate.
c, Dvs measurements are plotted. Dvp, DN, D
A straight line represented by a linear expression of DG obtained by the least square method for each of the measured values of c and Dvs is also shown in FIG. Thus, it can be seen that the relationship between Dvp, DN, Dc, Dvs and DG is well approximated by a linear equation similar to the above equation (2).

【００４２】ここで、Ａvp、ＢvpをＤvpに関する一次式
におけるＤGの係数及び定数項、ＡN、ＢNをＤNに関する
一次式におけるＤGの係数及び定数項、Ａc、ＢcをＤcに
関する一次式におけるＤGの係数及び定数項、Ａvs、Ｂv
sをＤvsに関する一次式におけるＤGの係数及び定数項と
すると、これらＡvp、ＡN、Ａc、Ａvs、Ｂvp、ＢN、Ｂ
c、Ｂvsの組は母音部間に存在する撥音“Ｎ”に続く子
音“Ｃ”の種類ごとに定められる。Here, Avp and Bvp are DG coefficients and constant terms in a linear equation relating to Dvp, AN and BN are DG coefficients and constant terms in a linear equation relating to DN, and Ac and Bc are DG coefficients in a linear equation relating to Dc. And constant terms, Avs, Bv
Assuming that s is a coefficient and a constant term of DG in a linear expression relating to Dvs, these Avp, AN, Ac, Avs, Bvp, BN, B
A set of c and Bvs is determined for each type of consonant "C" following the sound repellent "N" existing between the vowel parts.

【００４３】図１２は、促音を含んだ音韻系列の音韻区
画内における音韻部分の区分の例を示す模式図である。
この音韻区画は、図１２に示す如く、母音部エネルギー
重心点ＣＥＧＶ間を先行母音部後半部、促音部、後続母
音部前半部に３分割され、これら各音韻部分の継続時間
長をそれぞれＤvp、Ｄsok、Ｄvsとする。Ｄvp、Ｄsok、
ＤvsとＤGとの関係の分析には、８モーラの無意味単語
を含む「それは、“こ○っ○めんかい”です」という一
文を用いた。無意味単語部中の○部分には、それぞれｂ
Ｘ₁とＣＸ₂が入る。ここでＸ₁、Ｘ₂は任意の母音、Ｃは
任意の子音を示す。このようにして、無意味単語中の第
２〜４モーラを様々な音韻系列に変化させ、各々の場合
におけるＤvp、Ｄsok、ＤvsとＤGの関係を分析した。発
声者は、女性話者一名である。図１６は、促音を含む音
韻系列の各音韻部分の継続時間長Ｄvp、Ｄsok、Ｄvsと
ＤGとの関係の分析結果の一例を示すグラフである。こ
の例では無意味単語は“こばっかめんかい”である。こ
の図では、横軸がＤG値、縦軸がＤvp、Ｄsok、Ｄvsの値
を示し、発話速度を変化させて得られた各ＤG値につい
てＤvp、Ｄsok、Ｄvsの測定値がプロットされている。
Ｄvp、Ｄsok、Ｄvsそれぞれの測定値に対して最小自乗
法により求めたＤGの一次式で表される直線も同図中に
示されている。かくして、Ｄvp、Ｄsok、ＤvsとＤGとの
関係は、上記（２）式同様の一次式で良く近似されるこ
とが分かる。FIG. 12 is a schematic diagram showing an example of the division of a phoneme portion in a phoneme section of a phoneme sequence including a prompt sound.
As shown in FIG. 12, this phonological section is divided into three parts between the vowel part energy centroid CEGV into a preceding vowel part latter part, a prompting part, and a succeeding vowel part former part, and the duration time of each of these phonological parts is Dvp, Dsok and Dvs. Dvp, Dsok,
In the analysis of the relationship between Dvs and DG, a sentence containing a meaningless word of 8 mora, "It is" ko * menkokai "" was used. In the ○ part in the meaningless word part, b
X ₁ and CX ₂ to enter. Here, X ₁ and X ₂ indicate arbitrary vowels, and C indicates an arbitrary consonant. In this way, the second to fourth moras in the meaningless word were changed into various phoneme sequences, and the relationship between Dvp, Dsok, Dvs and DG in each case was analyzed. The speaker is a female speaker. FIG. 16 is a graph showing an example of an analysis result of the relationship between the durations Dvp, Dsok, Dvs, and DG of each phoneme portion of the phoneme sequence including the prompt sound. In this example, the meaningless word is “Kobakamenkai”. In this figure, the horizontal axis indicates the DG value, and the vertical axis indicates the values of Dvp, Dsok, and Dvs, and the measured values of Dvp, Dsok, and Dvs are plotted for each DG value obtained by changing the speech speed.
A straight line represented by a linear expression of DG obtained by the least square method for each of the measured values of Dvp, Dsok, and Dvs is also shown in FIG. Thus, it can be seen that the relationship between Dvp, Dsok, Dvs and DG is well approximated by a linear expression similar to the above expression (2).

【００４４】ここで、Ａvp、ＢvpをＤvpに関する一次式
におけるＤGの係数及び定数項、Ａsok、ＢsokをＤsokに
関する一次式におけるＤGの係数及び定数項、Ａvs、Ｂv
sをＤvsに関する一次式におけるＤGの係数及び定数項と
すると、これらＡvp、Ａsok、Ａvs、Ｂvp、Ｂsok、Ｂvs
の組は母音部間に存在する促音“ＣＣ”を構成する子音
Ｃの種類ごとに定められる。Here, Avp and Bvp are DG coefficients and constant terms in a linear equation relating to Dvp, and Asok and Bsok are DG coefficients and constant terms in a linear equation relating to Dsok, Avs and Bv.
Assuming that s is a coefficient and a constant term of DG in a linear expression relating to Dvs, these Avp, Asok, Avs, Bvp, Bsok, Bvs
Are determined for each type of consonant C constituting the prompting sound “CC” existing between the vowel parts.

【００４５】図１３は、母音無声化音韻系列の音韻区画
内における音韻部分の区分の例を示す模式図である。こ
の音韻区画は、図１３に示す如く、母音部エネルギー重
心点ＣＥＧＶ間を先行母音部後半部、母音の無声化を生
じたモーラ部、後続子音部、後続母音部前半部に４分割
され、これら各音韻部分の継続時間長をそれぞれＤvp、
Ｄmusei、Ｄc、Ｄvsとする。Ｄvp、Ｄmusei、Ｄc、Ｄvs
とＤGとの関係の分析には、８モーラの無意味単語を含
む「それは、“こ○○○めんかい”です」という一文を
用いた。無意味単語部中の○部分には、それぞれｂＸ₁
とＣ₁Ｘ₂とＣ₂Ｘ₃が入る。ここでＸ₁、Ｘ₃は任意の母
音、Ｘ₂は“／ｉ／”又は“／ｕ／”、そしてＣ₁、Ｃ₂
は無声化を起こす任意の子音の組を示す。このようにし
て、無意味単語中の第２〜４モーラを様々な音韻系列に
変化させ、各々の場合におけるＤvp、Ｄmusei、Ｄc、Ｄ
vsとＤGの関係を分析した。発声者は、女性話者一名で
ある。図１７は、母音無声化音韻系列の各音韻部分の継
続時間長Ｄvp、Ｄmusei、Ｄc、ＤvsとＤGとの関係の分
析結果の一例を示すグラフである。この例では無意味単
語は“こばきちゃめんかい”である。この図では、横軸
がＤG値、縦軸がＤvp、Ｄmusei、Ｄc、Ｄvsの値を示
し、発話速度を変化させて得られた各ＤG値についてＤv
p、Ｄmusei、Ｄc、Ｄvsの測定値がプロットされてい
る。Ｄvp、Ｄmusei、Ｄc、Ｄvsそれぞれの測定値に対し
て最小自乗法により求めたＤGの一次式で表される直線
も同図中に示されている。かくして、Ｄvp、Ｄmusei、
Ｄc、ＤvsとＤGとの関係は、上記（２）式同様の一次式
で良く近似されることが分かる。FIG. 13 is a schematic diagram showing an example of the division of a phoneme portion in a phoneme section of a vowel unvoiced phoneme sequence. As shown in FIG. 13, this phonological section is divided into four parts between the vowel part energy centroids CEGV into a former half part of the preceding vowel part, a mora part having vowel devoicing, a succeeding consonant part, and a succeeding half part of the vowel part. The duration of each phoneme part is Dvp,
Dmusei, Dc, and Dvs. Dvp, Dmusei, Dc, Dvs
To analyze the relationship between DG and DG, we used a sentence that included an insignificant word of 8 mora, "It's" This is XXX. " In the meaningless word part, each of the ○ parts is bX ₁
And C ₁ X ₂ and C ₂ X ₃ are entered. Here, X ₁ and X ₃ are arbitrary vowels, X ₂ is “/ i /” or “/ u /”, and C ₁ and C ₂
Indicates an arbitrary set of consonants that causes devoicing. In this way, the second to fourth mora in the meaningless word are changed into various phoneme sequences, and Dvp, Dmusei, Dc, D in each case are changed.
The relationship between vs and DG was analyzed. The speaker is a female speaker. FIG. 17 is a graph showing an example of an analysis result of the relationship between the durations Dvp, Dmusei, Dc, Dvs and DG of each phoneme portion of the vowel unvoiced phoneme sequence. In this example, the meaningless word is “Kobaki Chamenkai”. In this figure, the horizontal axis indicates the DG value, and the vertical axis indicates the values of Dvp, Dmusei, Dc, and Dvs. For each DG value obtained by changing the speech speed, Dv
The measured values of p, Dmusei, Dc, and Dvs are plotted. A straight line represented by a linear expression of DG obtained by the least square method for each of the measured values of Dvp, Dmusei, Dc, and Dvs is also shown in FIG. Thus, Dvp, Dmusei,
It can be seen that the relationship between Dc, Dvs and DG is well approximated by a linear expression similar to the above expression (2).

【００４６】ここで、Ａvp、ＢvpをＤvpに関する一次式
におけるＤGの係数及び定数項、Ａmusei、ＢmuseiをＤm
useiに関する一次式におけるＤGの係数及び定数項、Ａ
c、ＢcをＤcに関する一次式におけるＤGの係数及び定数
項、Ａvs、ＢvsをＤvsに関する一次式におけるＤGの係
数及び定数項とすると、これらＡvp、Ａmusei、Ａc、Ａ
vs、Ｂvp、Ｂmusei、Ｂc、Ｂvsの組は母音部間に存在す
る母音無声化音韻系列の種類ごとに定められる。Here, Avp and Bvp are represented by a coefficient and a constant term of DG in a linear expression relating to Dvp, and Amusei and Bmusei are represented by Dm.
A coefficient and constant term of DG in a linear expression relating to usei, A
Assuming that c and Bc are DG coefficients and constant terms in a linear equation relating to Dc, and Avs and Bvs are DG coefficients and constant terms in a linear equation relating to Dvs, these Avp, Amusei, Ac, and A
A set of vs, Bvp, Bmusei, Bc, Bvs is determined for each type of vowel unvoiced phoneme sequence existing between vowel parts.

【００４７】図１４は、異なる母音の連続を含む音韻系
列の音韻区画内における音韻部分の区分の例を示す模式
図である。この音韻区画は、図１４に示す如く、母音部
エネルギー重心点ＣＥＧＶ間を先行母音部後半部、遷移
部、後続母音部前半部に３分割され、これら各音韻部分
の継続時間長をそれぞれＤvp、ＤseNi、Ｄvsとする。Ｄ
vp、ＤseNi、ＤvsとＤGとの関係の分析には、７モーラ
の無意味単語を含む「それは、“こ○○めんかい”で
す」という一文を用いた。無意味単語部中の○部分に
は、それぞれｂＸ₁とＸ₂が入る。ここでＸ₁とＸ₂は相異
なる任意の母音を示す。このようにして、無意味単語中
の第２と第３モーラを様々な音韻系列に変化させ、各々
の場合におけるＤvp、ＤseNi、ＤvsとＤGの関係を分析
した。発声者は、女性話者一名である。図１８は、異な
る母音の連続を含む音韻系列の各音韻部分の継続時間長
Ｄvp、ＤseNi、ＤvsとＤGとの関係の分析結果の一例を
示すグラフである。この例では無意味単語は“こばいめ
んかい”である。この図では、横軸がＤG値、縦軸がＤv
p、ＤseNi、Ｄvsの値を示し、発話速度を変化させて得
られた各ＤG値についてＤvp、ＤseNi、Ｄvsの測定値が
プロットされている。Ｄvp、ＤseNi、Ｄvsそれぞれの測
定値に対して最小自乗法により求めたＤGの一次式で表
される直線も同図中に示されている。かくして、Ｄvp、
ＤseNi、ＤvsとＤGとの関係は、上記（２）式同様の一
次式で良く近似されることが分かる。FIG. 14 is a schematic diagram showing an example of the division of a phoneme part in a phoneme section of a phoneme sequence including a series of different vowels. As shown in FIG. 14, this phonological section is divided into three parts between the vowel part energy centroid CEGV into a preceding vowel part latter part, a transition part, and a succeeding vowel part former part, and the duration time of each of these phonological parts is Dvp, DseNi and Dvs. D
For the analysis of the relationship between vp, DseNi, Dvs and DG, a sentence containing a 7-mora nonsense word, "It's" this is the name of this girl "" was used. BX ₁ and X ₂ are respectively entered in the o portions in the meaningless word portion. Here, X ₁ and X ₂ indicate different vowels different from each other. In this way, the second and third mora in the meaningless word were changed into various phoneme sequences, and the relationship between Dvp, DseNi, Dvs and DG in each case was analyzed. The speaker is a female speaker. FIG. 18 is a graph showing an example of an analysis result of the relationship between the durations Dvp, DseNi, Dvs, and DG of each phoneme portion of a phoneme sequence including a series of different vowels. In this example, the meaningless word is “Kobai Menkai”. In this figure, the horizontal axis is the DG value, and the vertical axis is Dv
The values of p, DseNi, and Dvs are shown, and the measured values of Dvp, DseNi, and Dvs are plotted for each DG value obtained by changing the speech rate. A straight line represented by a linear expression of DG obtained by the least square method for each of the measured values of Dvp, DseNi, and Dvs is also shown in FIG. Thus, Dvp,
It can be seen that the relationship between DseNi, Dvs and DG is well approximated by a linear expression similar to the above expression (2).

【００４８】ここで、Ａvp、ＢvpをＤvpに関する一次式
におけるＤGの係数及び定数項、ＡseNi、ＢseNiをＤseN
iに関する一次式におけるＤGの係数及び定数項、Ａvs、
ＢvsをＤvsに関する一次式におけるＤGの係数及び定数
項とすると、これらＡvp、ＡseNi、Ａvs、Ｂvp、ＢseN
i、Ｂvsの組は異なる母音の組ごとに定められる。Here, Avp and Bvp are the coefficients and constant terms of DG in the linear expression relating to Dvp, and AseNi and BseNi are DseN.
The coefficient and constant term of DG in the linear expression for i, Avs,
When Bvs is a coefficient and a constant term of DG in a linear expression relating to Dvs, these Avp, AseNi, Avs, Bvp, BseN
The set of i, Bvs is determined for each different set of vowels.

【００４９】その他の音韻系列についての図示は省略す
るが、本リズム規則（Ｂ）では、上記音韻系列について
の説明で述べたように、母音間に存在する母音部以外の
音韻系列の種類の数だけ（２）式に相当する一次式の係
数、定数項の組が用意される。なお、本リズム規則
（Ｂ）は、関係式（２）のような一次式を近似式として
用いているが、特にこの関係式に限らず、特性が示せる
近似式であれば、同様に用いることができる。また、こ
のリズム規則（Ｂ）では、合成音声を作成するときの目
的に応じて、音韻系列の種類をグルーピングすることに
より、用いる近似式を簡略化することができる。さら
に、ＤGが各音韻系列の各部分の時間長の総和であるこ
とを用いて、用意するＤGの係数、定数項の数をそれぞ
れ（各音韻区画に含まれる音韻部分の数−１）個で済す
ことができる。Although illustration of other phonological sequences is omitted, in the present rhythm rule (B), as described in the description of the above phonological sequences, the number of types of phonological sequences other than vowel parts existing between vowels is described. Only a set of a coefficient of a linear expression and a constant term corresponding to the expression (2) is prepared. Although the rhythm rule (B) uses a linear expression such as the relational expression (2) as an approximate expression, the present invention is not limited to this relational expression. Can be. Further, in the rhythm rule (B), the approximate expression to be used can be simplified by grouping the types of phoneme sequences according to the purpose at the time of generating the synthesized speech. Furthermore, by using that DG is the sum of the time lengths of the respective parts of each phoneme sequence, the number of prepared DG coefficients and constant terms is each (the number of phoneme parts included in each phoneme section minus one). Can be done.

【００５０】このように、本発明は、より自然音声に合
致した日本語音声を合成できるよう、自由度を上げ、そ
の中で新たな規則を見出し、装置として実現したもので
ある。なお、合成音声装置の目的により、上述した精密
な規則を必要に応じて簡略化して実施することも可能で
ある。As described above, the present invention has been realized as an apparatus in which the degree of freedom is increased and a new rule is found therein to synthesize a Japanese speech that matches the natural speech more. Note that, depending on the purpose of the synthesized speech device, the above-described precise rules can be simplified and implemented as necessary.

【００５１】[0051]

【発明の効果】本発明の音声合成装置によれば、「母音
部エネルギー重心点ＣＥＧＶが存在する母音から次の母
音へと遷移していく際に、実現しなければならない母音
部以外の音韻系列の発声のしやすさ・しにくさが母音部
エネルギー重心点間時間長を決定する。」という原理に
基づき、さらに音韻区画を決定づける母音部エネルギー
重心点ＣＥＧＶの存在場所について「発声される母音が
存在するモーラに対して１つの母音部エネルギー重心点
を設定する。」という条件を用いることにより、従来方
法では表現しきれなかった撥音、促音といったモーラ音
素や母音無声化音韻系列といった音韻系列の音韻継続時
間長を精度良く決定できる規則が得られ、その結果、よ
り自然な合成音声を容易に得ることができるようになる
という効果が得られる。According to the speech synthesizing apparatus of the present invention, "when a transition from a vowel having the vowel part energy centroid CEGV to the next vowel occurs, a phonological sequence other than the vowel part must be realized. And the difficulty of uttering determines the time length between vowel energy centroids. "Based on the principle, the location of the vowel energy centroid CEGV that further determines the phonological segment is determined as follows. A single vowel energy centroid point is set for an existing mora. ", The phonemes of a phonetic sequence such as a mora phoneme such as a sound-repelling sound or a prompting sound or a vowel unvoiced phoneme sequence which could not be expressed by the conventional method. A rule that can determine the duration time accurately can be obtained, and as a result, it is possible to obtain a more natural synthesized speech easily. That.

[Brief description of the drawings]

【図１】本発明の実施形態に係る日本語音声合成装置
のシステム構成を示すブロック図である。FIG. 1 is a block diagram showing a system configuration of a Japanese speech synthesizer according to an embodiment of the present invention.

【図２】音韻継続時間長生成部の構成を示すブロック
図である。FIG. 2 is a block diagram illustrating a configuration of a phoneme duration length generation unit.

【図３】一般のＶＣＶ形音韻系列の音韻区画の模式図
である。FIG. 3 is a schematic diagram of a phoneme section of a general VCV phoneme sequence.

【図４】撥音、促音及び長音といったモーラ音素や母
音の無声化を生じたモーラを含んだ音韻系列の音韻区画
の模式図である。FIG. 4 is a schematic diagram of a phoneme section of a phoneme sequence including a mora phoneme such as a vowel sound, a prompt sound, and a long sound and a mora in which a vowel has been devoiced.

【図５】ＶＣＶ形音韻系列の発話速度と母音部エネル
ギー重心点間時間長との関係についての分析結果の一例
を示すグラフである。FIG. 5 is a graph showing an example of an analysis result of a relationship between a speech speed of a VCV-type phoneme sequence and a time length between vowel part energy centroids.

【図６】撥音を含む音韻系列の発話速度と母音部エネ
ルギー重心点間時間長との関係についての分析結果の一
例を示すグラフである。FIG. 6 is a graph showing an example of an analysis result on a relationship between the utterance speed of a phoneme sequence including a sound repellent and the time length between vowel part energy centroids.

【図７】促音を含む音韻系列の発話速度と母音部エネ
ルギー重心点間時間長との関係についての分析結果の一
例を示すグラフである。FIG. 7 is a graph showing an example of an analysis result on a relationship between a speech speed of a phoneme sequence including a prompting sound and a time length between vowel part energy centroids.

【図８】母音の無声化を生じたモーラを含む音韻系列
の発話速度と母音部エネルギー重心点間時間長の関係に
ついての分析結果の一例を示すグラフである。FIG. 8 is a graph showing an example of an analysis result on the relationship between the utterance speed of a phonological sequence including a mora that has caused vowel devoicing and the time length between vowel part energy centroids.

【図９】ＶＣＶ形音韻系列の音韻区画内における区分
を示す模式図である。FIG. 9 is a schematic diagram showing sections in a phoneme section of a VCV type phoneme series.

【図１０】ＶＣＶ形音韻系列の一例である無意味単語
“こばかめんかい”の分析結果におけるＤvp、Ｄc、Ｄv
sとＤGとの関係を示すグラフである。FIG. 10 shows Dvp, Dc, and Dv in the analysis result of the meaningless word “Kobakamenkai”, which is an example of the VCV phoneme sequence.
It is a graph which shows the relationship between s and DG.

【図１１】撥音を含んだ音韻系列の音韻区画内におけ
る音韻部分の区分を示す模式図である。FIG. 11 is a schematic diagram showing the division of a phoneme portion in a phoneme section of a phoneme series including a sound repellent.

【図１２】促音を含んだ音韻系列の音韻区画内におけ
る音韻部分の区分を示す模式図である。FIG. 12 is a schematic diagram showing a division of a phoneme portion in a phoneme section of a phoneme sequence including a prompting sound.

【図１３】母音の無声化を生じたモーラを含む音韻系
列の音韻区画内における音韻部分の区分を示す模式図で
ある。FIG. 13 is a schematic diagram showing the division of a phoneme portion in a phoneme section of a phoneme sequence including a mora in which vowels have been devoiced.

【図１４】異なる母音の連続を含む音韻系列の音韻区
画内における音韻部分の区分を示す模式図である。FIG. 14 is a schematic diagram showing the division of a phoneme portion in a phoneme section of a phoneme sequence including a series of different vowels.

【図１５】撥音を含む音韻系列の一例である無意味単
語“こばんばめんかい”の分析結果におけるＤvp、Ｄ
N、Ｄc、ＤvsとＤGとの関係を示すグラフである。FIG. 15 shows Dvp, D in the analysis result of the meaningless word “Kobanba Menkai”, which is an example of a phoneme sequence including a sound repellent.
It is a graph which shows the relationship between N, Dc, Dvs and DG.

【図１６】促音を含む音韻系列の一例である無意味単
語“こばっかめんかい”の分析結果におけるＤvp、Ｄso
k、ＤvsとＤGとの関係を示すグラフである。FIG. 16 shows Dvp and Dso in the analysis result of the meaningless word “Kobakamenkai”, which is an example of a phoneme sequence including a gong.
It is a graph which shows the relationship between k, Dvs, and DG.

【図１７】母音の無声化を生じたモーラを含む音韻系
列の一例である無意味単語“こばきちゃめんかい”の分
析結果におけるＤvp、Ｄmusei、Ｄc、ＤvsとＤGとの関
係を示すグラフである。FIG. 17 is a graph showing the relationship between Dvp, Dmusei, Dc, Dvs, and DG in the analysis result of the meaningless word “Kobaki Chamenkai”, which is an example of a phonological sequence including mora that has caused vowel devoicing. It is.

【図１８】異なる母音の連続を含む音韻系列の一例で
ある“こばいめんかい”の分析結果におけるＤvp、Ｄse
Ni、ＤvsとＤGとの関係を示すグラフである。FIG. 18 shows Dvp and Dse in the analysis result of “Kobaimenkai”, which is an example of a phoneme sequence including a sequence of different vowels.
4 is a graph showing the relationship between Ni, Dvs and DG.

[Explanation of symbols]

４テキスト解析部、６音韻継続時間長生成部、８
リズム規則、１０音源振幅パタン生成部、１２ピッ
チパタン生成部、１４スペクトルパタン生成部、２４
音源生成部、２６音声合成器、４０区画区分部、
４２母音部エネルギー重心点間時間長設定部、４４
リズム規則（Ａ）、４６各音韻継続時間長設定部、４
８リズム規則（Ｂ）。4 text analyzer, 6 phoneme duration generator, 8
Rhythm rule, 10 sound source amplitude pattern generator, 12 pitch pattern generator, 14 spectrum pattern generator, 24
Sound source generation unit, 26 speech synthesizer, 40 division unit,
42 vowel part energy centroid time length setting part, 44
Rhythm rule (A), 46 phoneme duration setting part, 4
8 Rhythm rules (B).

Claims

[Claims]

1. A speech synthesizer for regularly synthesizing speech from a text, comprising: a phoneme symbol string generating means for generating a phoneme symbol string including a vowel part, a consonant part and a mora phoneme from a text; A section dividing unit that divides the vowel section into phonological sections according to types of the consonant section and the mora phoneme in the phonological section, and an interval between energy centroid points of the vowel sections located at both ends of the phonological section. Vowel part energy barycenter time length generating means for obtaining a vowel part energy barycenter time length, the vowel part energy barycenter time length of the phoneme section,
Phoneme duration generation means for determining a phoneme duration of each phoneme in the phoneme section according to the type of the consonant part and the mora phoneme in the phoneme section, and using the phoneme duration. A speech synthesizer characterized by performing speech synthesis.

2. A speech synthesis apparatus for rule-synthesizing a speech from a text, comprising: a phoneme symbol string generating means for generating a phoneme symbol string including a vowel portion, a consonant portion, and a vowel unvoiced phoneme sequence from a text; Partitioning means for partitioning the vowel parts into phonological sections, and the energy of each vowel part located at both ends of the phonological section according to the type of the consonant part and the vowel unvoiced phoneme sequence in the phonological section. Vowel part energy barycenter time length generating means for obtaining a vowel part energy barycenter time length that is an interval between barycenter points, and the vowel part energy barycenter time length of the phoneme section,
Phoneme duration generation means for determining the phoneme duration of each phoneme in the phoneme section according to the consonant part in the phoneme section and the type of the vowel unvoiced phoneme sequence. A speech synthesizer characterized by performing speech synthesis using a length.

3. A speech synthesizer for regularly synthesizing a speech from a text, comprising: a phoneme symbol sequence generating means for generating a phoneme symbol sequence including a vowel portion, a consonant portion, a mora phoneme, and a vowel unvoiced phoneme sequence from the text; Partitioning means for partitioning a phoneme symbol string into phonemic sections by the vowel parts, and located at both ends of the phoneme section according to the type of the consonant part, the mora phoneme and the vowel unvoiced phoneme sequence in the phoneme section. A vowel part energy centroid time length generating means for obtaining a vowel part energy centroid point time length which is an interval between the energy centroid points of each of the vowel parts, and the vowel part energy centroid point time length of the phoneme section,
A consonant part in the phoneme section, the mora phoneme and the vowel unvoiced phoneme sequence, depending on the type of the phoneme duration, including phoneme duration generation means for determining the phoneme duration of each phoneme in the phoneme section, A speech synthesizer characterized in that speech synthesis is performed using the phoneme duration.

4. The vowel part energy barycenter time length generation means determines the vowel part energy barycenter time length by a linear function of a reciprocal of a speech rate, wherein the coefficient and the constant term of the linear function are consonant parts. 2. The speech synthesis device according to claim 1, wherein the speech synthesis device is determined according to a type of the mora phoneme.

5. The vowel part energy barycenter time length generation means determines the vowel part energy barycenter time length by a linear function of a reciprocal of a speech rate, and the coefficient and the constant term of the linear function are consonant parts. 3. The speech synthesis apparatus according to claim 2, wherein the speech synthesis apparatus is determined according to a type of the vowel unvoiced phoneme sequence.

6. The vowel part energy barycenter time length generation means determines the vowel part energy barycenter time length by a linear function of a reciprocal of a speech rate, and the coefficient and the constant term of the linear function are consonant parts. 4. The speech synthesizer according to claim 3, wherein the speech synthesis unit is determined according to a type of the mora phoneme and the vowel unvoiced phoneme sequence.