JP3278486B2

JP3278486B2 - Japanese speech synthesis system

Info

Publication number: JP3278486B2
Application number: JP06207693A
Authority: JP
Inventors: 雅代加藤; 新一郎橋本
Original assignee: Secom Co Ltd
Current assignee: Secom Co Ltd
Priority date: 1993-03-22
Filing date: 1993-03-22
Publication date: 2002-04-30
Anticipated expiration: 2017-04-30
Also published as: JPH06274195A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、日本語音声において単
音節に相当する韻律の単位であるモーラ（moras）を合
成する方法に関する。特に本発明ではモーラ間時間長を
調整し、日本語音声の自然的な合成を容易化することを
目的とする。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for synthesizing moras, which is a unit of prosody corresponding to a single syllable in Japanese speech. In particular, it is an object of the present invention to adjust the time length between mora and to facilitate natural synthesis of Japanese speech.

【０００２】[0002]

【従来の技術】従来このような分野の技術として、ＣＯ
Ｃ（Context Oriented Clustering)法を用いたテキスト
合成ボードの試作（電子情報通信学会の技術研究報告：
１９９０年１１月３０日）に記載されたものがあった。
この文献に記載されているＣＯＣ法は、自然音声のデー
タベースから統計的手法により生成された種々のバリエ
ーションを含む音韻を合成単位とする手法である。2. Description of the Related Art Conventionally, as a technique in such a field, CO2 is used.
Prototype of a text synthesis board using C (Context Oriented Clustering) method (Technical research report of IEICE:
(November 30, 1990).
The COC method described in this document is a method in which a phoneme including various variations generated by a statistical method from a database of natural speech is used as a synthesis unit.

【０００３】[0003]

【発明が解決しようとする課題】ところで、従来の日本
語音声合成システムによれば、下記のような問題が発生
する。図１５は従来の音韻継続時間長の制御を説明する
図である。本図に示すように、音韻継続時間長制御法と
して従来から考えられてきた方法は音素ごとに時間長を
与える制御方法であり、人が発した言葉を分析すると、
各音素の時間が実に変化に富むものであることが分か
る。この方法で音素の時間長制御を行う場合、考慮しな
ければならない要因としては、自分自身の音の種類、１
つ前の音の種類、１つ後ろの音の種類、２つ前の音の種
類、２つ後ろの音の種類、単語内の位置、文章内の位
置、アクセントの有無等が挙げられる（梅木等：「文音
声における音韻継続時間長の設定」信学技報、ＳＰ９０
−２（平０２−０４））。このように、音素の種類や連
なり方によって様々に異なって現れる時間長が言葉のリ
ズムといわれるが、自然な音声を合成する際には音素そ
れぞれに正確な時間長を与えることが必要になる。すな
わち、一種の音素に対してその前後に連なる音素に対応
して、リズムが異なるため、種々の音素データを蓄積
し、データの中から必要な部分を切り出して用いる必要
がある。しかし、ここに挙げられている要因だけでもそ
の組合せは天文学的な数になり、すべての場合を考慮す
ることは不可能である。またこれらの要因だけでは不十
分であることも既に分かっている。このため、膨大なデ
ータを蓄積せざるを得ず、また複雑な処理が必要になる
から日本語音声合成音の品質を向上するのが困難である
とういう問題があった。According to the conventional Japanese speech synthesis system, the following problems occur. FIG. 15 is a diagram for explaining conventional control of phoneme duration. As shown in this figure, a method conventionally considered as a phoneme duration control method is a control method that gives a time length for each phoneme.
It can be seen that the time of each phoneme is really variable. When controlling the duration of phonemes using this method, factors to be considered include the type of own sound,
The type of the preceding sound, the type of the preceding sound, the type of the preceding sound, the type of the preceding sound, the position in a word, the position in a sentence, the presence or absence of an accent, etc. Etc .: "Setting of phoneme duration in sentence speech" IEICE Technical Report, SP90
-2 (Heisei 02-04)). As described above, the length of time that appears differently depending on the type of phoneme and how it is connected is called the rhythm of words. When synthesizing natural speech, it is necessary to give each phoneme an accurate time length. In other words, since the rhythm is different for a phoneme that corresponds to a phoneme connected before and after the phoneme, various phoneme data need to be accumulated, and a necessary portion must be cut out from the data and used. However, the combination of these factors alone is an astronomical number and cannot be considered in all cases. We also know that these factors alone are not enough. For this reason, there has been a problem that it is necessary to accumulate a huge amount of data, and it is difficult to improve the quality of the synthesized speech sound in Japanese because complicated processing is required.

【０００４】したがって本発明は上記問題点に鑑み自然
な音声を簡単に合成できる日本語音声合成システムを提
供することを目的とする。Accordingly, an object of the present invention is to provide a Japanese speech synthesis system capable of easily synthesizing natural speech in view of the above problems.

【０００５】[0005]

【課題を解決するための手段】本発明は、前記問題点を
解決するために、単音節に相当する韻律の単位であるモ
ーラを合成する日本語音声合成システムにおいて、先行
母音の後半部時間長をｄｖｐ、前記二個のモーラ間の子
音部の時間長をｄｃ、前記二個のモーラ間の後続母音の
前半部時間長をｄｖｓ、前記先行母音によって決定され
る定数をＡｖｐ，Ｂｖｐ、前記子音によって決定される
定数をＡｃ，Ｂｃ、さらに前記後続母音によって決定さ
れる定数をＡｖｓ，Ｂｖｓで表す場合、隣接する二個の
モーラにおける各母音部のエネルギー重心点間の時間間
隔Ｄｃｅｇｖに対して、SUMMARY OF THE INVENTION In order to solve the above problems, the present invention provides a Japanese speech synthesis system for synthesizing a mora which is a unit of prosody corresponding to a single syllable. Dvp, the time length of the consonant part between the two mora is dc, the first half time length of the succeeding vowel between the two mora is dvs, and the constant determined by the preceding vowel is Avp, Bvp, When the determined constants are represented by Ac and Bc, and the constants determined by the succeeding vowels are represented by Avs and Bvs, the time interval Dsegv between the energy centroids of the respective vowel parts in two adjacent mora is expressed as follows.

【数３】なる式を適用して、前記先行母音部の後半部時間長ｄｖ
ｐ、前記子音部の時間長ｄｃおよび後続母音部の前半部
時間長ｄｖｓを決定する手段を備えている。また、前記
時間間隔Ｄｃｅｇｖに対して、(Equation 3) By applying the following equation, the second half time length dv of the preceding vowel part
p, means for determining the duration dc of the consonant part and the first half duration dvs of the succeeding vowel part. Further, with respect to the time interval Dsegv,

【数４】なる式を適用して、前記先行母音部の後半部時間長ｄｖ
ｐ、前記子音部の時間長ｄｃおよび後続母音部の前半部
時間長ｄｖｓを決定する手段を備えている。さらに、前
記先行母音によって決定される定数Ａｖｐおよび前記後
続母音によって決定される定数Ｂｖｐは、母音の種類を
複数個のグループに分割し、各グループにおいてそれぞ
れ一定の値を使用することも可能である。(Equation 4) By applying the following equation, the second half time length dv of the preceding vowel part
p, means for determining the duration dc of the consonant part and the first half duration dvs of the succeeding vowel part. Further, the constant Avp determined by the preceding vowel and the constant Bvp determined by the succeeding vowel may divide the type of vowel into a plurality of groups, and each group may use a constant value. .

【０００６】[0006]

【作用】本発明の、母音部エネルギー重心点間に母音長
・子音長規則を形成する日本語音声合成システムによれ
ば、任意の相隣合う２つのモーラの音声波形のエネルギ
ーを求め、該音声波形の母音部のエネルギーの時間積分
を取り、前記２つのモーラの前記エネルギーの時間積分
の重心点間である母音部エネルギー重心点位置間の時間
長によりモーラ間隔を求めると、前後の母音が変化して
も母音部エネルギー重心点位置間の時間長はほとんど変
化しない。さらに前の子音が変化しても前記母音部エネ
ルギー重心点位置間の時間長はほとんど変化しない。さ
らに前記モーラを構成する母音長、子音長が母音部エネ
ルギー重心点位置間の時間長と子音長とをパラメータと
してが決定される。したがって前記モーラ間隔、モーラ
を構成する母音長、子音長は定量的かつ正確に得られ
る。具体的には前記モーラを構成する母音長、子音長は
前記母音部エネルギー重心点位置間の時間長の一次関数
として定量的に決定し、さらに前記モーラを構成する母
音を複数にグループ化し各グループで前記一次関数を形
成することにより前記モーラ間隔の定量化の処理が簡略
化される。したがって、合成すべき文章の音韻継続時間
を前記モーラ間隔で調整することにより、自然な日本語
合成音が簡単に得られるようになる。According to the Japanese speech synthesis system of the present invention for forming vowel length / consonant length rules between vowel part energy centroids, the energy of the speech waveform of any two adjacent mora is determined, and When the time integration of the energy of the vowel part of the waveform is taken and the mora interval is determined by the time length between the vowel part energy centroid points between the centroid points of the time integration of the energy of the two moras, the preceding and following vowels change. Even though, the time length between the vowel part energy centroid positions hardly changes. Even if the preceding consonant changes, the time length between the vowel part energy centroid positions hardly changes. Further, the length of the vowel and the length of the consonant constituting the mora are determined using the time length between the vowel part energy centroid positions and the consonant length as parameters. Therefore, the mora interval, vowel length and consonant length constituting the mora can be obtained quantitatively and accurately. Specifically, the vowel length and consonant length constituting the mora are quantitatively determined as a linear function of the time length between the positions of the vowel part energy centroids, and the vowels constituting the mora are grouped into a plurality of groups. in the process of quantifying the mora interval is simplified by forming the number of primary functions. Therefore, by adjusting the phoneme duration of a sentence to be synthesized at the aforementioned mora interval, a natural Japanese synthesized sound can be easily obtained.

【０００７】[0007]

【実施例】以下本発明の実施例について図面を参照して
説明する。図１は本発明の実施例に係る日本語音声合成
システムの構成を示す図である。本図に示す日本語音声
合成システムは、合成文章が入力されると音声合成しよ
うとする日本語文を解析し、音声合成処理に必要なアク
セントの情報、ポーズ、母音の無音声化などといった発
音情報を加えた音韻記号列に変換するテキスト解析部１
と、該テキスト解析部１によって生成された音韻記号列
を後述するリズム規則により音韻継続時間長を制御する
音韻継続時間長生成部２と、該音韻継続時間長生成部２
のリズム規則により与えられる母音部エネルギー重心点
間時間長Ｄcegv（The Center of Energy Gravity of Vo
wels）を守るように音声のパワーパタンをパワー規則に
より決定する音響振幅パタン生成部３と、韻律制御規則
から各アクセント句についての単音節に相当する韻律の
単位であるモーラの含まれている個数、アクセント型か
ら点ピッチパタンを決めて、それらを補間して連続点ピ
ッチパタンを生成するピッチパタン生成部４と、前記パ
ワーパタンと前記連続点ピッチパタンをもとに音源を生
成する音源生成部５と、音韻性向上規則、音韻結合規則
により、母音・子音といった音韻の種類を決め、各音韻
のスペクトルを結合し、フォルマントパタンを作成する
スペクトルパタン生成部６と、前記得られた音源情報と
前記フォルマントパタンから合成音声を作成する音声合
成器７とを含む。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a diagram showing a configuration of a Japanese speech synthesis system according to an embodiment of the present invention. The Japanese speech synthesis system shown in this figure analyzes the Japanese sentence to be synthesized when a synthesized sentence is input, and generates pronunciation information such as accent information, pause, and vowel de-voice necessary for speech synthesis processing. Analysis unit 1 for converting to a phoneme symbol string to which is added
A phoneme duration generator 2 for controlling the phoneme duration of the phoneme symbol string generated by the text analyzer 1 according to a rhythm rule described later; and a phoneme duration generator 2.
Of the vowel energy given by the rhythm rule of Dcegv (The Center of Energy Gravity of Vo
The sound amplitude pattern generation unit 3 that determines the power pattern of the voice according to the power rule so as to observe the wels), and the number of mora that are the units of the prosody corresponding to a single syllable for each accent phrase from the prosody control rule A pitch pattern generation unit 4 that determines a point pitch pattern from an accent type and interpolates them to generate a continuous point pitch pattern, and a sound source generation unit that generates a sound source based on the power pattern and the continuous point pitch pattern 5, a phonological improvement rule, and a phonological combination rule, determine a type of phonology such as a vowel or a consonant, combine the spectra of each phonology to form a formant pattern, and generate a formant pattern. A speech synthesizer 7 for creating a synthesized speech from the formant pattern.

【０００８】図２は図１の音韻継続時間長生成部２の構
成を示すブロック図である。本図に示す音韻継続時間長
生成部２はテキスト解析部１によって生成された音韻記
号列をＶＣＶ（母音／子音／母音）の区画単位に区分す
るＶＣＶ区画区分部２１と、前記区画区分部２１で区分
された各区画単位での子音及び発話速度ＳＲ（SpeechRa
te)をパラメータとして格納されているリズム規則から
母音部エネルギー重心点間時間長Ｄcegvを算出し各区間
部に設定する母音部エネルギー重心点間時間長設定部２
２と、母音部エネルギー重心点間時間長Ｄcegvを得た後
に、Ｄcegv間に存在する各母音・子音の継続時間長を決
定しその結果を音響振幅パタン生成部３、ピッチパタン
生成部４、音源生成部５に出力するＤcegv内母音長・子
音長設定部２３を具備する。FIG. 2 is a block diagram showing a configuration of the phoneme duration generating unit 2 of FIG. The phoneme duration generation unit 2 shown in FIG. 2 includes a VCV partitioning unit 21 for partitioning a phoneme symbol string generated by the text analysis unit 1 into VCV (vowel / consonant / vowel) partition units; Consonant and speech rate SR (SpeechRa
te) is used as a parameter to calculate a vowel part energy center-of-gravity point time length Dcegv from a rhythm rule stored therein, and to set each section part for a vowel part energy center-of-gravity point time length setting unit 2
2 and the time length Dcegv between the vowel part energy centroids, the duration time of each vowel / consonant existing between Dcegvs is determined, and the result is used as the sound amplitude pattern generation unit 3, the pitch pattern generation unit 4, the sound source A vowel length / consonant length setting unit 23 in Dcegv to output to the generation unit 5 is provided.

【０００９】次に本発明の特徴部である音韻継続時間長
生成部２に母音部エネルギー重心点間時間長Ｄcegvを供
給するリズム規則について詳細に説明する。本発明者
は、高品質な日本語音声を容易に合成するために、日本
語にリズム法則を見い出し韻律規則として適用すること
が不可欠であると考えた。なぜならこれまで多くの研究
者の努力にも関わらず、リズム規則は見い出されていな
かったからである。日本語の構造は、ＣＶ（子音／母
音）形、ＶＣＶ（母音／子音／母音）形、ＣＶＣ（子音
／母音／子音）などが考えられるが、以下に説明するよ
うに、日本語音声のリズムがＶＣＶ（母音／子音／母
音）形の母音部エネルギー重心点間の時間長として与え
られること、さらにこの時間長があるテンポの範囲内で
は母音間に存在する子音の種類によりほぼ一位に定まる
ことが明らかになった。Next, a rhythm rule for supplying the vowel part energy centroid point time length Dcegv to the phoneme duration time generation unit 2 which is a characteristic part of the present invention will be described in detail. The present inventors, in order to easily synthesize a high quality Japanese voice, <br/> the applied child as prosodic rules found rhythm laws in Japanese is considered essential. This is because, despite the efforts of many researchers, rhythm rules have not been found. The structure of Japanese language can be CV (consonant / vowel), VCV (vowel / consonant / vowel), CVC (consonant / vowel / consonant), and the like. Is given as the time length between VCV (vowel / consonant / vowel) vowel part energy centroids, and within a certain tempo range, it is almost determined by the type of consonant existing between vowels. It became clear.

【００１０】はじめに以下の日本語リズムの仮説に立脚
して説明を進める。すなわち、かなり小さな子供であっ
ても、日本語独特なリズム感を正しく形成することがで
きる。また、どんなモーラ系列であっても人は自然なリ
ズムで発音できる。これらの現象から日本語のリズム規
則は非常に単純なものでなければならないはずである。
このことについて金田一（日本語音韻の研究（１９６
７）東京堂出版）はつぎのような法則があると指摘して
いる。First, the description will be made based on the following hypothesis of Japanese rhythm. That is, even a very small child can correctly form a sense of rhythm unique to Japanese. Also, no matter what the mora series, a person can pronounce with a natural rhythm. From these phenomena, Japanese rhythm rules should be very simple.
About this, Kaneda Kazu (Japanese Phonological Research (196
7) Tokyodo Shuppan) points out the following rules.

【００１１】〔法則〕日本語のリズムはモーラ単位でと
られ、基本的に等時性という単一概念で支配されてい
る。また、ゆっくり発声した場合を考察すると、リズム
はモーラ中の母音部でとられていることがわかる。さら
に、人はそのエネルギーを聞いていると考えられる。従
って、物理的なリズムを次のように定義することができ
る。[Rule] Japanese rhythm is taken in mora units and is basically governed by a single concept of isochronism. In addition, when considering the case of uttering slowly, it is understood that the rhythm is taken in the vowel part in the mora. In addition, it is believed that people are listening to that energy. Therefore, the physical rhythm can be defined as follows.

【００１２】〔定義〕日本語のリズムはモーラ中の母音
部エネルギー重心点間の時間長で与えられる。日本語の
リズムは基本的に等時的であるが、これを乱す最大の要
因は、発声器官の構造から付随的に加わる物理的制約で
あると考えられる。この制約は母音発声時よりも子音発
声時の方が明らかに大きく、母音部エネルギー重心点間
の時間長は子音（２７種類）の違いに大きく依存すると
思われる。[Definition] The rhythm of Japanese is given by the time length between the vowel energy centroids in the mora. The Japanese rhythm is basically isochronous, but the biggest factor that disturbs it is thought to be the additional physical constraints imposed by the structure of the vocal organs. This constraint is clearly greater when consonants are uttered than when vowels are uttered, and the time length between vowel energy centroids seems to be greatly dependent on differences in consonants (27 types).

【００１３】図３は母音部エネルギー重心点間時間長に
着目した時間長制御を示す図である。本図を用いて詳細
に説明すると、図中の音声波形による日本語のモーラ
は、一般に子音部と母音部とからなる。前述したとお
り、リズムのタイミングはモーラ中の母音部でとられて
いることから、とくにモーラ内の母音部のエネルギーに
着目する。すなわち母音部のエネルギーを求め、その部
分の重心を与える瞬間の点を母音部エネルギー重心点と
定義する。音節それぞれにつき１つずつ母音部エネルギ
ー重心点が決まるので、この母音部エネルギー重心点間
の時間長を図中の如くｔ1 、ｔ2 、ｔ3 、ｔ4 とすれば
モーラ間の時間長を与えることができる。FIG. 3 is a diagram showing time length control focusing on the time length between the vowel part energy centroids. This will be described in detail with reference to this drawing. A Japanese mora using a voice waveform in the drawing generally includes a consonant part and a vowel part. As described above, since the timing of the rhythm is taken in the vowel part in the mora, attention is paid particularly to the energy of the vowel part in the mora. That is, the energy of the vowel part is determined, and the point at which the center of gravity of the part is given is defined as the vowel part energy centroid point. One vowel energy centroid is determined for each syllable, so if the time length between the vowel energy centroids is t1, t2, t3, t4 as shown in the figure, the time length between mora can be given. .

【００１４】この母音部エネルギー重心点に着目した時
間長制御法では、波形のエネルギーを考慮するため、母
音の種類の違いは考える必要がない。また母音部エネル
ギー重心点間に存在する子音部について考える。子音
は、母音に比べるとそのエネルギーが小さいけれど、発
音のしやすさ・しにくさといった物理的な制約を大きく
与えている。日本語のリズム法則は、エネルギー単位で
の等時性に基づいていると先に述べた。母音部エネルギ
ー重心点間の時間長は、本来等間隔であるべきものであ
るが、間に存在する子音の発生のしやすさ・しにくさで
この等時性が大きく乱され、実際には様々な時間長とな
って現れていると考えることができる。In the time length control method focusing on the vowel part energy centroid, it is not necessary to consider the difference in the type of vowels because the energy of the waveform is taken into account. Consider a consonant part existing between the vowel part energy centroids. Consonants have less energy than vowels, but they have great physical limitations such as ease of pronunciation and difficulty. I mentioned earlier that the Japanese rhythm rule is based on isochronism in energy units. The time length between the vowel energy centroids should be equidistant in nature, but this isochronism is greatly disturbed by the ease and difficulty of generating consonants that exist between them. It can be thought that it appears as various time lengths.

【００１５】そこで、以下のような仮説を設け、実音声
の分析を通して検証する。〔仮説〕母音部エネルギー重心点間の時間長（リズム）
はあるテンポ以下では対象となる母音間の子音の種類の
みで定まり、前後の母音の種類には依らない。次に音声
資料と分析方法について説明する。音声資料として、７
モーラの無意味単語部を含む「それは、“こ○○めんか
い”です」という１文を用いた。発声者は男女一名ずつ
計２名である。Therefore, the following hypothesis is provided and verified through analysis of actual speech. [Hypothesis] Time length between vowel energy center of gravity (rhythm)
Is determined only by the type of consonant between target vowels below a certain tempo, and does not depend on the type of vowels before and after. Next, audio materials and analysis methods will be described. As audio material, 7
One sentence containing the meaningless word part of Mora, "It's" This is a OO Menkai "" was used. There are two speakers, one male and one female.

【００１６】Ｃi をi 番目の子音、Ｖi をi 番目の母音
とすれば、無意味単語中の○部分はそれぞれＣ2 Ｖ2 、
Ｃ3 Ｖ3 と表すことができる。この第２・第３モーラ間
について先の仮説を検証するため、〔分析１〕前の母音
Ｖ2 が違う場合、〔分析２〕後ろの母音Ｖ3 が違う場
合、〔分析３〕前の子音Ｃ2 が違う場合、〔分析４〕間
に挟まれた子音Ｃ3 が違う場合、という４つの場合につ
いて母音部エネルギー重心点間時間長の分析を行った。
各音韻の種類毎に１０回の発声をし、母音部エネルギー
重心点位置間の時間長の平均時間長と標準偏差を求めて
いる。この際、発話毎の起こるテンポの微妙な違いを吸
収するため、無意味単語部の継続時間すなわち発話速度
を７モーラ／秒に線形伸縮で正規化した。Assuming that Ci is the i-th consonant and Vi is the i-th vowel, the circles in the meaningless word are C2 V2,
It can be represented as C3 V3. In order to verify the previous hypothesis between the second and third mora, [Analysis 1] when the previous vowel V2 is different, [Analysis 2] when the rear vowel V3 is different, and [Analysis 3] the previous consonant C2 The analysis of the time length between the vowel part energy centroids was performed in four cases, that is, when the consonants C3 sandwiched between [Analysis 4] were different.
Ten utterances are made for each phoneme type, and the average time length and the standard deviation of the time length between the vowel part energy centroid positions are obtained. At this time, in order to absorb a subtle difference in tempo that occurs for each utterance, the duration of the meaningless word portion, that is, the utterance speed was normalized to 7 mora / sec by linear expansion and contraction.

【００１７】なお、男女それぞれの分析結果は大局的に
同じ傾向が見られるため、以下１人（女声）の例でのみ
説明する。図４は前の母音が違う場合を示す図であり、
図５は図４の場合の母音部エネルギー重心点間時間長の
影響を示す図である。図３に示すように、子音１、母音
１、子音２、母音２の順に配列し、子音１、２、母音２
を固定して母音１を種々変化させて母音部エネルギー重
心点間の時間長が測定される。図５に示すように、◆は
母音部エネルギー重心点間の平均時間長、上下に伸びた
線分は標準偏差であり、横軸は第２モーラ及び第３のモ
ーラ目の音韻を、例えば「baba、biba、buba、ｂeba 、
boba、 Nba」と、示している。図５では前の母音Ｖ2 が
違う場合の母音部エネルギー重心点間時間長が示されて
いるが、その平均時間長は１３９〜１４５ｍｓであり、
±約３．５ｍｓの幅で分布している。音韻継続時間長の
弁別限は約１０〜２０ｍｓである事実から（参照：橋本
等、第７回国際音響会議、１２９−１３２（１９７
１））、前の母音Ｖ2 の違いによる影響はほとんどない
ものといえる。Since the same tendency is generally observed in the analysis results of men and women, only the case of one person (female voice) will be described below. FIG. 4 shows a case where the previous vowel is different,
FIG. 5 is a diagram showing the influence of the time length between the vowel part energy centroids in the case of FIG. As shown in FIG. 3, consonants 1, vowels 1, consonants 2, and vowels 2 are arranged in this order, and consonants 1, 2, and vowels 2
Is fixed, and the vowel 1 is variously changed, and the time length between the vowel part energy centroids is measured. As shown in FIG. 5, ◆ is the average time length between the vowel energy centroids, the vertical line is the standard deviation, and the horizontal axis is the phoneme of the second and third mora eyes, for example, “ baba, biba, buba, beba,
boba, Nba ". FIG. 5 shows the time length between the vowel part energy centroids when the previous vowel V2 is different. The average time length is 139 to 145 ms.
It is distributed over a width of about ± 3.5 ms. From the fact that the discrimination limit of the phoneme duration is about 10 to 20 ms (see: Hashimoto et al., 7th International Conference on Acoustics, 129-132 (197)
1)), it can be said that there is almost no influence by the difference of the previous vowel V2.

【００１８】図６は後ろの母音が違う場合を示す図であ
り、図７は図６の場合の母音部エネルギー重心点間時間
長の影響を示す図である。上記と同様に図６に示すよう
に、子音１、２、母音１を固定して母音２を、例えば図
７に示すように「baba、babi、babu、babe、babo、 baN
de」と、種々変化させて母音部エネルギー重心点間時間
長が測定される。図７に示すように、後ろの母音Ｖ2 が
違っても、母音部エネルギー重心点間平均時間長の分布
は１４０〜１５４ｍｓ、±約７ｍｓであり、前の母音と
同様、後ろの母音Ｖ3 の違いによる影響もほとんどない
ことが分かる。FIG. 6 is a diagram showing the case where the rear vowel is different, and FIG. 7 is a diagram showing the effect of the time length between the gravities of the vowel parts in the case of FIG. Similarly to the above, as shown in FIG. 6, consonants 1, 2, and vowel 1 are fixed and vowel 2, for example, as shown in FIG. 7, "baba, babi, babu, babe, babo, baN
The time length between the vowel part energy centroids is measured by variously changing “de”. As shown in FIG. 7, even if the rear vowel V2 is different, the distribution of the average time length between the vowel energy centroids is 140 to 154 ms, ± about 7 ms, and the difference of the rear vowel V3 is the same as the previous vowel. It can be seen that there is almost no influence by the influence.

【００１９】図８は前の子音が違う場合を示す図であ
り、図９は図８の母音部エネルギー重心点間時間長の影
響を示す図である。上記と同様に図７に示すように母音
１、２、子音２を固定して子音１を、例えば図９に示す
ように「byaba 、 baba 、 gyaba、…、naba」と、種々
変化させて母音部エネルギー重心点間時間長が測定され
る。図８に示すように、子音Ｃ2 が違う場合でも、母音
部エネルギー重心点間の平均時間は１２１〜１５３ｍ
ｓ、±約１６ｍｓの幅で分布しており、弁別限（１０〜
２０ｍｓ）の域を考えると少ないながらも影響を及ぼし
ていることが分かる。FIG. 8 is a diagram showing the case where the previous consonant is different, and FIG. 9 is a diagram showing the effect of the time length between the gravities of the vowel parts in FIG. Similarly to the above, the vowels 1 and 2 and the consonant 2 are fixed as shown in FIG. 7 and the consonant 1 is variously changed to, for example, "byaba, baba, gyaba,..., Naba" as shown in FIG. The time length between the local energy centroids is measured. As shown in FIG. 8, even when the consonant C2 is different, the average time between the vowel energy centroids is 121 to 153 m.
s, ± 16 ms, and the discrimination limit (10 to
Considering the region of 20 ms), it is understood that the influence is exerted though it is small.

【００２０】図１０は間の子音が違う場合を示す図であ
り、図１１は図１０の場合の母音部エネルギー重心点間
時間長の影響を示す図である。上記と同様に図１０に示
すように母音１、２、子音１を固定して子音２を、例え
ば「bara、bada、bawa、…、bahya 」と、種々変化させ
て母音部エネルギー重心点間時間長が測定される。図１
１に示すように、子音２が分析対象の母音に挟まれて、
違っている場合には、母音部エネルギー重心点間の平均
時間長分布幅は、１１３〜１９５ｍｓ、±約４１ｍｓと
なっており、明らかに弁別限（１０〜２０ｍｓ）を越え
ている。FIG. 10 is a diagram showing the case where the consonants are different, and FIG. 11 is a diagram showing the influence of the time length between the vowel part energy centroids in the case of FIG. Similarly to the above, as shown in FIG. 10, the vowels 1, 2, and the consonant 1 are fixed, and the consonant 2 is variously changed to, for example, "bara, bada, bawa,..., Bahya". The length is measured. FIG.
As shown in FIG. 1, consonant 2 is sandwiched between vowels to be analyzed,
If they are different, the average time length distribution width between the vowel energy centroids is 113 to 195 ms, ± about 41 ms, which clearly exceeds the discrimination limit (10 to 20 ms).

【００２１】以上の結果から前の母音の種類の違いによ
る影響（図５）、後ろの母音の種類の違いよる影響（図
７）ではともに平均時間長は１４０ｍｓ近辺に集中して
いる。母音の種類ごとで若干のばらつきがみられるもの
の、人間が弁別限つまり約１０〜２０ｍｓの音の長さの
変化を認知できないことを考慮すると、母音の種類によ
る影響が無視できるものであると結論できる。それに対
し、間の子音の違いによる影響（図１１）は約１１０〜
１９０ｍｓと幅広く分布し、種類の違いによる影響の大
きさが確認できた。From the above results, the average time length is concentrated around 140 ms in both the effect of the difference in the type of the preceding vowel (FIG. 5) and the effect of the difference in the type of the rear vowel (FIG. 7). Although there is a slight variation among vowel types, it is concluded that the effects of vowel types are negligible considering the discrimination limit, that is, the fact that a human cannot perceive a change in sound length of about 10 to 20 ms. it can. On the other hand, the effect of the difference in consonants between (FIG. 11) is about 110-110.
It was widely distributed at 190 ms, and the magnitude of the effect due to the difference in type was confirmed.

【００２２】図１２は母音部エネルギー重心点間の時間
長導出方法を説明する図である。本図（ａ）に示すよう
なそれぞれ子音部及び母音部からなる第ｉモーラ、第
（ｉ＋１）モーラの音声波形において、本図（ｂ）に示
すような音声波形をエネルギーで表した音声エネルギー
Ｅに対して、本図（ｃ）に示すような音声エネルギーの
和（積分）を取り、例えば、第ｉモーラ及び第（ｉ＋
１）モーラの全エネルギーの和をＥtotal （ｉ）、Ｅto
tal （ｉ＋１）とし、第ｉモーラ及び第（ｉ＋１）モー
ラの１／２Ｅtotal （ｉ）、１／２Ｅtotal （ｉ＋１）
になる時間をそれぞれ第ｉモーラ及び第（ｉ＋１）モー
ラの母音部エネルギーの重心点とし、これらの間隔を母
音部エネルギー重心点間の時間長とする。FIG. 12 is a diagram for explaining a method for deriving the time length between the vowel part energy centroids. In the i-th mora and (i + 1) -th mora sound waveforms each having a consonant part and a vowel part as shown in FIG. 6A, a sound energy E representing the sound waveform as shown in FIG. Then, the sum (integration) of the voice energy as shown in FIG.
1) Sum the total energy of Mora as Etotal (i), Eto
tal (i + 1), ＥEtotal (i), ＥEtotal (i + 1) of the i-th and (i + 1) -th mora
Is defined as the center of gravity of the vowel energy of the i-th mora and the (i + 1) -th mora, and the interval between them is defined as the time length between the vowel energy centroids.

【００２３】以上の方法で導出した各種の子音に対する
母音部エネルギー重心点間の時間長を下記表に示す。以上では日本語音声のリズムを、母音部エネルギー重心
点間時間長（Ｄcegv）として定義し、規則化したが、Ｄ
cegv内の母音長・子音長配分に関する規則が定まってお
らず、従来は個々の母音長・子音長が規則的に決められ
なかった。以下のＤcegv内の母音長・子音長の分析結果
に基づき、これを規則として図２のリズム規則に格納し
Ｄcegv内母音長・子音長設定部２３で個々の母音長・子
音長が設定される。The table below shows the time length between the vowel energy centroids for various consonants derived by the above method. In the above, the rhythm of Japanese speech is defined as the vowel part energy center-of-gravity point time length (Dcegv) and is regularized.
Rules for vowel length / consonant length distribution in cegv have not been determined, and conventionally, individual vowel lengths / consonant lengths could not be determined regularly. Based on the analysis results of the vowel length and consonant length in Dcegv below, these are stored in the rhythm rule of FIG. 2 as rules, and individual vowel lengths and consonant lengths are set in the Dcegv inner vowel length / consonant length setting unit 23. .

【００２４】図１３はＤcegv内時間長設定方法を説明す
る図である。本図に示すように、Ｄcegv内を先行母音後
半部ｄｖｐ、子音部ｄｃ、後続母音前半部ｄｖｓの３分
割して考える。発話速度を変化させることにより得られ
る様々なＤcegv値に対して、ｄｖｐ、ｄｃ、ｄｖｓそれ
ぞれの値を求めた。前述のように音声資料には、７モー
ラの無意味単語部を含む「それは、“こ-/bx/-/kx/-め
んかい”です．」（Ｘは任意の母音）という一文を用
い、無意味単語中の第２、第３のモーラ間について分析
した。発声者は女性一名である。FIG. 13 is a diagram for explaining a method of setting the time length in Dcegv. As shown in the figure, the inside of Dcegv is divided into three parts: a preceding vowel second half dvp, a consonant part dc, and a subsequent vowel first half dvs. The values of dvp, dc, and dvs were obtained for various Dcegv values obtained by changing the utterance speed. As described above, the audio material contains a 7-mora nonsense word part, which is a sentence "It is"-/ bx /-/ kx / -Menkai "(X is an arbitrary vowel). The analysis was performed between the second and third mora in the meaningless word. The speaker is a single woman.

【００２５】図１４は一例として、“こ-/ba/-/ka/-め
んかい”の分析結果であるＤcegvとｄｖｐ、ｄｃ、ｄｖ
ｓを示すグラフである。図中は横軸は、発話時間を変化
させることにより得られたＤcegv値〔ｍｓ〕、縦軸はそ
の時のＤcegv値に対するｄｖｐ（○）、ｄｃ（▲）、ｄ
ｖｓ（◇）値〔ｍｓ〕である。図１４からは、ｄｖｐ、
ｄｃ、ｄｖｓがＤcegvをパラメータとする下記一次式で
近似できることが分かる。FIG. 14 shows, as an example, Dcegv and dvp, dc, dv, which are the analysis results of “this- / ba /-/ ka /-
It is a graph which shows s. In the figure, the horizontal axis represents the Dcegv value [ms] obtained by changing the utterance time, and the vertical axis represents dvp (、), dc (▲), d for the Dcegv value at that time.
vs (◇) value [ms]. From FIG. 14, dvp,
It can be seen that dc and dvs can be approximated by the following linear equation using Dcegv as a parameter.

【００２６】[0026]

【数１】 (Equation 1)

【００２７】ここで、Ａvp、Ｂvpは先行母音に関する係
数、Ａc、Ｂcは子音に関する係数、Ａvs、Ｂvsは後続母
音に関する係数である。他の母音系列に関しても同様に
分析した結果、Ｄcegvの一次関数で弁別限以内に近似で
きることがわかった。母音の組み合わせそれぞれに対し
て求められた子音/k/における各係数の値を下記に示
す。この式は以下のように変形できる。Here, Avp and Bvp are coefficients related to the preceding vowel, Ac and Bc are coefficients related to the consonant, and Avs and Bvs are coefficients related to the succeeding vowel. As a result of the same analysis for other vowel sequences, it was found that the linear function of Dcegv can be approximated within the discrimination limit. The value of each coefficient in the consonant / k / obtained for each combination of vowels is shown below. This equation can be modified as follows.

【００２８】[0028]

【数５】 (Equation 5)

【００２９】[0029]

【表１】 [Table 1]

【００３０】ところで、任意のＶＣＶ（母音／子音／母
音）について、先行母音と後続母音との組み合わせでは
２５種類の係数を備えなければならず、記憶すべきデー
タが多くなるだけでなく処理が煩雑となる。そこで以下
にこの係数の使用の簡略化を行うための母音のグループ
化の一例として二つのグループ化の場合について説明す
る。By the way, for an arbitrary VCV (vowel / consonant / vowel), the combination of the preceding vowel and the succeeding vowel must have 25 types of coefficients, which not only increases the data to be stored but also complicates the processing. Becomes Therefore, a case of two groupings will be described below as an example of vowel grouping for simplifying the use of the coefficient.

【００３１】この係数を使用し、発話速度が１１モーラ
／秒、７．７モーラ／秒、６モーラ／秒にあたるＤcegv
値（１００ｍｓ、１６０ｍｓ、２００ｍｓ）につき、ｄ
ｖｐ、ｄｃ、ｄｖｓを求めた。特にＤcegvが１００ｍ
ｓ，１６０ｍｓの結果を以下に示す。Using these coefficients, Dcegv corresponding to speech speeds of 11 mora / sec, 7.7 mora / sec, and 6 mora / sec.
Per value (100ms, 160ms, 200ms)
vp, dc and dvs were determined. Especially Dcegv is 100m
The results for s, 160 ms are shown below.

【００３２】[0032]

【表２】 [Table 2]

【００３３】表中の『差』は、各行ごとの共通先行母音
又は後続母音の時間長の最大値と最小値の差をいう。
『中央値』は各行の異なる先行母音又は後続母音の時間
長に関し（各行の最大値＋最小値）／２の値である。先
行母音が/a/の場合、後続母音の種類が違ってもｄｖｐ
値にはほとんど差（１０〜２０ｍｓ以下）がない。同様
に、後続母音が/a/の場合、先行母音の種類が違っても
ｄｖｓ値にはほとんど差がない。これにより、第１に母
音部の時間長は、隣接する母音の種類に関わりなく、自
身の種類だけで決定できると言える。"Difference" in the table means the difference between the maximum value and the minimum value of the time length of the common preceding vowel or the succeeding vowel for each row.
The “median value” is a value of (maximum value + minimum value of each row) / 2 regarding the time length of a different preceding vowel or succeeding vowel in each row. If the preceding vowel is / a /, dvp even if the type of the following vowel is different
There is almost no difference (less than 10-20 ms) in the values. Similarly, when the succeeding vowel is / a /, there is almost no difference in the dvs value even if the type of the preceding vowel is different. Thus, first, it can be said that the time length of a vowel part can be determined only by its own type, regardless of the type of an adjacent vowel.

【００３４】つぎに、Ｄcegv内のｄｖｐ、ｄｃ、ｄｖｓ
の設定規則を次のように定める。すなわち、５母音の種
類/a//i//u//e//o/により時間長が異なるが、前記定義
による異なる５母音長の中央値を境にし、５母音を・中央値より時間長が常に長いもの−/a//e/、・中央値より時間長が常に短いもの−/i//u//o/、の２グループに分けることができる。したがって、第２
には継続時間長の弁別限（１０〜２０ｍｓ）を考慮する
と、５母音すべての組み合わせについて分析する必要が
なく、/a/、/i/ついて分析し、得られた結果を用いれば
よいことが分かる。すなわち、Ｄcegvが子音の種類によ
り決定される規則であるのに対し、Ｄcegv内部は、第２
には、母音の２グループ（継続時間長が長いものと短い
もの）で決まる規則である。なお、他の子音について行
った分析結果からも、同様の傾向が見られた。Next, dvp, dc, dvs in Dcegv
The setting rule of is defined as follows. That is, although the time length differs depending on the type of the five vowels, / a // i // u // e // o / Those whose length is always longer-/ a // e /, and those whose time length is always shorter than the median-/ i // u // o /, can be divided into two groups. Therefore, the second
Considering the discrimination limit of the duration (10 to 20 ms), it is not necessary to analyze all combinations of the five vowels, and it is sufficient to analyze / a /, / i / and use the obtained result. I understand. That is, while Dcegv is a rule determined by the type of consonant, the inside of Dcegv is the second
Is a rule determined by two groups of vowels (long and short durations). Note that a similar tendency was observed from the analysis results performed for other consonants.

【００３５】以上の考察から、つまり、任意のＶＣＶ
（母音／子音／母音）のＤcegv〔ｍｓ〕について以下の
式のように定めることができる。例えば、/x-k-x/とい
う音韻系列については、表１中の/aka/、/ika/、/aki/
の各係数を使用し、以下の関係式から、ｄｖｐ、ｄｃ、
ｄｖｓが求められる。From the above considerations, that is, an arbitrary VCV
Dcegv [ms] of (vowel / consonant / vowel) can be determined by the following equation. For example, for the phonological sequence / xkx /, / aka /, / ika /, / aki /
Using the following coefficients, dvp, dc,
dvs is required.

【００３６】[0036]

【数３】 (Equation 3)

【００３７】本設定規則による継続時間長を実測値と比
較したところ、弁別限以内におさまり、精度良く設定で
きることが示された。以上母音の２グループ化について
説明したが、母音のグループ化は要求される精度と簡略
化の程度とのバランスで決定され、すなわち使用する者
が求める精度に応じて都合のよい複数のグループを選択
することができる。When the duration time according to the present setting rule was compared with the actually measured value, it was shown that the duration was within the discrimination limit and could be set with high accuracy. The vowel grouping has been described above.
It is determined by the balance between the degree of reduction, that is, to select multiple groups convenient depending on the accuracy with which person using seek.

【００３８】[0038]

【発明の効果】以上説明したように本発明によれば、２
つのモーラのエネルギーの時間積分の重心点間である母
音部エネルギー重心点位置間の時間長によりモーラ間隔
を求め、２つのモーラの間の子音、発話速度をパラメー
タとしてモーラ間隔を決定し、母音部エネルギー重心点
位置間の時間長と子音長とをパラメータとしてモーラを
構成する母音長、子音長を決定して合成すべき文章の音
韻継続時間をモーラ間隔で調整するので、自然な日本語
合成音が簡単に得られるようになる。As described above, according to the present invention, 2
The vowel energy, which is between the centroids of the time integrals of the energies of two mora, is used to determine the mora interval from the time length between the centroid positions, and the mora interval is determined using the consonant and speech speed between the two mora as parameters, and the vowel energy is determined. The vowel length and consonant length that compose the mora are determined by using the time length between concentric points and the consonant length as parameters, and the phonological duration of the sentence to be synthesized is adjusted at the mora interval. It will be easy to get.

[Brief description of the drawings]

【図１】本発明の実施例に係る日本語音声合成システム
の構成を示す図である。FIG. 1 is a diagram showing a configuration of a Japanese speech synthesis system according to an embodiment of the present invention.

【図２】図１の音韻継続時間長生成部２の構成を示すブ
ロック図である。FIG. 2 is a block diagram showing a configuration of a phoneme duration generation unit 2 of FIG. 1;

【図３】母音部エネルギー重心点時間長に着目した時間
長制御を示す図である。FIG. 3 is a diagram illustrating time length control focusing on a vowel energy barycenter time length.

【図４】前の母音が違う場合を示す図である。FIG. 4 is a diagram showing a case where a previous vowel is different.

【図５】図４の場合の母音部エネルギー重心点間時間長
の影響を示す図である。FIG. 5 is a diagram showing the influence of the time length between vowel energy center points in the case of FIG. 4;

【図６】後ろの母音が違う場合を示す図である。FIG. 6 is a diagram illustrating a case where a back vowel is different.

【図７】図６の場合の母音部エネルギー重心点間時間長
の影響を示す図である。FIG. 7 is a diagram showing the effect of the time length between vowel energy center points in the case of FIG. 6;

【図８】前の子音が違う場合を示す図である。FIG. 8 is a diagram showing a case where a previous consonant is different.

【図９】図８の母音部エネルギー重心点間時間長の影響
を示す図である。9 is a diagram showing the influence of the time length between vowel energy center points in FIG. 8;

【図１０】間の子音が違う場合を示す図である。FIG. 10 is a diagram illustrating a case where consonants are different.

【図１１】図１０の場合の母音部エネルギー重心点間時
間長の影響を示す図である。11 is a diagram showing the influence of the time length between vowel energy center points in the case of FIG. 10;

【図１２】母音部エネルギー重心点間の時間長導出方法
を説明する図である。FIG. 12 is a diagram illustrating a method for deriving a time length between vowel energy center points.

【図１３】Ｄcegv内時間長設定方法を説明する図であ
る。FIG. 13 is a diagram illustrating a method for setting a time length in Dcegv.

【図１４】一例として、“こ-/ba/-/ka/-めんかい”の
分析結果であるＤcegvとｄｖｐ、ｄｃ、ｄｖｓを示すグ
ラフである。FIG. 14 is a graph showing, as an example, Dcegv and dvp, dc, and dvs, which are analysis results of “this- / ba /-/ ka / -noodle”.

【図１５】従来の音韻継続時間長の制御を説明する図で
ある。FIG. 15 is a diagram for explaining conventional control of phoneme duration.

[Explanation of symbols]

１…テキスト解析部２…音韻継続時間長生成部３…音源振幅パタン生成部４…ピッチパタン生成部５…音源生成部６…スクペクトルパタン生成部７…音声合成器２１…ＶＣＶ区画区分部２２…母音部エネルギー重心点間時間長設定部２３…Ｄcegv内母音長・子音長設定部 DESCRIPTION OF SYMBOLS 1 ... Text analysis part 2 ... Phoneme duration length generation part 3 ... Sound source amplitude pattern generation part 4 ... Pitch pattern generation part 5 ... Sound source generation part 6 ... Spectrum pattern generation part 7 ... Voice synthesizer 21 ... VCV division division part 22 … Vowel part energy center-of-gravity time length setting part 23… Dcegv internal vowel length / consonant length setting part

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平６−222793（ＪＰ，Ａ) 特開平６−266391（ＪＰ，Ａ) 加藤雅代、橋本新一郎，母音エネルギー重心点に着目した日本語のリズム規則について，電子情報通信学会技術研究報告，日本，電子情報通信学会，ＶＯＬ92 Ｎｏ．35 ＳＰ92 ７−12，33−40 加藤雅代、橋本新一郎，母音エネルギー重心点に着目した日本語のリズム規則について，日本音響学会研究発表会講演論文集，日本，日本音響学会，ＶＯＬ. 1991 秋期，249−250 加藤雅代、橋本新一郎，母音エネルギー重心点に着目した日本語のリズム規則の拡張，日本音響学会研究発表会講演論文集，日本，日本音響学会，ＶＯＬ. 1992 春期，239−240 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 13/08 G10L 13/06 ────────────────────────────────────────────────── ─── Continuation of the front page (56) References JP-A-6-222793 (JP, A) JP-A-6-266391 (JP, A) Masayo Kato, Shinichiro Hashimoto, Japanese focusing on vowel energy center of gravity Rhythm rule of IEICE Technical Report, IEICE, VOL92 No. 35 SP92 7-12, 33-40 Masayo Kato, Shinichiro Hashimoto, Japanese rhythm rules focusing on vowel energy center of gravity, Proceedings of the Acoustical Society of Japan Conference, Japan, The Acoustical Society of Japan, Vol. 1991 Fall , 249-250 Masayo Kato, Shinichiro Hashimoto, Extension of Japanese rhythm rule focusing on vowel energy center of gravity, Proceedings of the Annual Conference of the Acoustical Society of Japan, Japan, Acoustical Society of Japan, Vol. 1992 Spring, 239-240 (58) Field surveyed (Int. Cl. ⁷ , DB name) G10L 13/08 G10L 13/06

Claims

(57) [Claims]

1. A Japanese speech synthesis system for synthesizing a mora, which is a unit of prosody corresponding to a single syllable, wherein the second half time length of a preceding vowel is dvp, the time length of a consonant part between the two mora is dc, The first half time length of the succeeding vowel between the two mora is dvs, the constants determined by the preceding vowel are Avp and Bvp, the constants determined by the consonants are Ac and Bc, and further determined by the succeeding vowel. When the constants are represented by Avs and Bvs, a time interval Dcegv between the energy centroids of the respective vowel parts in two adjacent moras is expressed as follows. By applying the following equation, the second half time length dv of the preceding vowel part
p, a means for determining the duration dc of the consonant part and the first half duration dvs of the succeeding vowel part.

2. The second half time length of the preceding vowel is dvp, the time length of the consonant part between the two mora is dc, the first half time length of the succeeding vowel between the two mora is dvs, and When the determined constants are represented by Avp and Bvp, and the constants determined by the succeeding vowels are represented by Avs and Bvs, the time interval Dsegv between the energy centroid points of each vowel part in two adjacent mora is expressed as follows. Equation 2 By applying the following equation, the second half time length dv of the preceding vowel part
p, a means for determining a time length dc of the consonant part and a first half time length dvs of the succeeding vowel part.

3. The Japanese speech synthesis system according to claim 1, wherein the constant Avp determined by the preceding vowel and the constant Bvp determined by the succeeding vowel include a plurality of vowel types. A Japanese speech synthesis system characterized by dividing and using a constant value in each group.