JPH09179577A

JPH09179577A - Rhythm energy control method for voice synthesis

Info

Publication number: JPH09179577A
Application number: JP7334420A
Authority: JP
Inventors: Shigeru Kashiwagi; 繁柏木
Original assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Current assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Priority date: 1995-12-22
Filing date: 1995-12-22
Publication date: 1997-07-11

Abstract

PROBLEM TO BE SOLVED: To prevent voice quality from deteriorating. SOLUTION: A judgement part of a step S4 judges whether or not a rhythm whose fundamental frequency is determined is a consonant and when it is judged that the rhythm is the consonant, it is supplied to a consonant energy control part of a step S5. The consonant energy control part controls the energy of the consonant by making use of data in a table TB for consonant energy. When it is judged that the rhythm is a vowel, it is supplied to a vowel energy control part of a step S7. The vowel energy control part controls the vowel by a quantization class-I system. Then the consonant and vowel are inputted to a rhythm continuance control part of the said step S6, where after rhythm continuance (length of voice) is controlled by using a rhythm continuance quantization class-I coefficient data base DB3, they are passed through a waveform editing part of a step S8; and a voice is synthesized and inputted to a synthesized voice file of a step S9.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は、任意日本語規則
音声合成における音韻エネルギ制御方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a phoneme energy control method in arbitrary Japanese rule speech synthesis.

【０００２】[0002]

【従来の技術】図４は規則音声合成方式の概略構成を示
すブロック構成図で、この図４において、テキスト入力
部１に入力されている漢字かな混じり文（文章）を日本
語処理部２に与える。日本語処理部２では、与えられた
上記文章を内蔵の日本語辞書を参照しながら音韻記号列
に変換する。この音韻記号列は、ある音韻における音韻
パラメータ（基本周波数→音の高さ、振幅→音の大き
さ、音韻継続時間長→音の長さ）を決定する音韻制御部
３に入力され、ここで、音韻記号列に基づき韻律パラメ
ータが生成される。韻律パラメータは図示しない各基本
周波数、振幅、音韻継続時間長のデータベースに格納さ
れているデータに基づき、各音韻毎に目標値が決定され
た後に、音声合成部４に与えられる。音声合成部４では
図示しないデータベースを参照して所望の韻律パラメー
タを実現しながら音声を得る。2. Description of the Related Art FIG. 4 is a block diagram showing a schematic configuration of a ruled speech synthesis system. In FIG. 4, a kanji / kana mixed sentence (sentence) input to a text input unit 1 is sent to a Japanese processing unit 2. give. The Japanese processing unit 2 converts the given sentence into a phoneme symbol string with reference to a built-in Japanese dictionary. This phoneme symbol string is input to the phoneme control unit 3 which determines phoneme parameters (fundamental frequency → pitch, pitch → amplitude → phone size, phoneme duration length → phone length) of a phoneme. , Prosodic parameters are generated based on the phoneme symbol string. The prosody parameter is given to the speech synthesizer 4 after the target value is determined for each phoneme based on the data stored in the database of each fundamental frequency, amplitude, and phoneme duration that are not shown. The voice synthesizer 4 refers to a database (not shown) to obtain a voice while realizing a desired prosody parameter.

【０００３】上記のように構成された規則音声合成方式
において、音韻制御部３で音韻パラメータを決定する場
合には、その音韻の先行音韻、後続音韻、アクセント環
境、音韻種別等の要因を考慮しなくてはならない。逆に
言えば、これらのすべての要因により、韻律パラメータ
を説明することができる。In the regular speech synthesis system configured as described above, when the phoneme control unit 3 determines the phoneme parameters, factors such as the preceding phoneme, subsequent phoneme, accent environment, phoneme type, etc. of the phoneme are considered. Must-have. Conversely, all these factors can explain the prosodic parameters.

【０００４】音韻エネルギ制御において、発声したい入
力音韻記号列から、ある音韻のエネルギ決定に至るまで
の手法を述べる。これには、多変量解析のひとつである
数量化Ｉ類を用いる手法が知られている。この手法は、
質的説明要因である質的変量（音韻環境）に基づいて目
的の量的変量（エネルギ）を算出するもので、以下に述
べる（１）式、（２）式により定式化される。次式にお
いて、ｉ番目のデータの質的変量の要因アイテムをｊ、
その属するカテゴリ（各アイテムの分類）をｋ、そのカ
テゴリ数量（カテゴリに付与する数量「係数」）をａ_jk
とするとき、In the phoneme energy control, a method from the input phoneme symbol string to be uttered to the energy determination of a certain phoneme will be described. For this, a method using quantification type I which is one of multivariate analysis is known. This technique is
A target quantitative variable (energy) is calculated based on a qualitative variable (phonological environment) that is a qualitative explanation factor, and is formulated by the following equations (1) and (2). In the following equation, the factor item of the qualitative variable of the i-th data is j,
The category (classification of each item) to which it belongs is k, and the category quantity (quantity “coefficient” given to the category) is a _jk
When

【０００５】[0005]

【数１】 [Equation 1]

【０００６】ここで、δは次のように定義される変数で
ある。Here, δ is a variable defined as follows.

【０００７】[0007]

【数２】 [Equation 2]

【０００８】量的変量の値｛ｙ_i｝を最小二乗法で予測
するため、次の（２）式を満たすようにカテゴリ数量
｛ａ_jk｝を定める。In order to predict the value {y _i } of the quantitative variable by the least squares method, the category quantity {a _jk } is determined so as to satisfy the following expression (2).

【０００９】[0009]

【数３】 (Equation 3)

【００１０】上記手法を用いてカテゴリ数量を求めるに
は、次に述べるエネルギデータベースの（ｂ）から
（ｆ）の質的変量を入力し、（ａ）の基本周波数値を教
師信号とすることにより求まる。In order to obtain the category quantity using the above method, the qualitative variables (b) to (f) of the energy database described below are input, and the fundamental frequency value of (a) is used as the teacher signal. I want it.

【００１１】エネルギデータベース：データベース内の
１データは、音韻記号列のある音韻でのエネルギとそれ
を説明するための環境データで構成されており、次の
（ａ）から（ｆ）のようになっている。Energy database: One data in the database is composed of energy in a phoneme with a phoneme symbol string and environmental data for explaining it, and is as shown in (a) to (f) below. ing.

【００１２】（ａ）基本周波数値（ｂ）句内位置（ｃ）文内位置（ｄ）前置音韻種別（ｅ）当該音韻種別（ｆ）後置音韻種別そして、制御時には入力音韻記号列から上記データベー
スの（ｂ）〜（ｆ）の質的変量を導出し、カテゴリ数量
から成り立つ線形式に入力することにより、各音韻エネ
ルギを推定することができる。この制御手法は目的変数
が量的変量、説明変数が質的変量のものを対象としてい
るので、その制御対象は、エネルギに止まらず、基本周
波数、音韻継続時間長にも適用できる。(A) Fundamental frequency value (b) In-phrase position (c) In-sentence position (d) Prefix phoneme type (e) Phoneme type (f) Postfix phoneme type Each phonological energy can be estimated by deriving the qualitative variables (b) to (f) in the above database and inputting them into a linear form consisting of category quantities. Since this control method targets the quantitative variables as the objective variables and the qualitative variables as the explanatory variables, the controlled object can be applied not only to the energy but also to the fundamental frequency and the phoneme duration.

【００１３】[0013]

【発明が解決しようとする課題】前述した音声合成方式
により得られた合成音声の良否の尺度のうちのひとつで
ある人間の発声する肉声らしさは、韻律パラメータによ
って左右されるので、この人間らしさを実現するため
に、合成音声の韻律パラメータは、人間の発声する肉声
の韻律パターンにより近いものが良い。従って、数量化
Ｉ類モデルで学習される韻律パラメータは、人間が自然
に発声したときの肉声を分析して得られたものを使用し
ている。しかし、分析手法がいまだ確立されていないた
め、分析に失敗するパラメータが多々ある。特に、子音
の韻律パラメータに関して顕著に現れるので、学習デー
タには、肉声の韻律パターンではないものも含まれてし
まう可能性を持っている。[Problem to be Solved by the Invention] Human voice utterance, which is one of the measures of the quality of synthesized voice obtained by the above-mentioned voice synthesizing method, is influenced by a prosody parameter. In order to realize it, it is preferable that the prosody parameter of the synthetic speech is closer to the prosody pattern of the real voice uttered by a human. Therefore, as the prosody parameters learned by the quantified type I model, those obtained by analyzing the real voice when a human naturally speaks are used. However, there are many parameters that fail the analysis because the analysis method has not been established yet. In particular, the learning data has a possibility of including a pattern that is not a prosodic pattern of a real voice, because the learning data remarkably appears with respect to the prosody parameter.

【００１４】ところで、数量化Ｉ類モデルでは、各カテ
ゴリの独立性を仮定しており、また、線形式の係数の数
は、各説明変数のカテゴリ数の総和となり、非常に少な
いので、モデルの推定精度はその学習データの善し悪し
で決定される。ここでいう善し悪しは、説明変数の目的
変数に対する偏相関係数がある程度、高い値を取り、説
明変数が目的変数を十分に説明しているということを仮
定したとき、各カテゴリに存在するデータがどの程度、
正確に分布しているか、ということである。By the way, in the quantified type I model, the independence of each category is assumed, and the number of linear coefficients is the sum of the number of categories of each explanatory variable, which is very small. The estimation accuracy is determined by the quality of the learning data. The good or bad here is that, assuming that the partial correlation coefficient of the explanatory variable with respect to the objective variable has a high value to some extent, and the explanatory variable sufficiently explains the objective variable, the data existing in each category is How much
Is it accurately distributed?

【００１５】以下に、例文と当該音韻のエネルギを説明
する変数を示す。Below, example sentences and variables for explaining the energy of the phoneme are shown.

【００１６】「もしもし、こちら、通訳電話、国際会議
事務局です。」「／MOSHIMOSHI／KOCHIRA／TU-Y Ａ KUD
EXWA／KOKUSAIKAIGIJIMUKYOKUDESU／」なお、下線Ａ
は当該音韻を示す。[0016] "Hello, here, interpreting phone, is an international conference secretariat.", "/ MOSHIMOSHI / KOCHIRA / TU-Y A KUD
EXWA / KOKUSAIKAIGIJIMUKYOKUDESU / ”In addition, underline A
Indicates the phoneme.

【００１７】（イ）基本周波数 →基本周波数制御部で推定された
値、（ロ）句内位置 →句のはじめの方に位置する、（ハ）文内位置 →文のはじめの方に位置する、（ニ）前置音韻種別 →半母音、鼻音である、（ホ）当該音韻種別 →Ａである、（ヘ）後置音韻種別 →無声破裂子音である。(A) fundamental frequency → value estimated by the fundamental frequency control unit, (b) position in phrase → located at beginning of phrase, (c) position in sentence → located at beginning of sentence , (D) front phoneme type → semivowel or nasal sound, (e) relevant phoneme type → A, (f) postphoneme type → unvoiced plosive consonant.

【００１８】ここで、数量化Ｉ類手法における母音と子
音のエネルギ推定を比較する。母音エネルギデータベー
スと子音エネルギデータベースの元になる音声データ
は、「ＡＴＲ１１５」文発声データを用いた。母音エネ
ルギ推定では６母音に対して、子音エネルギ推定では１
２子音に対して実験を行った。両者の実験条件は当該音
韻種別のカテゴリ数の違い、さらに当該音韻の違いによ
る前置音韻種別、後置音韻種別のカテゴリ数の違いのみ
である。表１、表２に母音エネルギ推定実験と子音エネ
ルギ推定実験結果をそれぞれ示す。Here, the energy estimation of vowels and consonants in the quantified type I method will be compared. As the voice data that is the basis of the vowel energy database and the consonant energy database, “ATR115” sentence utterance data was used. 6 vowels for vowel energy estimation, 1 for consonant energy estimation
Experiments were performed on two consonants. The two experimental conditions are only the difference in the number of categories of the phoneme type, and the difference in the number of categories of the front phoneme type and the postphoneme type due to the phoneme difference. Tables 1 and 2 show the results of the vowel energy estimation experiment and the consonant energy estimation experiment, respectively.

【００１９】[0019]

【表１】 [Table 1]

【００２０】[0020]

【表２】 [Table 2]

【００２１】上記両者の実験結果を比較した場合、平均
二乗誤差は、母音エネルギ推定では、２.１４２ｄＢと
人間の弁別閾程度であるが、子音エネルギ推定では、
４.３７３ｄＢと非常に大きくなっている。この実験か
ら子音エネルギ推定には、数量化Ｉ類手法を使用するこ
とができない問題がある。When comparing the above experimental results, the mean squared error is 2.142 dB in vowel energy estimation, which is about a human discrimination threshold, but in consonant energy estimation,
It is extremely large at 4.373 dB. From this experiment, there is a problem that the quantified type I method cannot be used for consonant energy estimation.

【００２２】この発明は上記の事情に鑑みてなされたも
ので、子音エネルギの推定部分に、テーブル方式を採用
することにより、極端に大きなエネルギ推定を防いで、
音質の劣化を防止するようにした音声合成における音韻
エネルギ制御方法を提供することを課題とする。The present invention has been made in view of the above circumstances. By adopting a table system in the consonant energy estimation part, an extremely large energy estimation can be prevented.
An object of the present invention is to provide a phoneme energy control method in speech synthesis that prevents deterioration of sound quality.

【００２３】[0023]

【課題を解決するための手段】この発明は、上記課題を
達成するために、第１発明は、テキスト入力部からの文
章を日本語処理部で音韻記号列に変換した後、その音韻
記号列を音韻記号列処理部にて処理してから、基本周波
数制御部に入力して音の高さを得た後、当該音韻は子音
か母音かを判断し、子音のときには、子音エネルギ制御
部に入力して子音エネルギ用テーブルからのデータを用
いて子音の音の大きさを制御し、母音のときには、母音
エネルギ制御部に入力して各音韻に対して基本周波数、
句内位置、文内位置、前置音韻、当該音韻、後置音韻か
らなる数量化Ｉ類方式処理を行って母音の音の大きさを
制御した後に、子音と母音を音韻継続時間長制御部に入
力して音の長さを得てから波形編集をして合成音声を得
るようにしたことを特徴とするものである。In order to achieve the above object, the first aspect of the present invention is to convert a sentence from a text input unit into a phoneme symbol string in a Japanese processing unit and then convert the phoneme symbol string. After being processed by the phonological symbol sequence processing unit, and input to the fundamental frequency control unit to obtain the pitch of the sound, it is determined whether the phoneme is a consonant or a vowel, and when it is a consonant, the consonant energy control unit It controls the loudness of the consonant using the data from the consonant energy table, and when it is a vowel, it is input to the vowel energy control unit and the fundamental frequency for each phoneme,
After controlling the loudness of the vowels by performing the quantified type I system processing consisting of the position in the phrase, the position in the sentence, the prephonology, the phoneme concerned, and the postpositional phoneme, the consonants and the vowels are phoneme duration control section. It is characterized in that the waveform is edited by inputting into the input to obtain the sound length and then the synthesized voice is obtained.

【００２４】第２発明は、前記子音エネルギ用テーブル
は前置音韻、当該音韻、後置音韻で分類されていること
を特徴とするものである。A second aspect of the present invention is characterized in that the consonant energy table is classified into a front phoneme, a corresponding phoneme, and a back phoneme.

【００２５】[0025]

【発明の実施の形態】以下この発明の実施の形態を図面
に基づいて説明する。図１はこの発明の実施の形態を示
すエネルギ制御のための子音エネルギ制御方法のフロー
チャートで、図１において、ステップＳ１は音韻記号列
ファイルで、このステップＳ１のファイルから呼び出さ
れた音韻記号列はステップＳ２の音韻記号列処理部で処
理された後、ある音韻の基本周波数を決定するステップ
Ｓ３の基本周波数制御部に入力される。基本周波数制御
部では基本周波数を決定する際に、基本周波数用ＦＮＮ
係数データベースＤＢ１のデータを使用する。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a flowchart of a consonant energy control method for energy control showing an embodiment of the present invention. In FIG. 1, step S1 is a phoneme symbol string file, and the phoneme symbol string called from the file of step S1 is After being processed by the phoneme symbol string processing unit in step S2, it is input to the fundamental frequency control unit in step S3 for determining the fundamental frequency of a certain phoneme. The basic frequency control unit determines the basic frequency when the basic frequency FNN is used.
The data of the coefficient database DB1 is used.

【００２６】基本周波数が決定された当該音韻は子音で
あるか、無いかがステップＳ４の判断部で判断されて、
子音と判断された場合にはステップＳ５の子音エネルギ
制御部に供給される。子音エネルギ制御部では子音エネ
ルギ用テーブルＴＢのデータを利用して子音のエネルギ
（音の大きさ）が制御される。このテーブルＴＢは前置
音韻、当該音韻、後置音韻で分類されたもので、人間が
各音韻にチューニングを施したものである。Whether or not the phoneme for which the fundamental frequency has been determined is a consonant or not is determined by the determination unit in step S4,
If it is determined that the sound is a consonant, it is supplied to the consonant energy control unit in step S5. The consonant energy control section controls the energy of the consonant (sound volume) using the data in the consonant energy table TB. This table TB is categorized by the front phoneme, the phoneme, and the post phoneme, and the human tunes each phoneme.

【００２７】前記ステップＳ４において、当該音韻が母
音と判断されたなら、ステップＳ７の母音エネルギ制御
部に供給される。この母音エネルギ制御部では母音が数
量化Ｉ類方式により制御される。ここで、数量化Ｉ類方
式について述べる。この方式は、予めオフラインで人間
の肉声を分析し、各音韻に対して基本周波数、句内位
置、文内位置、前置音韻、当該音韻、後置音韻（数量化
Ｉ類モデルの説明変数）、さらにエネルギを求めて、こ
れらを学習データとしたものである。この学習データを
数量化Ｉ類モデルに入力して、得られたカテゴリ数量を
保存し、エネルギ用数量化Ｉ類係数データベースＤＢ２
とする。If the phoneme is judged to be a vowel in step S4, it is supplied to the vowel energy control section in step S7. In this vowel energy control unit, vowels are controlled by the quantification type I method. Here, the quantification type I method will be described. In this method, a human voice is analyzed off-line in advance, and a fundamental frequency, a position in a phrase, a position in a sentence, a prepositional phoneme, the phoneme, and a postpositional phoneme (an explanatory variable of a quantified type I model) for each phoneme. Further, the energies are obtained and these are used as learning data. This learning data is input to the quantification type I model, the obtained category quantity is stored, and the energy quantification type I coefficient database DB2 is stored.
And

【００２８】音声合成装置が稼働しているときは、音韻
記号列から推定用説明変数の値を算出して（ステップＳ
２の音韻記号列処理部で処理される）、ステップＳ７の
母音エネルギ制御部に入力して、所望の当該母音のエネ
ルギを得る。When the speech synthesizer is operating, the value of the estimation explanatory variable is calculated from the phoneme symbol string (step S
2) is input to the vowel energy control unit in step S7 to obtain the desired energy of the vowel.

【００２９】一方、当該音韻が子音の場合は、数量化Ｉ
類方式では推定精度が良くないので、前述のようにステ
ップＳ５の子音エネルギ制御部で制御される。On the other hand, if the phoneme is a consonant, the quantification I
Since the estimation accuracy is not good in the similar method, it is controlled by the consonant energy control unit in step S5 as described above.

【００３０】前記ステップＳ５とステップＳ７で得られ
た子音と母音は前記ステップＳ６の音韻継続時間長制御
部に入力され、ここで音韻継続時間長用数量化Ｉ類係数
データベースＤＢ３を用いて音韻継続時間長（音の長
さ）が制御された後、ステップＳ８の波形編集部を経た
後、音声合成されてステップＳ９の合成音声ファイルに
入力される。なお、ステップＳ８の波形編集部では音声
素片データベースＤＢ４の音声素片データとステップＳ
２の音韻記号列処理部からの音韻記号列をステップＳ１
０の素片データ選択部で選択した素片データとを用いて
波形編集を行っている。The consonants and vowels obtained at steps S5 and S7 are input to the phoneme duration control unit at step S6, where the phoneme duration is calculated using the quantified class I coefficient database DB3 for phoneme duration. After the time length (sound length) is controlled, it goes through the waveform editing unit in step S8, is then voice-synthesized, and is input to the synthesized voice file in step S9. In the waveform editing unit of step S8, the speech unit data of the speech unit database DB4 and the step S8 are used.
The phoneme symbol string from the second phoneme symbol string processing unit is processed in step S1.
The waveform is edited using the segment data selected by the segment data selection unit of 0.

【００３１】次に、全音韻のエネルギを数量化Ｉ類方式
で推定させた場合と、子音のエネルギをテーブル方式
で、母音のエネルギを数量化Ｉ類方式で推定させた場合
とを「一週間ばかり（ＩＳＨＵＫＡＸＢＡＫＡＲ
Ｉ）」という文章例を用いた比較結果を図２、図３に示
す。図２、図３共に、横軸には時間、すなわち音韻記号
列を、縦軸にはエネルギ値を付して示した。Next, a case in which the energy of all phonemes is estimated by the quantified type I method and a case in which the energy of consonants is estimated by the table method and the energy of vowels is estimated by the quantified type I method are described as "one week. Only (I SHU KA X BA KA R
2 and 3 show the comparison results using the sentence example "I)". 2 and 3, the horizontal axis represents time, that is, a phoneme symbol string, and the vertical axis represents energy values.

【００３２】特に、図２に示す全音韻のエネルギを数量
化Ｉ類方式の場合の音韻「Ｂ」（図２の丸印で囲ったエ
ネルギ）のエネルギは、前置音韻の「Ｘ」のエネルギに
対して極端に大きな値を取っている。これを合成音声で
聞くと、音韻「Ｂ」が破裂系の有声子音であるため、非
常に違和感のある音に聞こえる。これに対して図３に示
す子音のエネルギをテーブル方式で制御し、母音のエネ
ルギを数量化Ｉ類方式で制御した場合には、前置音韻
「Ｘ」から当該音韻「Ｂ」（図３の丸印で囲ったエネル
ギ）へ、エネルギが比較的スムーズに移行しているた
め、合成音声にした場合でも違和感のない滑らかな音と
して聞こえる。このように、子音のエネルギをテーブル
方式で、母音のエネルギを数量化Ｉ類方式で推定させた
方が音質の劣化を防止することができる。In particular, the energy of the phoneme "B" (encircled in FIG. 2) in the case of the quantification type I method of the energy of all phonemes shown in FIG. 2 is the energy of the "X" of the pre-phoneme. Has taken an extremely large value for. When this is heard with a synthetic voice, since the phonological "B" is a voiced consonant of a burst type, it sounds very strange. On the other hand, when the energy of the consonant shown in FIG. 3 is controlled by the table method and the energy of the vowel is controlled by the quantified type I method, the phoneme “B” (from FIG. Since the energy is relatively smoothly transferred to the energy surrounded by the circle), it sounds as a smooth sound with no discomfort even in the case of synthetic speech. In this way, it is possible to prevent the sound quality from being deteriorated by estimating the energy of the consonant by the table method and the energy of the vowel by the quantified type I method.

【００３３】[0033]

【発明の効果】以上述べたように、この発明によれば、
子音エネルギの推定部分にテーブル方式を採用すること
により、極端に大きなエネルギ推定を防いで音質劣化を
防止することができるようにした。また、子音エネルギ
テーブルは、前置音韻、当該音韻、後置音韻の３つのパ
ラメータのみで分類されるので、容易に作成できる利点
がある。さらに、子音ならテーブル方式、母音なら数量
化Ｉ類方式という簡易なで、システムを構築することが
できる。この他、母音部に数量化Ｉ類方式を用いている
ので、複数のエネルギ用数量化Ｉ類係数データ（複数の
人間の学習データを取得して得られる）を持つことによ
り、複数の人間の声に対応したエネルギ制御が可能とな
る等の利点がある。As described above, according to the present invention,
By adopting the table method for the consonant energy estimation part, it is possible to prevent the estimation of extremely large energy and prevent the sound quality deterioration. Further, since the consonant energy table is classified only by the three parameters of the prefix phoneme, the phoneme, and the suffix phoneme, there is an advantage that it can be easily created. Furthermore, a system can be constructed with a simple system such as a table system for consonants and a quantified type I system for vowels. In addition, since the quantified type I method is used for the vowel part, by having a plurality of energy quantified type I coefficient data (obtained by acquiring learning data of a plurality of humans), There are advantages such as enabling energy control corresponding to the voice.

[Brief description of the drawings]

【図１】この発明の実施の形態を示すフローチャート。FIG. 1 is a flowchart showing an embodiment of the present invention.

【図２】全音韻のエネルギを数量化Ｉ類方式で推定させ
た場合の特性図。FIG. 2 is a characteristic diagram when the energy of all phonemes is estimated by the quantification type I method.

【図３】子音のエネルギをテーブル方式で、母音のエネ
ルギを数量化Ｉ類方式で推定させ場合の特性図。FIG. 3 is a characteristic diagram when the energy of a consonant is estimated by a table method and the energy of a vowel is estimated by a quantification type I method.

【図４】音声合成装置のブロック構成図。FIG. 4 is a block configuration diagram of a speech synthesizer.

【符号の説明】Ｓ１…音韻記号列ファイルＳ２…音韻記号列処理部Ｓ３…基本周波数制御部Ｓ４…子音、母音判断部Ｓ５…子音エネルギ制御部Ｓ６…音韻継続時間長制御部Ｓ７…母音エネルギ制御部Ｓ８…波形編集部Ｓ９…合成音声ファイルＳ１０…素片データ選択部ＴＢ…子音エネルギテーブルＤＢ１…基本周波数用ＦＮＮ係数データベースＤＢ２…エネルギ用数量化Ｉ類係数データベースＤＢ３…音韻継続時間長用数量化Ｉ類係数データベースＤＢ４…音声素片データベース[Description of Codes] S1 ... Phonological symbol sequence file S2 ... Phonological symbol sequence processing unit S3 ... Basic frequency control unit S4 ... Consonant / vowel determination unit S5 ... Consonant energy control unit S6 ... Phonological duration control unit S7 ... Vowel energy control Part S8 ... Waveform editing part S9 ... Synthetic voice file S10 ... Element data selection part TB ... Consonant energy table DB1 ... FNN coefficient database for fundamental frequency DB2 ... Quantification type I coefficient database for energy DB3 ... Quantification for phoneme duration Type I coefficient database DB4 ... Speech element database

Claims

[Claims]

1. A Japanese processing unit converts a sentence from a text input unit into a phoneme symbol string, the phoneme symbol string processing unit processes the phoneme symbol string, and then inputs the phoneme symbol string to a fundamental frequency control unit. After obtaining the pitch of the sound, it is determined whether the phoneme is a consonant or a vowel, and when it is a consonant, it is input to the consonant energy control unit and the volume of the consonant is controlled using the data from the consonant energy table. In the case of a vowel, the vowel energy is input to the vowel energy control unit and, for each phoneme, a quantification I consisting of a fundamental frequency, a position in a phrase, a position in a sentence, a prefix phoneme, the phoneme, and a suffix phoneme I
After controlling the loudness of the vowels by performing the type processing,
A phonological energy control method in speech synthesis, characterized in that a consonant and a vowel are input to a phoneme duration control unit to obtain a sound length and then waveform editing is performed to obtain a synthetic speech.

2. The method of controlling phonological energy in speech synthesis according to claim 1, wherein the consonant energy table is classified into a prephonic phoneme, a corresponding phoneme, and a postural phoneme.