JPH04191898A

JPH04191898A - Text speech synthesizer

Info

Publication number: JPH04191898A
Application number: JP2323764A
Authority: JP
Inventors: Takashi Ishitani; 高志石谷
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1990-11-27
Filing date: 1990-11-27
Publication date: 1992-07-10
Anticipated expiration: 2013-05-20
Also published as: JP2755478B2

Abstract

PURPOSE:To determine the optimum characteristic parameter according to vocal sound environment and utterance speeds without forming complicated rules by using a fuzzy computing means which makes computation to the fuzzy set calculated by 1st, 2nd fuzzy set calculating means. CONSTITUTION:The 1st fuzzy set calculating means 17 calculates the 1st fuzzy set indicating the characteristic parameter influenced by the preceding sound element and the 2nd fuzzy set indicating the characteristic parameter influenced by the succeeding sound element. The 2nd fuzzy set calculating means 18 calculates the fuzzy set indicating the change rate from the target value of the characteristic parameter influenced by the duration time of the sound element. The fuzzy computing means 19 makes fuzzy computation to the fuzzy sets calculated by the 1st, 2nd fuzzy set calculating means 17, 18 and determines the characteristic parameter of the vocal sound. The optimum characteristic parameters are obtd. according to the vocal sound environment and utterance speeds without forming the complicated rules.

Description

【発明の詳細な説明】［産業上の利用分野コ本発明は、任意の文字列から音声を合成するテキスト音
声合成装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a text-to-speech synthesizer that synthesizes speech from arbitrary character strings.

［従来の技術］音韻の特徴パラメータとしてホルマント周波数及びバン
ド幅といった声道パラメータを用いた従来のテキスト音
声合成装置では、あらかじめ自然音声より抽出したホル
マントパラメータをそのまま用いたり、または前後の音
素の影響や発声速度などに応じて所定のルールを適用し
てパラメータを修正することにより音声を合成していた
。[Prior Art] Conventional text-to-speech synthesis devices that use vocal tract parameters such as formant frequency and bandwidth as phoneme feature parameters use formant parameters extracted from natural speech in advance as they are, or use the influence of preceding and preceding phonemes. Speech was synthesized by applying predetermined rules and modifying parameters depending on things like speaking speed.

上述した従来のテキスト音声合成装置のホルマント抽出
を第９図に示す。第９図（Ａ）に示す自然音声からフレ
ームごとに、第９図（Ｂ）に示すホルマント抽出を行う
。FIG. 9 shows formant extraction by the conventional text-to-speech synthesizer described above. Formant extraction as shown in FIG. 9(B) is performed for each frame from the natural speech shown in FIG. 9(A).

次に、各音素ごとにその音素の特徴を最も顕著に表して
いるフレームを選択し、選択されたフレームのホルマン
トをその音素を合成するための代表ホルマントとする。Next, for each phoneme, a frame that most prominently represents the characteristics of that phoneme is selected, and the formant of the selected frame is used as a representative formant for synthesizing that phoneme.

代表ホルマントは、特に母音の場合には非常に重要な値
であり、音声合成装置では、母音の継続時間内で（代表
ホルマントの）同じ値を繰り返すことで音声を合成する
。The representative formant is a very important value, especially in the case of vowels, and a speech synthesizer synthesizes speech by repeating the same value (of the representative formant) within the duration of the vowel.

子音の場合でも鼻子音などは、代表ホルマントを継続時
間だけ繰り返すことで音声を合成することかある。Even in the case of consonants, such as nasal consonants, the voice may be synthesized by repeating the representative formant for the duration.

代表ホルマントによって合成できない場合には、フレー
ムごとに抽出したホルマント値をそのまま用いる。If synthesis cannot be performed using the representative formant, the formant value extracted for each frame is used as is.

また、音素と音素の境界では、接続ルールを用いること
で、先行音素のホルマントから後続音素のホルマントに
向かうホルマント値を計算により求める。Furthermore, at the boundary between phonemes, the formant value from the formant of the preceding phoneme to the formant of the subsequent phoneme is calculated by using the connection rule.

接続ルールとしては、例えば先行音素の代表ホルマント
と後続音素の代表ホルマントの間を直線遷移しているも
のと仮定して直線補間を行なうか、または先行音素の継
続時間内では代表ホルマントを繰り返して後続音素の継
続区間では先行音素の代表ホルマントパラメータからの
直線遷移と考えて直線補間を行なう。Connection rules include, for example, performing linear interpolation on the assumption that there is a straight line transition between the representative formant of the preceding phoneme and the representative formant of the succeeding phoneme, or repeating the representative formant within the duration of the preceding phoneme and connecting it to the representative formant of the subsequent phoneme. In the continuation section of a phoneme, linear interpolation is performed by considering it as a straight line transition from the representative formant parameter of the preceding phoneme.

このようにして合成された合成音声の分析例を第１０図
に示す。第１０図（Ａ）は合成音声の波形、第１０図（
Ｂ）は合成音声のホルマントをそれぞれ示す。FIG. 10 shows an example of analysis of the synthesized speech synthesized in this manner. Figure 10 (A) shows the waveform of the synthesized speech, Figure 10 (
B) shows the formants of the synthesized speech.

「発明が解決しようとする課題二上述した従来の音声合成装置では、合成音声の音質を向
上させるために、先行音素、後続音素などの音韻環境や
発声速度（音素の継続時間）の影響が最小であり各音韻
に対して最適な特徴パラメータを選択する必要があり、
そのために膨大な数のルールを作成して、予め選択した
特徴パラメータの修正を行なわなければならないという
問題点がある。``Problem to be solved by the invention 2 In the conventional speech synthesis device described above, in order to improve the quality of synthesized speech, the influence of the phonological environment such as preceding phonemes and subsequent phonemes, and the speaking rate (duration of phonemes) is minimized. Therefore, it is necessary to select the optimal feature parameters for each phoneme.
Therefore, there is a problem in that a huge number of rules must be created and characteristic parameters selected in advance must be modified.

本発明は、上述した従来の音声合成装置における問題点
に鑑み、複雑なルールを作成することなく、音韻環境や
発声速度に応じて最適な特徴パラメータが得られる音声
合成装置を提供することにある。In view of the above-mentioned problems with conventional speech synthesis devices, an object of the present invention is to provide a speech synthesis device that can obtain optimal feature parameters according to the phonetic environment and speech rate without creating complicated rules. .

［課題を解決するための手段］本発明は、先行音素の影響を受けた特徴パラメータを表
す第１のファジィ集合及び後続音素の影響を受けた特徴
パラメータを表す第２のファジィ集合を算出する第１の
ファジィ集合算出手段と、音素の継続時間の影響を受け
た特徴パラメータのターゲット値からの変化率を表すフ
ァジィ集合を算出する第２のファジィ集合算出手段と、
音韻の特徴パラメータを算出するファジィ演算手段とを
備えており、ファジィ演算手段は、第１のファジィ集合
算出手段及び第２のファジィ集合算出手段により算出さ
れたファジィ集合に対してファジィ演算を行なうように
構成されたテキスト音声合成装置によって達成される。[Means for Solving the Problems] The present invention provides a method for calculating a first fuzzy set representing a feature parameter influenced by a preceding phoneme and a second fuzzy set representing a feature parameter influenced by a subsequent phoneme. a second fuzzy set calculation means for calculating a fuzzy set representing a rate of change from a target value of a feature parameter affected by the duration of a phoneme;
and fuzzy calculation means for calculating phoneme feature parameters, the fuzzy calculation means performing fuzzy calculations on the fuzzy sets calculated by the first fuzzy set calculation means and the second fuzzy set calculation means. This is achieved by a text-to-speech synthesizer configured as follows.

［作用］第１のファジィ集合算出手段は、先行音素の影響を受け
た特徴パラメータを表す第１のファジィ集合と、後続音
素の影響を受けた特徴パラメータを表す第２のファジィ
集合とを算出し、第２のファジィ集合算出手段は、音素
の継続時間の影響を受けた特徴パラメータのターゲット
値からの変化率を表すファジィ集合を算出し、ファジィ
演算手段は、第１および第２のファジィ集合算出手段が
算出したファジィ集合に対してファジィ演算を行って音
韻の特徴パラメータを求める。[Operation] The first fuzzy set calculation means calculates a first fuzzy set representing feature parameters influenced by the preceding phoneme and a second fuzzy set representing feature parameters influenced by the subsequent phoneme. , the second fuzzy set calculation means calculates a fuzzy set representing the rate of change from the target value of the feature parameter affected by the duration of the phoneme, and the fuzzy calculation means calculates the first and second fuzzy set calculation means. A fuzzy operation is performed on the fuzzy set calculated by the means to obtain phoneme feature parameters.

［実施例］以下、図面を参照して本発明のテキスト音声合成装置を
説明する。[Example] Hereinafter, a text-to-speech synthesis device of the present invention will be described with reference to the drawings.

第１図は本発明のテキスト音声合成装置における一実施
例の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of an embodiment of the text-to-speech synthesis apparatus of the present invention.

第１図のテキスト音声合成装置は、文字列解析部１１、
単語辞書１２、規則制御部１３、規則ファイルＩ４、音
声合成器１５、特徴パラメータファイル１６により構成
されている。The text-to-speech synthesizer shown in FIG. 1 includes a character string analysis section 11,
It is composed of a word dictionary 12, a rule control section 13, a rule file I4, a speech synthesizer 15, and a feature parameter file 16.

次に上述の各構成部分を中心に第１図のテキスト音声合
成装置の動作を説明する。Next, the operation of the text-to-speech synthesizer shown in FIG. 1 will be explained focusing on each of the above-mentioned components.

文字列解析部１１は、任意の文字列が入力されるとその
構文解析を行って文字列全体のイントネーションパター
ンを決定する。When an arbitrary character string is input, the character string analysis unit 11 analyzes the syntax of the input character string and determines the intonation pattern of the entire character string.

文字列解析部ｌ）は、更に単語辞書１．２を参照して、
入力された文字列に含まれる単語を検索し、文字列内の
各単語のアクセントおよび音韻系列を決定して文字列の
音韻系列及びアクセントパターンを決定する。The character string analysis unit l) further refers to the word dictionary 1.2,
The words included in the input character string are searched, and the accent and phoneme sequence of each word in the character string are determined to determine the phoneme sequence and accent pattern of the character string.

文字列解析部１１は、このようにして決定した文字列全
体のイントネーション、文字列の音韻系列及びアクセン
トパターンを規則制御部１３に出力する。The character string analysis unit 11 outputs the intonation of the entire character string, the phonetic sequence of the character string, and the accent pattern thus determined to the rule control unit 13.

特徴パラメータファイル１６は、音素の特徴を表すパラ
メータを規則制御部１３に出力する。The feature parameter file 16 outputs parameters representing features of phonemes to the rule control unit 13.

規則ファイル１４は、特徴パラメータファイル１６から
出力される特徴パラメータを接続するための音韻制御規
則と、各韻律を制御するための韻律制御規則とをそれぞ
れ規則制御部１３に出力する。この韻律制御規則の中に
、後述する音素の継続時間りが含まれている。The rule file 14 outputs to the rule control unit 13 phoneme control rules for connecting the feature parameters output from the feature parameter file 16 and prosody control rules for controlling each prosody. This prosodic control rule includes the duration of phonemes, which will be described later.

規則制御部１３は、特徴パラメータファイル１６からの
特徴パラメータと、規則ファイル１４からの音韻制御規
則及び韻律制御規則とを参照して、文字列解析部１１か
ら入力された文字列全体のイントネーションパターン、
文字列の音韻系列及びアクセントパターン、並びに後述
する手順によって修正した特徴パラメータに基づいて、
音声合成に必要なパラメータを生成して音声合成器１５
に出力する。The rule control unit 13 refers to the feature parameters from the feature parameter file 16 and the phonological control rules and prosody control rules from the rule file 14 to determine the intonation pattern of the entire character string input from the character string analysis unit 11,
Based on the phonological sequence and accent pattern of the character string, and the feature parameters modified by the procedure described below,
Speech synthesizer 15 generates parameters necessary for speech synthesis.
Output to.

音声合成器１５は、規則制御部１３から入力された上記
パラメータに基ついて合成音声を生成し、人力された文
字列に対応する規則合成音声を出力するっ規則制御部１３は、上記修正した特徴パラメータを得る
ため、第２図に示すような構成要素を備えている。The speech synthesizer 15 generates synthesized speech based on the parameters inputted from the rule control section 13, and outputs the rule synthesized speech corresponding to the human-generated character string. In order to obtain the parameters, components as shown in FIG. 2 are provided.

以下、第２図を参照して規則制御部１３の構成を説明す
る。The configuration of the rule control section 13 will be explained below with reference to FIG.

規則制御部１３は、先行音素の影響を受けた特徴パラメ
ータを表す第１のファジィ集合及び後続音素の影響を受
けた特徴パラメータを表す第２のファジィ集合を算出す
る第１のファジィ集合算出宇部１７と、音素の継続時間
の影響を受けた特徴パラメータのターゲット値からの変
化率を表すファジィ集合を算ａする第２のファジィ集合
算出部１８と、第１のファジィ集合算出部１７及び第２
のファジィ集合算出部１８が算出したファジィ集合に対
してフ゛　アジイ演算を行って特徴パラメータを求める
ファジィ演算部１９とを備えている。The rule control unit 13 includes a first fuzzy set calculation unit 17 that calculates a first fuzzy set representing feature parameters influenced by the preceding phoneme and a second fuzzy set representing feature parameters influenced by the subsequent phoneme. , a second fuzzy set calculation unit 18 that calculates a fuzzy set representing the rate of change from the target value of the feature parameter affected by the duration of the phoneme, a first fuzzy set calculation unit 17 and a second fuzzy set calculation unit 17
The fuzzy calculation unit 19 performs fuzzy calculations on the fuzzy set calculated by the fuzzy set calculation unit 18 to obtain feature parameters.

策１のファジィ算出部１７は、先行音素ごとの特徴パラ
メータの出現頻度に基づいて第１のファジィ集合を表す
メン／くシップ関数を算出し、後続音素ごとの特徴パラ
メータの出現頻度に基づいて第２のファジィ集合を表す
メンバシップ関数を算出するメンバシップ関数算出部２
Ｇを備えている。The fuzzy calculation unit 17 of plan 1 calculates the membership function representing the first fuzzy set based on the appearance frequency of the feature parameter for each preceding phoneme, and calculates the membership function representing the first fuzzy set based on the appearance frequency of the feature parameter for each subsequent phoneme. Membership function calculation unit 2 that calculates a membership function representing the fuzzy set of 2
It is equipped with G.

第２のファジィ集合算出部１８は、音素の継続時間をフ
ァジィ集合で表し、更に継続時間の各ファジィ変数に対
して特徴パラメータのターゲット値からの変化率をファ
ジィ集合で表し、与えられた継続時間によりファジィ推
論を行って変化率のファジィ集合を算出するファジィ推
論部２１を備えている。The second fuzzy set calculation unit 18 expresses the duration of the phoneme as a fuzzy set, and further expresses the rate of change from the target value of the feature parameter for each fuzzy variable of the duration as a fuzzy set, and calculates the given duration. The apparatus includes a fuzzy inference section 21 that performs fuzzy inference to calculate a fuzzy set of change rates.

ファジィ推論１［２＋は、自然音声から得られる特徴パ
ラメータのファジィ集合における特徴パラメータの生起
確率に基づいて特徴パラメータのターゲット値を算出す
るターゲット値算出部２２を備えている。The fuzzy inference 1[2+ includes a target value calculation unit 22 that calculates a target value of a feature parameter based on the probability of occurrence of the feature parameter in a fuzzy set of feature parameters obtained from natural speech.

次に、上述の規則制御部１３により上記修正された特徴
パラメータを求める手順について説明する。Next, a procedure for obtaining the modified feature parameters by the rule control unit 13 described above will be explained.

ここでは特徴パラメータとして代表ホルマントを的に説
明する。たたし代表ホルマントとは、先行音素、後続音
素及び音素の継続時間の影響を受けた音韻の特徴を最も
よく表しているホルマント周波数及びバンド幅のことを
いう。Here, the representative formant will be specifically explained as a feature parameter. The representative formant refers to the formant frequency and bandwidth that best represent the characteristics of phoneme influenced by the preceding phoneme, the following phoneme, and the duration of the phoneme.

代表ホルマントのメンバシップ関数は、次の手順により
求める。たたし、以下ではホルマントパラメータの内の
一つについて説明するが、全てのパラメータは、同じ手
順により求めることができる。The membership function of the representative formant is determined by the following procedure. However, although one of the formant parameters will be explained below, all parameters can be determined by the same procedure.

ステップトターゲソト値算出部２２により各音素のター
ゲット値を求める。The target value of each phoneme is determined by the step target value calculation unit 22.

種々の条件のもとで、発声データを音素ごとに統計的に
処理し、ファジィ確率により最も生起確率の高いファジ
ィ値をターゲットとする。Vocalization data is statistically processed for each phoneme under various conditions, and the fuzzy value with the highest probability of occurrence is targeted using fuzzy probability.

ターゲットの算出手順を以下に説明する。The target calculation procedure will be explained below.

周波数領域において全領域をΩとし、各データのホルマ
ント周波数や、バンド幅といった周波数を変数ωとする
。In the frequency domain, the entire region is assumed to be Ω, and frequencies such as the formant frequency and bandwidth of each data are assumed to be variables ω.

次に、音素Ｘｉについて、パラメータが周波数領域を表
すファジィ変数Ｆにおいて生起する確率ｐ　（Ｆ）を求
める。これは第３図に示すように、ファジィ変数Ｆのメ
ンバシップ関数をμＦ（ω）とすると、Ｐ　（Ｆ）　＝ｆ、μＦ（ω）・ｄＰ＝ｆ、μＦ（ω）・ｐ（ω）・ｄω ｐ　（ω）＝ｎ　（ω）／ＮＦＣΩ、ω（Ω、μＦ　（ω）　　Ω→［０，１コとな
り、これよりファジィ変数Ｆの生起確率Ｐ（Ｆ）を求め
る。Next, for the phoneme Xi, the probability p (F) that the parameter occurs in the fuzzy variable F representing the frequency domain is determined. As shown in Figure 3, if the membership function of the fuzzy variable F is μF(ω), then P (F) = f, μF(ω)・dP = f, μF(ω)・p(ω)・dω p (ω)=n (ω)/N FCΩ, ω(Ω, μF (ω) Ω→[0,1) From this, the probability of occurrence P(F) of the fuzzy variable F is determined.

ここで、Ｎは全データ数、ｎ（ω）は周波数ωにおける
データの度数を表している。Here, N represents the total number of data, and n(ω) represents the frequency of data at frequency ω.

これにより、音素Ｘｉのターゲット値Ｔｉを次式により
求めるヨＰ　　（Ｔｉ）＝ｍａｘ　　（Ｐ　　（Ｆ））　　；Ｔ
＋＝　　ＦＦＣΩ ターゲット値Ｔｉのメンバシップ関数はμＴｉ（ω）と
する。As a result, the target value Ti of the phoneme Xi is calculated using the following formula: P (Ti)=max (P (F));T
+=FFCΩ The membership function of the target value Ti is μTi(ω).

ステップ２　第２のファジィ集合算出部１８により音素
の継続時間に応じて代表ホルマントのターゲットからの
変位を表すファジィ集合を求める。Step 2 The second fuzzy set calculation unit 18 calculates a fuzzy set representing the displacement of the representative formant from the target according to the duration of the phoneme.

ステップ１で得られた代表ホルマントのターゲットＴか
らの変位を継続時間に応して統計的に求める。The displacement of the representative formant obtained in step 1 from the target T is statistically determined according to the duration.

ターゲットＴは、ファジィ変数で表されているので、求
める変位はファジイ数となる。Since the target T is represented by a fuzzy variable, the displacement to be determined is a fuzzy number.

音素Ｘｉの継続時間力月のときのターゲットＴｉからの
変位Ｄｉ　　（１）（メンバシップ関数はμＤｉ（１）
）は、 μＤｉ（１）＝μＬ　（１）　　・μＴ、（１，ｄω）
により算出する。Displacement Di (1) from the target Ti when the duration of the phoneme Xi is short (the membership function is μDi (1)
) is μDi(1)=μL (1) ・μT, (1, dω)
Calculated by

ここでμＦ（１）は、継続時間ｌが継続時間のファジィ
変数りに属するグレード、μＴ、、（１゜ｄω）は、μ
Ｔｉ（ω）の継続時間がｌのときの周波数を１Ｔｉ−ｆ
、、ｌ／１Ｔｉ−Ｔｊ　ｌにより正規化したメンバシッ
プ関数、Ｔｊは、先行（後続）音素のターゲット、ｆ　
ｌｌは、先行（後続）音素がＸｊであるときの代表ホル
マントの観測値である。Here, μF(1) is the grade whose duration l belongs to the fuzzy variable group, μT, , (1°dω) is μ
The frequency when the duration of Ti(ω) is l is 1Ti−f
,,l/1Ti-Tj The membership function normalized by l, Tj is the target of the preceding (following) phoneme, f
ll is the observed value of the representative formant when the preceding (following) phoneme is Xj.

更に、継続時間りに属する音素Ｘｉの代表ホルマントの
ターゲットからの変位Ｄｉ（Ｌ）は、次式より算出する
。Furthermore, the displacement Di(L) of the representative formant of the phoneme Xi belonging to the duration d from the target is calculated from the following equation.

Ｄｉ（Ｌ）＝＼７Ｄｉ（：ｌ）第４図（Ａ）〜（Ｃ）に、算出した代表ホルマントのタ
ーゲットからの変位Ｄ１のメンバシップ関数及び算出過
程を示す。Di(L)=\7Di(:l) FIGS. 4A to 4C show the membership function and calculation process of the calculated displacement D1 of the representative formant from the target.

ステップ３：メンバシップ関数算出部２０により、先行
音素に基づいて分類した代表ホルマントをファジィ集合
として求める。Step 3: The membership function calculation unit 20 calculates representative formants classified based on the preceding phoneme as a fuzzy set.

音素Ｘｉの先行音素Ｘｊの影響を受けた代表ホルマント
の生起確率をもとにして、代表ホルマントのメンバシッ
プ関数μｆ　ｉ　　（ｏ＜ｉ＜ｐ；ｐはホルマント次数
）を求める。Based on the probability of occurrence of the representative formant influenced by the preceding phoneme Xj of the phoneme Xi, a membership function μfi (o<i<p; p is the formant degree) of the representative formant is determined.

ステップ４・メンバシップ関数算出部２Ｑにより、後続
音素に基ついて分類した代表ホルマントをファジィ集合
として求める。Step 4: The membership function calculation unit 2Q calculates representative formants classified based on the subsequent phonemes as a fuzzy set.

音素Ｘ１の後続音素Ｘｊの影響を受けた代表ホルマント
の生起確率をもとにして、代表ホルマントのメンバシッ
プ関数μｂｉ　　（０＜ｉ＜ｐ；ｐはホルマント次数）
を求める。Based on the probability of occurrence of a representative formant influenced by the phoneme Xj following phoneme X1, the membership function μbi of the representative formant (0<i<p; p is the formant order)
seek.

上記ステップ３及び４で求めたメンバシップ関数μｆ１
、μｂ１の例を第５図に示す。Membership function μf1 obtained in steps 3 and 4 above
, μb1 is shown in FIG.

上述の手順（ステップ１〜４）で得られた音韻の代表ホ
ルマントのファジィ集合を用いて、以下に示す手順によ
り音韻の代表ホルマントを決定して音声を合成する。Using the fuzzy set of representative formants of phonemes obtained in the above procedure (steps 1 to 4), representative formants of phonemes are determined and speech is synthesized according to the procedure shown below.

ステップ５・第１のファジィ集合算出部１７により、先
行音素及び後続音素をキーにして、代表ホルマントパラ
メータのファジィ集合ＦｆおよびＦｂを抽出する。Step 5: The first fuzzy set calculating unit 17 extracts fuzzy sets Ff and Fb of representative formant parameters using the preceding phoneme and the following phoneme as keys.

ステップ６　ファジィ推論部２１により、継続時間に基
ついてファジィ推論を行って継続時間の影響を受けたホ
ルマントパラメータのターゲットからの変位関数を求め
る。Step 6 The fuzzy inference unit 21 performs fuzzy inference based on the duration to find a displacement function from the target of the formant parameter affected by the duration.

上記ステップ２で、継続時間のファジィ変数ごとにター
ゲットからの変位を表すメンバシップ関数μＤｉが算比
されている。In step 2 above, the membership function μDi representing the displacement from the target is calculated for each fuzzy variable of duration.

ここでは、実際に与えられた継続時間をファジィ変数で
表した継続時間におけるグレードを求める二とにより、
与えられた継続時間に対応するメンパンツブ関数μＤＯ
ｉをターゲットからの変位関数として求める。Here, we calculate the grade for the duration of the actually given duration expressed by a fuzzy variable.
member subfunction μDO corresponding to the given duration
Find i as a displacement function from the target.

算出方法を第６図を参照して説明する。The calculation method will be explained with reference to FIG.

継続時間はＬｏｎｇ、Ｍｉｄｄｌｅ、５ｈｏｒｔの３つ
のファジィ変数で表現する。与えられる継続時間かして
あるとき、次のような３つのファジィルールを適用する
。The duration is expressed by three fuzzy variables: Long, Middle, and 5hort. Given a given duration, apply the following three fuzzy rules:

（Ａ）もししがＬｏｎｇであれば、ＤＯはＤＬである。(A) If Long, then DO is DL.

（Ｂ）もししがＭｉｄｄｌｅであれば、ＤＯはＤＭであ
る。(B) If it is Middle, then DO is DM.

（Ｃ）もししがＳｈｏ　ｒ　ｔであれば、ＤＯはＤＳで
ある。(C) If Short, then DO is DS.

ただし、ＤＬ、ＤＭ、ＤＳはそれぞれ継続時間Ｌｏｎｇ
、　Ｍｉｄｄｌｅ、５ｈｏｒｔに対応するターゲットか
らの変位を表すメンバシップ関数である。However, DL, DM, and DS each have a long duration.
, Middle, and 5hort are membership functions representing the displacement from the target.

第６図（Ａ）、　　（Ｂ）、　　（Ｃ）は上記３つのフ
ァジィルールをメンバシップ関数で表したものであるっ継続時間か与えられるとファジィ変数に対するグレート
か求められる。Figures 6 (A), (B), and (C) represent the above three fuzzy rules using membership functions.If the duration is given, the magnitude for the fuzzy variable can be found.

継続時間かファジィで与えられた場合には、上述のファ
ジィ変数との間のｍａｘ−ｍｉｎ演算を行った結果得ら
れるグレードをα、継続時間かクリスプな値で与えられ
る場合には、ファジィ変数に対するグレードをαとし、
上述のルールを適用する。第６図は継続時間がファジィ
で与えられた場合を示している。When the duration is given as a fuzzy value, α is the grade obtained as a result of the max-min operation between the above fuzzy variables, and when the duration is given as a crisp value, the grade obtained as a result of the max-min operation with the fuzzy variable is The grade is α,
Apply the rules above. FIG. 6 shows a case where the duration is given fuzzy.

変位関数ＤＬ、ＤＭ、ＤＳの要素は、注目音素のターゲ
ットを０に、先行音素または後続音素のターゲットを１
に対応させた両者とのパラメータ上での距離を表したも
のである。The elements of the displacement functions DL, DM, and DS are 0 for the target of the target phoneme and 1 for the target of the preceding or subsequent phoneme.
It represents the distance on the parameter between the two corresponding to.

ホルマント周波数について考えた場合は、それぞれのタ
ーゲット周波数を０，１に対応さぜた連続値となる。When considering the formant frequency, it becomes a continuous value in which each target frequency corresponds to 0 and 1.

ファジィ推論は、与えられる継続時間とルールの前件部
によりグレードαを求め、変位関数をαて頭切りするこ
とによって各ルールから得られる変位関数を算出し、各
ルールから得られた変位関数をＯＲ合成することによっ
て、目標とする変位関数ＤＯが得られる。第６図ＣＤ）
に上記のようにして得られた変位関数ＤＯを示す。Fuzzy inference calculates the grade α using the given duration and the antecedent part of the rule, calculates the displacement function obtained from each rule by truncating the displacement function by α, and calculates the displacement function obtained from each rule. The target displacement function DO is obtained by performing OR combination. Figure 6 CD)
shows the displacement function DO obtained as above.

上述したファジィ推論の過程では、与えられる継続時間
とルールの前件部によりグレートαを求め、変位関数を
αで頭切りすることによって各ルールから得られる変位
関数を算出し、各ルールから得られた変位関数をＯＲ合
成することによって、目標とする変位関数ＤＯを求めた
が、変位関数ＤＯを求める方法はこれに限定されるもの
ではなく、例えば第７図に示すような方法を用いること
もできる。In the fuzzy inference process described above, the great α is calculated using the given duration and the antecedent of the rule, the displacement function obtained from each rule is calculated by truncating the displacement function at α, and the displacement function obtained from each rule is calculated. Although the target displacement function DO was obtained by ORing the displacement functions obtained, the method for obtaining the displacement function DO is not limited to this, and for example, a method as shown in FIG. 7 may be used. can.

即ち、第７図に示すように、与えられた継続時間と（Ａ
）、　　（Ｂ）、　　（Ｃ）の各ルールの前件部により
求めたグレードα１．α５．α０を変位関数に乗しるこ
とによって、各ルールから得られる変位関数を算出する
。That is, as shown in FIG.
), (B), and (C), the grade α1. determined by the antecedent part of each rule. α5. The displacement function obtained from each rule is calculated by multiplying the displacement function by α0.

そして、各ルールから得られた変位関数の合成は、上述
したＯＲ合成に代わって限界和を用いる。Then, to synthesize the displacement functions obtained from each rule, marginal sum is used instead of the above-mentioned OR synthesis.

ここで、限界和とは次式で表されるような演算である。Here, the marginal sum is an operation expressed by the following equation.

μｃ＝（μへ４μ８）△１また、メンバシップ関数の合成には、上述したＯＲ合成
や限界和以外にもさまざまな手法を用いることができる
。第７図（Ｄ）に上記のようにして得られた変位関数Ｄ
Ｏを示す。μc=(4μ8 to μ)Δ1 In addition to the above-mentioned OR combination and marginal sum, various methods can be used to synthesize the membership functions. FIG. 7(D) shows the displacement function D obtained as above.
Indicates O.

ステップ７　ファジィ演算部１９により、上記ステップ
５及びステップ６で求めたメンバシップ関数を合成する
。Step 7 The fuzzy calculation unit 19 synthesizes the membership functions obtained in steps 5 and 6 above.

先行音素および後続音素の影響を受けた代表ホルマント
のファジィ集合Ｆｆ、Ｆｂと、継続時間の影響によるタ
ーゲットからの変位を表すファジィ集合ＤＯとを第８図
に示すようにＡＮＤ合成して、代表ホルマントＦｍを得
る。The representative formant fuzzy sets Ff, Fb influenced by the preceding and subsequent phonemes and the fuzzy set DO representing the displacement from the target due to the influence of duration are AND-synthesized as shown in FIG. 8 to obtain the representative formant. Get Fm.

ここで、ＤＯとＦｆ、Ｆｂとを合成するためには、周波
数軸を変換しなければならない。そのため、ＤＯのメン
バシップ関数μＤＯを、０を注目音素のターゲット値に
、１を先行あるいは後続音素のターゲット値に対応させ
て線形に伸縮する。Here, in order to synthesize DO, Ff, and Fb, the frequency axis must be converted. Therefore, the membership function μDO of DO is linearly expanded or contracted so that 0 corresponds to the target value of the phoneme of interest and 1 corresponds to the target value of the preceding or succeeding phoneme.

ステップ８　代表ホルマントのファジィ集合を非ファジ
ィ化する。Step 8 Defuzzify the fuzzy set of representative formants.

第８図に示した例では、非ファジィ化の際に最も広く用
いられている重心法により代表ホルマントを求めた。し
かし、非ファジィ化はこの手法に限定されるものではな
い。In the example shown in FIG. 8, the representative formant was determined by the centroid method, which is the most widely used method for defuzzification. However, defuzzification is not limited to this method.

以上の手順により、音韻環境に即しており、かつ音韻の
継続時間をも考慮した代表ホルマントを得ることができ
る。Through the above procedure, it is possible to obtain a representative formant that is compatible with the phoneme environment and also takes into account the duration of the phoneme.

音韻によっては、時間経過と共にパラメータが変動し、
音韻を一つの代表ホルマントでは表現することが困難な
場合もある。Depending on the phoneme, parameters change over time,
In some cases, it is difficult to express phoneme using one representative formant.

上記のような場合には、代表ホルマントを時間経過にと
もなって複数考えることにより、本発明を適用できる。In the above case, the present invention can be applied by considering a plurality of representative formants over time.

合成音声は、上述の手順で得られた代表ホルマントを接
続することにより作成できる。その接続の際、必要に応
じてスムージングを行う。Synthetic speech can be created by connecting the representative formants obtained in the above procedure. When connecting, smoothing is performed as necessary.

上述の実施例では、音韻の特徴パラメータとして代表ホ
ルマントを例に説明したが、これはあくまてち−例であ
り、音韻の特徴ベラメー９として線形予測分析などで得
られるパラメータ等、とのような特徴パラメータを用い
る場合でもよい。In the above embodiment, the representative formant was used as an example of the phoneme feature parameter, but this is just an example, and the phoneme feature parameter 9 may be a parameter obtained by linear predictive analysis etc. It is also possible to use feature parameters.

［発明の効果］先行音素の影響を受けた特徴パラメータを表す第１のフ
ァジィ集合及び後続音素の影響を受けた特徴パラメータ
を表す第２のファジィ集合を算出する第１のファジィ集
合算出手段と、音素の継続時間の影響を受けた特徴パラ
メータのターゲット値からの変化率を表すファジィ集合
を算出する第２のファジィ集合算出手段と、音韻の特徴
パラメータを算出するファジィ演算手段とを備えており
、ファジィ演算手段は、第１のファジィ集合算出手段及
び第２のファジィ集合算出手段により算ヨされたファジ
ィ集合に対してファジィ演算を行なうように構成された
ので、複雑なルールを作成することなく、音韻環境や発
声速度に応じて最適な特徴パラメータを求めることがで
きる。[Effects of the Invention] A first fuzzy set calculating means for calculating a first fuzzy set representing a feature parameter influenced by a preceding phoneme and a second fuzzy set representing a feature parameter influenced by a subsequent phoneme; a second fuzzy set calculation means for calculating a fuzzy set representing a rate of change from a target value of a feature parameter affected by the duration of a phoneme; and a fuzzy calculation means for calculating a phoneme feature parameter; Since the fuzzy operation means is configured to perform fuzzy operations on the fuzzy sets calculated by the first fuzzy set calculation means and the second fuzzy set calculation means, the fuzzy operation can be performed without creating complicated rules. Optimal feature parameters can be determined according to the phonological environment and speech rate.

[Brief explanation of the drawing]

第１図は、本発明によるテキスト音声合成装置の一実施
例を示すブロック図、第２図は、第１図を構成する規則
制御部１３を詳しく示すブロック図、第３図は、周波数
領域におけるファジィ変数のメンバシップ関数を示すグ
ラフ、第４図（Ａ）〜（Ｃ）は、音素の継続時間に応じ
た代表ホルマントの変位を表すメンバシップ関数のグラ
フ、第５図は、環境の影響を受けた特徴パラメータのメ
ンバシップ関数を示すグラフ、第６図（Ａ、　）〜（Ｄ
）は、継続時間に対するファジィ推論の一例を示すグラ
フ、策７図（Ａ）〜（Ｄ）は、継続時間に対するファジ
ィ推論の他の例を示すグラフ、第８図（Ａ）〜（Ｃ）は
、メンバシップ関数の合成の一例を示すグラフ、第９図
（Ａ）及び（Ｂ）は、自然音声のホルマント抽出の一例
を示すグラフ、第１０図（Ａ）及び（Ｂ）は、従来の音
声合成装置による合成音声の分析例を示すグラフである
。１１・・・文字列解析部、１２・・・単語辞書、１３・
・・規則側＠詔、１５・・・規則ファイル、Ｉ５・・・
音声合成器、！６・・・特徴パラメータファイル、１７
・・・第１のファジィ集合算出部、１８・・・第２のフ
ァジィ集合算出部、１９・・・ファジィ演算部、２Ｑ・
・・メンパンツブ関数算出部、２１・・・ファジィ推論
部、２２・・ターケラト値算出邦。；主Ｂ者素のターｄリド μ 第８図ＦｍFIG. 1 is a block diagram showing an embodiment of the text-to-speech synthesizer according to the present invention, FIG. 2 is a block diagram showing details of the rule control section 13 configuring FIG. 1, and FIG. Graphs showing membership functions of fuzzy variables. Figures 4 (A) to (C) are graphs of membership functions representing the displacement of representative formants according to the duration of a phoneme. Figure 5 shows the influence of the environment. Graphs showing membership functions of received feature parameters, Figures 6 (A, ) to (D
) is a graph showing an example of fuzzy inference for duration time, Figure 7 (A) to (D) is a graph showing another example of fuzzy inference for duration time, and Figure 8 (A) to (C) is a graph showing another example of fuzzy inference for duration time. , a graph showing an example of composition of membership functions, FIGS. 9(A) and (B) are graphs showing an example of formant extraction of natural speech, and FIGS. 10(A) and (B) are graphs showing an example of formant extraction of natural speech. It is a graph showing an example of analysis of synthesized speech by a synthesizer. 11...Character string analysis section, 12...Word dictionary, 13.
...Rules side @ Edict, 15...Rules file, I5...
Speech synthesizer! 6...Feature parameter file, 17
... first fuzzy set calculation section, 18 ... second fuzzy set calculation section, 19 ... fuzzy operation section, 2Q.
... Menpan sub function calculation section, 21 ... Fuzzy inference section, 22 ... Terkerat value calculation section. ；Main B's Tardrid μ Figure 8 Fm

Claims

[Claims]

a first fuzzy set calculation means for calculating a first fuzzy set representing a feature parameter influenced by the preceding phoneme and a second fuzzy set representing a feature parameter influenced by the subsequent phoneme; It is equipped with a second fuzzy set calculation means for calculating a fuzzy set representing a rate of change from a target value of an affected feature parameter, and a fuzzy calculation means for calculating a phonological feature parameter, the fuzzy calculation means , a text-to-speech synthesis device configured to perform fuzzy operations on fuzzy sets calculated by the first fuzzy set calculation means and the second fuzzy set calculation means.