JPH01200290A

JPH01200290A - Voice synthesizer

Info

Publication number: JPH01200290A
Application number: JP63025941A
Authority: JP
Inventors: Osamu Kimura; 治木村; Nobuyoshi Amaki; 延佳海木; Jiyungo Kitou; 鬼頭　淳悟
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1988-02-03
Filing date: 1988-02-03
Publication date: 1989-08-11
Anticipated expiration: 2014-01-20
Also published as: JP2848604B2

Abstract

PURPOSE:To attain the generation of a natural synthesized voice, which is close to a real voice, by deciding voice and silence in each vowel of an input character string on the basis of a silence generating frequency by vocal sound series and an accent pattern and a silence deciding threshold. CONSTITUTION:When the character string is inputted to a character string analyzing part 1, the analyzing part 1 determines the intonation pattern of the whole character string, the vocal sound series and accent pattern of the character string. These determined matters are outputted to a rule control part 3. On the other hand, a target feature parameter file 6 of a feature parameter file 8 outputs a target feature parameter, which expresses the feature of the vowel, to the control part 3 and a time series feature parameter file 7 outputs a time series feature parameter, which expresses the feature of a consonant, to the control part 3. The control part 3 refers the both parameters, a vocal sound control rule and a rhythm control rule from a rule file 4 and generates the parameter, which is needed for voice synthesizing, and the parameter is outputted to a voice synthesizer 5. The synthesizer 5 executes the sound synthesizing on the basis of the inputted parameter and outputs a rule synthesizing voice to correspond to the input character string.

Description

【発明の詳細な説明】〈産業上の利用分野〉この発明は、規則合成音声を生成する音声合成装置に関
する。DETAILED DESCRIPTION OF THE INVENTION <Industrial Application Field> The present invention relates to a speech synthesizer that generates rule-based synthesized speech.

〈従来の技術〉自然な合成音声を生成するためには所定の母音を無声化
することが重要である。<Prior Art> In order to generate natural synthesized speech, it is important to devoice certain vowels.

従来、母音の無声化の規則としては、「桜井成心°“共
通語の発音で注意すべきことがら″゛日本語発音アクセ
ント辞典（改定新版）ＮＨＫｚ、解説・付録　Ｐ、１２
８　１９８５年」に−船釣な法則として示されている。Conventionally, the rules for devoicing vowels are as follows: "Sakurai Seishin °"Things to be careful about when pronunciation of common words", Japanese Pronunciation Accent Dictionary (Revised New Edition) NHKz, Explanation/Appendix P. 12
8 1985, as the law of boat fishing.

これには、母音無音化の生起する典型的な音韻環境につ
いて詳しく述べられている。It details the typical phonological environments in which vowel silence occurs.

また、実際に音声合成装置において用いられる無声化の
規則としては、例えば［佐原大和１箱田和雄“法則によ
る音声合成”研究実用化報告第２７巻第１２号、ｐ、２
５６２（６２）電々公社編１９７８年」がある。In addition, the devoicing rules actually used in speech synthesizers include, for example, [Yamato Sawara 1 Kazuo Hakoda "Speech Synthesis Based on Laws" Research and Practical Application Report Vol. 27 No. 12, p. 2
562 (62) edited by Electric Corporation, 1978.

第３図は上記従来の母音無声化規則を用いた無声化判定
ルーチンのフローチャートである。以下、第３図を用い
てこの従来の無声化判定ルーチンについて説明する。FIG. 3 is a flowchart of a devoicing determination routine using the above conventional vowel devoicing rule. This conventional devoicing determination routine will be described below with reference to FIG.

ステップＳ　３＋で、対象とする母音が高舌母音（／　
ｉ／、　／ｕ／）であるか否かが判別される。その結果
、高舌母音であればステップＳ　３２に進み、そうでな
ければ有声と判断してステップＳ３□に進む。In step S3+, the target vowel is a high tongue vowel (/
i/, /u/). As a result, if it is a high-toned vowel, the process proceeds to step S32, and if not, it is determined that it is voiced and the process proceeds to step S3□.

ステップＳ３２で、対象とする母音が無声子音に挾まれ
るか否かが判別される。その結果、挾まれていればステ
ップ３３３に進み、そうでなければ有声と判断してステ
ップＳ　３７に進む。In step S32, it is determined whether the target vowel is sandwiched between voiceless consonants. As a result, if the voice is being held, the process proceeds to step 333; if not, it is determined that there is a voice, and the process proceeds to step S37.

ステップＳ　３３で、対象とする母音がアクセント核（
音の高さが相対的に高から低に変化する位置）を有して
いるか否かが判別される。その結果、アクセント核を有
していればステップＳ　３９に進み、そうでなければス
テップＳ３４に進む。In step S33, the target vowel is the accent nucleus (
It is determined whether the sound has a position where the pitch relatively changes from high to low. As a result, if there is an accent kernel, the process proceeds to step S39; otherwise, the process proceeds to step S34.

ステップＳ　、４で、対象とする母音が第１モーラであ
るか否かが判別される。その結果、第１モーラであれば
ステップＳ　３１＋に進み、そうでなければステップＳ
　３６に進む。In step S4, it is determined whether the target vowel is the first mora. As a result, if it is the first mora, proceed to step S31+; otherwise, proceed to step S31+.
Proceed to step 36.

ステップＳ　３５で、先行する母音がすでに無声化され
ているか否かが判別される。その結果、無声化されてい
れば準無声化としてステップＳ　３１１へ進み、そうで
なければステップＳ３８に進む。In step S35, it is determined whether the preceding vowel has already been devoiced. As a result, if the voice has been devoiced, the process proceeds to step S311 as semi-devoicing; otherwise, the process proceeds to step S38.

ステップＳ３．で、対象とする母音が同種の無声摩擦音
に挾まれているか否かが判別される。その結果、挾まれ
ていれば準無声化としてステップ３３１１に進み、そう
でなければステップＳ３ａに進む。Step S3. Then, it is determined whether the target vowel is sandwiched between voiceless fricatives of the same type. As a result, if it is interposed, the process proceeds to step 3311 as semi-devoicing, and if not, the process proceeds to step S3a.

ステップＳ３７で、対象とする母音を有声化すると判定
される。In step S37, it is determined that the target vowel is to be voiced.

ステップ５３１１で、対象とする母音を無声化すると判
定される。In step 5311, it is determined that the target vowel is to be devoiced.

ステップＳ　３９で、対象とする母音の継続時間を短く
するなどの準無声化処理が実行される。In step S39, semi-devoicing processing such as shortening the duration of the target vowel is executed.

〈発明が解決しようとする課題〉しかしながら、上記従来の無声化規則では、母音無声化
と音韻系列との間および母音無声化上アクセントパター
ンとの間に明確な無声化基準が示されておらず、無声化
の判定基準としは曖昧な点がある。また、発生速度が変
わると一般に無声化する割合も変わるが、発生速度と無
声化基準との関係が定量的に記述されていない。そのた
め、無声化の規則か実音声と必ずしも一致しない場合が
ある。したがって、上記従来の無声化の規則によって生
成された合成音声は不自然に聞えるという問題がある。<Problems to be Solved by the Invention> However, the conventional devoicing rules described above do not provide clear devoicing standards between vowel devoicing and phonological sequences and between vowel devoicing and accent patterns. , the criteria for devoicing are ambiguous. Furthermore, when the rate of occurrence changes, the rate of devoicing generally changes, but the relationship between the rate of occurrence and the devoicing standard has not been quantitatively described. Therefore, the devoicing rules may not always match the actual speech. Therefore, there is a problem in that the synthesized speech generated using the conventional devoicing rules described above sounds unnatural.

そこで、この発明の目的は、音韻系列別の無声化生起度
合いと、アクセントパターン別の無声化生起度合いと、
発声速度に応じて設定される無声化閾値とに基づいて、
対象母音の有声化・無声化を判定することによって、実
音声に近い自然な合成音声を生成することができる音声
合成装置を提供することにある。Therefore, the purpose of this invention is to determine the degree of occurrence of devoicing by phoneme sequence, the degree of occurrence of devoicing by accent pattern,
Based on the devoicing threshold set according to the speaking rate,
An object of the present invention is to provide a speech synthesis device capable of generating natural synthesized speech close to real speech by determining whether a target vowel is voiced or unvoiced.

く課題を解決するための手段〉上記目的を達成するため、この発明は、文字列が入力さ
れる文字列解析部の出力から、規則ファイルに格納され
た規則に従って合成パラメータ生成手段で音声合成パラ
メータを生成し、この音声合成パラメータに基づいて音
声合成手段で音声合成を行う音声合成装置において、　
上記文字列解析部から入力された文字列の音韻系列と上
記規則ファイルに格納された音韻系列別の無声化生起度
合いに基づいて、上記入力された文字列の各母音毎の音
韻系列無声化生起度合いを設定する第１無声化生起度合
い設定手段と、上記文字列解析部から入力された文字列
のアクセントパターンと上記規則ファイルに格納された
アクセントパターン別の無声化生起度合いに基づいて、
上記入力された文字列の各母音毎のアクセントパターン
無声化生起度合いを設定する第２無声化生起度合い設定
手段と、上記第１無声化生起度合い設定手段によって設
定された音韻系列無声化生起度合いと、上記第２無声化
生起度合い設定手段によって設定されたアクセントパタ
ーン無声化生起度合いとから、上記各母音毎の無声化生
起度合いを算出する無声化生起度合い算出手段と、上記
規則ファイルは発声速度別の無声化判定閾値を格納して
おり、この発声速度別の無声化判定閾値に基づいて、指
定された発声速度に従って無声化判定閾値を設定する無
声化判定閾値設定手段と、上記無声化生起度合い算出手
段で算出された無声化生起度合いと上記無声化判定閾値
設定手段で設定された無声化判定閾値に基づいて、入力
文字列の各母音毎に有声化・無声化を判定することを特
徴としている。Means for Solving the Problems> In order to achieve the above object, the present invention generates speech synthesis parameters using a synthesis parameter generation means according to rules stored in a rule file from the output of a character string analysis unit into which a character string is input. In a speech synthesis device that generates a speech synthesis parameter and performs speech synthesis using a speech synthesis means based on this speech synthesis parameter,
Based on the phonetic sequence of the character string input from the character string analysis section and the degree of devoicing occurrence for each phonetic sequence stored in the rule file, the phonetic sequence devoicing occurs for each vowel in the input character string. Based on the first devoicing occurrence degree setting means for setting the degree, the accent pattern of the character string input from the character string analysis section, and the devoicing occurrence degree for each accent pattern stored in the rule file,
a second degree of devoicing occurrence setting means for setting the degree of occurrence of accent pattern devoicing for each vowel of the input character string; and a degree of occurrence of phonetic sequence devoicing set by the first degree of devoicing occurrence setting means. , a devoicing occurrence degree calculating means for calculating the devoicing occurrence degree for each vowel from the accent pattern devoicing occurrence degree set by the second devoicing occurrence degree setting means; a devoicing determination threshold setting means that stores a devoicing determination threshold for each speaking rate, and sets a devoicing determining threshold according to a specified speaking speed based on the devoicing determining threshold for each speaking speed; Based on the devoicing occurrence degree calculated by the calculating means and the devoicing judgment threshold set by the devoicing judgment threshold setting means, voicing/devoicing is determined for each vowel of the input character string. There is.

〈作用〉任意の文字列が文字列解析部に入力され、文字列解析部
からの出力が合成パラメータ生成手段に入力されると、
規則ファイルに格納された規則に従って音声合成パラメ
ータが生成される。<Operation> When an arbitrary character string is input to the character string analysis section and the output from the character string analysis section is input to the synthesis parameter generation means,
Speech synthesis parameters are generated according to rules stored in a rule file.

その際に、上記入力された文字列の音韻系列と上記規則
ファイルに格納された音韻系列別の無声化生起度合いに
基づいて、上記入力された文字列の各母音毎の音韻系列
無声化生起度合いが第１無声化生起度合い設定手段によ
って設定される。また、上記入力された文字列のアクセ
ントパターンと上記規則ファイルに格納されたアクセン
トパターン別の無声化生起度合いに基づいて、上記入力
された文字列の各母音毎のアクセントパターン無声化生
起度合いが第２無声化生起度合い設定手段によって設定
される。そして、上記第１無声化生起度合い設定手段に
よって設定された音韻系列無声化生起度合いと、上記第
２無声化生起度合い設定手段によって設定されたアクセ
ントパターン無声化生起度合いとから、上記各母音毎の
無声化生起度合いが無声化生起度合い算出手段によって
算出される。At that time, based on the phonological sequence of the input character string and the degree of devoicing occurrence for each phonological sequence stored in the rule file, the degree of occurrence of devoicing in the phonological sequence for each vowel of the input character string is determined. is set by the first devoicing occurrence degree setting means. Also, based on the accent pattern of the input character string and the degree of devoicing occurrence for each accent pattern stored in the rule file, the degree of accent pattern devoicing for each vowel in the input character string is calculated as follows. 2. It is set by the degree of devoicing occurrence setting means. Then, based on the phoneme series devoicing occurrence degree set by the first devoicing occurrence degree setting means and the accent pattern devoicing occurrence degree set by the second devoicing occurrence degree setting means, each vowel is The devoicing occurrence degree is calculated by the devoicing occurrence degree calculating means.

さらに、上記規則ファイルに格納された発声速度別の無
声化判定閾値に基づいて、指定された発声速度における
無声化判定閾値が無声化判定閾値設定手段によって設定
される。Further, based on the devoicing determination threshold for each speaking speed stored in the rule file, the devoicing determining threshold at the specified speaking speed is set by the devoicing determining threshold setting means.

そうすると、上記有声化・無声化判定手段は、上記無声
化生起度合い算出手段で算出された無声化生起度合いと
、上記無声化判定閾値設定手段で設定された無声化判定
閾値に基づいて、入力文字列の各母音毎に有声化・無声
化を判定する。Then, the voicing/devoicing determining means determines the degree of devoicing of the input character based on the devoicing occurrence degree calculated by the devoicing occurrence degree calculating means and the devoicing determination threshold set by the devoicing determination threshold setting means. Determine voicing/devoicing for each vowel in the string.

その後に、上記合成パラメータ生成手段で生成された音
声合成パラメータと、上記有声化・無声化判定手段で判
定された母音の有声化・無声化の判定結果に基づいて、
音声合成手段によって音声合成が行われる。Thereafter, based on the speech synthesis parameters generated by the synthesis parameter generation means and the determination result of vowel voicing/devoicing determined by the voicing/devoicing determination means,
Speech synthesis is performed by the speech synthesis means.

したがって、母音の無声化現象に大きな影響を与える音
韻系列およびアクセントパターンと発声速度を考慮して
母音毎の有声化・無声化を判定でき、自然な合成音声を
得ることができる。Therefore, it is possible to determine whether each vowel is voiced or unvoiced, taking into consideration the phoneme sequence, accent pattern, and speech rate that have a large influence on the vowel devoicing phenomenon, and it is possible to obtain natural synthesized speech.

〈実施例〉以下、第１図のブロック図により、この発明の音声合成
装置の構成および動作の該要を説明する。<Embodiment> Hereinafter, the main points of the configuration and operation of the speech synthesis device of the present invention will be explained with reference to the block diagram of FIG.

任意の文字列が文字列解析部１に入力されると、文字列
解析部１は入力された上記文字列の構文解釈を行い、文
字列全体のイントネーションパターンを決定する。さら
に、単語辞書２を参照して上記文字列に含まれる単語を
検索し、文字列内の各単語のアクセント及び音韻系列を
決定することにより、上記文字列の音韻系列及びアクセ
ントパターンを決定する。このようにして、上記文字列
解析部１において決定された文字列全体のイントネーシ
ョンパターンと、上記文字列の音韻系列およびアクセン
トパターンとは、規則制御部３に出力される。When an arbitrary character string is input to the character string analysis unit 1, the character string analysis unit 1 performs syntactical interpretation of the input character string and determines the intonation pattern of the entire character string. Furthermore, the word dictionary 2 is referred to to search for words included in the character string, and the accent and phoneme sequence of each word in the character string are determined, thereby determining the phoneme sequence and accent pattern of the character string. In this way, the intonation pattern of the entire character string, the phoneme series and the accent pattern of the character string determined by the character string analysis section 1 are output to the rule control section 3.

特徴パラメータファイル８はターゲット特徴パラメータ
ファイル６と時系列特徴パラメータファイル７とから構
成され、上記ターゲット特徴パラメータファイル６は、
母音の特徴を表わすターゲット特徴パラメータを上記規
則制御部３に出力し、また、上記時系列特徴パラメータ
ファイル７は子音の特徴を表わす時系列特徴パラメータ
を規則制御部３に出力する。一方、規則ファイル４は上
記特徴パラメータファイルから出力されるターゲット特
徴パラメータと時系列特徴パラメータとを接続するため
の音韻制御規則と、各韻律を制御するための韻律制御規
則とをそれぞれ上記規則制御部３に出力する。この韻律
制御規則の中に、後述する無声化音韻系列、音韻系列毎
の母音無声化生起度合いおよびアクセントパターン別の
母音無声化生起度合いが含まれている。The feature parameter file 8 is composed of a target feature parameter file 6 and a time series feature parameter file 7, and the target feature parameter file 6 includes:
Target feature parameters representing vowel features are output to the rule control section 3, and the time series feature parameter file 7 outputs time series feature parameters representing consonant features to the rule control section 3. On the other hand, the rule file 4 stores phoneme control rules for connecting the target feature parameters and time-series feature parameters output from the feature parameter file, and prosody control rules for controlling each prosody, respectively, in the rule control section. Output to 3. This prosodic control rule includes a devoiced phoneme series, a vowel devoicing occurrence degree for each phoneme series, and a vowel devoicing occurrence degree for each accent pattern, which will be described later.

上記規則制御部３は、上記特徴パラメータファイル８か
ら入力されたターゲット特徴パラメータおよび時系列特
徴パラメータと、上記規則ファイル４から入力された各
音韻を結合させるための上記音韻制御用ｍｌおよび各韻
律を制御するための上記韻律制御規則を参照して、上記
文字列解析部１から入力された文字列全体のイントネー
ションパターン、文字列の音韻系列、アクセントパター
ン、及び、後述する母音の有声化・無声化の判定結果に
より、音声合成に必要なパラメータを生成し、生成され
た上記パラメータを音声合成器５に出力する。The rule control unit 3 controls the phoneme control ml and each prosody for combining the target feature parameters and time series feature parameters input from the feature parameter file 8 with each phoneme input from the rule file 4. With reference to the prosodic control rules for control, the intonation pattern of the entire character string input from the character string analysis unit 1, the phonological sequence of the character string, the accent pattern, and the voicing/devoicing of vowels described below Based on the determination result, parameters necessary for speech synthesis are generated, and the generated parameters are output to the speech synthesizer 5.

音声合成器５は、入力されたパラメータに基づいて、音
声合成を行ない入力された文字列に対応する規則合成装
置を出力する。The speech synthesizer 5 performs speech synthesis based on the input parameters and outputs a rule synthesis device corresponding to the input character string.

第２図は上記規則制御部３で行なわれている有声化・無
声化判定ルーチンのフローチャートである。以下第２図
を用いて有声化・無声化判定ルーチンについて説明する
。FIG. 2 is a flowchart of the voicing/devoicing determination routine carried out by the rule control section 3. The voicing/devoicing determination routine will be described below with reference to FIG.

ステップＳ、で、まず、文字列解析部１から人力される
音韻系列および規則ファイル４に格納された音韻系列別
の無声化生起度合いを表した無声化生起係数から、母音
毎に音韻系列無声化生起係数が求められる。In step S, first, the phoneme sequence is devoiced for each vowel based on the phoneme sequence manually input from the character string analysis unit 1 and the devoicing occurrence coefficient representing the degree of devoicing for each phoneme sequence stored in the rule file 4. The occurrence coefficient is determined.

但し、前後の子音のどちらかが促音のときは上記音韻系
列無声化生起係数をα倍（α：　０〈α〈１　の定数）
して無声化生起度合いを低くするようにしている。However, if either the preceding or following consonant is a consonant, the above phonetic sequence devoicing occurrence coefficient is multiplied by α (α: constant of 0 < α < 1)
This is done to reduce the degree of devoicing.

また、この音韻系列無声化生起係数は、例えば、以下の
特徴をもつ。Further, this phoneme sequence devoicing occurrence coefficient has, for example, the following characteristics.

１、母音か高舌母音（／ｉ／　、　／ｕ／）であり、こ
の母音に先行する子音が無声摩擦音（／ｓ／　、　／ｓ
ｈ／　。1. It is a vowel or a high vowel (/i/, /u/), and the consonant preceding this vowel is a voiceless fricative (/s/, /s
h/.

／ｈ／）または無声破擦音（／ｃｈ／　、　／ｌｓ／）
であって、後続する子音が無声破裂音（／ｐ／　、　／
ｌ／　、　／に／　。/h/) or voiceless affricate (/ch/, /ls/)
and the following consonant is a voiceless plosive (/p/, /
l/, /ni/.

／ｐｙ／　、　／ｋｙ／）または無声破擦音（／ｃｈ／
　、　／ｌｓ／）のときは、前後の子音が無声破裂音（
／ｐ／　、　／ｌ／　。/py/, /ky/) or voiceless affricate (/ch/
, /ls/), the consonants before and after are voiceless plosives (
/p/, /l/.

／に／、後続子音は／ｐｙ／　、　／ｋｙ／を含む）の
ときよりも無声化の度合いが高い。The degree of devoicing is higher than that of /ni/, the following consonants include /py/, /ky/).

２、母音が高舌母音（／ｉ／　、　／ｕ／）であり、前
後の子音が無声破裂音（／ｐ／　、　／ｌ／　、　／に
／　、後続子音は／ｐｙ／　、　／ｋｙ／を含む）のと
きは、前後の子音が種類の異なる無声摩擦音（／ｓ／　
、　／ｓｈ／　、　／ｈ／、後続子音は／ｈｙ／を含む
）のときよりも無声化の度合いが高い。2. The vowel is a high vowel (/i/, /u/), the preceding and following consonants are voiceless plosives (/p/, /l/, /ni/, and the following consonant is /py/, /ky/). ), the preceding and following consonants are different types of unvoiced fricatives (/s/
, /sh/, /h/, the following consonant includes /hy/), the degree of devoicing is higher than that of the following consonants.

３、母音が高舌母音（／ｉ／　、　／ｕ／）であり、前
後の子音か種類の異なる無声摩擦音（／ｓ／　、　／　
ｓｈ／　。3. The vowel is a high tongue vowel (/i/, /u/), and the preceding and following consonants or different types of unvoiced fricatives (/s/, /
sh/.

／ｈ／、後続子音は／ｂｙ／を含む）のときは、前後の
子音が同一の無声摩擦音（／ｓ／　、　／ｓｈ／　、　
／ｈ／）のときよりも　無声化の度合いが高い。/h/, the following consonant includes /by/), the preceding and following consonants are the same unvoiced fricative (/s/, /sh/,
The degree of devoicing is higher than that of /h/).

４、母音が高舌母音（／ｉ／　、　／ｕ／）であり、前
後の子音が同一の無声摩擦音（／ｓ／　、　／ｓｈ／　
、　／ｈ／）のときは、前後の子音のどちらかが無声子
音でないときよりも無声化の度合いが高い。4. The vowel is a high vowel (/i/, /u/), and the consonants before and after are voiceless fricatives (/s/, /sh/).
, /h/), the degree of devoicing is higher than when either the preceding or following consonant is not a voiceless consonant.

５、母音が高舌母音（／ｉ／　、　／ｕ／）でないとき
は無声化はしない。5. Do not devoice unless the vowel is a high vowel (/i/, /u/).

表１に高舌母音の前後の子音で場合分けした音韻系列無
声化生起係数の一例を示す。縦は先行子音、横は後続子
音であり、各係数は数字が大きいほど無声化の度合いが
高いことを示す。括弧内の数字は前後の子音が同一の場
合を示している。Table 1 shows an example of phonological sequence devoicing occurrence coefficients divided into cases of consonants before and after a high vowel. The vertical line represents the preceding consonant, and the horizontal line represents the subsequent consonant, and the larger the number of each coefficient, the higher the degree of devoicing. Numbers in parentheses indicate cases where the consonants before and after are the same.

−以下、余白− 第１表次に、文字列解析部１から入力されるアクセントパター
ンと規則ファイル４に格納されたアクセントパターン別
無声化生起係数から、母音毎にアクセントパターン無声
化生起係数が求められる。- Below are blank spaces - Table 1 Next, the accent pattern devoicing occurrence coefficient is calculated for each vowel from the accent pattern input from the character string analysis unit 1 and the devoicing occurrence coefficient for each accent pattern stored in the rule file 4. It will be done.

上記アクセントパターン別の無声化生起度合いは、以下
の特徴を持つ。The degree of occurrence of devoicing for each accent pattern has the following characteristics.

ａ、アクセント核にある母音はほとんど無声化すること
がない。a. Vowels in the accent nucleus are rarely devoiced.

ｂ、０型の発声は、ｌ型以上の発声に比べて母音が無声
化する度合いが高い。In type b and 0 vocalizations, the degree of vowel devoicing is higher than in type l or higher vocalizations.

Ｃ０語頭はｌ型の発声を除いて無声化の度合いが高い。C0 word initials have a high degree of devoicing, except for l-type utterances.

ｄ、アクセント核の直前にある母音はアクセント核にあ
る母音より無声化の度合いが高いが、アクセント核より
後にある母音より無声化の度合いが低い。d. Vowels immediately preceding the accent nucleus have a higher degree of devoicing than vowels in the accent nucleus, but have a lower degree of devoicing than vowels after the accent nucleus.

第２表にアクセントパターン別の無声化生起度合いの一
例を示す。縦はアクセント型、横は対象母音のモーラ位
置を示し、各係数は数字が大きい程無声化の度合いが高
いことを示す。ここで、上記アクセント型とは、単語を
形成する音節のうち高く唱える音節の位置によって分類
するものであり、例えば“ｎ型”とは第２番目の音節か
らｎ番目までを高く唱え、１番目の音節とｎ千１番目以
下の音節はすべて低く唱えることを表わす（新明解国語
事典　第３版　付録　アクセント−覧）。Table 2 shows an example of the degree of occurrence of devoicing for each accent pattern. The vertical axis indicates the accent type, the horizontal axis indicates the mora position of the target vowel, and the larger the number of each coefficient, the higher the degree of devoicing. Here, the above-mentioned accent type is classified according to the position of the syllable pronounced high among the syllables forming the word. For example, "n type" means that the second to nth syllables are pronounced high, and the first syllable is pronounced high. The syllable and all syllables below the n,100th mark indicate chanting in a low voice (Shinmeikai Japanese Encyclopedia 3rd Edition Appendix Accent).

第２表さらに、上記音韻系列無声化生起係数とアクセントパタ
ーン無声化生起係数から、次式により母音毎に無声化生
起度合いρ（ｎ）　　（１＝、ｌ、　２．・・。Table 2 Furthermore, from the above-mentioned phoneme sequence devoicing occurrence coefficient and accent pattern devoicing occurrence coefficient, the devoicing occurrence degree ρ(n) (1=, l, 2...) for each vowel is determined by the following formula.

ＷＭ＋ｌ　　ＷＭは入力した音韻系列別のモーラ数であ
り、１Ｍ＋１は語尾の無音を表わす）が求められる。WM+l WM is the number of moras for each input phoneme sequence, and 1M+1 represents silence at the end of a word).

ρ（ｎ）＝音韻系列無声化生起係数Ｘアクセントパターン無声化生起係数ステップＳ、で、規則ファイル４に格納しである発声速
度毎に定めた無声化を判定するための無声化判定閾値と
、文字列解析部ｌから入力された発声速度とから、次の
ようにして無声化の判定基準となる閾値θが求められる
。すなわち、普通の発声速度のときの閾値を定め、それ
より発声速度が速いときは閾値を下げ、それより発声速
度が遅いときは閾値を上げる。ρ(n) = Phonological sequence devoicing occurrence coefficient A threshold value θ, which serves as a criterion for devoicing, is determined from the speech rate input from the character string analysis unit 1 in the following manner. That is, a threshold value is determined when the speech rate is normal, and when the speech rate is faster than that, the threshold value is lowered, and when the speech rate is slower than that, the threshold value is raised.

一例として、普通発声速度の閾値を６としたときの発声
速度毎の閾値を第３表に示す。As an example, Table 3 shows the threshold values for each speaking speed when the normal speaking speed threshold is 6.

第３表また、無声化の生起しやすい母音が連鎖したときの無声
化判定閾値θ２が次式で求められる。Table 3 Also, the devoicing determination threshold θ2 when vowels that are likely to be devoiced are chained is determined by the following equation.

θ２−θ×β　（βは、１〈βの実数）次に、母音毎に
無声化の判定に入る。θ2−θ×β (β is 1<real number of β) Next, devoicing is determined for each vowel.

ステップＳ３で、１モーラ目から判定を行なうためにｎ
−１とする。In step S3, n
-1.

ステップＳ４で、現モーラ（ｎモーラ）の無声化生起度
合いρ（ｎ）と無声化判定閾値θとが比較される。その
結果、ρ（ｎ）≧θのときは現モーラの母音は無声化の
可能性があるとしてステップＳ、に進み、ρ（ｎ）〈θ
のときは無声化しないと判定してステップＳ　１１へ進
む。In step S4, the devoicing occurrence degree ρ(n) of the current mora (n mora) is compared with the devoicing determination threshold θ. As a result, when ρ(n)≧θ, it is assumed that the vowel of the current mora may be devoiced, and the process proceeds to step S, where ρ(n)〈θ
In the case of , it is determined that devoicing is not to be performed and the process proceeds to step S11.

ステップＳ、で、次モーラ（ｎ＋１）の無声化生起度合
いρ（ｎ＋１）と、現モーラ（ｎ）の無声化生起度合い
ρ（ｎ）とが比較される。その結果、ρ（ｎ）≧ρ（ｎ
＋１）すなわち現モーラの方が無声化の度合が高い場合
は、現モーラの母音は無声化するが、次モーラの母音は
無声化しない場合があるとしてステップＳ８へ進む。一
方、ρ（ｎ）〈ρ（ｎ＋１）すなわち現モーラの方が無
声化の度合いが低い場合は、現モーラの母音は無声化し
ない場合があるとしてステップＳ６へ進む。In step S, the devoicing occurrence degree ρ(n+1) of the next mora (n+1) and the devoicing occurrence degree ρ(n) of the current mora (n) are compared. As a result, ρ(n)≧ρ(n
+1) That is, if the current mora has a higher degree of devoicing, the vowel of the current mora is devoiced, but the vowel of the next mora may not be devoiced, and the process proceeds to step S8. On the other hand, if ρ(n)<ρ(n+1), that is, the degree of devoicing of the current mora is lower than that of the current mora, it is assumed that the vowel of the current mora may not be devoiced, and the process proceeds to step S6.

これは、無声化する母音が続く場合は、発音の不明確に
なるのを避けるために無声化の度合いの低い一方の母音
を無声化させないためである。This is because when vowels to be devoiced follow, the vowel with a lower degree of devoicing is not devoiced to avoid unclear pronunciation.

ステップＳ８で、次モーラの母音が必ず無声化されるよ
うにρ（ｎ＋　１　）を大きくしてステップＳ７へ進む
。In step S8, ρ(n+1) is increased so that the vowel of the next mora is always devoiced, and the process proceeds to step S7.

ステップＳ７で、現モーラの無声化生起度合いρ（ｎ）
と、上記無声化の生起しやすい母音が連鎖したときの無
声化判定閾値θ２とが比較される。In step S7, the current mora devoicing occurrence degree ρ(n)
is compared with the devoicing determination threshold θ2 when the vowels that are likely to be devoiced are linked.

その結果、ρ（ｎ）≧０２の場合は現モーラの母音は無
声化の度合いが強いためステップＳ　１０へ進む。As a result, if ρ(n)≧02, the vowel of the current mora is highly devoiced, so the process proceeds to step S10.

一方、ρ（ｎ）＜０２の場合は現モーラの母音を無声化
しないためステップＳ　ＩＩへ進む。On the other hand, if ρ(n)<02, the vowel of the current mora is not devoiced and the process proceeds to step S II.

ステップＳ８で、次モーラの無声化生起度合ρ（ｎ＋１
）と、無声化の生起しやすい母音が連鎖したときの無声
化判定閾値θ２とが比較される。In step S8, the next mora devoicing occurrence degree ρ(n+1
) is compared with the devoicing determination threshold θ2 when vowels that are likely to be devoiced are linked.

その結果、ρ（ｎ＋１）＜０２の場合は次モーラの母音
を無声化しないと判定してステップＳ、に進む。As a result, if ρ(n+1)<02, it is determined that the vowel of the next mora is not to be devoiced, and the process proceeds to step S.

一方、ρ（ｎ＋１）≧０２の場合は次モーラの母音は無
声化の度合いが強いためそのままステップ３１０に進む
。On the other hand, if ρ(n+1)≧02, the vowel of the next mora is highly devoiced, so the process directly proceeds to step 310.

ステップＳ、で、次モーラの母音を無声化させないため
に、ρ（ｎ＋１）＝ＯとしてステップＳ　１０に進む。In step S, in order not to devoice the vowel of the next mora, ρ(n+1)=O is set and the process proceeds to step S10.

ステップＳ　１０で、ｎモーラ目の母音を無声化すると
判定してステップＳＩ！に進む。In step S10, it is determined that the nth mora vowel is to be devoiced, and step SI! Proceed to.

ステップＳ　ＩＩで、ｎモーラ目の母音を有声化すると
判定してステップＳｌｔに進む。In step S II, it is determined that the n-th mora vowel is to be voiced, and the process proceeds to step Slt.

ステップＳｌ！で、次モーラの母音の判定に移るために
、ｎを１つインクリメントされる。Step Sl! Then, n is incremented by one in order to move on to determining the vowel of the next mora.

ステップＳＩ３で、ｎが入力した音韻系列のモーラ数（
ＷＭ）以下であるか否かが判定される。その結果、ｎ≦
ＷＭの場合は上記ステップｓ４がらステップＳ　Ｉ２が
繰返され、ｎ＞ＷＭの場合はこの無声化の判定ルーチン
を終了する。In step SI3, the number of moras (
WM) or less is determined. As a result, n≦
In the case of WM, steps S4 and S12 are repeated, and if n>WM, this devoicing determination routine is ended.

以上のように、この発明では、音韻系列別の無声化生起
度合いとアクセントパターン別の無声化生起度合いと発
声速度に応じて設定される無声化判定閾値と無声化の生
起しゃすい母音が連鎖したときの無声化判定閾値とに基
づいて、対象母音の有声化・無声化を判定するので、入
力された文字列に対して、この文字列の音韻系列とアク
セントパターンと発声速度に則して母音の無声化を行な
うことができる。したがって、この発明によれば、より
自然な合成音声を生成することができる。As described above, in this invention, the devoicing determination threshold, which is set according to the degree of devoicing occurrence for each phoneme series, the degree of devoicing occurrence for each accent pattern, and the speaking speed, and the vowel that causes devoicing are linked. Since it determines whether the target vowel is voiced or unvoiced based on the devoicing determination threshold of can be muted. Therefore, according to the present invention, more natural synthesized speech can be generated.

〈発明の効果〉以上より明らかなように、この発明の音声合成装置は、
規則ファイルの音韻系列別の無声化生起度合いとアクセ
ントパターン別の無声化生起度合いに基づいて、入力文
字列の各母音毎の無声化生起度合いを算出し、さらに、
発声速度別の無声化判定閾値に基づいて、指定された発
声速度における無声化判定閾値を設定し、上記無声化生
起度合いと上記無声化判定閾値に基づいて、入力文字列
の各母音毎に有声化・無声化を判定するようにしたので
、母音の無声化現象に大きな影響を与える音韻系列とア
クセントパターンと発声速度を考慮して、対象とする母
音の有声化・無声化を判定することができる。したがっ
て、実音声に近い自然な合成音声を生成することができ
る。<Effects of the Invention> As is clear from the above, the speech synthesis device of the present invention has the following effects:
The degree of devoicing occurrence for each vowel in the input character string is calculated based on the degree of devoicing occurrence for each phoneme series and the degree of devoicing occurrence for each accent pattern in the rule file, and further,
Based on the devoicing determination threshold for each speaking speed, set the devoicing determination threshold at the specified speaking speed, and determine whether each vowel in the input character string is voiced based on the degree of devoicing occurrence and the devoicing determination threshold. Since the method determines whether the target vowel is voiced or devoiced, it is possible to determine whether the target vowel is voiced or devoiced, taking into account the phonological sequence, accent pattern, and speech rate, which have a large influence on the vowel devoicing phenomenon. can. Therefore, it is possible to generate natural synthesized speech that is close to real speech.

[Brief explanation of the drawing]

第１図はこの発明の音声合成装置の一実施例を示すブロ
ック図、第２図は上記実施例における母音の有声化・無
声化判定ルーチンのフローチャート、第３図は従来の有
声化・無声化判定ルーチンのフローチャートである。ｌ・・・文字列解析部、２・・・単語辞書、３・・・規
則制御部、　　４・・・規則ファイル、５・・・音声合
成器、６・・・ターゲット特徴パラメータファイル、７・・・
時系列特徴パラメータファイル、８・・・特徴ハラメー
タファイル。FIG. 1 is a block diagram showing an embodiment of the speech synthesis device of the present invention, FIG. 2 is a flowchart of a vowel voicing/devoicing determination routine in the above embodiment, and FIG. 3 is a conventional voicing/devoicing determination routine. It is a flowchart of a determination routine. l... Character string analysis unit, 2... Word dictionary, 3... Rule control unit, 4... Rule file, 5... Speech synthesizer, 6... Target feature parameter file, 7.・・・
Time series feature parameter file, 8... feature parameter file.

Claims

[Claims]

(1) From the output of the character string analysis unit into which a character string is input, the synthesis parameter generation means generates speech synthesis parameters according to the rules stored in the rule file, and the speech synthesis means synthesizes speech based on the speech synthesis parameters. In a speech synthesis device that performs a first devoicing occurrence degree setting means for setting the degree of devoicing occurrence in the phoneme series for each vowel; an accent pattern of the character string input from the character string analysis section; and devoicing by accent pattern stored in the rule file. a second devoicing occurrence degree setting means for setting an accent pattern devoicing occurrence degree for each vowel of the input character string based on the devoicing occurrence degree; and a devoicing occurrence degree setting means set by the first devoicing occurrence degree setting means. Devoicing occurrence degree calculating means for calculating the devoicing occurrence degree for each vowel from the phoneme series devoicing occurrence degree and the accent pattern devoicing occurrence degree set by the second devoicing occurrence degree setting means; The above rule file stores devoicing determination thresholds for each speaking rate, and based on these devoicing determining thresholds for each speaking rate, a devoicing determination threshold setting means sets a devoicing determining threshold according to a specified speaking rate. Based on the devoicing occurrence degree calculated by the devoicing occurrence degree calculation means and the devoicing determination threshold set by the devoicing determination threshold setting means, each vowel of the input character string is voiced or unvoiced. A speech synthesis device characterized by determining whether