JP3186263B2

JP3186263B2 - Accent processing method of speech synthesizer

Info

Publication number: JP3186263B2
Application number: JP30911792A
Authority: JP
Inventors: 和也長谷川
Original assignee: Meidensha Corp
Current assignee: Meidensha Corp
Priority date: 1992-11-19
Filing date: 1992-11-19
Publication date: 2001-07-11
Anticipated expiration: 2016-07-11
Also published as: JPH06161492A

Abstract

PURPOSE:To improve the naturalness of a synthesized speech whose pitch pattern is generated corresponding to a multistage accent pattern by using accent basic tables of two-stage accents. CONSTITUTION:Phoneme rain data and an N-stage accent pattern corresponding to an input sentence from a host computer 1 are inputted as to the pertinent phoneme and phonemes before and after it and multistage accents are quantized 15 into two stages to set 16 a basic accent from the two-stage accent basic table 14; and the accent component is performed in accent component variation quantity control 19, movement control 21, and mixing control 23 according to tables 20, 22, and 24 in consideration of the phonemes before and after it, thereby synthesizing a pitch pattern.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、規則合成方式の音声合
成装置において、ホストコンピュータ等から入力された
文字コードを韻律パラメータに変換する韻律処理のうち
アクセント処理方式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an accent processing method in a prosody processing for converting a character code inputted from a host computer or the like into a prosody parameter in a rule-synthesizing speech synthesizer.

【０００２】[0002]

【従来の技術】規則合成方式の音声合成装置は、入力文
字列を構文解析や形態素解析によって単語・文節に区切
ると共にイントネーションやアクセントを決定し、単語
や文節を音節や音素にまで分解し、音節又は音素単位の
音源波及び調音フィルタのパラメータを求め、音源波に
対する調音フィルタの応答出力として合成音声を得る。2. Description of the Related Art A rule-based speech synthesizer divides an input character string into words and phrases by syntactic analysis and morphological analysis, determines intonation and accent, decomposes words and phrases into syllables and phonemes, Alternatively, parameters of a sound source wave and an articulation filter are obtained for each phoneme, and a synthesized speech is obtained as a response output of the articulation filter to the sound source wave.

【０００３】韻律処理にはイントネーションやアクセン
トや生成するピッチパターンによって音節や音素単位の
ピッチ周波数（基本周波数）を調節したピッチパターン
パラメータ生成と、各音の強さを調節するエネルギーパ
ターンパラメータ生成と、各音の強さを調節する継電時
間長パラメータ生成などを行う。The prosody processing includes pitch pattern parameter generation in which the pitch frequency (basic frequency) of each syllable or phoneme is adjusted according to intonation, accent, and the pitch pattern to be generated, and energy pattern parameter generation in which the intensity of each sound is adjusted. Generates a relay time length parameter for adjusting the intensity of each sound.

【０００４】図３は従来の規則音声合成方式による韻律
処理方式を示す。ホストコンピュータ１等から与えられ
る漢字かな混じり文に対し、日本語処理部２により構文
解析や形態素解析によってイントネーションを含むアク
セント決定と音素列データの生成をする。FIG. 3 shows a prosody processing method using a conventional rule speech synthesis method. For a sentence mixed with kanji or kana given from the host computer 1 or the like, the Japanese processing unit 2 performs syntax analysis and morphological analysis to determine accents including intonation and generate phoneme string data.

【０００５】韻律処理部３は音素列データの各音素につ
いてアクセントデータからピッチ周波数を決定するピッ
チパターン生成部３Ａと、各音素についてその基本エネ
ルギーデータをピッチパターンパラメータから決定する
エネルギーパターン生成部３Ｂと、各音素についてその
継続時間長を決定する時間長算出部３Ｃとを備える。The prosody processing unit 3 includes a pitch pattern generation unit 3A for determining a pitch frequency from accent data for each phoneme of the phoneme sequence data, and an energy pattern generation unit 3B for determining basic energy data for each phoneme from pitch pattern parameters. And a time length calculator 3C for determining the duration of each phoneme.

【０００６】合成処理部４は韻律処理結果としての各パ
ラメータから調音フィルター等を介して合成音信号を
得、音声出力装置５から合成音声を得る。The synthesizing section 4 obtains a synthesized sound signal from each parameter as a result of the prosody processing through an articulation filter or the like, and obtains a synthesized sound from the sound output device 5.

【０００７】[0007]

【発明が解決しようとする課題】従来の方式において、
入力文に対してアクセント変化パターンは高／低の２段
階で与えられる。SUMMARY OF THE INVENTION In the conventional system,
An accent change pattern for an input sentence is given in two stages of high / low.

【０００８】ここで、人の発声になる肉声の韻律パター
ンはアクセント変化パターンについては連続的に変化す
るもので、２段階のアクセント変化では粗すぎて音声品
質の向上が望めない。Here, the prosody pattern of the real voice that becomes a human utterance changes continuously with respect to the accent change pattern, and the two-stage accent change is too coarse to improve the voice quality.

【０００９】しかし、現在の日本語処理技術では、漢字
かな混じり文を３段階以上のアクセントに変換すること
は出来ない。よって、３段階以上のアクセントを対象文
章に付与するには、人間が意図的に与える必要がある。However, with the current Japanese processing technology, it is not possible to convert a sentence mixed with kanji or kana into an accent of three or more levels. Therefore, in order to add three or more levels of accent to a target sentence, it is necessary for a human to intentionally give it.

【００１０】一方、合成音の声の高さを司るピッチパタ
ーンは、アクセント基本テーブルから、入力されたアク
セントパターンなどから、該当するテーブル値を参照す
ることにより生成される。しかし、２段階のアクセント
環境から得られたアクセント基本テーブルは、Ｎ段階の
入力に対して、そのままでは利用することは出来ない。On the other hand, a pitch pattern which controls the pitch of a synthesized voice is generated by referring to a corresponding table value from an input accent pattern or the like from an accent basic table. However, the basic accent table obtained from the two-stage accent environment cannot be used as it is for the N-stage input.

【００１１】本発明の目的は、２段階アクセントのアク
セント基本テーブルを用いて多段階のアクセントパター
ンに対応したピッチパターンを生成して合成音の自然性
を高めるアクセント処理方式を提供することにある。An object of the present invention is to provide an accent processing method for generating a pitch pattern corresponding to a multi-stage accent pattern using a two-stage accent basic accent table to enhance the naturalness of a synthesized sound.

【００１２】[0012]

【課題を解決するための手段】本発明は、前記課題の解
決を図るため、規則合成方式による音声合成装置におい
て、入力文から音素列データと各音素毎に多段階のアク
セント量を求める手段と、前記音素列の当該音素とその
前後の音素から２段階で表される当該音素の２つの基本
アクセントデータを求める手段と、前記当該音素とその
前後の音素の夫々のアクセント量が１つでも異なるか否
かによって前記２つの基本アクセントデータを所定比で
混合するか否かの判定手段と、前記当該音素とその前後
の音素の夫々のアクセントパターンから前記基本アクセ
ントデータに対してアクセント成分変化量テーブルの値
に従ってアクセント成分を変化させるアクセント成分変
化量制御手段と、前記アクセント成分を変化させた基本
アクセントデータに対し前記アクセントパターンに従っ
た移動量テーブルの値で前記アクセント成分を加える移
動制御手段と、前記移動制御させた基本アクセントデー
タを前記判定手段により所定比で混合する混合制御手段
とを備え、前記入力文に対する各音素の基本アクセント
データを前記アクセント成分変化と移動と混合したピッ
チパターンを合成することを特徴とする。SUMMARY OF THE INVENTION In order to solve the above-mentioned problems, the present invention provides a speech synthesizing apparatus using a rule synthesizing method, comprising means for obtaining phoneme string data from an input sentence and multi-stage accent amounts for each phoneme. Means for obtaining two basic accent data of the phoneme represented in two stages from the phoneme of the phoneme sequence and the phonemes before and after the phoneme, and at least one of the respective accent amounts of the phoneme and the phonemes before and after the phoneme is different. Means for determining whether or not to mix the two basic accent data at a predetermined ratio depending on whether or not the two are different from each other; and an accent component change amount table for the basic accent data based on the accent patterns of the phoneme and the phonemes before and after the phoneme. Accent component change amount control means for changing the accent component according to the value of A movement control unit for adding the accent component with a value of a movement amount table according to the accent pattern; and a mixing control unit for mixing the movement-controlled basic accent data at a predetermined ratio by the determination unit. It is characterized by synthesizing a pitch pattern in which basic accent data of each phoneme for a sentence is mixed with the accent component change and movement.

【００１３】[0013]

【作用】各音素について前後の音素を含めて多段階のア
クセント量を２段階に量子化し、２段階で表される２つ
のアクセント基本データを抽出し、前後の音素を含めた
多段階アクセント量から２つのアクセント基本データの
混合，アクセント成分変化及び移動を行うことで各音素
のピッチパターンを合成する。[Action] For each phoneme, the multi-stage accent amount including the preceding and succeeding phonemes is quantized into two stages, and two basic accent data expressed in two stages are extracted. The pitch pattern of each phoneme is synthesized by mixing the two basic accent data, changing the accent component, and moving.

【００１４】[0014]

【実施例】図１は本発明の一実施例を示すピッチパター
ン生成処理手順図である。ホストコンピュータ１は、入
力文の日本語処理によってピッチパターン生成には各音
素についての音素記号とＮ段階のアクセントパターンを
与える。FIG. 1 is a flowchart showing a pitch pattern generating process according to an embodiment of the present invention. The host computer 1 gives a phoneme symbol and an N-stage accent pattern for each phoneme for pitch pattern generation by Japanese processing of the input sentence.

【００１５】３音素ウィンドウかけ１１は与えられる音
素列データのうちの処理対象となる当該音素と１つ前の
音素及び１つ後の音素データを抽出する。一方、５モー
ラウィンドウかけ１２と３モーラウィンドウかけ１３
は、当該モーラのＮ段階のアクセント量とこれを中心と
する前後２つのモーラ（先先行モーラ，先行モーラ，後
続モーラ，後後続モーラ）又は前後１つのモーラ（先行
モーラ，後続モーラ）のＮ段階のアクセント量を当該モ
ーラのアクセント環境として抽出する。The three-phoneme window multiplier 11 extracts the phoneme to be processed, the preceding phoneme and the subsequent phoneme data from the given phoneme string data. On the other hand, a 5-mora window 12 and a 3-mora window 13
Is the amount of N stages of the mora and the N stages of two mora before and after the mora (preceding mora, preceding mora, subsequent mora, and subsequent mora) or one mora before and after (the preceding mora, the following mora) Is extracted as the accent environment of the mora.

【００１６】アクセント基本テーブル１４は、処理対象
となる当該音素の種類（下記表１の１０種類）と先行音
素の種類（長音，その他）及び後続音素の種類（促音，
有声子音，無声子音，母音，文末・句末）によって作成
されたグループ別のピッチパターンデータから各グルー
プ毎に典型的なアクセントパターンを選択し、このアク
セントパターンの基本成分を表す７つの値（子音部３
点，母音部４点）を正規化したデータが保存される。The accent basic table 14 stores the types of the phonemes to be processed (10 types in Table 1 below), the types of preceding phonemes (long sounds, other), and the types of subsequent phonemes (promotion sounds,
A typical accent pattern is selected for each group from the pitch pattern data for each group created by voiced consonants, unvoiced consonants, vowels, end of sentences and phrases, and seven values (consonants) representing the basic components of this accent pattern Part 3
(Points, vowels 4 points) are stored.

【００１７】[0017]

【表１】 [Table 1]

【００１８】２段階量子化１５は、５モーラウィンドウ
かけ１２が抽出する５モーラのＮ段階アクセントパター
ンを２段階のアクセントパターンに振分ける。The two-stage quantization 15 divides the five-mora N-stage accent pattern extracted by the five-mora window multiplier 12 into two-stage accent patterns.

【００１９】基本アクセントデータセット１６は、３音
素ウィンドウかけ１１からの音素データについて２段階
量子化１５からの２段階アクセントをアクセント環境と
して当該音素についての基本アクセントデータをアクセ
ント基本テーブル１４から２パターンを抽出する。The basic accent data set 16 is composed of two patterns from the basic accent table 14 of the phoneme data from the three-phoneme window unit 11 using the two-stage accent from the two-stage quantization 15 as an accent environment. Extract.

【００２０】混合制御判定部１７は、３モーラウィンド
ウかけ１３で抽出された先行モーラと後続モーラと当該
モーラの夫々のＮ段階のアクセントから混合制御をする
か否かを判定する。この混合制御判定は基本アクセント
データセット１６に抽出された２パターンのアクセント
データを所定比（混合係数）で混合するか否かを判定す
る。The mixing control determining unit 17 determines whether or not to perform the mixing control based on the N-stage accents of the preceding mora, the succeeding mora, and the mora extracted in the three-mora window frame 13. In this mixing control determination, it is determined whether or not two patterns of accent data extracted in the basic accent data set 16 are mixed at a predetermined ratio (mixing coefficient).

【００２１】このための混合制御フラグは混合制御フラ
グテーブル１８から求め、実際の混合制御は後述の混合
制御２３と混合係数テーブル２４によって行われる。The mixing control flag for this is obtained from the mixing control flag table 18, and the actual mixing control is performed by a mixing control 23 and a mixing coefficient table 24 described later.

【００２２】混合制御フラグテーブル１８及び混合係数
テーブル２４は下記表２に例示する。The mixing control flag table 18 and the mixing coefficient table 24 are exemplified in Table 2 below.

【００２３】[0023]

【表２】 [Table 2]

【００２４】この表は３モーラの各アクセント量が５段
階（Ｎ＝５）の場合を示し、先行モーラと後続モーラの
各アクセント量（１〜５）と、先行モーラが句頭になる
場合と後続モーラが句末になる場合も示し、夫々の組合
せにおいて当該モーラについて５段階に混合制御フラグ
（１，０）と混合係数α，βを示す。混合制御フラグは
混合制御する場合は１とし、混合制御しない場合は０と
し、混合係数α，βは混合制御フラグが１のときのみ
０．５，０．５としている。This table shows a case in which each of the three mora has five accent levels (N = 5). The accent amounts (1 to 5) of the preceding mora and the subsequent mora, and the case where the preceding mora is the beginning of a phrase. The case where the succeeding mora is the end of a phrase is also shown. In each combination, the mixing control flag (1, 0) and the mixing coefficients α and β are shown in five stages for the mora. The mixing control flag is set to 1 when mixing control is performed, is set to 0 when mixing control is not performed, and the mixing coefficients α and β are set to 0.5 and 0.5 only when the mixing control flag is set to 1.

【００２５】例えば、先行モーラのアクセント量が２、
後続モーラのアクセント量が３、当該モーラのアクセン
ト量が４では混合制御フラグは１となって混合制御を
し、その混合係数α，βは０．５，０．５となって１対
１の混合になる。For example, the amount of accent of the preceding mora is 2,
When the amount of accent of the following mora is 3 and the amount of accent of the mora is 4, the mixing control flag is set to 1 to perform mixing control, and the mixing coefficients α and β are set to 0.5 and 0.5, respectively. Become mixed.

【００２６】上述の混合制御をするか否かの決定は先行
モーラと当該モーラと後続モーラの夫々のアクセント量
ａ，ｂ，ｃについてａ≠ｂかつｂ≠ｃかつｃ≠ａのときフラグ＝１ａ＝ｂ又はｂ＝ｃ又はｃ＝ａのときフラグ＝０に従って決定される。The determination as to whether or not to perform the above-described mixing control is made by determining whether the preceding mora and the accent amounts a, b, and c of the mora and the subsequent mora have the flag = 1 when a ≠ b, b ≠ c, and c ≠ a. When a = b or b = c or c = a, it is determined according to the flag = 0.

【００２７】また、混合制御２３における混合アクセン
ト算出例を表３に示す。Table 3 shows an example of calculating a mixed accent in the mixing control 23.

【００２８】[0028]

【表３】 [Table 3]

【００２９】この例ではアクセント環境を３段階に量子
化した結果をＬ／Ｍ／Ｈで表現している。すなわち、３
モーラのアクセント量をａ，ｂ，ｃとすると、ａ＞ｂ＞ｃのときａｂｃ＝＞ＨＭＬａ＝ｂ＞ｃのときａｂｃ＝＞ＨＨＬａ＞ｂ＝ｃのときａｂｃ＝＞ＨＬＬａ＞ｃ＞ｂのときａｂｃ＝＞ＨＬＭａ＝ｃ＞ｂのときａｂｃ＝＞ＨＬＨｂ＞ａ＞ｃのときａｂｃ＝＞ＭＨＬｂ＞ａ＝ｃのときａｂｃ＝＞ＬＨＬｂ＞ｃ＞ａのとき＝＞ＬＨＭｂ＝ｃ＞ａのときａｂｃ＝＞ＬＨＨｃ＞ａ＞ｂのときａｂｃ＝＞ＭＬＨｃ＞ａ＝ｂのとき＝＞ＬＬＨａ＝ｂ＝ｃのとき＝＞ＨＨＨ又はＬＬＬ但し、ＨＨＨは先行の量子化アクセントがＨのとき、先
行のアクセントがＬＭの変化のとき、先行モーラが無く
後続がＬのときになる。またＬＬＬは先行のアクセント
がＬのとき、先行がＭＬ変化のとき、先行モーラが無く
後続がＨのときになる。In this example, the result of quantizing the accent environment in three stages is represented by L / M / H. That is, 3
If the mora accent amounts are a, b, and c, abc => HML when a>b> c, abc => HHL when a = b> c, abc => HLL a>c> when a> b = c abc => HLM when b = abc => HLH when b = a> c abc => MHL when b>a> c abc => LHL when b> a = c => LHM when b>c> a When b = c> a, abc => LHH When c>a> b, abc => MLH When c> a = b => LLH When a = b = c => HHH or LLL where HHH is the preceding When the quantization accent is H, when the preceding accent is a change in LM, there is no preceding mora, and when the succeeding is L. LLL is when the preceding accent is L, when the preceding is an ML change, when there is no preceding mora, and when the following is H.

【００３０】このような３段階に量子化されたアクセン
ト環境において、Ｍが含まれる場合に混合制御が行わ
れ、対象環境はＨＭＬ→ＨＬＬ、ＨＨＬＨＬＭ→ＨＬＬ、ＨＬＨＭＨＬ→ＬＨＬ、ＨＨＬＭＬＨ→ＬＬＨ、ＨＬＨＬＨＭ→ＬＨＬ、ＬＨＨＬＭＨ→ＬＨＨ、ＬＬＨの規制により決定される。In such an accent environment quantized in three stages, when M is included, mixing control is performed, and the target environment is HML → HLL, HHL HLM → HLL, HLH MHL → LHL, HHL MLH → LLH , HLH LHM → LHL, LHH LMH → LHH, LLH.

【００３１】表３の第１項を例にとると、アクセント環
境１，２，３の場合、ピッチパターンデータを算出する
には、アクセントパターンデータＰitch₁₁₃にα（＝
０．５）を掛けた値とアクセントパターンデータＰitch
₁₃₃にβ（＝０．５）を掛けた値とを足し合わせること
によって得られる。[0031] Taking as an example the first term of Table 3, when the accent environment 1,2,3, to calculate the pitch pattern data, the accent pattern data Pitch ₁₁₃ alpha (=
0.5) multiplied by accent pattern data Pitch
It is obtained by adding the value obtained by multiplying ₁₃₃ by β (= 0.5).

【００３２】次に、アクセント成分変化量制御１９は３
モーラのＮ段階アクセントパターンからアクセント成分
変化量テーブル２０のデータを抽出し、この変化量によ
って基本アクセントデータセット１６に抽出されたアク
セント量を補正する。Next, the accent component change amount control 19
The data in the accent component change amount table 20 is extracted from the mora N-stage accent pattern, and the amount of change is used to correct the accent amount extracted in the basic accent data set 16.

【００３３】アクセント成分変化量は、人の発生した音
声を分析することにより算出してテーブル２０に保存さ
れている。この算出は、図２に示すように、人の発生音
のアクセント変化量が大きい音声と普通の音声及び小さ
い音声について夫々のピッチ分析によるピッチパターン
生成をし、夫々の時間軸を合せる時間軸正規化と、３モ
ーラのウィンドウかけと、平均処理を行った後にアクセ
ント成分変化量算出をする。The accent component change amount is calculated by analyzing a voice generated by a person, and is stored in the table 20. In this calculation, as shown in FIG. 2, a pitch pattern is generated by pitch analysis for a voice with a large accent change amount, a normal voice, and a small voice of a human generated sound, and a time axis normalization that matches each time axis is performed. After performing the conversion, windowing of three moras, and averaging, the amount of change in the accent component is calculated.

【００３４】この変化量算出例を５段階アクセントパタ
ーンの場合で表４に示し、アクセント成分変化量テーブ
ルを表５に示す。Table 4 shows an example of calculation of the change amount in the case of a five-stage accent pattern, and Table 5 shows an accent component change amount table.

【００３５】[0035]

【表４】 [Table 4]

【００３６】[0036]

【表５】 [Table 5]

【００３７】上記表４中、ＨＨＬ（Ｌ）はアクセント環
境ＨＨＬ、アクセント量小の文字列を示し、Ｐitch₄₄₁
（ｊ）はアクセント環境₄₄₁におけるｊ番目の抽出アク
セントデータ値を示す。[0037] Among the above Table 4, HHL (L) shows accent environment HHL, a string of accents amount small, Pitch ₄₄₁
(J) shows the j-th extracted accent data value in the accent environment ₄₄₁ .

【００３８】上記のアクセント変化量の一般算出式は、
アクセント環境ａｂｃ（先行，当該，後続）における基
本アクセントデータに対する変化量Ｐabcを下記式とす
る。The general formula for calculating the amount of change in accent is as follows:
The change amount Pabc with respect to the basic accent data in the accent environment abc (preceding, relevant, succeeding) is represented by the following equation.

【００３９】Ｐabc＝Ｐitch abc−Ｐitch_ABC ………（１）但し、・ａ＝ｂ又はｂ＝ｃ又はｃ＝ａ（ａ，ｂ，ｃ＝１〜Ｎ）・Ｐitch_ABCはアクセント量中の時のピッチパターンデ
ータ（平均処理済み）でmax（Ａ，Ｂ，Ｃ）＝Ｎ
／２＋１（小数点以下切り捨て）の条件を満たす。（ma
x（Ａ，Ｂ，Ｃ）はＡ，Ｂ，Ｃの最大値）Ａ＝Ｂ又はＢ＝Ｃ又はＣ＝Ａ・Ｐitch_A′_B′_C′はアクセント最大の時のピッチパタ
ーンデータ（平均処理済み）で、 max（Ａ′，Ｂ′，Ｃ′）＝max（Ａ，Ｂ，Ｃ）＊２を満たす。[0039] _{Pabc = Pitch abc-Pitch ABC .........} (1) However, · a = b or b = c or c = a (a, b, c = 1~N) · Pitch ABC when in accent amount Max (A, B, C) = N in pitch pattern data (average processed)
/ 2 + 1 (truncated below the decimal point). (Ma
x (A, B, C) are A, B, the maximum value of C) A = B or B = C or _{_{C = A · Pitch A 'B}} ' C ' pitch pattern data when the accent maximum (average processed ), Max (A ′, B ′, C ′) = max (A, B, C) * 2.

【００４０】Ａ′＝Ｂ′又はＢ′＝Ｃ′＝Ａ′ ・Ｐitch_A″，_B″，_C″はアクセント最小の時のピッチ
パターンデータ（平均処理済み）で、 max（Ａ″，Ｂ″，Ｃ″）＝２を満たす。A '= B' or B '= C' = A 'Pitch _A ", _B ", _C "are pitch pattern data (average processed) when the accent is minimum, and max (A", B " , C ″) = 2.

【００４１】Ａ″＝Ｂ″又はＢ″＝Ｃ″又はＣ″＝Ａ″ ・max（ａ，ｂ，ｃ）−min（ａ，ｂ，ｃ)＜Ｎ／２＋
１、max（ａ，ｂ，ｃ）＞２の場合Ｐitchａ′ｂ′ｃ′＝（Ｐitch_ABC＋Ｐitch_A″_B″_C″）
＊Ｘ／ＺＸ＝max（ａ，ｂ，ｃ）−２Ｚ＝Ｎ／２−１（小数点以下切り捨て）・max（ａ，ｂ，ｃ）−min（ａ，ｂ，ｃ）＞Ｎ／２＋
１、max（ａ，ｂ，ｃ）＜Ｎの場合Ｐitchａ′，ｂ′，ｃ′＝（Ｐitch_ABC＋Ｐitc
h_A′_B′_C′）＊Ｙ／ＺＹ＝max（ａ，ｂ，ｃ）−（Ｎ／２＋１）Ｚ＝Ｎ／２−１（小数点以下切り捨て）ここで、max（ａ，ｂ，ｃ）＝ＮかつＮが奇数の場合
は、上式に当てはまらない。そこで、この場合に限りＰabc＝２＊Ｐitchａ′ｂ′ｃ′ ………（２）の式により変化量を算出する。A "= B" or B "= C" or C "= A" max (a, b, c) -min (a, b, c) <N / 2 +
1, max (a, b, c)> 2 Pitcha'b'c '= (Pitch _ABC + Pitch _A " _B " _C ")
* X / Z X = max (a, b, c) -2 Z = N / 2-1 (decimal point rounded down) • max (a, b, c) -min (a, b, c)> N / 2 +
1, max (a, b, c) <N Pitcha ', b', c '= (Pitch _ABC + Pitc
_{_{_{h A 'B' C ')}}} * Y / Z Y = max (a, b, c) - (N / 2 + 1) Z = N / 2-1 ( rounded down below the decimal point) Here, max (a, b, c ) = N and N is odd, the above expression does not apply. Therefore, only in this case, the change amount is calculated by the equation of Pabc = 2 * Pitcha'b'c '(2).

【００４２】図１に戻って、アクセント成分変化量制御
の後、移動量テーブル２２を使った移動制御２１を行
う。この移動量テーブル例は５段階アクセントパターン
の場合を表６に示す。Returning to FIG. 1, after the control of the change amount of the accent component, the movement control 21 using the movement amount table 22 is performed. Table 6 shows an example of this movement amount table in the case of a five-stage accent pattern.

【００４３】[0043]

【表６】 [Table 6]

【００４４】この移動量はＮ段階アクセントパターン
ａ，ｂ，ｃが入力された場合、 min（ａ，ｂ，ｃ）＝１かつａ＝ｂ又はｂ＝ｃ又はｃ＝ｆ ……（３）の条件を満たした場合は、移動する必要はないが、アク
セントパターンｄｅｆが入力された場合 min（ｄ，ｅ，ｆ）＞１かつｄ＝ｅ又はｅ＝ｆ又はｆ＝ｄ ……（４）の条件を満たした場合は、移動量Ｓ_defを基本テーブル
のピッチパターン値に一律加える。ここで、移動量Ｓ
_defの算出は、以下の条件で決定する。When the N-stage accent patterns a, b, and c are input, the amount of movement is expressed as follows: min (a, b, c) = 1 and a = b or b = c or c = f (3) When the condition is satisfied, there is no need to move, but when the accent pattern def is input, min (d, e, f)> 1 and d = e or e = f or f = d (4) When the condition is satisfied, the movement amount S _def is uniformly added to the pitch pattern value of the basic table. Here, the movement amount S
The calculation of _def is determined under the following conditions.

【００４５】Ｓ_def＝Ｉ（Ｈ，ａ，ｂ，ｃ） ………（５）Ｉ（Ｈ，ａ，ｂ，ｃ）：パターンａｂｃがアクセント差
Ｈによって決定される値Ｈ＝max（ｄ，ｅ，ｆ）−max（ａ，ｂ，ｃ） ……（６）ただし、ｄ−Ｈ＝ａかつｅ−Ｈ＝ｂかつｆ−Ｈ＝ｃ以上の５つのテーブルによるＮ段階アクセントパターン
入力に対するピッチパターンの生成は入力音素列の個々
の音素に対して行われ、最終音素についてのピッチパタ
ーン生成処理終了（２５）でない限り、当該音素の次の
音素へのウィンドウシフト（２６）とそのＮ段階アクセ
ントパターンへのウィンドウシフト（２７）を行い、全
音素についてのピッチパターン生成処理終了後に全モー
ラのピッチパターン合成（２８）を行うことでピッチパ
ターンパタメータを得る。S _def = I (H, a, b, c) (5) I (H, a, b, c): Value where pattern abc is determined by accent difference H H = max (d, e, f) -max (a, b, c) (6) where d-H = a, e-H = b, and f-H = c The pitch for N-stage accent pattern input using the above five tables The pattern is generated for each phoneme of the input phoneme sequence. Unless the pitch pattern generation processing for the final phoneme is completed (25), the window shift to the next phoneme of the phoneme (26) and its N-stage accent are performed. A window shift to a pattern (27) is performed, and pitch pattern synthesis of all moras is performed (28) after completion of the pitch pattern generation processing for all phonemes, thereby obtaining pitch pattern parameters.

【００４６】[0046]

【発明の効果】以上のとおり、本発明によれば、入力文
に対する音素列データと多段階のアクセント量から当該
音素とその前後の音素を考慮して基本アクセントデータ
のアクセント成分変化量制御と移動制御及び混合制御を
するようにしたため、以下の効果がある。As described above, according to the present invention, the control and movement of the accent component change amount of the basic accent data from the phoneme sequence data and the multi-stage accent amount for the input sentence are considered in consideration of the phoneme and the phonemes before and after the phoneme. The control and the mixing control have the following effects.

【００４７】（１）規則音声合成装置において、従来、
高いか低いの２段階のアクセントで声の高さを制御して
いたが、多段階のアクセントで声の高さが制御できる。(1) In a rule speech synthesizer,
Although the pitch of the voice was controlled by two levels of accents, high or low, the pitch of the voice can be controlled by multiple levels of accents.

【００４８】（２）声の高さを表す基本周波数パターン
を連続的に生成できる。(2) A fundamental frequency pattern representing the pitch of a voice can be continuously generated.

【００４９】（３）従来の日本語処理結果を用いては出
来なかった、３段階以上のアクセントを対象文章に付与
することができる。(3) Accents in three or more stages, which could not be obtained by using the result of the conventional Japanese processing, can be added to the target sentence.

【００５０】（４）従来の、２段階のアクセント環境か
ら得られたアクセント基本テーブルのピッチパターンデ
ータが、多段階の入力に対して、利用することができ
る。(4) Conventional pitch pattern data of an accent basic table obtained from a two-stage accent environment can be used for multi-stage input.

【００５１】（５）人間の発声した音声のピッチパター
ンを根拠としているため、より人間のピッチパターンに
近いピッチパターンを生成することが出来る。(5) Since a pitch pattern of a voice uttered by a human is used as a basis, a pitch pattern closer to a human pitch pattern can be generated.

【００５２】（６）人間が発声した音声に近い、自然な
合成音声が得られる。(6) A natural synthesized voice similar to a voice uttered by a human can be obtained.

[Brief description of the drawings]

【図１】本発明の一実施例を示すピッチパターン生成処
理。FIG. 1 shows a pitch pattern generation process according to an embodiment of the present invention.

【図２】実施例におけるアクセント成分変化量算出。FIG. 2 is a diagram illustrating an example of calculating an accent component change amount according to an embodiment.

【図３】従来の規則音声合成方式。FIG. 3 shows a conventional rule speech synthesis method.

[Explanation of symbols]

１…ホストコンピュータ１１…３音素ウィンドウかけ１２…５モーラウィンドウかけ１３…３モーラウィンドウかけ１４…アクセント基本テーブル１５…２段階量子化１６…基本アクセントデータセット１７…混合制御判定１８…混合制御テーブル１９…アクセント成分変化量制御２０…アクセント成分変化量ケーブル２１…移動制御２２…移動量テーブル２３…混合制御２４…混合係数テーブル DESCRIPTION OF SYMBOLS 1 ... Host computer 11 ... Over 3 phoneme windows 12 ... 5 Mora window over 13 ... 3 Mora window over 14 ... Accent basic table 15 ... Two-stage quantization 16 ... Basic accent data set 17 ... Mixed control determination 18 ... Mixed control table 19 ... Accent component change amount control 20 ... Accent component change amount cable 21 ... Movement control 22 ... Movement amount table 23 ... Mixing control 24 ... Mixing coefficient table

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平４−46396（ＪＰ，Ａ) 特開昭64−61795（ＪＰ，Ａ) 特開昭64−61796（ＪＰ，Ａ) 特開平２−48700（ＪＰ，Ａ) 特開昭61−57997（ＪＰ，Ａ) 特開昭63−85797（ＪＰ，Ａ) 特開平４−134499（ＪＰ，Ａ) 特開平２−197897（ＪＰ，Ａ) 特開平４−83298（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 13/06 G10L 13/08 ──────────────────────────────────────────────────続き Continuation of front page (56) References JP-A-4-46396 (JP, A) JP-A-64-61795 (JP, A) JP-A-64-61796 (JP, A) JP-A-2- 48700 (JP, A) JP-A-61-57997 (JP, A) JP-A-63-85797 (JP, A) JP-A-4-134499 (JP, A) JP-A-2-197897 (JP, A) JP-A-4-83298 (JP, A) (58) Fields investigated (Int. Cl. ⁷ , DB name) G10L 13/06 G10L 13/08

Claims

(57) [Claims]

1. A speech synthesizing apparatus using a rule synthesizing method, comprising: means for obtaining phoneme string data from a sentence and multi-stage accent amounts for each phoneme; Means for obtaining two basic accent data of the phoneme represented, and mixing the two basic accent data at a predetermined ratio depending on whether at least one of the accent amounts of the phoneme and the phonemes before and after the phoneme is different. Determination means for determining whether or not, the accent component change amount control means for changing the accent component according to the value of the accent component change amount table for the basic accent data from the accent pattern of the phoneme and the phonemes before and after the phoneme, Movement table according to the accent pattern with respect to the basic accent data in which the accent component is changed A movement control unit that adds the accent component with a value of: and a mixing control unit that mixes the movement-controlled basic accent data at a predetermined ratio by the determination unit, wherein the basic accent data of each phoneme for the input sentence is An accent processing method for a speech synthesizer characterized by synthesizing a pitch pattern mixed with a change in accent component and movement.

2. The accent processing method for a speech synthesizer according to claim 1, wherein said accent component change amount table is obtained by analyzing pitch patterns of three-stage voices uttered by a human.