JPH032800A

JPH032800A - Intonation control system for voice synthesizer

Info

Publication number: JPH032800A
Application number: JP13636489A
Authority: JP
Inventors: Kazuya Hasegawa; 和也長谷川
Original assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Current assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Priority date: 1989-05-30
Filing date: 1989-05-30
Publication date: 1991-01-09

Abstract

PURPOSE:To obtain a synthesized voice approximated to the pitch pattern of an actual voice by controlling the control component of a fall part of accent type components in accordance with the accent type and the number of moras. CONSTITUTION:Intonation is determined by a sentence processing part 2, and the accent type is determined by an accent processing part 3. Table data is referred to determine a fall part coefficient K in accordance with the accent type and the number of moras, and this table data is preliminarily determined as the value of actual voice because being obtained with the accent type and the number of moras as parameters by a statistical method. With respect to a value F of the fall part, a pitch frequency control component S as the top of accent is multiplied by the fall part coefficient K to perform addition correction of the pitch target value of this fall part at the time of referring to phoneme parameters to determine the pitch target value by an interpolation processing part 5. Thus, the pitch pattern change in the fall part is approximated to an actual voice to obtain a highly natural synthesized voice.

Description

【発明の詳細な説明】Ａ、産業上の利用分野本発明は、規則合成方式による音声合成装置に係り、特
にアクセント型による抑揚制御方式に関する。DETAILED DESCRIPTION OF THE INVENTION A. Field of Industrial Application The present invention relates to a speech synthesis device using a rule synthesis method, and more particularly to an intonation control method using an accent type.

Ｂ１発明の概要本発明は、イントネーションとアクセント型によって音
素又は音節のピッチ周波数を調節することで抑揚制御を
行う音声合成装置において、アクセント型成分の滝部の
調節分をアクセント型とモーラ数に応じて調節すること
により、実音声のピッチパターンに近づけた合成音声を
得るものである。B1 Summary of the Invention The present invention provides a speech synthesis device that performs intonation control by adjusting the pitch frequency of a phoneme or syllable depending on the intonation and accent type. By adjusting the pitch pattern, synthesized speech that approximates the pitch pattern of real speech can be obtained.

Ｃ１従来の技術規則合成方式による音声合成装置は、入力文字列を構文
解析によって単語、文節に区切り、夫々にはイントネー
ション、アクセントを決定し、単語、文節を音節さらに
は音素にまで分解し、音節又は音素単位の音源波及び調
音フィルタのパラメータを求め、音源波に対する調音フ
ィルタの応答出力として合成音声を得るようにしている
。C1 A speech synthesis device using the conventional technical rule synthesis method divides an input character string into words and phrases by syntactic analysis, determines intonation and accent for each, breaks down the words and phrases into syllables and even phonemes, and converts them into syllables. Alternatively, the sound source wave and the parameters of the articulation filter are determined for each phoneme, and synthesized speech is obtained as a response output of the articulation filter to the sound source wave.

この種の音声合成装置は、例えば第３図に示す構成にさ
れる。日本語処理部ｌは人力された日本語文章に対して
文節の区切りや辞書を参照して読みがな変換等を行う。This type of speech synthesis device has, for example, the configuration shown in FIG. The Japanese language processing unit 1 performs pronunciation conversion, etc. on the manually written Japanese text by referring to the segmentation of phrases and a dictionary.

文章処理部２は文章にイントネーションを付与し、アク
セント処理部３では文、文節を構成する音節にアクセン
トを付ける。The sentence processing section 2 adds intonation to sentences, and the accent processing section 3 adds accents to syllables constituting sentences and clauses.

例えば、第４図に示すように、文章入力「学校の桜がき
れいに咲いた」に対して文イントネーションはその音節
数によって立上り点から対数特性等で低下していき、文
節アクセントは単語、文節によってアクセント型が決め
られ、これらイントネーショとアクセント型を合成しさ
らには呼気イントネーションやフィルタ処理による丸め
、ポーズ等を付加して合成イントネーションが求められ
る。For example, as shown in Figure 4, for the sentence input ``The cherry blossoms at school bloomed beautifully,'' the sentence intonation decreases from the rising point according to the number of syllables in a logarithmic manner, and the phrase accent changes depending on the word and phrase. An accent type is determined, and a composite intonation is obtained by synthesizing these intonations and accent types, and adding exhalation intonation, rounding by filter processing, pauses, etc.

音素処理部４は入力されたｒｓＡｊ・・・等の各音節デ
ータに対して母音及び子音の単位である音素との対応関
係を規定した音節パラメータ格納部４゜内のデータを参
照して音素に分解する処理、例えば音節ｒｓＡＪに対し
て音素ｒｓＪ、ｒＡＪに分解処理する。The phoneme processing section 4 converts each input syllable data, such as rsAj..., into a phoneme by referring to data in the syllable parameter storage section 4 that defines the correspondence relationship with phonemes, which are units of vowels and consonants. For example, the syllable rsAJ is decomposed into phonemes rsJ and rAJ.

補間処理部５は、音素処理部４からの音素列データに対
して、音素毎に音素パラメータ格納部５１の音素パラメ
ータを抽出し、また音源パラメータ格納部５．の音源パ
ターンを抽出してこれらデータから補間処理によって音
源波形及び調音データを得る。音素パラメータは、例え
ば第５図に示すように、子音には各音素を３つの発声時
間帯０１〜０３に区分し、各時間帯毎に継続時間１．〜
ｔ１、音源波の繰り返し周波数であるピッチＰ１〜Ｐ３
、この音源波のエネルギーＥ１〜Ｅ３、音源波パターン
Ｇ、〜Ｇ３及びピッチとエネルギーの時定数ＤＰ、−Ｄ
Ｐ＊、ＤＥＩ−ＤＥ３を有して音源波の離散的データを
得る。また、母音には１つの区分Ｏ＾にして夫々ピッチ
時定数Ｄ　Ｐ　Ａ　、エネルギーＥＡ％エネルギー時定
数ＤＥＡ、音源波パターンＧＡを有して音源波の離散的
データとする。このうち、音源波パターンは例えば第６
図に示すような音源波パターンＧ、〜Ｇ３、ＧＡが対応
づけられ、各パターンに対して音源パラメータ格納部５
！には数十個のサンプルデータ列が用意されて音源波の
サンプルデータが取出される。また、エネルギー　Ｅ　
＋〜Ｅ３、ＥＡは音源波のレベルの大きさ即ち音の大き
さを規定し、ピッチＰ　Ｉ””　Ｐ　５、ＰＡは周波数
の高さ即ち音の高さを規定する。そして、これら音源波
データの規定は各時間帯Ｏ３〜０３、ＯＡでの１つの値
になり、各時間帯及び音素間のわたりには時定数ＤＰ、
〜Ｄ　Ｐ　５、ＤＰＡ１ＤＥＩ〜ＤＥ、、ＤＥＡが与え
られて補間処理部５による補間処理によって連続した音
源波データ列が取出される。The interpolation processing unit 5 extracts phoneme parameters from the phoneme parameter storage unit 51 for each phoneme from the phoneme string data from the phoneme processing unit 4, and also extracts phoneme parameters from the phoneme parameter storage unit 51. A sound source pattern is extracted, and a sound source waveform and articulatory data are obtained from these data through interpolation processing. For example, as shown in FIG. 5, the phoneme parameters are such that each phoneme for consonants is divided into three utterance time periods 01 to 03, and the duration of each consonant is 1 to 03 for each time period. ~
t1, pitch P1 to P3 which is the repetition frequency of the sound source wave
, the energies E1 to E3 of this sound source wave, the sound source wave patterns G, to G3, and the pitch and energy time constants DP, -D
P*, DEI-DE3 to obtain discrete data of the sound source wave. Further, each vowel is divided into one division O^ and has a pitch time constant D P A , an energy EA% energy time constant DEA, and a sound source wave pattern GA to provide discrete data of the sound source wave. Among these, the sound source wave pattern is, for example, the 6th
The sound source wave patterns G, ~G3, and GA as shown in the figure are associated with each other, and the sound source parameter storage unit 5
! Several dozen sample data strings are prepared and sample data of the sound source wave is extracted. Also, energy E
+~E3, EA defines the magnitude of the level of the sound source wave, that is, the loudness of the sound, and the pitch P I"" P5, PA defines the height of the frequency, that is, the pitch of the sound. The regulation of these sound source wave data is one value for each time period O3 to 03 and OA, and the time constant DP,
~D P 5, DPA1 DEI ~ DE, , DEA are given, and a continuous sound source wave data string is extracted by interpolation processing by the interpolation processing section 5.

例えば、子音のピッチＰ１〜Ｐ３は第７図に示すように
区間Ｏ８−０３毎の目標値として与えられ、各区間内の
ピッチＰは時定数Ｄ　Ｐ　Ｉ−Ｄ　Ｐ　ｓの大きさによ
って実線や破線で示すような変化になるｎ回の補間処理
を行う。この補間演算は次の漸化式Ｐｎｋ＝ＤＰ　（Ｐ
　、、Ｐ　。−＋）　＋　Ｐ　。For example, the pitches P1 to P3 of consonants are given as target values for each interval O8-03 as shown in Fig. 7, and the pitch P in each interval is expressed as a solid line or Interpolation processing is performed n times resulting in changes as shown by broken lines. This interpolation calculation is performed using the following recurrence formula Pnk=DP (P
,,P. -+) +P.

但し、Ｐｎｋ；に回目のピッチ制御値ＤＰ　　、ピッチ時定数Ｐｏ　；今回のピッチ目標値Ｐｎ−＋：前回のピッチ目標値によってｎ回演算を行ってＰ　Ｈｈ、　Ｐ　Ｈｋ＋１・
・・のように夫々ピッチＰｎｋを求める。However, Pnk is the pitch control value DP for the second time, pitch time constant Po is the current pitch target value Pn-+: calculation is performed n times using the previous pitch target value, and P Hh, P Hk+1.
The pitch Pnk is determined as follows.

次に、音素パラメータ格納部５Ｉには第５図に示すよう
に音響管モデル断面積のパラメータと時定数ＤＡＩ〜Ｄ
Ａ３、ＤＡ＾も格納される。このパラメータは声道調音
等価フィルタのパラメータを与えるもので、人間の声道
（男性の場合は約１７ｃｍ）を長さｌＣｘの音響管１７
個連接した調音モデルとして各時間帯毎に各音響管の断
面積Ａ　Ｉ−＋〜Ａ＋？−ＩＮ　Ａ＋−ｔ〜Ａ＋、−＊
、Ａ　Ｉ−３〜Ａｌ７−３として与えられる。これらパ
ラメータは音響管時定数と共に調音演算部６に与えられ
て音源波に対する調音演算がなされる。Next, as shown in FIG.
A3 and DA^ are also stored. This parameter gives the parameter of the vocal tract articulation equivalent filter, and the human vocal tract (approximately 17 cm in the case of a male) is
As an individually connected articulation model, the cross-sectional area of each acoustic tube for each time period A I-+ ~ A+? -IN A+-t~A+,-*
, A I-3 to Al7-3. These parameters are given to the articulation calculation unit 6 together with the acoustic tube time constant, and articulation calculations are performed on the sound source wave.

調音演算部６は、断面積パラメータを持つ音響管に対し
て音源波を与えたときの放射音声波形データ列を求め、
この波形デーをＤ／Ａ変換器７によってアナログ信号に
変換して音声出力装置８から合成音声を得る。The articulation calculation unit 6 obtains a radiated sound waveform data string when a sound source wave is applied to an acoustic tube having a cross-sectional area parameter,
This waveform data is converted into an analog signal by the D/A converter 7, and synthesized speech is obtained from the speech output device 8.

ここで、合成音声の抑揚は文イントネーションとアクセ
ント型が音素のピッチ（第５図）ＰＩ〜Ｐ３、ＰＡに夫
々加算又は乗算され、この演算結果でピッチ目標値が決
定され、さらに補間処理部５において前述の補間処理が
なされてピッチ周波数として算定される。Here, for the intonation of the synthesized speech, the sentence intonation and accent type are added or multiplied by the phoneme pitches (Fig. 5) PI to P3 and PA, respectively, and the pitch target value is determined by the result of this calculation, and the interpolation processing unit 5 The above-mentioned interpolation process is performed in , and the pitch frequency is calculated.

Ｄ０発明が解決しようとする課題従来の抑揚制御は、イントネーションとアクセント型に
よるピッチ周波数の画一的な調節になり、音素又は音節
の組み合わせになる単語や文節によっては人の自然音声
による抑換から外れた制御になることがあった。D0 Problems to be Solved by the Invention Conventional intonation control involves uniform adjustment of pitch frequency based on intonation and accent type, and depending on the word or phrase that is a combination of phonemes or syllables, it may be difficult to control the intonation due to natural human speech. Sometimes things got out of control.

例えば、第８図に示すように、文節「このそうち」に対
してイントネーション成分■にアクセント型成分Ａを重
畳した抑揚制御には各モーラのピッチ目標値が○印で示
すようになるし、目標値間の補間したピッチ周波数は実
線のピッチパターンＰのようになる。このようなピッチ
パターンＰにおいて、アクセント型Ａによるピッチ目標
値の調節分は高低の２レベルにされ、アクセントの頂上
になる音節「のＪでレベルＳの高さを持たせ、音節「う
」でレベルＯに戻す。このピッチパターンＰではアクセ
ントの滝の下のモーラ「う」のアクセント成分による調
節分はレベルＯになるが、実音声のピッチパターンでは
アクセントの滝部でのアクセント成分は零でなくある程
度の値を持ち、該部分でのイントネーションが不自然な
合成音声になる。For example, as shown in Fig. 8, for intonation control in which accent type component A is superimposed on intonation component ■ for the phrase ``Konosochi'', the pitch target value of each mora will be indicated by a circle, The pitch frequency interpolated between the target values becomes a pitch pattern P shown by a solid line. In such a pitch pattern P, the adjustment of the pitch target value by accent type A is made into two levels, high and low. Return to level O. In this pitch pattern P, the adjustment amount due to the accent component of the mora "u" below the accent waterfall is level O, but in the pitch pattern of the actual voice, the accent component at the accent waterfall part is not zero but has a certain value, and the The intonation in some parts becomes unnatural synthesized speech.

本発明の目的は、アクセント型の滝部のピッチパターン
を実音声のピッチパターンに近づけることで自然性の高
い合成音声を得ることができる抑揚制御方式を提供する
ことにある。An object of the present invention is to provide an intonation control method that can obtain highly natural synthesized speech by bringing the pitch pattern of an accent-type waterfall closer to the pitch pattern of real speech.

９１課題を解決するための手段と作用本発明は、上記目的を達成するため、入力文章にイント
ネーションとアクセント型の成分で該文章の音素又は音
節毎のピッチ目標値を調節して合成音声の抑揚を得る音
声合成装置において、単語又は文節のアクセント型とモ
ーラ数に応じた係数にを得る第１の手段と、前記アクセ
ント型の頂上によるピッチ周波数調節分Ｓに前記係数に
を乗算して該アクセント型の滝部のピッチ周波数調節分
Ｆを求める第２の手段とを備え、アクセント型の滝部の
ピッチ目標値を前記ピッチ周波数調節分で加算補正する
ようにし、アクセント型とモーラ数に応じて各文節のア
クセント滝部のピッチ目標値に値を持たせ、該滝部での
ピッチパターン変化を実音声のそれに近づける。91 Means and Effects for Solving the Problems In order to achieve the above object, the present invention adjusts the pitch target value of each phoneme or syllable of an input sentence using intonation and accent type components to inflect the synthesized speech. a first means for obtaining a coefficient according to the accent type and mora number of a word or phrase; and a first means for obtaining a coefficient according to the accent type and mora number of the word or phrase; and a second means for determining a pitch frequency adjustment amount F for the waterfall part of the accent type, the pitch target value of the waterfall part of the accent type is added and corrected by the pitch frequency adjustment amount, and each phrase is adjusted according to the accent type and the number of moras. A value is given to the pitch target value of the accent waterfall part, and the pitch pattern change at the accent waterfall part is brought closer to that of the actual voice.

Ｆ、実施例第１図は本発明の一実施例を示すフローチャートである
。ステップＳｌによるイントネーション決定は、従来と
同様に文章処理部２によって行い、ステップＳ２による
アクセント型決定はアクセント処理部３によって行う。F. Embodiment FIG. 1 is a flowchart showing an embodiment of the present invention. The intonation determination in step S1 is performed by the sentence processing section 2 as in the prior art, and the accent type determination in step S2 is performed by the accent processing section 3.

ステップＳ３による滝部係数にの決定は、アクセント型
とモーラ数に応じてテーブルデータの参照により決定さ
れ、このテーブルデータはアクセント型とモーラ数をパ
ラメータとして統計的手法により求めたため実音声の値
として予め決められる。また、係数には正の実数でＯ＜
Ｋ＜１の範囲になる。The determination of the Takibe coefficient in step S3 is determined by referring to table data according to the accent type and the number of moras, and this table data is calculated in advance by a statistical method using the accent type and the number of moras as parameters, so it is determined in advance as the value of the actual voice. It can be decided. In addition, the coefficient is a positive real number O<
The range is K<1.

次に、ステップＳ４による滝部の値Ｆは補間処理部５に
より音素パラメータを参照したピッチ目標値を決定する
際に、アクセントの頂上になるピッチ周波数調節分Ｓに
滝部係数にを乗算して該滝部のピッチ目標値を加算補正
する。ステップＳ５による音源波確立等は、従来と同様
に補間処理部５による音源波合成を行うが、このときピ
ッチ周波数目標値には上述の補正されたピッチ目標値に
対する補間処理して音源波確立さらには調音演算部６に
よる調音演算等を行う。Next, when the interpolation processing unit 5 determines the pitch target value with reference to the phoneme parameters, the value F of the waterfall part in step S4 is calculated by multiplying the pitch frequency adjustment amount S at the top of the accent by the waterfall coefficient. The pitch target value of is added and corrected. To establish the sound source wave in step S5, the interpolation processing unit 5 synthesizes the sound source wave in the same manner as in the past, but at this time, the pitch frequency target value is interpolated with respect to the corrected pitch target value as described above to establish the sound source wave. performs articulation calculation etc. by the articulation calculation section 6.

上述のように、ステップＳ３及びＳ４によるピッチ目標
値の補正により、アクセントの滝部でのピッチパターン
を実音声のそれに近づけることができる。例えば、第２
図に示すように、文節「このそうち」の各モーラに対し
て、アクセントの頂上部の調節分Ｓに該アクセント型と
モーラ数によって決定した係数にを乗算した滝部値ＦＦ
’＝ＳＸＫが音節「う」のピッチ目標値の調節分にされ、ピッチ目
標値にアクセント成分を残して実音声のピッチパターン
に近いピッチパターンＰ′を得る。As described above, by correcting the pitch target value in steps S3 and S4, it is possible to bring the pitch pattern at the waterfall portion of the accent closer to that of the actual voice. For example, the second
As shown in the figure, for each mora of the phrase "Konosochi", the waterfall value FF is calculated by multiplying the adjustment amount S at the top of the accent by a coefficient determined based on the accent type and the number of moras.
'=SXK is set as the adjustment amount for the pitch target value of the syllable "u", and the accent component is left in the pitch target value to obtain a pitch pattern P' that is close to the pitch pattern of the actual speech.

Ｇ１発明の効果以上のとおり、本発明によれば、アクセント型とモーラ
数に応じてアクセント型成分によるピッチ目標値をアク
セント型の滝部で加算補正するようにしたため、該滝部
でのピッチパターン変化を実音声に近づけることができ
、自然性の高い合成音声を得ることができる効果がある
。G1 Effects of the Invention As described above, according to the present invention, the pitch target value based on the accent type component is added and corrected at the waterfall part of the accent type according to the accent type and the number of moras. This has the effect of making it possible to obtain synthesized speech that is close to real speech and has a high degree of naturalness.

[Brief explanation of drawings]

第１図は本発明の一実施例を示すフローチャート、第２
図は実施例の抑揚波形図、第３図は音声合成装置の構成
図、第４図はイントネーション波形図、第５図は音素パ
ラメータのデータ図、第６図は音源波パターンの波形図
、第７図は補間処理によるピッチ特性図、第８図は従来
の抑揚波形図である。１・・・日本語処理部、２・・・文章処理部、３・・・
アクセント処理部、４・・・音素処理部、４．・・・音
節パラメータ格納部、５・・・補間処理部、５．・・・
音素パラメータ格納部、５．・・・音源パラメータ格納
部、６・・・調音演算部、７・・・Ｄ／Ａ変換器、８・
・・音声出力装置。外２名第２図実施例の抑揚波形図 □モーラ第１図実施例のフローチャート第３図音声合成装置の構成図８−音声出力装置第４図イントネーション波形図第６図音源波パターンの波形図第７図補間処理によるピッチ特性図第５図音素パラメータのデータ図第８図従来の抑揚波形図 □モーラ平成　２年１　月１２日平成１年特許願第１３６３６４号発明の名称音声合成装置の抑揚制御方式３、補正をする者事件との関係FIG. 1 is a flowchart showing one embodiment of the present invention, and FIG.
The figure is an intonation waveform diagram of the embodiment, Figure 3 is a configuration diagram of the speech synthesis device, Figure 4 is an intonation waveform diagram, Figure 5 is a data diagram of phoneme parameters, Figure 6 is a waveform diagram of the sound source wave pattern, and Figure 6 is a diagram of the intonation waveform. FIG. 7 is a pitch characteristic diagram obtained by interpolation processing, and FIG. 8 is a conventional intonation waveform diagram. 1... Japanese language processing section, 2... Text processing section, 3...
Accent processing unit, 4... Phoneme processing unit, 4. ... syllable parameter storage unit, 5... interpolation processing unit, 5. ...
Phoneme parameter storage unit, 5. ... Sound source parameter storage unit, 6... Articulation calculation unit, 7... D/A converter, 8.
...Audio output device. Figure 2 Intonation waveform diagram of the embodiment □ Mora Figure 1 Flowchart of the embodiment Figure 3 Configuration diagram of the speech synthesizer 8-Speech output device Figure 4 Intonation waveform diagram Figure 6 Waveform diagram of the sound source wave pattern Fig. 7: Pitch characteristics due to interpolation processing Fig. 5: Data diagram of phoneme parameters Fig. 8: Conventional intonation waveform diagram Control method 3, relationship with the person making the amendment

Claims

[Claims]

(1) In a speech synthesis device that adjusts the pitch target value of each phoneme or syllable of an input sentence using intonation and accent type components to obtain the intonation of synthesized speech, depending on the accent type and mora number of the word or phrase. and second means for multiplying the pitch frequency adjustment amount S by the top of the accent type by the coefficient K to obtain the pitch frequency adjustment amount F of the waterfall portion of the accent type. . An intonation control method for a speech synthesizer, characterized in that a pitch target value of an accent type waterfall portion is additionally corrected by the pitch frequency adjustment amount.