JP3077981B2

JP3077981B2 - Basic frequency pattern generator

Info

Publication number: JP3077981B2
Application number: JP63266969A
Authority: JP
Inventors: 博也藤崎; 幹雄山口; 啓吉広瀬; 恒河井
Original assignee: Sumitomo Electric Industries Ltd
Current assignee: Sumitomo Electric Industries Ltd
Priority date: 1988-10-22
Filing date: 1988-10-22
Publication date: 2000-08-21
Anticipated expiration: 2015-08-21
Also published as: JPH02113299A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、音声合成装置、特に規則により基本周波数
パタンを生成する音声規則合成装置やテキスト合成装置
に用いられる基本周波数パタン生成装置に関するもので
ある。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech synthesizer, and more particularly to a fundamental frequency pattern generator used in a speech rule synthesizer or a text synthesizer that generates a fundamental frequency pattern according to rules. is there.

[Conventional technology]

文字・記号等を入力とした従来の音声合成装置（たと
えば、昭和61年度電子通信学会総合全国大会講演論文集
S26−５、昭和61年３月）の処理ブロック図を第２図に
示す。また上記音声合成装置への入力例を第１表に示
す。A conventional speech synthesizer that uses characters and symbols as input (for example, Proceedings of the 1986 IEICE General Conference)
FIG. 2 shows a processing block diagram of S26-5, March 1986). Table 1 shows an example of input to the speech synthesizer.

入力は、アクセント記号、フレーズ記号、休止記号及
び音節記号である。Inputs are accent symbols, phrase symbols, pause symbols, and syllable symbols.

アクセント記号はA1（0.40）、A2（0.26）及びA0であ
り、A1とA2によってアクセントの立上がりの音節境界
と、アクセントの大きさの種類を示し、A0でアクセント
の立下がりの音節境界を示す。アクセントの大きさは、
A1、A2の記号で示される２種類がある。（）内は実際
に割当てるアクセントの大きさを示す。The accent marks are A1 (0.40), A2 (0.26), and A0. A1 and A2 indicate the syllable boundary at the rising edge of the accent and the type of accent size, and A0 indicates the syllable boundary at the falling edge of the accent. The size of the accent is
There are two types indicated by the symbols A1 and A2. () Shows the size of the accent actually assigned.

フレーズ記号は、P1（0.43）、P2（0.26）、P3（0.1
2）及びP0であり、P1、P2、P3によってフレーズの開始
時点と大きさの種類を示し、P0はそれ以前にあるP1、P
2、P3によって生じたフレーズ成分を０に減らすことを
示している。（）内は割当てるフレーズの大きさを示
す。Phrase symbols are P1 (0.43), P2 (0.26), P3 (0.1
2) and P0, where P1, P2, and P3 indicate the starting point and size of the phrase, and P0 is the preceding P1, P
2, indicates that the phrase component generated by P3 is reduced to zero. The size in parentheses indicates the size of the phrase to be assigned.

休止記号は「。」（0.7秒）、「、」（0.3秒）「・」
（0.08秒）であり、その音節境界で休止がおかれるこ
と、すなわち間があくことを示している。（）内は休
止の長さを示す。The pause symbol is “.” (0.7 seconds), “,” (0.3 seconds) “•”
(0.08 seconds), indicating that there is a pause at the syllable boundary, that is, there is a pause. The number in parentheses indicates the length of the pause.

音節記号は「ス」「ズ」「メ」「ワ」……等の片仮名
で表された記号であり、音の種類を示している。The syllables are symbols represented by katakana such as "su", "zu", "me", "wa" ... and indicate the type of sound.

入力中、○のついたもの、たとえば「シ○」は無声化
された「シ」を示す。During the input, a symbol with a circle, for example, "shi" indicates a silenced "shi".

入力によって合成すべき音声が指示されるが、その合
成処理は次のようにして行なう。 The voice to be synthesized is specified by the input, and the synthesis process is performed as follows.

（１）音素的パラメータの生成音節記号により指定される音声を蓄積パタンから選び
出し、その蓄積パタン中に記載されているその音節固有
の時間長と、休止記号によって指定される休止時間をも
とにしてその音節の時点を決定する。(1) Generation of phonemic parameters A speech specified by a syllable symbol is selected from a storage pattern, and based on a time length specific to the syllable described in the storage pattern and a pause time specified by a pause symbol. To determine the time of the syllable.

次に、蓄積パタンに記載されているその音節固有の音
素的パラメータ、例えばホルマント周波数と帯域幅の時
間変化パタンを読み出し、先に定めた各音節の時点が満
たされるように、音素的パラメータを伸長・圧縮させな
がら、互いにつなぎあわせる。たとえば、ある音節C₁V₁
の時点をｔ＝０とし、次の音節C₂V₂がｔ＝140msecであ
り、C₁V₁の音素的パラメータがｔ＝100msecの分までし
か記述されておらず、またC₂V₂の音素的パラメータがｔ
＝140msecからの分からしか蓄積パタンに記述されてい
ないとしたら、ｔ＝100msecからｔ＝140msecまでの間
は、V₁の部分を引き伸ばすことにより補う。Next, the phonetic parameters specific to the syllable described in the storage pattern, such as the time-varying pattern of the formant frequency and the bandwidth, are read out, and the phonemic parameters are expanded so that the time points of the respective syllables defined above are satisfied.・ Connect each other while compressing. For example, a syllable C ₁ V ₁
At time t = 0, the next syllable C ₂ V ₂ is t = 140 msec, the phonetic parameters of C ₁ V ₁ are described only up to t = 100 msec, and C ₂ V ₂ Phonemic parameter is t
= If we do not describe only the accumulation pattern of minute from 140Msec, between the t = 100 msec to t = 140Msec is compensated for by stretching a portion of the V _1.

以上の処理によって、合成しようとする文音声の音素
的パラメータが得られ、音声合成器（たとえばホルマン
ト合成器）に送られて、音声信号生成に用いられる。Through the above processing, phonemic parameters of the sentence speech to be synthesized are obtained, sent to a speech synthesizer (for example, a formant synthesizer), and used for generating a speech signal.

（２）音源強度パタンの生成音源強度は、合成する音節の種類ごとに値を定め、ま
た休止の前・後では、減少・増加させる必要がある。音
節の種類に固有な値は、やはり蓄積パタンに記載されて
おり、音素的パラメータと同様の伸縮処理を行なってつ
なぎ合わせることで、目的とする文の基礎的な音源強度
パタンが得られる。さらに、休止、特に文と文の区切り
を示す「。」の休止の前後で、音源強度規則にしたがっ
て、一定量を減少・増量させることで、最終的な音源強
度パタンが得られ、音声合成器に送られて音声信号生成
に用いられる。(2) Generation of sound source intensity pattern The value of the sound source intensity must be determined for each type of syllable to be synthesized, and must be decreased or increased before and after pausing. The value unique to the syllable type is also described in the storage pattern, and by performing expansion and contraction processing similar to the phonemic parameters and joining them together, a basic sound source intensity pattern of the target sentence can be obtained. Further, before and after the pause, particularly before and after the pause of "." Which indicates a sentence-to-sentence separation, a certain amount is decreased or increased in accordance with the sound source intensity rule, thereby obtaining a final sound source intensity pattern. And is used for audio signal generation.

（３）基本周波数（声の高さ、記F₀で表す）パタン生成入力中には、フレーズ及びアクセントの時点が、どの
音節境界にあるかが示されており、しかも前述のように
音節の時点が決定しているので、音節の時点を基準とす
ることでフレーズおよびアクセントの時点を決められ
る。また入力中のフレーズ記号、アクセント記号の種類
によって実際に用いる値が決まっている（たとえば、A1
は0.40）ので、これによりフレーズ指令とアクセント指
令の大きさと時点を決めることができる。フレーズおよ
びアクセントの時点と大きさをもとにしてF₀パタンの生
成モデルの式により、F₀パタンの生成を行う。(3) fundamental frequency in (voice pitch, serial represented by F ₀₎ pattern generator input, the point of the phrase and accent, what is the syllable boundary is shown, moreover syllable as described above Since the time point is determined, the time point of the phrase and the accent can be determined based on the time point of the syllable. The actual value to be used is determined by the type of the phrase symbol and accent symbol being input (for example, A1
Is 0.40), so that the size and time of the phrase command and the accent command can be determined. The equation generation model of F ₀ pattern of time and size of a phrase and accent based, and generates the F ₀ pattern.

F₀パタン生成モデルを第３図に示す。F₀の時間変化パ
タンをF₀（ｔ）で表わし、次の式によって計算を行う。The F ₀ pattern generation model is shown in Figure 3. A time change pattern of the F ₀ expressed as F ₀ (t), performs calculation according to the following equation.

であり、それぞれ臨界制動二次線形系のインパルス応答
と、ステップ応答になっている。 Where the impulse response and the step response of the critical damping quadratic linear system, respectively.

α、βは応答の速さを決める定数であり、α＝3.0、
β＝20.0程度の値を用いる。α and β are constants that determine the response speed, α = 3.0,
Use a value of about β = 20.0.

１（１＋βｔ）exp（−βｔ）は、ｔが増加するに伴
って目標値1.0に漸近するが、有限の時間内でGa（ｔ）
を目標値に収束させるため、θ＝0.9として処理を行っ
ている。θ≦１の条件の場合Ga（ｔ）の目標値はθであ
る。1 (1 + βt) exp (−βt) asymptotically approaches the target value 1.0 as t increases, but Ga (t) within a finite time
Is converged to the target value, the processing is performed with θ = 0.9. In the case of θ ≦ 1, the target value of Ga (t) is θ.

Ｉは、その文章内に出てくるフレーズの数を示し、Ap
iはｉ番目に出てくるフレーズ指令の大きさを示す。た
とえば、P1（0.43）の記号で示されるフレーズが来るな
るAp_i＝0.43となる。To_iはそのフレーズ指令の時点を示
す。I indicates the number of phrases appearing in the sentence, Ap
i indicates the size of the i-th phrase command. For example, Ap _i = 0.43 where the phrase indicated by the symbol P1 (0.43) comes. To _i indicates the time of the phrase command.

Ｊは、その文章内に出てくるアクセントの数を示し、
Aa_jはｊ番目に出てくるアクセント指令の大きさを示
す。たとえば、１番目のアクセントとしてA1（0.40）の
記号で示されるアクセントが来るならば、Aa₁＝0.40と
なる。T_1j、T_2jは、ｊ番目のアクセント指令の開始時点
と終了時点を示す。J indicates the number of accents appearing in the text,
Aa _j indicates the size of the jth accent command. For example, if accent shown as the first accent symbol A1 (0.40) comes, the Aa ₁ = 0.40. T _1j and T _2j indicate the start point and end point of the j-th accent command.

1nFminは、定数項であり、声帯の振動可能最低周波数
に対応している。たとえば、男性音声を合成するとき
は、Fmin≒75Hz程度に、女性音声を合成するときはFmin
≒115Hz程度に設定する。1nFmin is a constant term and corresponds to the lowest vibrating frequency of the vocal cords. For example, when synthesizing male voice, Fmin ≒ 75Hz, and when synthesizing female voice, Fmin
≒ Set to about 115Hz.

F₀（ｔ）を計算するときは、前述の処理によって決ま
ったフレーズ指令の大きさと時点Ap_i、To_i（１≦ｉ≦
Ｉ）、アクセント指令の大きさと時点Aa_j、T_1j、T
_2j（１≦ｊ≦Ｊ）を前掲の式に当てはめて右辺を計算
し、その結果に対して対数の逆関数、すなわち指数関数
をとることにより、P₀（ｔ）を計算する。When calculating F ₀ (t), the size of the phrase command determined by the above-described processing and the time Ap _i , To _i (1 ≦ i ≦
I), the size of the accent command and the time points Aa _j , T _1j , T
_The right side is calculated by applying _2j (1 ≦ j ≦ J) to the above-described equation, and the inverse function of the logarithm, that is, the exponential function, is used to calculate P ₀ (t).

以上の処理によって得られたF₀（ｔ）、すなわち基本
周波数パタンは、音声合成器に送られ、音声信号の生成
に用いられる。F ₀ (t) obtained by the above processing, that is, the fundamental frequency pattern is sent to the speech synthesizer and used for generating a speech signal.

上述の処理に用いられるハードウェアは、音声合成器
（たとえばホルマント合成器）は、信号処理プロセッサ
により実現されており、入力信号から音声合成器への入
力を作成するまでの処理は、マイクロプロセッサによっ
て処理される。蓄積パタンは、マイクロプロセッサのア
クセスするROMに記憶される。The hardware used for the above-described processing is such that a speech synthesizer (for example, a formant synthesizer) is realized by a signal processor, and the processing up to creating an input to the speech synthesizer from an input signal is performed by a microprocessor. It is processed. The storage pattern is stored in a ROM accessed by the microprocessor.

また、F₀（ｔ）の計算式の計算などはマイクロプロセ
ッサのプログラムで実現されている。The calculation of the formula of F ₀ (t) is realized by a program of a microprocessor.

以上の説明では、音節記号、休止記号、アクセント記
号、フレーズ記号を入力して、音声を合成する場合の処
理を示したが、漢字仮名混じり文章を入力する音声合成
装置も知られている。この場合、漢字仮名混じり文を前
述した音声記号、休止記号、アクセント記号、フレーズ
記号に変換する処理が必要である。この処理は、入力文
章を単語単位に分かち、単語辞書を検索することにより
読みを決定し、同じく単語辞書に書かれたアクセント型
により、アクセントの上り下りの音節境界を決定し、ア
クセントの大きさを割り当てる処理によって行なわれ
る。In the above description, the processing of synthesizing speech by inputting syllable symbols, pause symbols, accent symbols, and phrase symbols has been described. However, a speech synthesizer that inputs a sentence mixed with kanji kana is also known. In this case, it is necessary to convert the sentence mixed with the kanji kana into the above-mentioned phonetic symbols, pause symbols, accent symbols, and phrase symbols. In this process, the input sentence is divided into words, the reading is determined by searching the word dictionary, the syllable boundaries of the accents are determined by the accent type written in the word dictionary, and the accent size is determined. Is performed.

次に、従来技術による休止記号・フレーズ記号を合成
しようとする文への与え方を説明する（前掲論文参
照）。Next, a description will be given of how to give a pause symbol / phrase symbol to a sentence to be synthesized according to the prior art (see the above-mentioned paper).

この与え方は、文の統語構造（構文）に基づいてお
り、次の通りである。なお、統語構造から休止記号・フ
レーズ記号を導く導出規則は、これ以外にも種々の変形
・改良版がある。This way of giving is based on the syntactic structure (syntax) of the sentence, and is as follows. There are various other modified and improved versions of the derivation rules for deriving pause symbols and phrase symbols from the syntactic structure.

（１）文の句点に“。”とP0、文頭にP1を置く。(1) Put “.” And P0 at the period of the sentence and P1 at the beginning of the sentence.

（２）文の読点に“、”とP0とP1を置く。(2) Put “,” and P0 and P1 at the reading point of the sentence.

（３）読点（なければ句点）と比較して統語的に大き
な語境界、または１段小さな語境界には“・”とP2を置
く。(3) Put “•” and P2 on a word boundary that is syntactically larger than a reading point (or a punctuation mark if not) or a word boundary one step smaller.

（４）読点と比較して２段小さな語境界にはP3を置
く。(4) Put P3 at the word boundary two steps smaller than the reading point.

（５）ただし修飾関係にある語の境界では前記（３）
（４）にかかわらず休止・フレーズ記号をおかない。(5) However, at the boundary between words having a qualifying relationship,
Regardless of (4), do not leave pause / phrase symbols.

（６）以上のようにして設定した休止・フレーズの間
隔がある程度以上離れていた場合（通常の発話速度で13
モーラ程度）、大きい語境界の順にP3を追加する。(6) When the pause / phrase interval set as described above is at least a certain distance apart (at a normal utterance speed of 13
P3 is added in the order of larger word boundaries.

（７）統語上の境界を特に示す必要がある場合には小
さな語境界でもP3を置く。(7) If it is necessary to specifically indicate a syntactic boundary, place P3 even on a small word boundary.

（８）すべてのP2について直前のP1、P2との間隔をし
らべ、間隔が小さければ（通常の発話速度で４モーラ程
度）P3に変更する。(8) Check the intervals between P1 and P2 immediately before for all P2s, and change them to P3 if the intervals are small (about 4 mora at normal utterance speed).

[Problems to be solved by the invention]

式によりフレーズ成分を計算する場合、フレーズ指
令が短い時間間隔で続くと、前のフレーズ指令によるフ
レーズ成分が大きく残っている間に次のフレーズ指令に
よりフレーズ成分を追加することになり、フレーズ成分
全体としての大きさはかなり大きくなる。ところが自然
音声の発話では、声の高さは生理的制約によりむやみに
高くなることはない。このため、フレーズ指令が短い時
間間隔で続く場合は、前記（８）の処理により、フレー
ズ指令の大きさを小さくする必要があった。When calculating the phrase component by the formula, if the phrase command continues for a short time interval, the phrase component will be added by the next phrase command while the phrase component by the previous phrase command remains largely, and the entire phrase component will be added. As for the size becomes considerably large. However, in the utterance of natural speech, the pitch of the voice does not increase unnecessarily due to physiological constraints. For this reason, when the phrase command continues at short time intervals, it is necessary to reduce the size of the phrase command by the process (8).

すなわち、フレーズ指令自体は本質的には文の統語構
成を反映して定まると考えられるが、実際に与えるフレ
ーズ指令の大きさはそれまでのフレーズ指令により修正
する必要がある。そのため、文の統語構成とフレーズ指
令の大きさは対応が明瞭ではなくなり、フレーズ指令の
与え方の規則は見通しが悪くならざるをえなかった。In other words, it is considered that the phrase command itself is essentially determined by reflecting the syntactic structure of the sentence, but the size of the phrase command actually given needs to be corrected by the previous phrase command. For this reason, the correspondence between the syntactic structure of the sentence and the size of the phrase command was not clear, and the rules for giving the phrase command had to be poorly prospected.

本発明は、この生理的制約に相当するフレーズ指令の
大きさ修正処理をフレーズ成分生成処理に内在させ、も
って見通しのよい韻律生成規則を可能とすることを目的
としている。An object of the present invention is to make the processing for correcting the size of a phrase command corresponding to the physiological constraint inherent in the phrase component generation processing, thereby enabling a prosody generation rule with good visibility.

[Means for solving the problem]

本発明は、フレーズ指令としてフレーズ成分の目標値
を与え、フレーズ指令の大きさはその目標値に達するた
めに要するフレーズ成分の大きさから求めることを特徴
とする。The present invention is characterized in that a target value of a phrase component is given as a phrase command, and the size of the phrase command is obtained from the size of the phrase component required to reach the target value.

その構成は、第３図に示すように、制御部１、フレー
ズ制御機構２、アクセント制御機構３及び声門振動機構
４とからなる。As shown in FIG. 3, the configuration comprises a control unit 1, a phrase control mechanism 2, an accent control mechanism 3, and a glottal vibration mechanism 4.

制御部１は次の第１の処理と第２の処理を行う。即
ち、第１の処理は、入力される文字・記号等から、フレ
ーズ指令とアクセント指令を決定する。フレーズ指令は
時点T0_iと大きさAp_iを備え、アクセント指令は時点T1_i
−T2_iと大きさAa_iを備える。第２の処理は、第１図に示
すように、ｉ番目のフレーズ指令（時点T0_i）によって
生じるフレーズ成分の最大値を目標値Tp_iとして決定
し、目標値Tp_iとｉ−１番目までのフレーズ成分との差
を目標未到達分計算部1aにおいて演算する。その演算に
よって得られた未到達分に基づいてｉ番目のフレーズ指
令の大きさ（Ap_i）を決定する。The control unit 1 performs the following first processing and second processing. That is, the first process determines a phrase command and an accent command from input characters, symbols, and the like. The phrase command has a time point T0 _i and a size Ap _i , and the accent command has a time point T1 _i
-T2 _i and size Aa _i . The second process, as shown in FIG. 1, the maximum value of the phrase component caused by the i-th phrase command (time T0 _i) is determined as a target value Tp _i, to the target value Tp _i and i-1 th Is calculated in the target unreached portion calculation unit 1a. The size (Ap _i ) of the i-th phrase command is determined based on the unreached portion obtained by the calculation.

フレーズ制御機構２は、第１図に示すように、前記の
第１の処理で決定したフレーズ指令の時点（T0_i）、及
び第２の処理で決定したフレーズ指令の大きさ（Ap_i）
に基づいてフレーズ成分を求める。As shown in FIG. 1, the phrase control mechanism 2 determines the time point (T0 _i ) of the phrase command determined in the first process and the size (Ap _i ) of the phrase command determined in the second process.
The phrase component is obtained based on

アクセント制御機構３は、前記の第１の処理で決定し
たアクセント指令からアクセント成分を求める。The accent control mechanism 3 obtains an accent component from the accent command determined in the first processing.

声門振動機構４は、前記のフレーズ成分とアクセント
成分を合わせて基本周波数パタンを求めて出力する。The glottal vibration mechanism 4 determines and outputs a fundamental frequency pattern by combining the phrase component and the accent component.

なお、前記の第２の処理の他の例として、後述の第１
実施例で示すように、ｉ番目のフレーズ指令によって生
じるフレーズ成分の最大値を第１目標値Tpiとして決定
し、ｉ−１番目のフレーズ指令によって生じるフレーズ
成分の最大値を第２目標値Tpi_-1として決定し、第１、
第２目標値及びそれらの時間間隔に基づいて、ｉ番目の
フレーズ指令の大きさを決定するようにしてもよい。As another example of the second processing, a first processing described later
As shown in the examples, i-th phrase the maximum value of the phrase component caused by the command is determined as the first target value Tpi, i-1 th phrase second target value the maximum value of the phrase component caused by a command Tpi _- determined as _1, first,
The magnitude of the i-th phrase command may be determined based on the second target values and their time intervals.

〔第１実施例〕フレーズ成分の正の目標値として２通りある場合の実
施例を説明する。[First Embodiment] An embodiment in which there are two positive target values of a phrase component will be described.

式におけるGp（ｔ）は、ｔ＝1/αにおいて最大値Gp
（1/α）＝α/e（ｅは自然対数の底）を取る（第２
表）。そこで、従来技術でのP1に対するフレーズ指令の
大きさとして0.43を割り当てていたのを、α/e倍して、
第１のフレーズ成分の目標値は0.43×α/eとする。α＝
3.0、ｅ＝2.71828とすると、フレーズ成分の目標値は0.
47となる。第２のフレーズ成分の目標値として、第１の
目標値の８割すなわち、0.47×0.8＝0.376とする。Gp (t) in the equation is a maximum value Gp at t = 1 / α.
(1 / α) = α / e (e is the base of natural logarithm)
table). Therefore, the value of 0.43 assigned as the size of the phrase command for P1 in the prior art is multiplied by α / e,
The target value of the first phrase component is 0.43 × α / e. α =
Assuming that 3.0 and e = 2.71828, the target value of the phrase component is 0.
It becomes 47. The target value of the second phrase component is set to 80% of the first target value, that is, 0.47 × 0.8 = 0.376.

次に、フレーズ成分の目標未到達分の計算実施例を第
４図（ｂ）を用いて説明する。ｉ−１番目迄のフレーズ
指令によるフレーズ成分に対し、時刻ｔ＝０でｉ番目の
フレーズ指令が発生してフレーズ成分が追加される場
合、フレーズ成分が極大値を取る時刻t_maxは1/αと異な
る。しかし、t_maxは簡単な計算では求まらないこと、聴
覚上はフレーズ指令の大きさは必ずしも厳密に制御する
必要はないこと、を考慮して、ｔ＝1/αにおけるフレー
ズ成分未到達分をｉ番目のフレーズ指令の大きさの計算
に用いることとする。すなわち、第４図（ｂ）では、Tp
_i−ｃがフレーズ成分未到達分である。Next, an example of calculating the phrase component unattainment will be described with reference to FIG. 4 (b). When the i-th phrase command is generated at time t = 0 and the phrase component is added to the phrase component by the (i-1) -th phrase command, the time t _{max at} which the phrase component takes the maximum value is 1 / α. And different. However, considering that t _max cannot be obtained by a simple calculation, and that the magnitude of the phrase command does not necessarily have to be strictly controlled audibly, the unreached portion of the phrase component at t = 1 / α is considered. Is used for calculating the size of the i-th phrase command. That is, in FIG. 4 (b), Tp
_i- c is the phrase component unreached part.

最後に、フレーズ指令の大きさの計算実施例を説明す
る。フレーズ指令の大きさは、第４図（ａ）より、フレ
ーズ成分未到達分のe/αとすればよい。そこで、フレー
ズ成分未到達分がTp_i−ｃとすると、与えるべきフレー
ズ指令の大きさは、（Tp_i−ｃ）×e/αとなる。ただ
し、与えるべきフレーズ指令の大きさが負の値になった
場合は、フレーズ指令の大きさは０とする（ｉ番目のフ
レーズ指令を生成しない）。Finally, an embodiment of calculating the size of the phrase command will be described. From FIG. 4 (a), the magnitude of the phrase command may be set to e / α for the phrase component unreached. Therefore, assuming that the phrase component has not been reached is Tp _i −c, the size of the phrase command to be given is (Tp _i −c) × e / α. However, when the size of the phrase command to be given becomes a negative value, the size of the phrase command is set to 0 (i-th phrase command is not generated).

なお、負のフレーズ記号P0は、フレーズ成分の下がり
を実現するためなので、従来技術と同様に−0.5を固定
的に割り当ててもよく、また、フレーズ成分の目標値と
して０を割り当てて、正のフレーズ指令と同様にしてフ
レーズ指令の大きさを求めてもよい。Since the negative phrase symbol P0 is for realizing a decrease in the phrase component, -0.5 may be fixedly assigned in the same manner as in the related art. Alternatively, 0 is assigned as the target value of the phrase component, and a positive value is assigned. The size of the phrase command may be obtained in the same manner as the phrase command.

〔第２実施例〕聴覚上は、フレーズ指令の大きさは厳密に制御する必
要がない点に着目し、第１実施例よりも計算が簡単な実
施例を次に説明する。Second Embodiment Focusing on the fact that the size of a phrase command does not need to be strictly controlled in terms of hearing, an embodiment in which calculation is simpler than in the first embodiment will be described below.

ｉ−１番目迄のフレーズ指令によるフレーズ成分を求
めるときは、Gp（ｔ）＝α²texp（−αｔ）の計算を行
う必要があるが、関数は計算時間がかかる。そこで、こ
れを表に記載しておいて検索すれば、計算時間を短縮で
きる。そして、フレーズ指令の大きさが厳密さに欠けて
も聴覚上差し障りがないので、この表としては比較的粗
い時間間隔（たとえば0.1秒）で記載することで記憶容
量を減らすことができる（第２表）。To find the phrase components according to the (i-1) th phrase command, it is necessary to calculate Gp (t) = α ² texp (−αt), but the function takes a long time to calculate. Therefore, if this is described in a table and searched, the calculation time can be reduced. Even if the size of the phrase command is not strict, there is no problem in hearing. Therefore, the table can be described at relatively coarse time intervals (for example, 0.1 seconds) to reduce the storage capacity (second example). table).

さらに、Gp（ｔ）の関数はｔの増大にともなって徐々
に０に漸近するので、ｉ−１番目迄のフレーズ指令によ
るフレーズ成分はｉ−１番目のフレーズ指令によるフレ
ーズ成分が主であり、ｉ−２番目迄のフレーズ指令によ
る寄与分は少ない（第５図）。Further, since the function of Gp (t) gradually approaches 0 with an increase in t, the phrase components according to the phrase commands up to the (i-1) th are mainly the phrase components according to the (i-1) th phrase command. The contribution of the phrase command up to the (i-2) th order is small (FIG. 5).

以上の観点から、ｉ−１番目のフレーズ指令の大きさ
を用いて、ｉ番目のフレーズ指令の大きさを求める実施
例を次に示す。From the above point of view, the following describes an embodiment in which the size of the i-th phrase command is obtained using the size of the (i-1) -th phrase command.

まず、第１実施例と同様に、ｉ番目のフレーズ指令の
時点から1/α後のフレーズ成分の大きさを求める。この
時点のフレーズ指令の大きさは、ｉ−１番目のフレーズ
指令の時点からの経過時間で第２表を検索してGp（ｔ）
の値を求め、ｉ−１番目のフレーズ指令の大きさを掛け
合わせることで求め、この値を以降はｃと書く。First, as in the first embodiment, the magnitude of the phrase component 1 / α after the time of the i-th phrase command is obtained. The magnitude of the phrase command at this point is obtained by searching Table 2 with the elapsed time from the time of the (i-1) -th phrase command and obtaining Gp (t)
Is obtained by multiplying by the magnitude of the (i-1) -th phrase command. This value is hereinafter referred to as c.

次に、フレーズ成分未到達分を求めるとTp_i−ｃとな
る。最後に、ｉ番目のフレーズ指令の大きさは第１実施
例の場合にして（Tp_i−ｃ）×e/αとする。 Next, when the phrase component has not been reached, Tp _i −c is obtained. Finally, the magnitude of the i-th phrase command is (Tp _i -c) × e / α in the case of the first embodiment.

〔第３実施例〕第２実施例では、直前のフレーズ指令の大きさをもと
にしてフレーズ成分未到達分を求めているが、直前のフ
レーズ指令の目標値をもとにしてフレーズ成分未到達分
を近似することもできる。[Third Embodiment] In the second embodiment, the phrase component unreached portion is obtained based on the size of the immediately preceding phrase command. However, the phrase component unachieved portion is determined based on the target value of the immediately preceding phrase command. The arrival can be approximated.

さらに、直前のフレーズ指令からの時間間隔として拍
数によってカウントすることもできる。Further, the time interval from the immediately preceding phrase command can be counted by the number of beats.

これらの点を考慮した実施例を次に示す。 An embodiment considering these points will be described below.

発話速度をｍ拍／秒とすると、Ｉ拍はなれた時間間隔
はI/mである。直前のフレーズ指令の目標値Tp_bに対し
て、Tp_b×e/αの大きさのフレーズ指令が生起している
とする。Ｉ拍はなれた次のフレーズ指令の目標値Tp_nに
対する未到達分ｄはｄ＝Tp_n−Tp_b×e/α×Gp（I/m＋1/
α）となる。よって次のフレーズの指令の大きさは、Ap
_n＝ｄ×e/αとなる。Assuming that the utterance speed is m beats / sec, the time interval after I beat is I / m. It is assumed that a phrase command having a magnitude of Tp _b × e / α has occurred with respect to the target value Tp _b of the immediately preceding phrase command. I beat is not reached fraction d with respect to the target value Tp _n of the next phrase command accustomed _{_{d = Tp n -Tp b × e}} / α × Gp (I / m + 1 /
α). Therefore, the size of the command of the next phrase is Ap
_n = d × e / α.

フレーズ指令の大きさとして0.47と0.376の２通りあ
る場合のAp_nの値を種々のＩに対して表に示すと第３表
の通りとなる。実際にフレーズ指令を与える場合、Ap_n
が負の場合にはAp_n＝０とする。すなわち、フレーズ指
令は生起しない。直前のフレーズ指令によるフレーズ成
分は、徐々に減衰するので、Ｉ＞20の場合はＩ＝20と同
じ扱いにすればよい。Table 3 shows the values of Ap _n when there are two types of phrase commands, 0.47 and 0.376, for various I values. When actually giving a phrase command, Ap _n
If _n is negative, Ap _n = 0. That is, no phrase command occurs. Since the phrase component due to the immediately preceding phrase command is gradually attenuated, if I> 20, it may be treated the same as I = 20.

〔第４実施例〕第３実施例は、フレーズ指令の大きさとして（Tp_i−
ｃ）×e/αの値をそのまま用いているが、フレーズ指令
の大きさを何段階かに量子化を行うことも可能である。
すなわち、フレーズ指令の大きさとして、0.43、0.26、
0.12の３種類（それぞれ、P1、P2、P3と記号をつける）
用意しておき、Ap_nに最も近い値を採用することもでき
る。 Fourth Embodiment In the third embodiment, the size of the phrase command is (Tp _i −
c) Although the value of xe / α is used as it is, it is also possible to quantize the size of the phrase command in several steps.
That is, as the size of the phrase command, 0.43, 0.26,
0.12 (P1, P2, P3, respectively)
It is also possible to prepare and use a value closest to Ap _n .

この観点から第３表を書き直したものを第４表に示
す。この表によれば、極めて簡便にフレーズ指令の大き
さを決めることができる。Table 4 rewrites Table 3 from this viewpoint. According to this table, the size of the phrase command can be determined very easily.

〔第５実施例〕第１実施例では、ｉ番目のフレーズ指令の時点から1/
α時間後の、ｉ−１番目迄のフレーズ指令によるフレー
ズ成分の大きさｃを必要としていた。処理上の簡便さの
点では、将来の時点の値を予想した値を使うよりも現在
までに判明している値で処理が行える方が簡便である。
そこで、ｉ番目のフレーズ指令の時点で得られるフレー
ズ成分の大きさｃ′を、ｃの代わりに用いる実施例を次
に説明する。 [Fifth Embodiment] In the first embodiment, 1 / from the time of the i-th phrase command.
After α hours, the size c of the phrase component according to the (i-1) th phrase command was required. In terms of simplicity in processing, it is more convenient to perform processing with a value known up to now than to use a predicted value at a future time point.
Therefore, an embodiment in which the size c 'of the phrase component obtained at the time of the i-th phrase command is used instead of c will be described below.

まず、ｉ番目のフレーズ指令の時点To_iにおいて、フ
レーズ成分の値ｃ′を求める。音声波形生成のために基
本周波数パタンを求める処理を行っているので（第２図
参照）、時点To_iにおいて、フレーズ成分の値ｃ′を取
り出せばよい（第６図）。First, at the time point To _i of the i-th phrase command, the value c ′ of the phrase component is obtained. Since the processing for obtaining the fundamental frequency pattern is performed for generating the voice waveform (see FIG. 2), the value c ′ of the phrase component may be extracted at the time point To _i (FIG. 6).

次に、フレーズ成分未到達分を求めるとTp_i−ｃとな
る。最後にｉ番目のフレーズ指令の大きさは第１実施例
の場合と同様にして（Tp_i−ｃ′）×e/αとする。Next, when the phrase component has not been reached, Tp _i −c is obtained. Finally i th the size of a phrase command and if the in the same manner _{(Tp i -c ') × e} / α of the first embodiment.

〔The invention's effect〕

本発明によれば、フレーズ成分が大きくなり過ぎない
様にするための、フレーズ指令の大きさ設定処理がフレ
ーズ成分生成処理に内在させることができる。このこと
により、従来は必要であったフレーズ指令の大きさ修正
処理が不要になり、フレーズ指令生成規則が簡単になり
見通しが良くなる。According to the present invention, the phrase command size setting process for preventing the phrase component from becoming too large can be included in the phrase component generation process. As a result, the process of correcting the size of the phrase command, which was conventionally required, is not required, and the rule for generating the phrase command is simplified, and the visibility is improved.

[Brief description of the drawings]

第１図は、本方式のフレーズ指令の大きさ算出処理概念
図第２図は、音声合成装置の処理ブロック図第３図は、Foパタン（基本周波数パタン）生成モデル第４図の（ａ）及び（ｂ）は、追加のフレーズ成分説明
図第５図は、フレーズ成分の大きさ算出説明図第６図は、フレーズ成分未到達分簡略計算法の説明図である。１……制御部 1a……目標未到達分計算部２……フレーズ制御機構３……アクセント制御機構４……声門振動機構FIG. 1 is a conceptual diagram of a process for calculating the size of a phrase command according to the present method. FIG. 2 is a processing block diagram of a speech synthesizer. FIG. 3 is a Fo pattern (fundamental frequency pattern) generation model. FIGS. 5A and 5B are diagrams for explaining additional phrase components. FIG. 5 is a diagram for explaining the calculation of the size of a phrase component. FIG. 1 Control unit 1a Target unreachable calculation unit 2 Phrase control mechanism 3 Accent control mechanism 4 Glottal vibration mechanism

───────────────────────────────────────────────────── フロントページの続き (72)発明者河井恒千葉県船橋市前原東５丁目50番８号 (56)参考文献特開平２−48700（ＪＰ，Ａ) 特開昭64−35599（ＪＰ，Ａ) 特開昭64−28695（ＪＰ，Ａ) 特開昭63−259692（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 13/08 G10L 13/04 G10L 13/06 ──────────────────────────────────────────────────続き Continuation of the front page (72) Inventor Tsune Kawai 5-50-8 Maeharahigashi, Funabashi City, Chiba Prefecture (56) References JP-A-2-48700 (JP, A) JP-A 64-35599 (JP) , A) JP-A-64-28695 (JP, A) JP-A-63-259692 (JP, A) (58) Fields investigated (Int. Cl. ⁷ , DB name) G10L 13/08 G10L 13/04 G10L 13/06

Claims

(57) [Claims]

1. A fundamental frequency pattern generation device comprising a control unit, a phrase control mechanism, an accent control mechanism, and a glottal vibration mechanism, wherein the control unit performs first and second processing, and the first processing includes inputting A phrase command and an accent command each having a time and a size are determined from the characters / symbols to be performed, and the second process is to determine a maximum value of a phrase component generated by the i-th phrase command as a target value Tpi. , Target value T
The difference between pi and the phrase component up to the (i-1) th phrase component is determined as the unarrived portion, and the size of the i-th phrase command is determined based on the unarrived portion. At the time of the phrase command determined in the above, a phrase component is obtained based on the size of the phrase command determined in the second process. The accent control mechanism obtains an accent component from the accent command determined in the first process. The fundamental frequency pattern generating apparatus, wherein the glottal vibration mechanism obtains and outputs a fundamental frequency pattern by combining the phrase component and the accent component.

2. A fundamental frequency pattern generator comprising a control unit, a phrase control mechanism, an accent control mechanism, and a glottal vibration mechanism, wherein the control unit performs first and second processing, and the first processing includes inputting A phrase command and an accent command each having a time and a size are determined from the characters / symbols to be executed, and the second processing is to use the maximum value of the phrase component generated by the i-th phrase command as a first target value Tpi. Decide, i
The maximum value of the phrase component generated by the -1st phrase command is determined as the second target value Tpi _- 1, and the magnitude of the i-th phrase command is determined based on the first and second target values and their time intervals. The phrase control mechanism obtains a phrase component based on the phrase command determined in the first processing and the size of the phrase command determined in the second processing. The accent control mechanism determines A fundamental frequency pattern generating device, wherein an accent component is obtained from the accent command determined in the first processing, and the glottal vibration mechanism obtains and outputs a fundamental frequency pattern by combining the phrase component and the accent component.

3. The fundamental frequency according to claim 1, wherein the phrase control mechanism obtains the phrase component by an impulse response of a critical braking quadratic linear system of the phrase command. Pattern generator.

4. The equation of the impulse response of the critical damping quadratic linear system is: The difference between the target value of the i-th phrase command and the phrase component from the time point of the i-th phrase command to the (i-1) -th time point 1 / α after the time point is determined as the unarrived part. Item (3): The fundamental frequency pattern generator.

5. The fundamental frequency pattern generation device according to claim 1, wherein the control unit determines the magnitude of the phrase command as 0 when the unreached portion is negative. .

6. The apparatus according to claim 1, wherein the control unit determines the unarrived part based on the size of the (i-1) th phrase command and the time interval between the i-th and (i-1) th phrase commands. Item (1): The fundamental frequency pattern generator.

7. The controller according to claim 1, wherein the difference between the target value of the i-th phrase command and the phrase component up to the (i-1) -th phrase component at the time of the i-th phrase command is determined as an unreached portion. The fundamental frequency pattern generation device according to claim (1).