JPS5981697A

JPS5981697A - Voice synthesization by rule

Info

Publication number: JPS5981697A
Application number: JP19086182A
Authority: JP
Inventors: 武田　昌一; 市川　「あきら」
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1982-11-01
Filing date: 1982-11-01
Publication date: 1984-05-11

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔発明の利用分野〕本発明は、規則による音声合成方法に関し、峙に音節コ
ード列とアクセント型から規則により高品質のアクセン
）ｋもつ任意の単語音声２合成する方法に関する。[Detailed Description of the Invention] [Field of Application of the Invention] The present invention relates to a speech synthesis method based on rules, and a method for synthesizing two arbitrary word speeches with high-quality accents based on syllable code strings and accent types using rules. Regarding.

[Prior art]

従来、規則により音声？合成する際、アクセント？与え
る方法として、ピッチ周波数の時間（化バタンで表わし
たとき、４１図に示づ〜ように、］音節間が一定値とな
る階役状バタン等１種々の直線近似パタンを用いている
。しかし、こグ）Ｊコうな直線近似パタンは９本来の人
間の実音声？分析して抽出される自然なピッチパタンを
徂く近似したパタンであって、これにより得られるアク
セントがある種度不自然な感じになるのは免ｉ′Ｌない
。Conventionally, audio according to the rules? Accent when compositing? As a method for giving the pitch frequency, various linear approximation patterns are used, such as a scale-like batan, in which the interval between syllables is a constant value (as shown in Figure 41 when expressed as a batan). However, , Kog) Is the straight line approximation pattern 9 the original real human voice? This pattern is a close approximation of the natural pitch pattern extracted by analysis, and it is inevitable that the resulting accent will feel somewhat unnatural.

[Purpose of the invention]

本発明の目的は、このような従来の欠点？改善するため
１人間の実音声の分析により抽出されるピッチパタンと
きわめてよく一致するピッチ周波数の時間変化パタンを
人工的に発生させ、任意の単語のアクセントに対応する
ピッチパタンを得ることができる規則による音声合成方
法をト早供することにある。Is the purpose of the present invention to overcome these conventional drawbacks? In order to improve the performance, a time-varying pattern of pitch frequency that closely matches the pitch pattern extracted by analyzing real human speech is artificially generated, and a pitch pattern corresponding to the accent of any word can be obtained. The purpose of this invention is to quickly provide a speech synthesis method based on the method.

[Summary of the invention]

本発明の規則による音声合成方式は、アクセント型番号
と音節あるいは音韻に対応する音節コード番号とを入力
することにより、音韻持続時間？決定し、決定された音
引持続時間と上記アクセント型番号とからピッチ周波数
の時間変化パタンを発生することにより音声波形を合成
する音声合成装置において、上記ピッチ周波数の時間変
化バタンを臨界制動２次線形システムの出力として発生
させ、核臨界制動２次線形システムのパラメータを、上
記アクセント型と斤量に対応して登録されているテーブ
ルを検査することによって求めることに特徴を有する。The speech synthesis method according to the rules of the present invention is able to determine the duration of a phoneme by inputting an accent type number and a syllable code number corresponding to a syllable or phoneme. In a speech synthesizer that synthesizes a speech waveform by determining and generating a time-varying pattern of pitch frequency from the determined accent duration and the accent type number, the time-varying bang of pitch frequency is controlled by critical damping It is characterized in that it is generated as an output of a linear system, and the parameters of the nuclear critical braking quadratic linear system are determined by checking a table registered in correspondence with the accent type and basis weight.

[Embodiments of the invention]

以下１本発明の原理および実施例２１図面により説明す
る。The principle of the present invention and embodiment 21 will be explained below with reference to the drawings.

本発明においては、ピッチバタン発生法として。In the present invention, as a pitch bang generation method.

ピッチ制御機構モデルによる方法シ用いることに品質？
向上させている。このモデルは、ピッチ周波数の時間変
化バタン？臨界制動２次線形システムのステップ応答と
して実現したものでアル。また、このモデルは、ピッチ
周波数の時間変化パターンは、脳において発生されるス
テップ状の指令が声帯振動制御機構の機械的特性により
平滑化されて得られるものであるという考え方で干デル
化されたもので、このモデルによる予測値は相対誤差±
４％という高い精度で実音声の分析による実測値と一致
することが実険的に確昭されている０また。上記モデル
によるピッチバタン？用いで合成した音声と１分析・合
成による音声との間で、アクセント感の差異が殆んど識
別できないほど高い自然性が得られることも、知覚的に
・ｌ存留されている。Is there any quality in using the pitch control mechanism model method?
Improving. Is this model a time-varying pitch frequency bump? This is realized as a step response of a critical braking quadratic linear system. In addition, this model was developed based on the idea that the temporal change pattern of pitch frequency is obtained by smoothing the step-like commands generated in the brain by the mechanical characteristics of the vocal fold vibration control mechanism. The predicted value by this model has a relative error ±
It has been practically confirmed that it matches the actual value measured by analyzing real speech with a high accuracy of 4%. Pitch slam by the above model? It has also been perceptually maintained that a high degree of naturalness is obtained, with the difference in accent feeling being almost indistinguishable between the speech synthesized using the method and the speech synthesized after analysis.

第２図は、ピッチ制御機構モデルにより発生させたピッ
チ周波数の時間変化パタンの一例図である０ピッチ周波数は、緩かで大まかな変化を表わす声立て成
分（第２図の点線の成分）と１局所的な細かめ変化２表
わすアクセント成分ｋ　＝を畳したもの（第２図の実線
で表示）として９次式のように定式化される。Figure 2 is an example of a time change pattern of pitch frequency generated by the pitch control mechanism model. It is formulated as a ninth-order equation as the sum of (1) local fine changes and (2) the accent component k= (indicated by the solid line in FIG. 2).

Ｊ、　（ｆ（ｔ）／　ｆｖ（Ｇｖ（を−τｖｌ）　−Ｇ
ｖ（を−τ７□）十Ａａ（を−τａ　１　）　　”ａ　
（’−τ３□））・・・・・・・・・（１）Ｇｖ（ｔ）
　＝αｔｅ”””μ（ｔ）　　　　　　　　−−−−−
−−・・（２）−（）ａ（ｔ）　＝　（１−（１＋βｔ
）ｅ−”）Ａｔ）　　　＝−−・−（３）ここで、ＡＶ
は声立て成分の振幅、Ａ、ｌはアクセント成分の振幅、
αは角立て成分の固有角周波ｑ。J, (f(t)/fv(Gv(−τvl) −G
v(-τ7□) 10Aa(-τa 1) ”a
('−τ3□))・・・・・・・・・(1) Gv(t)
=αte”””μ(t) ------
−−・(2)−()a(t) = (1−(1+βt
)e−”)At) =−−・−(3) Here, AV
is the amplitude of the vocalization component, A, l is the amplitude of the accent component,
α is the natural angular frequency q of the square component.

βはアクセント成分の固有角周波数、τｙｌｓτ７□τ
ｇｌｓ　　τ８□はそれぞれ声立て開始、声立て終了。β is the natural angular frequency of the accent component, τylsτ7□τ
gls τ8□ starts raising its voice and ends raising its voice, respectively.

アクセント開始、アクセント終了の時刻２表わすパラメ
ータ＊　　ｆｍｉｎは最低周波数であり、μ（１）は単
位ステップ関数である。Parameter *fmin representing the time 2 of accent start and accent end is the lowest frequency, and μ(1) is a unit step function.

（１）〜（３）式（こおいて、９種類のパラメータＡＶ
。Equations (1) to (3) (where, nine types of parameters AV
.

Ａａ、α、β、τｖｌ、τｙ２＋＋　τａｌｅ　τａ２
およびｆｍｉｎシ適当に制御することによって、さまざ
まなピッチ周波数の時間変化バタン２発生させることが
できる。Aa, α, β, τvl, τy2++ τale τa2
By appropriately controlling the fmin and fmin, it is possible to generate time-varying bangs 2 with various pitch frequencies.

ここで説明の便宜上、アクセントは東京アクセントを用
いるとともに１次のように定峻する。すなわち、ｍ拍単
醋の１型（ｉ＝１，２．・・・、ｍ）のアクセントとは
、アクセント核が後からｉ拍目（前からｍ＋１−ｉ拍目
）にある場合２いうＯアクセ〉ト核とは、アクセントの
ある音節の終り近くからピッチ周波数が急激に下降する
部分？いう０さらに、アクセント核のない平板型のアク
セントを０型と呼ぶこととし、後に助詞を付した場合。Here, for convenience of explanation, the Tokyo accent is used and the accent is fixed in a linear manner. In other words, a type 1 (i = 1, 2..., m) accent with a single m-beat is a type 2 accent when the accent nucleus is on the i-th beat from the back (m+1-i beat from the front). The accent nucleus is the part where the pitch frequency suddenly drops near the end of the accented syllable? Furthermore, a flat accent without an accent nucleus is called type 0, and when a particle is added after it.

助詞が低アクセントとなる１型と区別する。４拍単語に
ついて、音節を「ＱＪ　で表わＬ、アクセントの高低全
直線の高低により模式的に表わして。It is distinguished from type 1, in which the particle has a low accent. For four-beat words, the syllables are expressed as ``QJ'', and the height of the accent is represented schematically by the height of the entire straight line.

具体例を挙げれば１次のようになる００型ｐ丁万万６１　　例、しろくま１型９下で百６七−例、くるしみ２型Ｓ汀でＺ口Ｏが　例、すりこぎ３型ｐ「ぴ上】λ邑例、あさがお４型テロ○○○　が　例、けんリヨ〈本発明では、上記（１）〜（３）式のモデルにより、任
意の拍数のすべてのアクセント型の単語のピッチ周波数
の時間変化パタンを、少数の制御パラメータにより発生
させることができる。しかも、このパタンシ用いて合成
した音声は、きわめて自然なアクセント感？有する。具
体的に、制御パラメータの種類は、前記（１）〜（３）
式中の９種類であり、上記バタン？発生させるために与
えなければならない制御パラメータの実現値は、１つの
単語について各制御パラメータ当り唯１個でよい。しか
も。To give a concrete example, the first order is 0 0 type p 61,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000. "Piage" For example, Asagao type 4 terrorism ○○○ is an example, Kenriyo. A time-varying pattern of pitch frequency can be generated using a small number of control parameters.Moreover, the speech synthesized using this pattern has an extremely natural accent.Specifically, the types of control parameters are as follows ( 1)-(3)
There are 9 types in the ceremony, and the above slam? The realization values of the control parameters that must be provided for generation may be only one for each control parameter for one word. Moreover.

どのような単語とアクセント型が同じであるならば、単
語の種類と無関係に同一のパラメータ値を設定すること
ができるので、全体としては１拍数とアクセント型のみ
を変数とした小型のテーブルのみを用意すればよいこと
になる。このように。As long as the word and accent type are the same, the same parameter value can be set regardless of the type of word, so overall we only have a small table with only the number of beats and accent type as variables. All you have to do is prepare. in this way.

制御情報量を少なくすることができる理由は、上記ｆ１
）〜（３）式のモデルが、単語のもつ韻律的情報の基本
部分？音韻の種類、アクセント型１話者とは無関係に、
臨界制動２次遅れ型の固有パタンｙ　＝　ｘｅ−”およ
びｙ＝１　（１＋ｘ　）　ｅ−”に集約した形となって
おり、そのため制御パラメータとしては、アクセント型
や話者によって異なるアクセント立上り、立下りの相対
的タイミング、声立てやアクセントの振幅、連れの時定
数等、副次的な僅かな情報を担いさえすればよいからで
ある０したがって、上記の固有バタン？小規模かつ高速
な方法で発生させることができれば、ピッチバタン発生
処理部全体？小硯模化でき、しかもピッチ周波数の時間
変化パタンシ高速に発生させることが可能となるはずで
、ある。The reason why the amount of control information can be reduced is because the above f1
) ~ (3) Is the model the basic part of the prosodic information of words? Regardless of the type of phoneme or accent type of the speaker,
The critical damping second-order lag type characteristic patterns y = xe-'' and y = 1 (1 + This is because it only needs to carry a small amount of secondary information such as the relative timing of the descent, the amplitude of the pitch and accent, and the time constant of the accompaniment. If it can be generated in a small-scale and high-speed way, what about the entire pitch bang generation processing section? It should be possible to imitate a small inkstone and also to generate a time-varying pattern of pitch frequency at high speed.

本発明は、このような音声の韻律的特徴と音韻的・個人
的特徴とｔ分離でき、しかもこの韻律的特徴が他の要因
によらず、一定バタンで表わすことができるというへ（
こ着目して、モデル（こよりピッチ周波数の時間変化バ
タン全発生させる方法。The present invention is capable of separating such prosodic features of speech from phonological/personal features, and furthermore, that these prosodic features can be expressed with a constant bang regardless of other factors (
Focusing on this, the model (this method generates all the time-varying bangs of the pitch frequency.

およびモデルを用いて任意拍数で任意アクセント型のピ
ッチ周波数の時間バタン２発生させるためのモデル・パ
ラメータの制御方法２与え、装置の小型化と高速化？達
成するものである。なお、〕＜タンが一定であるため１
本発明では、パタン・テーブルを用意しておけばよく、
その実現値は、変数Ｘをテーブルのメモリの番地に対応
させた番地修飾によりきわめて高速に求めることができ
る。And how to control model parameters to generate time bangs of arbitrary accent type pitch frequency 2 at arbitrary number of beats using model 2, and make the device smaller and faster? It is something to be achieved. In addition, since ]<tan is constant, 1
In the present invention, it is only necessary to prepare a pattern table,
The realized value can be obtained extremely quickly by modifying the address by making the variable X correspond to the memory address of the table.

しかも、変数の数も、１００個程度用青しておけば十分
な、悄度の数値シ得ることができるので、メモリ容量を
小さく抑えることが可能である。Furthermore, if the number of variables is set to about 100, a sufficient numerical value can be obtained, so that the memory capacity can be kept small.

第３図は１本発明の実施例を示す規則による音声合成装
置のシステム・ブロック図である。FIG. 3 is a system block diagram of a rule-based speech synthesizer illustrating an embodiment of the present invention.

音声合成装置は、音声合成器１．規則部２．および学位
データ・ベース３から構成されている。The speech synthesis device includes speech synthesizer 1. Rules part 2. and academic degree database3.

規則部２け、さらに音韻持続時間設定処理部４゜ピッチ
・バタン発生処理部５１合成単位検索処理部６．および
学位接続処理部７より構成きれている。先ず、入力とし
て与えられたｍ拍゛単語の音韻系列０□、　Ｖ、、　　
０２．　　Ｖ２．、曲、四〇、、、　　Ｖｍ（コ（１：
。Rule section 2, further phoneme duration setting processing section 4. Pitch/bang generation processing section 51 Synthesis unit search processing section 6. and a degree connection processing section 7. First, the phonological sequence of the m-beat word given as input is 0□, V, .
02. V2. , song, 40,,, Vm(ko(1:
.

でＯｋはに袖口の子音シ示すコード、■、はに袖口の母
音を示すコード？それぞれ表わす）、あるい［Ｆ節系列
（ＯＶ）、、　（ｅＶ）２・−・−＝−（Ｏｖ）ｍ（（
ＯＶ）、はに袖口の音節を示すコードを表わす）。So, OK is the code that shows the consonant of the cuff, ■, is the code that shows the vowel of the cuff? respectively), or [F-node series (OV),, (eV)2・−・−=−(Ov)m((
OV), representing the code indicating the syllable of cuff).

およびアクセント型首号ｉより、各音韻Ｏｋ、Ｖ。and each phoneme Ok, V from the accent type prefix i.

（ｋ＝１，２．・・・・・・・・・ｍ）の持続時間を音
韻持続時間設定処理部４．により求め、音韻境界時刻τ
１τ２・・・・・・・・・、１２ｍ＋１として出力する
。次に、得られた音韻境界時刻τ１．τ２・・・・・・
・・・１２ｍ＋１と、アクセント型番号ｉより、ピッチ
周波数の時間変化パタンｆ（ｔｌ）、ｆ（ｔ２）叫・団
・ｆ（ｔｎ）（ｎけ全フレーム数）あるいはピッチ周期
の時間変化バタン？ピッチ・バタン発生処理部５により
求め、そのピッチ周波数の時間変化パタンｆ（ｔ、）、
ｆ（ｔ２）・・・・・・・・・ｆ（ｔｍ）を合成器１の
ピッチ制御端子にフレーム周期ごとに入力する。他方、
入力として寿えられた音韻系列ｏ１．　　ｖｌｙｃｍ、
ｖ２・・団団・Ｏｍ。Phonological duration setting processing unit 4. The phoneme boundary time τ
1τ2..., output as 12m+1. Next, the obtained phoneme boundary time τ1. τ2・・・・・・
... From 12m+1 and the accent type number i, are the pitch frequency time change patterns f(tl), f(t2), shout, group, f(tn) (n minus total number of frames) or the pitch period time change bang? The pitch/bang generation processing unit 5 obtains the pitch frequency time change pattern f(t,),
f(t2)...f(tm) is input to the pitch control terminal of the synthesizer 1 every frame period. On the other hand,
The phoneme sequence o1. which was used as input. vlycm,
v2・dandan・Om.

Ｖｍ（あるいは音節系列）から合成単位虜索処叩部６に
よって単位コードｉ。￥求め、その静位コードｉｃから
対応する合成単位のパラメータ時系列Ｐ１Ｐ２・・・・
・・・・・Ｐ、　（ｉはこの合成単位の総フレーム数）
、およびこの合成単位の始端時刻（合成単位をＯｖ連鎖
すればτｃｋ）−終端時刻（同じ場合にτｖｋ）、Ｃｖ
境界時刻τｃｖｋ等を、学位データ・ベース３を検索す
ることにより出力４″る（ｋ＝１，２．町・・、Ｊ）ｏ
なお、上記パラメー夕晴系列は、具体的には、ｐ次のＰ
ＡＲＣＯＲ係数等ホルマントの特徴シ表わすパラメータ
ｐ１．。From Vm (or syllable series), the unit code i is synthesized by the unit search unit 6. Find the parameter time series P1P2 of the corresponding synthesis unit from the static position code ic.
...P, (i is the total number of frames in this composition unit)
, and the start time of this composite unit (τck if the composite units are chained Ov) - the end time (τvk in the same case), Cv
By searching the degree database 3, the boundary time τcvk, etc. is output 4'' (k=1, 2. town..., J) o
In addition, the above parameter sunset series is, specifically, p-order P
Parameter p1 representing the characteristics of formant such as ARCOR coefficient. .

ｐ２ｋ　、・・・・・・・・・ｐ１２．音源振幅Ａｋ、
および有声／無声情報Ｕｋン要素とするベクトル２２合
（ｐ、１゜ｐ２に、　　・・・・・・・・・ｐ、、　−
Ａｋ　−Ｕｋ　）Ｔ（Ｔは転置２表わす）の時系列であ
る。さらに、求められた合成単位のパラメータ時系列Ｐ
１．Ｐ２・・・・・・・・・Ｐ、を上記の各種境界時刻
τ。１．τ。ｙｌｓ　　τｖ１・・・・・・・・・をも
とに、上記で求めた音韻境界時刻τ１．τ２・・・・・
・・・・τ２ｍ＋ｌにしたがって時間軸上に割当ててい
く。割当ては、拍数分の合成単位について行う。p2k, p12. Sound source amplitude Ak,
and voiced/unvoiced information Uk element vector 22 (p, 1゜p2, ......p,, -
Ak − Uk )T (T represents transpose 2). Furthermore, the parameter time series P of the obtained synthetic unit
1. P2...P is the above various boundary times τ. 1. τ. Based on yls τv1......, the phoneme boundary time τ1. τ2...
...Allocate on the time axis according to τ2m+l. The assignment is made for the number of beats as a unit of synthesis.

その後１合成単位の前後部の境界が連続になるように、
境界部に間隙がある場合にはその間隙が埋まるように接
続補間し１合成単位データが重なる場合には重なった部
分や切断後適当な間隙の部分？補間曲線（直線）に置き
換える。この単位データ割当て、および接続処理により
、音韻接続時間が実質的に制御された合成単語パラメー
タの時系列ｐ、　＃　ｐ２１・・・・・・・・・Ｐｎ（
ｎは単語の全フレーム数）が作成される。そして、最後
にその単語パラメータ時系列を、上記ピッチ周波数の時
間変化パタンｆ（を汎ｆ（ｔ２）、　　・・・・・・・
・・ｆ　（ｔｎ）と同期させて、フレーム周期とと（こ
合成器１の各パラメータｔｆｆｌｌ　Ｋ端子に入力する
。このとき、合成器１の出力どして合成音声Ｓが得られ
る。After that, so that the front and rear boundaries of one composite unit are continuous,
If there is a gap at the boundary, connect interpolation to fill the gap, and if one composite unit data overlaps, select the overlapping part or the appropriate gap after cutting. Replace with interpolated curve (straight line). Through this unit data allocation and connection processing, a time series p of synthesized word parameters whose phoneme connection time is substantially controlled, #p21...Pn(
n is the total number of frames of the word) are created. Finally, the word parameter time series is expressed as the pitch frequency time change pattern f(, generalized f(t2),...
. . f (tn), the frame period and () are inputted to each parameter tffll K terminal of the synthesizer 1. At this time, the synthesized speech S is obtained as the output of the synthesizer 1.

合成器１は１例えばＰＡＲＯＯ几合成器全合成器ば実現
可能であり、単位データ・ベース３．規則部２のうち合
成単位検索処理部６および単位吸続処理部７は、すべて
従来の規則合成方式における対応する各処理部を用いる
ことにより実現可能である。The synthesizer 1 can be realized by one, for example, a PAROO synthesizer, and a unit database 3. Of the rule section 2, the synthesis unit search processing section 6 and the unit suction processing section 7 can all be realized by using corresponding processing sections in the conventional rule synthesis method.

次に、規則部２のうちの音韻接続時間処理部４およびピ
ッチバタン発生処理部５の実施例を、詳細に説明する。Next, embodiments of the phoneme connection time processing section 4 and the pitch bang generation processing section 5 of the rule section 2 will be described in detail.

先ず、音韻持続時間処理部４は１例えば子音の持続時間
？求めるための子音長テーブルと、母音の持続時間を求
める計算回路により、実現可能である。ここで、子音長
テーブルは、あらかじめ自然に発話されたすべてのｆＩ
類の子音について、それらの始端時刻ｔ５および終端時
刻ｉｅ１．＝計測し。First, the phoneme duration processing unit 4 calculates the duration of a consonant, for example. This can be achieved using a consonant length table and a calculation circuit for calculating the vowel duration. Here, the consonant length table contains all naturally uttered fI in advance.
Regarding the consonants of the class, their start time t5 and end time ie1. =Measure.

１　−１　　の値？登録すること（こよって作成してお
く。１．、　１ａの測定は、音声波形の観察により目視
で求めるか、あるいは後述のＦＦＴ（高速フーリエ変換
）による音韻境界時刻測定法によって実行することがで
きる。寸た。母音の持続時間らは１例えば次の実験式を
計算することにより求められる。A value of 1 -1? 1. The measurement of 1a can be performed visually by observing the speech waveform, or by the phoneme boundary time measurement method using FFT (fast Fourier transform), which will be described later. The duration of a vowel can be found, for example, by calculating the following empirical formula.

匂＝λｖ（’ｖｏ−η！。＋　Ｏ，＋　ｄ　）　ｑ　・
・・（４）ここで、λ７は母音の種類ごとの補正係数で
、例えばλｖ＝１．００（Ｉａｌの場合）、０．０８（
ｌｉｔの場合）、０．７９（１μｍの、場合）、０．９
６（ｌｅｔの場合）、０．９３（ｌｑｌの場合）と定め
ればよい。１ｖｏは屑帛となる母音持続時間で１例えば
１．。＝　１３６　ｍｓ　（最後部音節以外）＝　１６
４ｍ５（最後部音節）と定めればよい。ｔｏは、その音
節の子音の持続時間（ｍｓ）、ηは直前の子音の持続時
間ｌ　の母音に及ぼす影響の大きさを表わす係数で１例
えばη＝Ｏ，４３と定めればよい。Smell = λv ('vo-η!. + O, + d ) q ・
...(4) Here, λ7 is a correction coefficient for each type of vowel, for example, λv = 1.00 (in the case of Ial), 0.08 (
(in case of lit), 0.79 (in case of 1 μm), 0.9
6 (in the case of let) and 0.93 (in the case of lql). 1vo is the vowel duration that becomes a waste.For example, 1. . = 136 ms (other than the last syllable) = 16
It is sufficient to set it as 4m5 (last syllable). to is the duration of the consonant of the syllable (ms), and η is a coefficient representing the magnitude of the influence of the duration l of the immediately preceding consonant on the vowel, and may be set to 1, for example, η=O, 43.

Ｏ□は単語の拍数による補正項で１例えば１拍。O□ is a correction term based on the number of beats of a word, for example 1 beat.

２拍、・・・・・・・・・、１０拍単梧の順ｆこ、８５
，４０゜１５．０ｌ−１０，−１５，−２０、−２３゜
−２５と定めればよい（単位はＩｎ５）。ｄｉげ、発話
速度制御のための項で、標準の速度で合成したい場合は
ｄ＝Ｏｍ８、それよりも近い速度あるいは速い速度で合
成したい場合は適当に正あるい１．・コ負の値（１音節
当りの、中度変化量（ｍ　Ｓ　）　）を力えればよい。2 beats, 10 beats, 10 beats, 85
, 40°15.0l-10, -15, -20, -23°-25 (unit: In5). In terms of speech rate control, if you want to synthesize at the standard speed, set d=Om8, and if you want to synthesize at a faster or closer speed, set it to the correct value or 1.・You just need to enter the negative value (the amount of moderate change (m S ) per syllable).

そしてｑはアクセントの高低による補正係数で１例えば
ｑ＝０．７（ｉ１拍袖口あり。And q is a correction coefficient depending on the height of the accent, 1, for example, q = 0.7 (with i1 beat cuffs.

かつ用１拍目が低アクセントの場合）、１．０（上記以
外）と定めればよい。以上の実験値は一例であって、そ
の目的・要求される品質によってけ上記の値に拘束され
る必要はないし、上記係数あるいは項の一部？省略する
こともで入る。］二記（４）〒（：の演算は、加算器、
引轢器および乗算器の構成１、−より容易に実行するこ
とができる。最後に、？ｒ韻境界時刻τ１．τ２．・・
・・・・・・・、τ２＋ｎ−）］は、上記の方法で求め
た子音持続時間’ｃｋ　−および母音持続時間’ｖｋ　
’用いて次の漸化式により求められろ。and if the first beat is a low accent), it may be set to 1.0 (other than the above). The above experimental values are just examples, and there is no need to be restricted to the above values depending on the purpose and required quality. It is also possible to omit it. ] Note 2 (4) 〒(The operation of : is an adder,
Tractor and multiplier configuration 1, - easier to implement. lastly,? r rhyme boundary time τ1. τ2.・・・
......, τ2+n-)] are the consonant duration 'ck - and the vowel duration 'vk determined by the above method.
'Use the following recurrence formula to find.

式中のサフィックスには、第に袖口（ｋ＝１−．２゜・
・・・・・・・・、ｍ）の音韻持続時間であることを表
わす。The suffix in the formula is cuff (k=1-.2°・
......, represents the phoneme duration of m).

τＩ＝ＩＯ（初期値）（５）７□、＝τ２に−＋　＋’ｃｋ　　　　　　　　　　（
６）τ２に＋１−Ｔ２に＋１ｖｋ（７）ここで、初期値ｌ。は適当に定める（例えば、Ｅ。τI = IO (initial value) (5) 7□, = τ2 -+ +'ck (
6) +1 to τ2 - +1vk to T2 (7) Here, the initial value l. be determined appropriately (for example, E.

＝１００ｍＳ）。=100mS).

次に１本発明で最も重要なピッチバタン発生処理部５の
実施例？、第４図〜第６図により説明する０第４図は、ピッチバタン発生処理部５の内部論理ブロッ
ク図である。Next, what is the embodiment of the pitch bang generation processing section 5, which is the most important part of the present invention? , which will be explained with reference to FIGS. 4 to 6. FIG. 4 is an internal logic block diagram of the pitch bang generation processing section 5.

先ず、合成しようとする単語のアクセント型番号ｉと拍
数ｍをもとに、ピッチ制御パラメータ・テーブル８の中
からそのアクセント型番号および拍数に対応するｉｆ）
〜（３）式における９種類のパラメータＡＶ、Ａａ、α
、β、ΔＴｖ工、ΔＴｖ□。First, based on the accent type number i and the number of beats m of the word to be synthesized, select if) from the pitch control parameter table 8 corresponding to the accent type number and number of beats.
~Nine types of parameters AV, Aa, α in equation (3)
, β, ΔTv engineering, ΔTv□.

ΔＴａ１−　　ΔＴａ□、およびｆ　　の値？検索し抽
出する。ただし、ΔＴｖｌ−ΔＴ［ｌ　ｖ□、　　ΔＴ
ａ、。ΔTa1− ΔTa□, and the value of f? Search and extract. However, ΔTvl−ΔT[l v□, ΔT
a.

Δ′ｒａ２（ｌ−１，それぞれあらかじめ指定した音韻
境界時刻τｋ（ｋ、＝１．２、・・・・・・・・・２ｍ
＋１）を基準とした声立て開始、声立て終了、アクセン
ト開始。Δ′ra2(l−1, respectively prespecified phoneme boundary time τk(k,=1.2,...2m
+1) is the start of raising the voice, the end of raising the voice, and the start of the accent.

アクセント終了の相対時刻である。基準とする音韻境界
時刻τ、の指定方法の一例を次１こ述べろ。Relative time of end of accent. Describe one example of how to specify the reference phoneme boundary time τ.

先ず、声立て開始相対時刻Δ′ｒｖ□および声立て終了
時刻ΔＴｖ□についてはそれぞれ単語の語頭および語尾
のＯｖの境界の時刻τ２．τ２ｍを基準とする。すなわ
ち、これらの基準の時刻をそれぞれτｓｖ、τｅｖ　　
とすれば１次式が成立する。First, the relative start time Δ'rv□ and end time ΔTv□ are determined by the time τ2 at the boundary between Ov at the beginning and end of the word, respectively. Based on τ2m. That is, let these reference times be τsv and τev, respectively.
If so, a linear equation is established.

τｓｖ＝τ２　　　　　　　・・・・・・・・・　（８
）、ｅＶ　　　２ｍ　　　　　　　　・・・・・・・・
・　（９）次に、アクセント開始相対時刻ΔＴＲ１およ
びアクセント終了時刻ΔＴａ２については、それぞれア
クセントが低から高へ立ち上り開始直後の音節ＯＶの先
頭、およびアクセントが高から低へ立ち下り開始直後の
音節Ｃｖの先頭の時刻を基準とする。τsv=τ2 ・・・・・・・・・ (8
), eV 2m・・・・・・・・・
(9) Next, for the accent start relative time ΔTR1 and accent end time ΔTa2, the beginning of the syllable OV immediately after the accent starts rising from low to high, and the syllable Cv immediately after the accent starts falling from high to low, respectively. Based on the first time of .

第２図の例では、前者の基準の時刻τｓａ’よτ３であ
り、後者の基準の時刻Ｔｅａはτ５である。斤お０型で
は、このようなτ。は定められないで、この場合１こ限
り、τｅａは７２ｍ＋１と定める。また。In the example of FIG. 2, the former reference time τsa' is τ3, and the latter reference time Tea is τ5. In the 0 type, τ is like this. is not determined, and in this case, τea is determined to be 72m+1. Also.

１型の場合には、単語の後に助詞が付１１０されている
ことシ仮定して上と同じくＴｅａハτ２ｍ＋１と定める
。以上の事項を式で示すと、アクセント型ｉにしたがっ
て、次のようになる。In the case of type 1, it is assumed that a particle 110 is added after the word, and Tea is set as τ2m+1 as above. The above matters can be expressed as the following formula according to the accent type i.

・・・・・・・・・（１０）・・・・・・・・・（１１）なお、ピッチ制御パラメータの検索に用いる拍数は、音
韻境界時刻τ□の個数Ｍ（＝２ｍ＋１）全計数すること
により１次式で求めることができる０上記（８）〜（１２）式の演算は、演算回路９により実
行されるが、具体的には（８）〜（１１）式テーブル倹
素により（１２）式は引算器とシフトレジスタにより簡
単に実行できる。・・・・・・・・・(10) ・・・・・・・・・(11) Note that the number of beats used to search for pitch control parameters is the number M (=2m+1) of phonetic boundary times τ□. The calculations of the above equations (8) to (12) are executed by the arithmetic circuit 9, and specifically, the calculations of equations (8) to (11) can be obtained using a linear equation. Therefore, equation (12) can be easily executed using a subtracter and a shift register.

次に、上記により求められた相対時刻ΔＴｐｖ、。Next, the relative time ΔTpv obtained above.

ΔＴ　　　ΔＴ　、Δ１゛８□および基準の音韻境界時
ｖ２　　＊　　　　　　ａｌ刻τｓｖ＊　　ｅｖｅ　　τｓａｗ　　τｅ８より１次
の４個のτ 式によってパラメータτｖ１．τ７□、τａｌｅ　　τ
３゜シ求める。The parameter τv1. τ7□, τale τ
Find 3°.

τ　　＝τ　　＋ΔＴｖ□　　　　　・・・・・・・・
・（１３）ｖｌ　　　　　　　ｓｖ τ　＝τ　　＋ΔＴｖ２　　　　　　・・・・・・・・
・（１４）ｖ２　　　　　　　ｅｖ τ　＝τ　　＋ΔＴａ□　　　　　・・・・・・・・・
（１５）ａｌ　　　　　　　ｓａ τ　　＝τ　　＋ΔＩＩ＋、□　　　　　・・・・・・
・・・（１６）ａ２　　　　　　ｅａ上記（１３）〜（１６）式は、それぞれ加算器１０，１
１゜１２．１３により実行できる。続いて、τ９、。τ = τ +ΔTv□・・・・・・・・・
・(13) vl sv τ = τ +ΔTv2 ・・・・・・・・
・(14)v2 ev τ =τ +ΔTa□ ・・・・・・・・・
(15) al sa τ = τ +ΔII+, □ ・・・・・・
...(16) a2 ea The above equations (13) to (16) are calculated by the adders 10 and 1, respectively.
1°12.13. Next, τ9.

τ９．〜．前１こ求めておいたα、およびフレーム時刻
ｔｋ（ｋ＝１．２，３・・・・・・・・・、ｎ）？計算
回路１４の入力として＋１１式中のＧｖ（ｔ−τｖｌ）
−Ｇｖ（ｔ　−τ９□）全計算する。同じように、τａ
ｉｍ　　τ８□、β、ｔ、を計算回路１５０入力として
（１）式中の０３（を−τａｌ）　　’ａ（ｔ−τ２□
）全計算する。さらに１両計算回路１４，１５の出方に
それぞれＡＶ。τ9. ~． α, which was calculated previously, and frame time tk (k=1.2, 3......, n)? Gv(t-τvl) in +11 formula as input to calculation circuit 14
−Gv(t−τ9□) Complete calculation. Similarly, τa
im τ8□, β, t, as input to the calculation circuit 150, 03(-τal) 'a(t-τ2□
) Calculate all. Furthermore, each of the outputs of the one-car calculation circuits 14 and 15 is AV.

Ａａを乗算器１６．１７により掛惇し、その結果シ加算
器１８により加え合わせることにより、（１）式の右辺
の計算を実行する。そして、最後に、加算器１８の出力
を入力として計算回路１９１こより指数関数ｅＸ　ｙ計
算した後０乗算器２ｏによりｆｒｒｌ、ｎを掛算すれば
、その結果としてピッチ周波数の１サンプル・データｆ
（ｔ、）が得られる。By multiplying Aa by multipliers 16 and 17 and adding the results by adder 18, the calculation on the right side of equation (1) is executed. Finally, using the output of the adder 18 as input, the calculation circuit 191 calculates the exponential function eXy, and then multiplies frrl and n by the 0 multiplier 2o.
(t,) is obtained.

第４図にもとづいて説明した上記演算を、フレーム周期
ごとに時々刻々と実行すれば１乗算器２０の出力として
ピッチ周波数の時間変化パタンｆ（ｔｌ）　、ｆ（ｔ２
）　、−・・・・・ｆ（ｉｎ）が得られる。もし、ピッ
チ周波数の代りにピッチ周期Ｔ（ｔ、）ｋ求めたければ
、計算回路１９をピッチ周波数を求める場合の逆数計算
？実行するよう変形し、かっ乗算器２０２割算器に変更
すればよい。また、ピ。If the above calculation explained based on FIG.
) , -... f(in) is obtained. If you want to find the pitch period T(t,)k instead of the pitch frequency, use the calculation circuit 19 to calculate the reciprocal of the pitch frequency. What is necessary is to modify it so that it can be executed and change the parentheses multiplier 202 to a divider. Also, pi.

チ周波数は１次のような回路構成でも、求めることがで
きる。すなわち１乗算器２ｏを省き、ピッチ制御パラメ
ータ・テーブル８中に、ｆ　、　の代りに’ｎ　ｆｍｉ
ｎ￥登録しておき、その１直を加算器１８により乗算器
１６，１７０出力とともに加算するように変更しても、
変更前と同一の機能を得ることができる。しかも、高価
な乗算器の代り１こ安価な加算器？用いるので、製作コ
ストは低下する。The chi frequency can also be determined using a first-order circuit configuration. In other words, 1 multiplier 2o is omitted, and 'n fmi is written instead of f in the pitch control parameter table 8.
Even if you register n\ and change the 1st shift to be added together with the outputs of multipliers 16 and 170 by adder 18,
You can get the same functionality as before the change. Moreover, an inexpensive adder instead of an expensive multiplier? Therefore, the production cost is reduced.

このような変更回路によれば、ピッチ周期２求めるため
には、計算回路１９をピッチ周波数を求める場合の逆数
計算シ実行するように変形するだけでよく、他の部分の
変更は不叩である。According to such a change circuit, in order to obtain the pitch period 2, it is only necessary to modify the calculation circuit 19 to perform the reciprocal calculation when obtaining the pitch frequency, and changes to other parts are unnecessary. .

第４図の実施例の中で、ピッチ制御パラメータ・テーブ
ル８に登録されているパラメータＡＶ。In the embodiment shown in FIG. 4, the parameter AV is registered in the pitch control parameter table 8.

Ａａ、α、β、ΔＴｖ１．ΔＴｖ２ｓ　　ΔＴａ１−　
　ΔＴａ２および’ｍｉｎの値は１人間の自然な音声の
波形から分析、抽出したピッチ周波数の時間変化パタン
と（１）式のモデルによるピッチ周波数の時間変化パタ
ンの時間平均２乗誤差が醍小になるように、上記パラメ
ータ値を決定することによって、求めることができる。Aa, α, β, ΔTv1. ΔTv2s ΔTa1−
The values of ΔTa2 and 'min are determined by analyzing and extracting the waveform of one human's natural voice, and the time mean square error of the pitch frequency time change pattern and the pitch frequency time change pattern according to the model of equation (1) is minimized. It can be obtained by determining the above parameter values so that

ここで、音声波形からのピッチ周波数の抽出法には、い
くつかの公知の方法があるので、簡単に実現可能であり
、上記誤差ｔ’！小にする最適化法も種々の公知の方法
があるので、これも簡単に実現可能である。なお、上記
のΔＴｖ１゜ΔＴｖ□、ｌＴａ□、ΔＴａ□については
、上記最適化τ 法により、直接求められるτ７□、τ７□、Ｒ１゜τ８
□シ用いて、さらに次の方法により決定する。Here, since there are several known methods for extracting the pitch frequency from the audio waveform, it can be easily realized, and the above error t'! Since there are various known optimization methods for reducing the size, this can also be easily realized. In addition, for the above ΔTv1゜ΔTv□, lTa□, ΔTa□, τ7□, τ7□, R1゜τ8, which can be directly obtained by the above optimization τ method.
□ and further determine by the following method.

すなわち、ピッチ周波数の時間軸と同一の時間軸上に、
各フレームごとに上記音声波形にＦＦＴ（高速フーリエ
変換）を適用し、音声信号の周波数スペクトラムの時間
変化パタンを得る。このバタンの急変する時刻を読み取
ることによって音韻境界時刻シ求めることができるので
、これよりアクセント型にしたがい（８）〜（１１）式
の規則によってτ＠９．τｅｖｓ　　τｓａ、τ６３を
求め、さらにこれらより（１３）〜（１６）式の関係に
もとづきΔＩ［ｖ□、ΔＴｖ□、ΔＴａ□、ΔＴ８□シ
決定することができる。In other words, on the same time axis as the pitch frequency time axis,
FFT (Fast Fourier Transform) is applied to the audio waveform for each frame to obtain a time change pattern of the frequency spectrum of the audio signal. By reading the time at which this bang suddenly changes, the phoneme boundary time can be found, and from this, according to the accent type, according to the rules of equations (8) to (11), τ@9. τevs τsa, τ63 are obtained, and ΔI[v□, ΔTv□, ΔTa□, ΔT8□ can be determined based on the relationships of equations (13) to (16).

次に、計算回路１４．１５の実施例を第５図（ａ）、（
ｂ）、第６図により説明する。Next, examples of calculation circuits 14 and 15 are shown in FIGS. 5(a) and (
b) will be explained with reference to FIG.

’Ｗｚ　５１１（ａ）は、（１）式におけるＧｖ　（を
−τｖ１）−Ｇｖ（を−τ９□）全計算する回路であり
、同図（ｂ）は同式におけるＧａ（ｔ−τａｌ）　−Ｇ
ａ（ｔ−τ８゜）全計算する回路である。第５図（ａ）
では、先ず入力として与えられた声立て開始および終了
時刻τ９□、ｖ２゜τ およびフレーム時刻ｔｋ（ｋ＝１．２．・・・・・・ｎ
）より、それぞれ引算器２１．２２ｉこよりｔ−τｙｌ
ｓｔ−τｖ２を計算し、その出力を一方では単位ステッ
プ関数計算回路２３．２４にそれぞれ入力し。'Wz 511 (a) is a circuit that fully calculates Gv (-τv1) - Gv (-τ9□) in equation (1), and figure (b) is a circuit that calculates Ga (t-τal) - in equation (1). G
This is a circuit that fully calculates a(t-τ8°). Figure 5(a)
First, let us consider the starting and ending times τ9□, v2゜τ and frame time tk (k=1.2...n
), t−τyl from the subtractor 21.22i, respectively.
st-τv2 is calculated, and the output thereof is input to unit step function calculation circuits 23 and 24, respectively.

それらの出力と上で求めた引算器２１，２２０出力とを
それぞれ１乗算器２５．２６により掛け。Multiply those outputs by the outputs of the subtracters 21 and 220 obtained above using 1 multipliers 25 and 26, respectively.

その結果にさらに乗算器２７．２８により、それぞれα
を掛ける。次に、これらの出カシそれぞれ。The results are further multiplied by multipliers 27 and 28, respectively α
Multiply by Next, each of these dekashi.

関数ｇ　（ｘ）＝　ｘｅ−ｘ￥計算する関数テーブル２
９゜３０に入力し、その出力としてＧｖ（ｔ−τ７□）
およびＧｖ（ｔ−ｒｖ２）、得る。さらに引算器３１に
より前者から後者を引き、所望の計算哨果を得る。第５
図（ｂ）では、各回路３２〜４２は順に第５図（ａ）の
各回路２１〜３１に対応しており、その機能は次の３点
シ除き第５図（ａ）と同一である。■引算器３２．３３
にはτ７□、τ７□の代りにそれぞれτ３１ｓ　　τ８
゜が入力される。０乗算器３８，３９にはαの代りにβ
が入力される。■関数テーブル４０．４１はともに関数
ｈ（ｘ）＝１−（１＋ｘ　）ｅ−”を与えるテーブルで
ある。Function g (x) = xe-x\ Function table 2 to calculate
9゜30, and its output is Gv(t-τ7□)
and Gv(t-rv2), obtained. Further, the subtractor 31 subtracts the latter from the former to obtain a desired calculation result. Fifth
In Figure (b), each circuit 32 to 42 corresponds in turn to each circuit 21 to 31 in Figure 5(a), and their functions are the same as in Figure 5(a) except for the following three points. . ■Subtractor 32.33
are τ31s and τ8 instead of τ7□ and τ7□, respectively.
゜ is input. 0 multipliers 38 and 39 have β instead of α.
is input. (2) Function tables 40 and 41 are both tables that give the function h(x)=1-(1+x)e-''.

関数テーブル２９．３０あるいは４０．４１による関数
計′算ハ、具体的には予め上記ｇ　（ｘ）、ｈ　（ｘ）
全計算した結果全登録しておき、その登録場所の番地ａ
１．ａ２．−．−ａｎ（Ｎはｇ　（ｘ）あるいはｈ　（
ｘ）の値の登録１同数）をＸに対応させておき、演算は
上記番地の修飾によるテーブル検索１こより実行する。Function calculation using function table 29.30 or 40.41, specifically, the above g (x), h (x)
Register all the results of all calculations, and enter the address a of the registration location.
1. a2. −． -an(N is g (x) or h (
The registration of the value of x) is made to correspond to X, and the calculation is performed by searching the table by modifying the address.

この方法によれば、関数計算を容易かつ高速に行うこと
ができる。ａｋ（ｋ＝１．２．・・・。According to this method, functional calculations can be performed easily and quickly. ak (k=1.2...

Ｎ）とＸとの対応付けは、たとえば、Ｘの変域Ｘ１≦Ｘ
≦Ｘ　なるテーブルを作成するとすれば。N) and X, for example, the range of X1≦X
Suppose we create a table where ≦X.

変換式により行うことができる。ここに〔〕はガウス記号であ
る。なおＭ傘（ａＮ−ａ、）／（Ｘ、−Ｘ、）はＸｌ、
ＸＭ、Ｎが定まれば一定値であるから、ノくラメータα
あるいはβの代りにＭαあるいはＩＭβの（直をピッチ
制御パラメータテーブル８に登録しておけば、（１７）
式の変換は。This can be done using a conversion formula. Here, [ ] is a Gauss symbol. Note that M umbrella (aN-a,)/(X, -X,) is Xl,
If XM and N are determined, they are constant values, so the parameter α
Alternatively, if Mα or IMβ (direction) is registered in the pitch control parameter table 8 instead of β, (17)
The conversion of the expression.

ａ、　＝　（ｘ　−ｘ、）　＋　ａ、　　　　　（１，
７）’と簡単になる。本実ｌ面倒ではＸ、＝０であるか
ら。a, = (x - x,) + a, (1,
7) It becomes easy. In real terms, X, = 0.

結局（１７）’式は、Ｘの整数化のみの処理さえすれば
、先頭番地ａ１のメモリ・エリアの直接（Ｘ）による番
地修飾するだけでよいという意味になり。In the end, formula (17)' means that as long as X is converted into an integer, it is sufficient to directly modify the address of the memory area at the first address a1 with (X).

実質的に変換は不要となるＯ第５図（ａ）９．　（ｂ）では、各図面の上半分と下半
分で同一の回路を使用している。例えば、２１と２２゜
２３と２４，２５と２６．２７と２８．２９と３０は、
それぞれ同一の対である。そこで、同一の回路を一つに
まとめ装置規模を約１７２に縮小したのが第６図であり
１機能は第５図（ａ）あるいは第５図（ｂ）と同じであ
る。以下第６図に示す回路ｑ）機能について説明する。O FIG. 5(a) 9. Substantially no conversion is required. In (b), the same circuit is used in the upper and lower halves of each figure. For example, 21 and 22 degrees, 23 and 24 degrees, 25 degrees and 26 degrees, 27 degrees and 28 degrees, 29 degrees and 30 degrees,
Each is an identical pair. Therefore, the same circuits are combined into one and the device scale is reduced to about 172 in FIG. 6, and one function is the same as in FIG. 5(a) or FIG. 5(b). The functions of the circuit q) shown in FIG. 6 will be explained below.

基本的には第５図（ａ）あるいは第５図（ｂ）で並列１
こ行われた上半分と下半分の処理シ１フレーム中に２１
ＤＩに分けて行うのが第６図の方式である。すなわち、
クロック信号ＯＬは１フレーム中に２回パルス？発生し
、処理の起動？かける０捷ず、第１回目のクロック信号
ＯＬのパルスをトリガとじて、シフトレジスタ４３゜４
４にそれぞれτｖｔ　（あるいはτａ１）、τｖ２（あ
るいはτ８□）を書込み、同時にバッファレジスタ５１
をゼロクリアする。次にフレーム時刻へ（ｋ＝１，２．
・・・、ｎ）からシフトレジスタ４４の出力τ７□（あ
るいはτ８□）ｔ！！引算機４５により引き、その祷果
を一方では単位ステップ関数計算回路４６に入力し、そ
の出力と上で求めた引算器４５の出力とや乗算器４７に
より卦け、その！青果にさらに乗算器４８によりα（あ
るいはβ）？掛ける。次に、この出力を前記関数ｇ　（
ｘ）　（あるいはｈ　（Ｘ）　）を計算する関数テーブ
ル４９に入力し、その出力としてＧｖ（ｔ−τ９□）（
あるいはｏＲ（ｔ−τ８□））シ得る。さらにこの値か
らバッファレジスタ５１の出力（すなわち、第１回目の
起動時は０″）？引算器５０（こより引き、その皓果ｔ
バッファレジスタ５１に書込む。以上の処理が終了後、
クロック信号ＯＬが第２回目のパルスを発生する。この
パルス２トリガとしてシフトレジスタ４３の内容をシフ
トレジス４４にシフトさせる。Basically, parallel 1 in Figure 5 (a) or Figure 5 (b)
The processing of the upper half and lower half performed here is 21 times in one frame.
The method shown in FIG. 6 is to perform the DI separately. That is,
Is the clock signal OL pulsed twice in one frame? Occurrence and start of processing? Multiplying by 0, using the first pulse of the clock signal OL as a trigger, the shift register 43°4
τvt (or τa1) and τv2 (or τ8□) are written to buffer register 51 at the same time.
Clear to zero. Next, go to the frame time (k=1, 2.
..., n) to the output τ7□ (or τ8□) of the shift register 44 t! ! The result is subtracted by the subtractor 45, inputted into the unit step function calculation circuit 46, and multiplied by the output of the subtracter 45 obtained above and the multiplier 47, and then! The multiplier 48 further adds α (or β) to the fruits and vegetables. Multiply. Next, this output is converted to the function g (
x) (or h (X) ) into the function table 49 that calculates Gv(t-τ9□)(
Alternatively, oR(t-τ8□)) can be obtained. Furthermore, from this value, the output of the buffer register 51 (that is, 0'' at the first startup)?
Write to buffer register 51. After the above processing is completed,
Clock signal OL generates a second pulse. As this pulse 2 trigger, the contents of the shift register 43 are shifted to the shift register 44.

ただし、このときにはバッファレジスタ５１１こ対して
は何の操作も行わない（ゼロクリ−ｆは行わない）０こ
のときシフトレジスタ４４にはτ７□（あるいはτ８、
）が格納されているので、す、下τ９□（あるいはτａ
ｌ）についてｉ　１１ｃｉｌ目の処理と同じ処理？行っ
ていく。その結果、バッファレジスタ５１に（は、第２
回目の処理結果Ｇｖ（ｔ−τ７゜）（あるい（はＧａ（
ｔ−τ８１））から第１回目の処理結果Ｇｖ（ｔ−τ７
□）（あるいはＧａ（を−τ３２））シ引いた結果が格
納され、結局バッファレジスタ５１の出力として所望の
ＧＶ（を−τｖｌ）　−Ｇｖ（ｔ−τｖ２）（あるいは
Ｇａ（ｔ−τａ、　）　−（）、（１−τ３□））が得
られる。However, at this time, no operation is performed on the buffer register 511 (no zero clear f is performed).
) is stored, so the lower τ9□ (or τa
Regarding l), is it the same process as the 11th cil? I'm going. As a result, the buffer register 51 (is the second
The second processing result Gv(t-τ7゜)(or(is Ga(
t-τ81)) to the first processing result Gv(t-τ7
□) (or Ga (-τ32)) is stored, and eventually the desired GV (-τvl) -Gv(t-τv2) (or Ga(t-τa, ) is output as the output of the buffer register 51. -(), (1-τ3□)) are obtained.

最後に再び第４図に立ち戻り、計算回路１９について説
明する。計算回路１９は指数関数ｅＸ全計算する回路で
あり、前記ｇ　（ｘ）あるいはｈ　（ｘ）を求める場合
と同様の方法により、容易かつ高速に計算や実行するこ
とができる。ただし、関数テーブルには　ｅＸＯ値シ登
録しておく。Finally, referring back to FIG. 4, the calculation circuit 19 will be explained. The calculation circuit 19 is a circuit that performs all calculations of the exponential function eX, and can be easily and quickly calculated and executed using the same method as when calculating g (x) or h (x). However, the eXO value must be registered in the function table.

以上に述べてきた実施例によれば、小型で実時間性に優
れ、しかも従来より自然性の高いアクセント感？有する
。規則による単語音声合成が可能となる。According to the embodiments described above, it is small, has excellent real-time performance, and has a more natural accent feel than before. have Word speech synthesis based on rules becomes possible.

最後に、装置規模および計算時間について具体例シ挙げ
る。Finally, we will give specific examples regarding equipment scale and calculation time.

先ず、装置規模については、制御パラメータは９種類で
あるから単語ごとに９種類すべてに実現値？与えるとし
、単語の最大拍数２１０拍として所要パジメータテーブ
ルの容量を計＄−１−ると。First, regarding the equipment scale, there are 9 types of control parameters, so are there actual values for all 9 types for each word? Assuming that the maximum number of beats for a word is 210 beats, the total capacity of the required pager table is $-1-.

５８５個の実現値を用意すればよいことになる。This means that it is sufficient to prepare 585 actual values.

２バイトデータとすれば所要メモリ容、Ｂｔ約ＩＫＢｙ
ｔｅである。また、パラメータの中には一定値として扱
ってよいものがあるので、このようなパラメータな省く
ことによってテーブルをさらに小容量化することができ
る− また関数テーブルも例えば１００変数に対応する関数値
全用意しておけば精度的に十分であるから、３種類の関
数を用意するとして６００Ｂｙｔｅのメモリ容置があれ
ばよい。If it is 2 byte data, the required memory capacity is Bt approximately IKBy.
It is te. Also, some parameters can be treated as constant values, so by omitting such parameters, the table can be made even smaller.Furthermore, the function table can also be used to store all function values corresponding to, for example, 100 variables. Preparing these functions is sufficient for accuracy, so if three types of functions are prepared, a memory capacity of 600 bytes is sufficient.

次に、計算時間について言えば、ピッチ周波数の１サン
プルデータな計算するのに８蜆な時間は。Next, regarding the calculation time, it takes 8 days to calculate one sample of pitch frequency data.

主としてテーブル検索に要する時間と１０回程度の四則
演算に要する時間であり、固定小数点演算２行うとすれ
ばこれらの総和は数１００μｓのオーダとなる。フレー
ム間隔（５〜２０ｍ５）に比べて１〜２ケタ小さいので
、十分に実時間処理が可能である。This mainly consists of the time required for table search and the time required for about 10 arithmetic operations, and if two fixed-point operations are performed, the total of these times will be on the order of several hundred microseconds. Since it is one to two orders of magnitude smaller than the frame interval (5 to 20 m5), real-time processing is sufficiently possible.

〔Effect of the invention〕

以上説明したように９本発明によれＩｔ；Ｉ’、、極め
て自然性の高い任意の日本語単語アクセントを、一つの
単語につき各制御パラメータ当り唯−閏の実現値を与え
るという簡単な制御により発生きせることができるｏし
かも、上記制御パラメータの実現値は、単語の拍数とア
クセント型滑号よりデープル？検索するという簡単な方
法により得ることができる。さら（こ、ピッチ周波数の
時間変化パタンは。As explained above, according to the present invention, an extremely natural Japanese word accent can be created by simple control in which only one realized value is given for each control parameter per word. Moreover, the actual value of the above control parameters is more than the number of beats of the word and the accent type glide? It can be obtained by a simple method of searching. Furthermore, what is the time change pattern of pitch frequency?

テーブル参照による関数値の実現と、少数回の四則演算
の組合せにより、容易に発生させることができる。This can be easily generated by realizing a function value by referring to a table and combining a small number of four arithmetic operations.

したがって１本発明は、任意語の音声合成装置の高品質
化、小型化、および高速化？実現する上で有用な手段を
提供することが可能である。Therefore, the present invention can improve the quality, size, and speed of a speech synthesis device for arbitrary words. It is possible to provide useful means for realizing this.

[Brief explanation of drawings]

第１図は従来用いられている直線近似ピッチパタンの特
性曲線図、第２図はピッチ制御機構モデルにより発生さ
せたピッチ周波数の時間変化バタンの特性曲線図、第３
図は本発明の実施例シ示す規則による音声合成装置の全
体ブロック図、第４図は第３図におけるピッチバタン発
生処理部の回路構成図、第５図は関数計算を行うための
回路構成図、第６図は第５図の変形回路の構成図である
。１・・・音声合成器、２・・・規則部、３・・・単位デ
ータベース、４・・・音韻持続時間設定処理部、５・・
・ピッチバタン発生処理部、６・・・合成単位検索処理
部。７・・・単位接続処理部、８ピツチ制御パラメータテー
ブル、９・・・演算回路、１４．１５．１９・・・計算
回路、２３，２４，３４，３５，４６・・・単位ステッ
プ関数計算回路、２９，３０．　４０，４１，４９・・
・関数テーブル、４３，４４・・・シフトレジスタ、５
１・・・バッファレジスタ。８４　図２θ　　　　　　　　ｆｔｔｎノ第　５国（（２）ｔｙ、　ｔ２．　　ｔｙｌ（ｂ）ｔｔ、　ｔｐ、ｔｎ皐６額Figure 1 is a characteristic curve diagram of a conventionally used linear approximation pitch pattern, Figure 2 is a characteristic curve diagram of a time-varying pitch frequency bump generated by a pitch control mechanism model, and Figure 3 is a characteristic curve diagram of a pitch frequency change over time generated by a pitch control mechanism model.
The figure is an overall block diagram of a speech synthesis device according to the rules shown in the embodiment of the present invention, FIG. 4 is a circuit configuration diagram of the pitch bang generation processing section in FIG. 3, and FIG. 5 is a circuit configuration diagram for performing function calculations. , FIG. 6 is a block diagram of a modified circuit of FIG. 5. DESCRIPTION OF SYMBOLS 1... Speech synthesizer, 2... Rule section, 3... Unit database, 4... Phoneme duration setting processing section, 5...
- Pitch bang generation processing section, 6... synthesis unit search processing section. 7... Unit connection processing section, 8 pitch control parameter table, 9... Arithmetic circuit, 14.15.19... Calculation circuit, 23, 24, 34, 35, 46... Unit step function calculation circuit , 29, 30. 40, 41, 49...
・Function table, 43, 44...shift register, 5
1...Buffer register. 84 Figure 2θ fttn no 5th country ((2) ty, t2. tyl (b) tt, tp, tn 甐6 amount

Claims

[Claims]

(1) By inputting the accent type number and the syllable code number corresponding to the syllable or phoneme. By determining the phoneme duration and generating a time change pattern of pitch frequency from the determined phoneme duration and the above accent type number, a speech waveform is generated. In the synthesizer,
By generating the above-mentioned pitch frequency time change pattern as the output of the critical damping quadratic linear system, and searching the parameters of the critical damping quadratic J stem in a table registered corresponding to the accent type number. A method of speech synthesis using rules that has all the desired characteristics.

(2) The speech synthesis method according to claim 1, wherein the critical damping quadratic linear system has a function table search means as a main part.