JPH01219898A

JPH01219898A - Speech synthesizing device

Info

Publication number: JPH01219898A
Application number: JP4653388A
Authority: JP
Inventors: Yoshimasa Sawada; 沢田　喜正; Norio Suda; 典雄須田; Yoshimichi Okuno; 義道奥野
Original assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Current assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Priority date: 1988-02-29
Filing date: 1988-02-29
Publication date: 1989-09-01

Abstract

PURPOSE:To obtain a smooth voice by sectioning the generation time of respective phonemes into plural time zones by the phonemes which constitute a syllable, specifying phoneme parameters of the sectional area of acoustic tubes by the time zones, and interpolating the phoneme parameters and superposing the energy of a sound wave on a pattern prescribed by an exponential function. CONSTITUTION:The voice channel of a human is regarded as an acoustic tube group and made to correspond to a circuit element group of surge impedance components, and then a voice is simulated according to the current wave at the output terminal of the circuit element group. The propagation of the acoustic wave of an acoustic tube model is substituted with a flow of the current of the equivalent circuit and parameters of the pitch of the current source and the sectional area of the acoustic waves are prescribed by the phonemes and the parameters are interpolated as to joins of phonemes or joins of time zones sectioned in the phonemes. As for the energy, energy patterns corresponding to accent prescribed by an exponential function previously by words and the interpolated value of the energy is superposed on the energy pattern to obtain a voice closer to the human voice.

Description

【発明の詳細な説明】Ａ、産業上の利用分野本発明は、音響管モデルを利用した音声合成装置に関す
るものである。DETAILED DESCRIPTION OF THE INVENTION A. Field of Industrial Application The present invention relates to a speech synthesis device using an acoustic tube model.

Ｂ３発明の概要本発明は人間の声道を音響管群とみなし、これをサージ
インピーダンス成分の回路要素群に対応させることによ
って、回路要素群の出力端の電流波に基づいて音声を模
擬的に作り出す装置において、音節を構成する各音素毎に各音素の発生時間を複数の時
間帯に区分し、各時間毎に音響管の断面積や音源波のエ
ネルギー等の音素パラメータを指定してこの音素パラメ
ータを補間処理すると共に、音源波のエネルギーの補間
値群については、単語のアクセントに応じた指数関数で
規定されるパターンに重畳することによって、滑らかで人間の音声に近似した音声を作り出すようにし
たものである。B3 Summary of the Invention The present invention regards the human vocal tract as a group of acoustic tubes, and by associating this with a group of circuit elements for surge impedance components, it is possible to simulate speech based on the current wave at the output end of the group of circuit elements. The generation device divides the generation time of each phoneme into multiple time periods for each phoneme that makes up a syllable, and specifies phoneme parameters such as the cross-sectional area of the acoustic tube and the energy of the sound source wave for each time period. In addition to interpolating the parameters, the group of interpolated values of the energy of the sound source wave is superimposed on a pattern defined by an exponential function according to the accent of the word, thereby creating a smooth sound that approximates human speech. This is what I did.

Ｃ０従来の技術音声合成やミュージックシンセサイザー（電子楽器）等
の所謂音を人工的に合成して出力する電子装置は、最近
になって１ないし数チップの音声認識や音声合成のＬＳ
Ｉが音声情報処理と半導体の大規模集積回路技術により
低価格で実現されるようになり、その使用目的、制約条
件により種々の方式が提案されている。この音声合成に
は、人間の発生した生の音声を録音しておき、これを適
当に結合して文章に編集する録音編集方式と、人間の声
を直接的には利用せず、人間の音声のパラメータだけを
抽出し、音声合成過程で、そのパラメータを制御して人
工的に音声信号を作り出すパラメータ方式がある。C0 Conventional technology Electronic devices that artificially synthesize and output so-called sounds, such as voice synthesis and music synthesizers (electronic musical instruments), have recently become LS for voice recognition and voice synthesis using one or several chips.
I can now be realized at low cost through audio information processing and semiconductor large-scale integrated circuit technology, and various systems have been proposed depending on the purpose of use and constraints. This speech synthesis method involves two methods: recording and editing raw human speech, which combines them appropriately and editing them into sentences; There is a parameter method that extracts only the parameters and controls them during the speech synthesis process to artificially create a speech signal.

パラメータ方式においては、音声波形をある周期毎にサ
ンプリングして各サンプリング点での音声信号の値をア
ナログ／ディジタル変換し、その値を０と１の符号で表
示して行われるが、アナログ信号に忠実な記録をするた
めには、ビット数を増やす必要があり、このため大きな
メモリ容量を必要とする。In the parameter method, the audio waveform is sampled at certain intervals, the audio signal value at each sampling point is converted from analog to digital, and the values are displayed as codes of 0 and 1. In order to record faithfully, it is necessary to increase the number of bits, which requires a large memory capacity.

そこで、この情報量を極力少なくするために各種の高能
率な符号化法が研究開発されている。Therefore, various highly efficient encoding methods are being researched and developed in order to reduce the amount of information as much as possible.

その方法の一つとして、１つの音声信号の情報に最低限
１ビツトを対応させたデルタ変調方式がある。この方式
は、１ビツトの使い方として、次にくる音声信号値が現
在の値より高いか低いかを判定して、高ければ符号“１
”、低ければ符号“０”を与え、音声信号の符号化を行
うもので、実際のシステム構成としては一定の振幅ステ
ップ量（デルタ）を定めておき、誤差が蓄積されないよ
うに今までの符号化によって得られる音声の値と、入力
してくる音声信号との残差信号に対して、符号化を行う
。One such method is a delta modulation method in which at least one bit corresponds to the information of one audio signal. In this method, one bit is used to determine whether the next audio signal value is higher or lower than the current value, and if it is higher, it is coded "1".
”, if it is low, the code “0” is given and the audio signal is encoded.In the actual system configuration, a constant amplitude step amount (delta) is determined, and the previous code is changed to prevent errors from accumulating. The residual signal between the audio value obtained by the encoding and the input audio signal is encoded.

このような構成予測コード化といわれ、線形予測法（何
個か前のサンプル値から予測する）およびパーコール方
式（線形予測法の予測係数の代わりにパーコール係数に
といわれる偏自己相関関数を用いる）がある。This type of predictive coding is called a linear prediction method (predicting from several previous sample values) and the Percoll method (using a partial autocorrelation function called Percoll coefficient instead of the prediction coefficient of the linear prediction method). There is.

Ｄ１発明が解決しようとする問題点従来の音声合成方式のうち録音編集方式は、合成できる
語驚や文章の種類が限定されるという問題がある。D1 Problems to be Solved by the Invention Of the conventional speech synthesis methods, the recording and editing method has a problem in that the types of word surprises and sentences that can be synthesized are limited.

また予測コード化を用いた方式では、音と音との継ぎ目
に相当する調音結合が難しくて合成単位の結合法が確立
しておらず、例えば母音から子音を経て母音に至る発声
において、母音の定常から過渡を経て子音に至りまた母
音の過渡を経て母音の定常音に至る過程で母音と母音の
継ぎ目の音が針切れてしまう。従って音の滑らかさに欠
け、人間が聞いたときに自然な感じを与えないという問
題がある。In addition, in methods using predictive coding, it is difficult to create articulatory combinations that correspond to the joints between sounds, and a method for combining synthesis units has not been established. In the process of going from a steady state to a consonant through a transition, and then through a vowel transition to a steady vowel sound, the sound at the joint between the vowels becomes disconnected. Therefore, there is a problem that the sound lacks smoothness and does not give a natural feeling when heard by humans.

本発明の目的は、任意な給量、文章を合成することがで
き、しかも音が滑らかであって人間の実際の音声に近く
、自然な感じを聞く人に与えることのできる音声合成装
置を提供することにある。An object of the present invention is to provide a speech synthesis device that can synthesize sentences of any amount and provide a smooth sound that is close to actual human speech and can give a natural feeling to the listener. It's about doing.

Ｅ０問題点を解決するための手段及び作用（１）基本概
念音声を口から外に放射するためには、音源が必要で、こ
の音源は声帯によって作り出される。−方声帯は２枚の
ヒダを開閉することによって呼気を断続的に止める働き
があり、その断続によってパフと呼ばれる空気流が発生
し、声帯を緊張させるとこのヒダに張力が加わりヒダの
開閉の周波数が高くなり、周波数の高いパフ音が発生す
る。そして呼気流を大きくすると大きな音となる。Means and operations for solving the E0 problem (1) Basic concept In order to radiate sound outward from the mouth, a sound source is required, and this sound source is produced by the vocal cords. -The vocal cords have the function of intermittent exhalation by opening and closing two folds, and these intermittent intervals generate airflow called puffs, and when the vocal cords are tensed, tension is applied to these folds, causing the folds to open and close. The frequency increases, producing a high-frequency puff sound. When the exhalation flow is increased, the sound becomes louder.

この音源波が声道のような円筒状の音響管を通過すると
、開放端から音波は共振現象によりある成分が強調され
、ある成分が減弱し複雑な母音の波形が作り出される。When this sound source wave passes through a cylindrical acoustic tube like the vocal tract, a resonance phenomenon causes certain components of the sound waves from the open end to be emphasized and certain components to be attenuated, creating a complex vowel waveform.

そして口から発せられる音声は、音源波が同じ波形をも
っていても、口唇から放射されるまでに通過する声道の
形によって影響を受ける。即ち、人間の発生音は、声帯
から口唇までの声道の長さや断面積及び声帯の震わせ方
等によって決定される。Even if the sound source waves have the same waveform, the sound emitted from the mouth is affected by the shape of the vocal tract that the sound passes through before being emitted from the lips. That is, the sounds produced by humans are determined by the length and cross-sectional area of the vocal tract from the vocal cords to the lips, and the way the vocal cords vibrate.

本発明はこのようなことに着目してなされたものであり
、上記の声道を複数の可変断面積の音響管群とみなし、
更に音響管の音波の伝達を表わす進行波現象をその等価
回路により実現することを出発点としている。声道を音
響管とみなすと、各音響管の中の音波の伝搬は前進波と
後進波に分けて各音響管の境界面における反射、透過現
象の繰り返しとして考えることができ、このときその反
射と透過は境界面における音響的特性インピーダンスの
不整合の度合い、即ち互いに隣接する音響管の各断面積
の比に応じて定量的に規定される。The present invention has been made with this in mind, and considers the vocal tract as a group of acoustic tubes with variable cross-sectional areas.
Furthermore, the starting point is to realize the traveling wave phenomenon representing the transmission of sound waves in an acoustic tube using its equivalent circuit. If we consider the vocal tract as an acoustic tube, the propagation of sound waves in each acoustic tube can be thought of as a repetition of reflection and transmission phenomena at the boundary surfaces of each acoustic tube, dividing them into forward waves and backward waves. and transmission are quantitatively determined according to the degree of mismatch of acoustic characteristic impedances at the interface, that is, the ratio of the respective cross-sectional areas of adjacent acoustic tubes.

ここで上記の反射、透過現象は、電気回路においてイン
ピーダンスの異なる線路にインパルス電流を流したとき
の過渡現象と同じである。The reflection and transmission phenomena described above are the same as the transient phenomena that occur when impulse currents are passed through lines with different impedances in an electric circuit.

（２）等価回路このようなことからｎ個の音響管Ｓ、〜Ｓｎよりなる音
響管モデルを第１図（ア）に示すと、このモデルは第１
図（ロ）に示すような抵抗の無い無損失のサージインピ
ーダンス成分よりなる回路要素群（Ｔ、−Ｔ、）を直列
に接続した電気回路として表わすことができる。Ａ１〜
Ａ、は夫々音響管８１〜Ｓｎの断面積である。ここに本
発明では、基本的には上記の電気回路を適用して、これ
に供給するインパルス電流と各回路要素Ｔ、〜Ｔｒｌの
サージインピーダンスを変化させることによって、音響
管モデルの音源波と各音響管の断面積とを変化させるこ
とに対応させ、最終段の回路要素Ｔｎがら出力される電
流をスピーカ等の発声部に供給することによって、音響
管モデルから得られる音声を模擬的に作り出している。(2) Equivalent circuit Based on the above, a sound tube model consisting of n sound tubes S, ~Sn is shown in Figure 1 (A).
It can be expressed as an electric circuit in which a group of circuit elements (T, -T,) made up of lossless surge impedance components without resistance are connected in series as shown in FIG. A1~
A is the cross-sectional area of the acoustic tubes 81 to Sn, respectively. Here, in the present invention, basically, by applying the above-mentioned electric circuit and changing the impulse current supplied to it and the surge impedance of each circuit element T, ~Trl, the sound source wave of the acoustic tube model and each The sound obtained from the acoustic tube model is simulated by supplying the current output from the final stage circuit element Tn to a sounding section such as a speaker in response to changing the cross-sectional area of the acoustic tube. There is.

具体的には、第１図（つ）に示すように上記の電気回路
と等価な回路を想定し、この等価回路における電流源の
電流を時間に対して変化させると共に、後述するように
演算式中には音響管の断面積比が導入されるので、各断
面積Ａ、−Ａｎを時間に対して変化させ、これによって
各部の電流値を演算により求めている。同図においてＰ
は電流源、Ｚｏは電流源のインピーダンス、Ｚ１〜Ｚｎ
は夫々回路要素Ｔ、−Ｔｎのサージインピーダンス、Ｚ
Ｌは放射インピーダンス、１０＾〜ｆ　（ｎ−＋ｌ＾・
　ｉ＋ｎ〜ｉ　ｎＢ、　ｌ　ＯＡ−ａ　ｆｎ−１１Ａ＋
　ａ　ｔｓ−ａ　ｎＢは各々記号の該当する電流路の電
流、Ｗ　ＯＡ　−Ｗい−１，＾、Ｗ、、〜ＷｎＢは電流
源、Ｉ　ＯＡ−１＋ｎ−１１Ａは後進波電流、Ｉ＋ａ〜
Ｉｎｓは前進波電流を示す。この等価回路においては、
例えば回路要素Ｔ１．Ｔｔの結合部分に着目すると、回
路要素Ｔ１中をＴ、に向かって流れる電流Ｌａに対応さ
せた電流源ＷＩＡと、回路要素Ｔ。Specifically, we assume a circuit equivalent to the above electric circuit as shown in Figure 1 (2), change the current of the current source in this equivalent circuit with respect to time, and create the calculation formula as described later. Since the cross-sectional area ratio of the acoustic tube is introduced, each cross-sectional area A, -An is changed with respect to time, and the current value of each part is calculated based on this. In the same figure, P
is the current source, Zo is the impedance of the current source, Z1~Zn
are the surge impedance and Z of the circuit elements T and -Tn, respectively.
L is radiation impedance, 10^~f (n-+l^・
i+n〜i nB, l OA-a fn-11A+
a ts-a nB is the current in the current path corresponding to each symbol, W OA -Wi-1, ^, W, , ~WnB is the current source, I OA-1 + n-11A is the backward wave current, I + a ~
Ins indicates forward wave current. In this equivalent circuit,
For example, circuit element T1. Focusing on the connection part of Tt, the current source WIA corresponds to the current La flowing toward T through the circuit element T1, and the circuit element T.

中をＴＩに向かって流れる電流１１Ａに対応させた電流
源ＷＩＡとを想定し、電流■、Ｂが回路要素Ｔ□。Assuming a current source WIA corresponding to a current 11A flowing through it towards TI, currents ■ and B are circuit elements T□.

Ｔ、の境界にてＴＩへ反射される反射波電流１１ＢとＴ
、へ透過する透過波電流ａｌＡとに分かれ、また電流１
１Ａが回路要素Ｔｔ、ＴＩの境界にてＴ、へ反射される
反射波電流１１ＡとＴ１へ透過する透過波電流ａｌＢと
に分かれることを等価的に表わしたものである。また同
図（１）はこうした様子を模式的に示す模式図である。The reflected wave current 11B reflected to TI at the boundary of T and T
, the transmitted wave current alA transmitted to
This equivalently represents that 1A is divided into a reflected wave current 11A that is reflected to T at the boundary between circuit elements Tt and TI, and a transmitted wave current alB that is transmitted to T1. Further, FIG. 1 (1) is a schematic diagram schematically showing such a situation.

（３）演算先ず第１図（つ）の第１段目の電流源Ｐを含むブロック
は、第２図に示すように二つの回路の重ね合わせと考え
ることができる。従って電流源Ｐの電圧を■とおくと、
同図の電流ａｌ、ａ、は夫々（１）、（２）式で表わさ
れ、この結果電流ａ。Ａは（３）式で表わされる。(3) Calculation First, the block including the first stage current source P shown in FIG. 1 can be thought of as a superposition of two circuits as shown in FIG. Therefore, if we set the voltage of current source P as ■, then
The currents al and a in the figure are expressed by equations (1) and (2), respectively, and as a result, the current a. A is expressed by equation (3).

ａｌ　　＝ｖ／ｚｏ＋ｚｔ　　　　　　　　・　（１）
ａｔ　　＝Ｚｏ／Ｚｏ＋Ｚ＋・　ｌｏｔ　　　　−（２
）ａＯ＾＝ａ、＋ａ。al=v/zo+zt・(1)
at =Zo/Zo+Z+・lot −(2
) aO^=a, +a.

＝　ｌ／Ｚｏ＋Ｚ＋（Ｖ＋Ｚｏ・　ｌ０Ａ）　　−（３
）今、初めて等価回路中に電流を供給°していくとする
と、ＩＯＡを零とすることによりａ。Ａが求まる。= l/Zo+Z+(V+Zo・l0A) −(3
) Now, if we start supplying current into the equivalent circuit for the first time, by setting IOA to zero, a. Find A.

そしてこの値を基にして順次に演算が実行される。Then, calculations are performed sequentially based on this value.

図中左端に位置する１段目のブロック及び２段目のブロ
ックの電流値の演算式を例にとると、以下の（４）〜（
１２）式のように表わされる。Taking as an example the calculation formulas for the current values of the first block and second block located at the left end of the figure, the following (4) to (
12) It is expressed as follows.

１１ｏＡ’：　ｌ／Ｚｏ＋　Ｚ＋（Ｖ’＋Ｚｏ・　ｌ０
Ａ）　　−（４）ｉｏＡ’＝ｌｏＡ’　　ＩＯＡ　　　
　　　　　　　　”’　（５）１０Ａ’＝　ｌ　＋ａ’
＋　ａ　＋８’　　　　　　　　　　・−（６）ａ＋Ｂ
’＝ｓ＋ａ（１＋ｓ＋　ＩＩＡ）　　　　　　　・・・
（７）ｉ＋ａ’−ａｌａ’　　ｌ　＋ｅ　　　　　　　
　　　　”・（８）１　＋ａ’＝　Ｉ　ＯＡ’十ａ　Ｏ
Ａ’　　　　　　　　　　　　・・・（９）ａ＋Ａ’＝
ｓ＋Ａ（１＋Ｂ＋　Ｉ　ＩＡ）　　　　　　　　　・＝
　（１０）１１Ａ’−ａ　ＩＡ’　　Ｉ　＋８　　　　
　　　　　　　　”’　（１１）！　＋Ａ’−１２Ｂ’
　＋　ａ　ｔＢ’　　　　　　　　　　　　・−（１２
）このような計算を進めていくと、最終段のブロックに
関する演算式は（１３）〜（１５）式のように表わされ
る。11oA': l/Zo+ Z+(V'+Zo・l0
A) −(4) ioA'=loA' IOA
”' (5) 10A'= l +a'
+ a +8' ・-(6) a+B
'=s+a(1+s+IIA)...
(7) i+a'-ala' l +e
”・(8)1 +a'= I OA'10a O
A'...(9) a+A'=
s+A(1+B+I IA) ・=
(10) 11A'-a IA' I +8
”' (11)! +A'-12B'
+ a tB' ・-(12
) As such calculations proceed, the arithmetic expressions for the final stage block are expressed as equations (13) to (15).

ａ　ｎＢ’：　ｚＬ、／　Ｚｎ　＋　”ＬＬ−１ｎＢ　
　＋＋＋　（１３）ｊ　ｎＢ””　ａ　ｎＢ’　　ｌ　
ｎ８Ｉ　ｎｓ””　ｉ　（ｎ−１１Ａ＋　ａ　１ｎ−１
１Ａ　　　Ｈ・・（１４）こうして最終段の音響管Ｓｎ
より発せられる音波に対応する電流Ｉｎｓが求められる
。ただしＳ　ＩＢ＋ＳＩＡは各々互いに隣接する音響管
の断面積比で表わされる係数であり、夫々（１５）、（
１６）式％式％１段目から最終段目までのブロックの電流値の一連の演
算は瞬時に実行され、これら演算が所定のタイミングを
とって次々に行われていく。ここに上記の（４）〜（１
４）式において、ダッシュの付いた値は時刻ｔにおける
演算値、ダッシュの付かない値は時刻ｔにおける演算の
１回前における演算により求めた演算値である。こうし
て求めたデジタル値であるｌｎＢをデジタル／アナログ
変換してアナログ電流を作り、この電流をスピーカー等
に供給することにより音声を得る。前記演算のタイミン
グについては、音速を考慮して決定され、例えば各音響
管の１本の伝搬時間を演算の時間間隔とすることによっ
て、後進波電流■。Ａ〜１　（ｎ−１１Ａ及び前進波電
流Ｉ　ＩＢ−１ｎＢが音速と同じ速度で各回路要素Ｔ、
〜１．Ａ中を流れる状態と等価な状態を作り出し、これ
により音響管モデルと電気回路モデルとを整合させてい
る。a nB': zL, / Zn + "LL-1nB
+++ (13)j nB"" a nB' l
n8I ns"" i (n-11A+ a 1n-1
1A H... (14) In this way, the final stage acoustic tube Sn
A current Ins corresponding to the sound wave emitted from the waveform is determined. However, SIB+SIA are coefficients expressed by the cross-sectional area ratio of adjacent acoustic tubes, respectively (15) and (
16) Formula % Formula % A series of calculations of the current values of the blocks from the first stage to the final stage are executed instantaneously, and these calculations are performed one after another at a predetermined timing. Here, the above (4) to (1)
In formula 4), the value with a dash is the calculated value at time t, and the value without a dash is the calculated value obtained by the calculation one time before the calculation at time t. The digital value lnB obtained in this way is converted from digital to analog to create an analog current, and this current is supplied to a speaker or the like to obtain sound. The timing of the calculation is determined in consideration of the speed of sound. For example, by setting the propagation time of one acoustic tube as the calculation time interval, the backward wave current (2) is determined. A~1 (n-11A and forward wave current IIB-1nB each circuit element T at the same speed as the speed of sound,
~1. A state equivalent to the state flowing through A is created, thereby matching the acoustic tube model and the electric circuit model.

本発明は以上のような等価モデルと演算の実現を基調と
したものであり、具体的には、音節を構成する各音素毎
に各音素の発声時間を１以上の時間帯に区分し、各時間
帯毎に、音源波の繰り返し周波数であるピッチ、この音
源波のエネルギー及び音響管の断面積の各初期値と当該
時間帯の前記各初期値Ｘｏから次の時間帯の各初期値Ｘ
、、への変化の仕方を規定した定数と音源波パターンと
を格納する音素パラメータ格納部と、入力された音素デ
ータに対応する前記ピッチ、エネルギー及び断面積の各
補間処理を行うパラメータ補間処理部と、各単語のアク
セントに対応する音源波のエネルギーのパターンを複数
種類に分類し、各種類毎にエネルギーのパターンを指数
関数で規定すると共に、これらパターンの中から入力さ
れた単語に対応するパターンを選択するパターン処理部
と、前記パラメータ補間処理部で補間処理されたパラメ
ータに基づいて前記回路要素群の出力端から出力される
電流値を演算すると共に、前記エネルギーの演算値につ
いては、補間処理された補間値群にパターン処理部で選
択されたパターンに対応する指数関数を重畳させた値を
用いる演算部と、この演算部の演算結果に基づいて音声
を発生する発声部とを備え、前記パラメータ補間処理部は、前記各時間帯の間に前記
初期値Ｘ０と目標値に相当する前記Ｘ１と前記定数とを
用いて多数回補間演算を行い、前記エネルギーについて
は、捕間用の指数関数に基づいて実行することを特徴と
する。The present invention is based on the realization of the equivalent model and calculations described above, and specifically, the utterance time of each phoneme is divided into one or more time periods for each phoneme that makes up a syllable, and each phoneme is divided into one or more time periods. For each time period, each initial value of the pitch, which is the repetition frequency of the sound source wave, the energy of this sound source wave, and the cross-sectional area of the acoustic tube, and each initial value X of the next time period is determined from the above-mentioned initial value Xo of the relevant time period.
a phoneme parameter storage unit that stores constants and sound source wave patterns that define how to change to , , and a parameter interpolation processing unit that performs interpolation processing of the pitch, energy, and cross-sectional area corresponding to the input phoneme data. Then, the energy patterns of the sound source waves corresponding to the accent of each word are classified into multiple types, and the energy pattern for each type is defined by an exponential function, and the pattern corresponding to the input word is determined from among these patterns. and a pattern processing section that selects a current value outputted from the output end of the circuit element group based on the parameters interpolated by the parameter interpolation processing section, and performs interpolation processing on the calculated value of energy. a calculation unit that uses a value obtained by superimposing an exponential function corresponding to the pattern selected by the pattern processing unit on the interpolated value group, and a voice generation unit that generates sound based on the calculation result of the calculation unit, The parameter interpolation processing unit performs interpolation calculations many times during each of the time periods using the initial value X0, the X1 corresponding to the target value, and the constant, and for the energy, an exponential function for interpolation is used. It is characterized by being executed based on.

Ｆ　実施例第１図は本発明の実施例のブロック構成を示す図である
。ｌは日本語処理部であり、入力された日本語文章に対
して文節の区切りや辞書を参照して読みがな変換等を行
う。２は文章処理部であり文章にイントネーションを付
ける処理を行う。３は音節処理部であり、文章を構成す
る音節に対して、イントネーションに応じたアクセント
を付ける。例えば「さくらがさいた」という文章に対し
てｒｓＡｊ、ｒＫＵＪ、ｒＲＡＪ・・・というように音
節に分解し、各音節に対してアクセントを付ける。音の
イントネーションは後述する音源波の繰り返し周波数、
そのエネルギー及び時間で決まることから、アクセント
を付けるとは、これらパラメータに対する係数を決定す
ることである。そして特にエネルギーに対する係数の決
定は音節処理部３内のパターン処理部３１にて実行され
る。F. Embodiment FIG. 1 is a diagram showing a block configuration of an embodiment of the present invention. 1 is a Japanese language processing unit, which converts the input Japanese text into pronunciations by referring to clause breaks and a dictionary. Reference numeral 2 denotes a sentence processing section which performs processing for adding intonation to sentences. 3 is a syllable processing unit, which adds accents to the syllables that make up a sentence according to the intonation. For example, the sentence "Sakura ga Saita" is broken down into syllables such as rsAj, rKUJ, rRAJ, etc., and an accent is added to each syllable. The intonation of a sound is determined by the repetition frequency of the sound source wave, which will be explained later.
Since it is determined by the energy and time, adding an accent means determining coefficients for these parameters. In particular, the determination of coefficients for energy is executed in the pattern processing unit 31 within the syllable processing unit 3.

ここでパターン処理部３１に関して述べると、各単語の
アクセントに対応する音源波のエネルギーのパターンを
例えば頭高パターン、尾高パターン及び中高パターンの
３種類に分類して格納し、これら３種類のエネルギーの
パターンの中から発声すべき単語に対応するパターンを
選択する機能を有する。第４図（ア）〜（つ）は夫々「
動作した」、「異常は」、「遮断器は」の各単語につい
て実際に人間が発声した音声を分析した結果を示す図で
あり、実線は各音節のエネルギーの変化、点線は各音節
のエネルギーのピーク値を結ぶことによって得た単語の
アクセントに対応するエネルギーのパターンを夫々示す
。この実施例では第４図（ア）〜（つ）の点線で示すパ
ターンを夫々エネルギーのピーク値の高い部分の位置に
応じて頭高パターン、尾高パターン、中高パターンとし
て捉え、これら３つのパターンを第５図（ア）〜（つ）
に示すように指数関数により規定してパターン処理部３
１内の格納部に予め格納しておく。Regarding the pattern processing unit 31, energy patterns of sound source waves corresponding to the accent of each word are classified and stored into three types, for example, a head height pattern, a tail height pattern, and a middle height pattern, and these three types of energy patterns are stored. It has a function of selecting a pattern corresponding to a word to be uttered from patterns. Figure 4 (A) to (T) are respectively “
This is a diagram showing the results of analyzing the speech actually uttered by humans for the words "It worked,""Abnormal", and "The circuit breaker", where the solid line shows the change in the energy of each syllable, and the dotted line shows the energy of each syllable. The energy patterns corresponding to the accents of the words obtained by connecting the peak values of are shown. In this example, the patterns shown by the dotted lines in FIG. Figure 5 (A) - (T)
The pattern processing unit 3 is defined by an exponential function as shown in FIG.
1 in advance.

この場合格納した各パターンは、例えば上昇部分と下降
部分とに対して夫々異なった式で定義される指数関数を
割り当てている。そしてこのエネルギーパターンの取り
出しについては、例えば前記辞書に登録する単語に予め
エネルギーパターンの種類を示す符号を付けておき、パ
ターン処理部３１にてこの符号に対応するエネルギーパ
ターンを選択する。In this case, each stored pattern has an exponential function defined by a different formula, for example, assigned to an ascending portion and a descending portion. To extract this energy pattern, for example, a code indicating the type of energy pattern is attached to the word registered in the dictionary in advance, and the pattern processing section 31 selects an energy pattern corresponding to this code.

４は音素処理部、４□は音節パラメータ格納部であり、
音素処理部４は、入力されたｒｓＡＪ・・等の音節デー
タに対し、音節と母音及び子音の単位である音素との対
応関係を規定した音節パラメータ格納部４．内のデータ
を参照して音素に分解する処理、例えば音節ｒｓＡＪに
対し、音素「Ｓ」。4 is a phoneme processing unit, 4□ is a syllable parameter storage unit,
The phoneme processing unit 4 includes a syllable parameter storage unit 4. which defines the correspondence between syllables, vowels, and consonants, which are units of phonemes, for input syllable data such as rsAJ. For example, the phoneme "S" for the syllable rsAJ.

ｒＡＪを取り出す。Take out rAJ.

５はパラメータ補間処理部、５、は音素パラメータ格納
部、５．は音源パラメータ格納部である。5 is a parameter interpolation processing unit; 5 is a phoneme parameter storage unit; 5. is a sound source parameter storage section.

音素パラメータ格納部５□は第６図に示すように各音素
の発声時間を複数例えば３つの時間帯ＯＩ〜０３に区分
し、各時間帯毎に継続時間音源波の繰り返し周波数であ
るピッチ、この音源波のエネルギー及び音響管の断面積
の各初期値と当該時間帯の前記各初期値から次の時間帯
の各初期値への変化の仕方を規定した時定数と音源波パ
ターンとを格納している。この実施例では、人間の声道
（男性の場合的１７ｃＮ）を長さｌｃｘの音響管を１７
個連接したものでモデル化しており、このため断面積値
は１つの時間相当たり１７個（Ａ、〜Ａ１７）定められ
ている。また音源パラメータ格納部５゜には、例えば第
７図に示すように３種類の音源波パターン６１〜Ｇ３の
波形成分が５０個のサンプルデータとして格納されてい
る。前記パラメータ補間処理部５は、各時間帯（０、〜
０３）におけるピッチ、エネルギー及び断面積の補間処
理を行う部分であり、この処理は当該時間帯のピッチ、
エネルギー及び断面積の各パラメータの初期値をＸｏと
し、次の時間帯の初期値をＸｒ、ｎ番目の補間演算値を
Ｘ（ｎ）、各パラメータに対応する時定数をＤで表わす
と、次の（１７）式に示す漸化式に従って当該時間帯の
間にｎ回演算を行う処理である。ただし初期値Ｘ（０）
は上記のＸｏである。As shown in FIG. 6, the phoneme parameter storage unit 5□ divides the utterance time of each phoneme into a plurality of time periods, for example, three time periods OI to 03, and stores the duration, pitch, which is the repetition frequency of the sound source wave, and this value for each time period. Stores each initial value of the energy of the sound source wave and the cross-sectional area of the sound tube, a time constant that specifies the manner of change from each of the above-mentioned initial values in the relevant time period to each initial value in the next time period, and a sound source wave pattern. ing. In this example, the human vocal tract (17 cN for a male) is connected to an acoustic tube of length lcx.
The model is made up of individually connected objects, and therefore, 17 cross-sectional area values (A, to A17) are determined for each time period. Further, in the sound source parameter storage unit 5°, waveform components of three types of sound source wave patterns 61 to G3 are stored as 50 sample data, as shown in FIG. 7, for example. The parameter interpolation processing unit 5 processes each time period (0, ~
This is the part that performs interpolation processing of pitch, energy, and cross-sectional area in 03), and this processing is performed based on the pitch, energy, and cross-sectional area of the relevant time period.
Let the initial value of each parameter of energy and cross-sectional area be Xo, the initial value of the next time period be Xr, the nth interpolated value be X(n), and the time constant corresponding to each parameter be D. This is a process in which calculations are performed n times during the time period according to the recurrence formula shown in equation (17). However, the initial value X (0)
is the above Xo.

Ｘ（ｎ）＝Ｄ　（Ｘ、−Ｘ（ｎ−１））　＋Ｘ（ｎ−１
）・・（１７）例えば時間帯Ｏ１におけるエネルギーの
補間処理については、Ｘ　ｏｈ＜Ｅ　ｌ　％　Ｘ　ｒが
Ｅ２に相当するので（１８）式に従って演算される。X(n)=D (X, -X(n-1)) +X(n-1
)...(17) For example, regarding the energy interpolation process in the time period O1, since X oh<E 1 % X r corresponds to E2, it is calculated according to equation (18).

Ｘ（ｎ）−ＤＥ＋（Ｅｘ　　Ｘ（ｎ　　ｌ））＋Ｘ（ｎ
　　１’）＝（１８）ここで上記（１７）式は次の（１
９）式の漸化式である。X(n)-DE+(Ex X(n l))+X(n
1') = (18) Here, the above equation (17) is transformed into the following (1
This is a recurrence formula of equation 9).

Ｘ＝Ｘ、−ｅ−”　　　　−（１９）即ち（１９）式を微分すると（２０）式が成立し、従っ
て（２１）が成立する。X=X, -e-'' -(19) That is, when equation (19) is differentiated, equation (20) is established, and therefore (21) is established.

ｄ　ｘ／　ｄ　ｔ　＝Ｄ　ｅ　−”　　　−（２０）Δ
Ｘ＝Ｘ（ｎ＋　１）−Ｘ（ｎ）＝Δｔ、　Ｄｅ−Ｄ　ｔ
　Ｉ　ｎ　１−Δｔ　−Ｄ（Ｘ、−Ｘ（ｎ））　　　−
（２１）よって（２２）式となる。d x / d t = De −” − (20) Δ
X=X(n+1)-X(n)=Δt, De-D t
I n 1−Δt −D(X, −X(n)) −
(21) Therefore, the formula (22) is obtained.

Ｘ（ｎ＋１）＝Δｔ　−Ｄ（Ｘ、−Ｘ（ｎ））十Ｘ（ｎ
）・・（２２）ここで補間演算の時間間隔は一定である
からΔｔ−Ｄを一括して時定数りと置き換えることがで
き、（１７）式として表わされる。X(n+1)=Δt-D(X,-X(n))×X(n
)...(22) Here, since the time interval of the interpolation calculation is constant, Δt-D can be collectively replaced with a time constant, which is expressed as equation (17).

６は演算部であり、パラメータ補間処理部５で算出した
パラメータに基づいて、前記補間演算と同じタイミング
で例えばｌＯＯμＳの時間間隔で第１図（つ）に示す電
流１ｎａのデジタル値を求める。７はデジタル／アナロ
グ（Ｄ／Ａ）変換器であり、演算部６で求めたデジタル
値に基づいて電流波（アナログ電流）を作り出す。８は
スピーカー等の発声部であり、アナログ電流に基づいて
音声を発生する。Reference numeral 6 denotes a calculation unit, which calculates the digital value of the current 1na shown in FIG. 1 at the same timing as the interpolation calculation, for example, at a time interval of lOOμS, based on the parameters calculated by the parameter interpolation processing unit 5. 7 is a digital/analog (D/A) converter, which generates a current wave (analog current) based on the digital value obtained by the calculation section 6. Reference numeral 8 denotes a voice generating section such as a speaker, which generates voice based on analog current.

次に上述実施例の作用について述べる。Next, the operation of the above embodiment will be described.

ワードプロセッサ等により入力された日本語文章は、日
本語処理部１１文章処理部２及び音節処理部３を経てイ
ントネーション等が付けられて音節単位に区切られ、更
に音素処理部４によって各音節は音素に分解される。次
いでパラメータ補間処理部によって、各音素のピッチ、
エネルギー及び断面積が音素パラメータ格納部５．から
取り出され、これらパラメータについて各時間帯（０゜
〜０．）毎に補間処理が行われる。この捕間処理は（１
７）式に従って行われ、例えば時間帯０１におけるエネ
ルギーについては（１８）式に従って実行される。第８
図はこの様子を示す図であり、補間演算によって求めら
れたエネルギーの各補間値Ｅ　（１）　、　Ｅ　（２）
　−Ｅ　（ｎ）は次の（２３）式で表わされる曲線に沿
って並ぶことになる。A Japanese sentence inputted by a word processor or the like passes through a Japanese language processing section 11, a sentence processing section 2, and a syllable processing section 3, where intonation etc. are added and it is divided into syllable units.Furthermore, each syllable is divided into phonemes by a phoneme processing section 4. Decomposed. Next, the parameter interpolation processing unit calculates the pitch of each phoneme,
The energy and cross-sectional area are stored in the phoneme parameter storage section 5. , and interpolation processing is performed on these parameters for each time period (0° to 0.). This interpolation process is (1
For example, energy in time period 01 is performed according to equation (18). 8th
The figure shows this situation, and each interpolated value of energy E (1), E (2) obtained by interpolation calculation is shown in the figure.
-E (n) are arranged along the curve expressed by the following equation (23).

Ｅ　＝　Ｅ　ｔ　　ｅ−Ｄｔ−（２３）また各時間帯０
１〜０３毎に規定された音源波パターンのサンプルデー
タが音源パラメータ格納部５、から取り出され、このサ
ンプルデータとピッチ等の補間値が演算部６に与えられ
、演算部６にて上記のＥ、（３）項「演算」にて詳述し
た演算が実行される。この演算において、音節処理部３
にて各音節単位に付けられたアクセントに対応する係数
あるいは関数とパラメータ補間処理部５で求められた各
パラメータとが掛は合わされて、文章のイントネーショ
ンが表われるように演算される。特にエネルギーについ
ては、辞書から引き出された単語に付した符号に基づい
て、パターン処理部３１により単語のアクセントに対応
するエネルギーパターン（第５図参照）を選択し、その
単語の発声時間の間に当該エネルギーパターンが描かれ
るように、パターンを規定する指数関数値（第５図の縦
軸の値）を読み出し、読み出した値にエネルギーの補間
値を掛は合わせてその掛は合イっせ値を演算の要素とし
て用いる。E = E t e-Dt- (23) Also, each time period 0
Sample data of the sound source wave patterns defined for each of 1 to 03 is taken out from the sound source parameter storage section 5, and this sample data and interpolated values such as pitch are given to the calculation section 6. , the calculation detailed in section (3) "Operation" is executed. In this calculation, the syllable processing unit 3
In , the coefficient or function corresponding to the accent added to each syllable unit is multiplied by each parameter obtained by the parameter interpolation processing unit 5, and the intonation of the sentence is calculated. In particular, regarding energy, the pattern processing unit 31 selects an energy pattern (see Figure 5) corresponding to the accent of the word based on the code attached to the word extracted from the dictionary, and In order to draw the energy pattern, read out the exponential function value that defines the pattern (the value on the vertical axis in Figure 5), multiply the read value by the interpolated value of energy, and then calculate the combined value. is used as an element of the calculation.

こうして最終段の音響管より発せられる音波に相当する
電流波のデジタル値が求められ、この値に基づいてＤ／
Ａ変換器７により電流波が作られ、発声音８より対応す
る音声が発せられる。In this way, the digital value of the current wave corresponding to the sound wave emitted from the final stage sound tube is obtained, and based on this value, the D/
A current wave is created by the A converter 7, and a corresponding sound is emitted from the vocalization sound 8.

Ｇ０発明の効果本発明によれば音響管モデルの音波の伝搬を等価回路の
電流の流れに置き換え、各音素毎に電流源のピッチ等の
パラメータと音響管の断面積とを規定し、音素間の継ぎ
目あるいは音素内の区分された時間帯の継ぎ目について
、パラメータの補間処理を実行しているので、滑らかな
音声を得ることができ、聞き手に自然な感じを与える。G0 Effects of the Invention According to the present invention, the propagation of sound waves in a sound tube model is replaced by the flow of current in an equivalent circuit, parameters such as the pitch of the current source and the cross-sectional area of the sound tube are defined for each phoneme, and the Since parameter interpolation processing is performed on the seams between 2 and 2 or between divided time periods within a phoneme, smooth speech can be obtained, giving a natural feel to the listener.

そしてエネルギーについては指数関数に基づいて捕間処
理しているので補間値の並び方が実際の音声の場合に近
く、しかも単語毎に予め指数関数で規定したアクセント
に相当するエネルギーパターンを割り当て、このエネル
ギーパターンにエネルギーの補間値を重畳させているか
ら、より一層人間に近い音声を得ることができる。また
音素間の継ぎ目に相当する領域の全パラメータ値をメモ
リに格納するのではなく、音素単位あるいは時間帯単位
にデータを保存しておけば足りるのでメモリ容量が小さ
くて済む。As for the energy, interpolation processing is performed based on an exponential function, so the arrangement of interpolated values is similar to that of actual speech, and an energy pattern corresponding to an accent defined in advance by an exponential function is assigned to each word, and the energy Since the interpolated energy value is superimposed on the pattern, it is possible to obtain a voice that is even more human-like. Furthermore, instead of storing all parameter values in the area corresponding to the joints between phonemes in the memory, it is sufficient to store data in units of phonemes or units of time, so the memory capacity can be reduced.

[Brief explanation of the drawing]

第１図は音響管の等価モデルを示す説明図、第２図は電
流源を含むブロックを示す等価回路図、第３図は本発明
の実施例を示すブロック図、第４図及び第５図は各々エ
ネルギーパターンを示す説明図である。第６図は音素パ
ラメータのデータ図、第７図は音源波パターンを示す説
明図、第８図はパラメータ補間処理の様子を示す説明図
である。３１・・・パターン処理部、４・・・音素処理部、４１
・・音節パラメータ格納部、５・・パラメータ補間処理
部、５．・・・音素パラメータ格納部、５．・・・音源
波パターン格納部、６・・・演算部、７・・・デジタル
／アナログ変換部、８・・発生部。第２図１扉」県も含もプロ１，１６筈橿Ｉ可籍題第３図Ｖ局ソ列め等Ａ−回ｍ因第４図工事ルｒ−パターンＣ説明田：１）ＨＡ　　　ＤＡ　　Ｎ　　　　ＫＩ　　　ＷＡ第
５図工序ル千しバ゛ツーンｏＬｄＡ図頭語ハ゛ツーン尾高バフーン第６図音累パラメータめデータｌ第７図Ｍｒ／１Ｊｌｊｏ＋FIG. 1 is an explanatory diagram showing an equivalent model of an acoustic tube, FIG. 2 is an equivalent circuit diagram showing a block including a current source, FIG. 3 is a block diagram showing an embodiment of the present invention, and FIGS. 4 and 5 are explanatory diagrams each showing an energy pattern. FIG. 6 is a data diagram of phoneme parameters, FIG. 7 is an explanatory diagram showing a sound source wave pattern, and FIG. 8 is an explanatory diagram showing the state of parameter interpolation processing. 31... Pattern processing unit, 4... Phoneme processing unit, 41
. . . syllable parameter storage section, 5. . . parameter interpolation processing section, 5. . . . phoneme parameter storage unit, 5. . . . Sound source wave pattern storage section, 6. Arithmetic section, 7. Digital/analog conversion section, 8. Generation section. Fig. 2 1 Door” Prefecture and Including Professional 1, 16 Supposedly I Registerable Title Fig. 3 V Station Sorting etc. A- Times m Cause Fig. 4 Construction Rule R- Pattern C Explanation Field: 1) HA DA N KI WA 5th art sequence 1000 yen ba tsun oLdA fig. initial hi tsu tsun Odaka bahoon 6th figuration parameter data l Fig. 7 Mr/1Jljo+

Claims

[Claims]

(1) By regarding the human vocal tract as a plurality of acoustic tubes connected in tandem, and by associating the acoustic tube group with the circuit element group of the surge impedance component and making the sound source correspond with the current source, the acoustic tube In a speech synthesizer that simulates a speech wave emitted from an output end of a group of circuit elements based on a current wave of an output end of a group of circuit elements, the utterance time of each phoneme is divided into one or more time periods for each phoneme constituting a syllable. For each time period, the initial values of the pitch, which is the repetition frequency of the sound source wave, the energy of this sound source wave, and the cross-sectional area of the acoustic tube, and the initial values a phoneme parameter storage unit that stores constants and sound source wave patterns that define how to change to each initial value X_r; and the pitch corresponding to the input phoneme data;
A parameter interpolation processing unit performs energy and cross-sectional area interpolation processing, classifies the energy pattern of the sound source wave corresponding to the accent of each word into multiple types, and defines the energy pattern for each type using an exponential function. , a pattern processing section that selects a pattern corresponding to the input word from among these patterns, and a current value output from the output end of the circuit element group based on the parameters interpolated by the parameter interpolation processing section. A calculation unit that uses a value obtained by superimposing an exponential function corresponding to the pattern selected by the pattern processing unit on the interpolation-processed interpolation value group, and a calculation result of this calculation unit. and a voice generation unit that generates a sound based on the parameter interpolation processing unit, wherein the parameter interpolation processing unit performs interpolation calculation multiple times during each time period using the initial value X_o, the X_r corresponding to the target value, and the constant. A speech synthesis device characterized in that the energy is determined based on an exponential function for interpolation.