JPH0373000B2

JPH0373000B2 -

Info

Publication number: JPH0373000B2
Application number: JP57045440A
Authority: JP
Inventors: Yoji Sugiura
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1982-03-19
Filing date: 1982-03-19
Publication date: 1991-11-20
Also published as: JPS58162996A

Description

[Detailed description of the invention]

本発明は音声の合成装置に関し、合成音声信号
の品質を高める事を目的とする。一般に音素片を結合編集に合成された音声信号
の品質は音素片の接続部の処理に依存する。接続
部に発生する波形の急激な変化、即ち不連続性は
高調波ノイズの原因となり、合成音のＳ／Ｎ比を
低下させ、明瞭度を落とす。本発明は波形接続部の音声信号のデイジタル値
を演算加工し、自然な形で各音素片を係合する事
により高品質の合成音を得る事を可能にするもの
である。本発明の詳細な内容について、以下音声の時間
軸変換装置をその具体的実施例として説明する。第１図は従来の時間軸伸長装置を例示するブロ
ツク図である。同図に於て端子１は音声入力端
子、２は出力端子、３及び４はいずれもＮビツト
の例えばBBDなどのアナログシフトレジスト、
５は低域通過フイルタ（LPF）である。６，７，
８及び９はアナログスイツチであり、入力端子１
からアナログシフトレジスタ３或いは４、LPF
５を経て出力端子２に至る音声信号をスイツチ制
御する。かつ、これらアナログスイツチはアナロ
グシフトレジスタ３，４の書込みクロツク回路１
０を2mN（ｍについては後述する）分周する分周
回路１１のＱ及び出力によつて図示の如く開閉
御される。アナログシフトレジスタ３及び４はクロツク回
路１０及び分周回路１１のＱ、出力のANDゲ
ート１２及び１３によりORゲート１４及び１５
を介して交互に書込みクロツク制御され、又、読
出しクロツク回路１６及び分周回路１１のＱ、
出力のANDゲート１７及び１８により同じくOR
ゲート１４及び１５を介して交互に読出しクロツ
ク制御される。即ち、例えば入力端子に与えられ
た時間軸がｍ倍（ｍ＞１）に圧縮された音声信号
（かかる圧縮信号は例えばテープレコーダの再生
速度を録音速度のｍ倍にすることにより得られ
る）は、分周回路１１のＱ出力が１のとき、アナ
ログスイツチ８を経てアナログシフトレジスタ４
に書込まれる。該シフトレジスタのビツト数はＮ
であるため、入力音声信号がmN個のサンプリン
グ列として順次入力を完了したとき、該シフトレ
ジスタにはmN個のサンプリング列の後端Ｎ個が
記憶され、分周回路１１のＱ出力は反転して０と
なり、スイツチ８を閉じる。同時に該分周回路の
Ｑ出力は１となり、スイツチ６を開いて、同様に
アナログシフトレジスト３に書込みを行なう。こ
のとき図の構成から明らかなように、アナログシ
フトレジスタ４は読出しクロツク回路１６により
クロツクされて、同様に出力により制御されて
いるスイツチ９を経て読み出される。アナログシ
フトレジスタ３への書き込み期間中、別のアナロ
グシフトレジスタ４はこのように読み出しを行な
い、続いて分周回路１１のＱ、出力が反転する
と、再びアナログシフトレジスタ４が書込み、３
が読出しを行なう。ここで書込みクロツク回路１
０のクロツク周波数を₁、読出しクロツク回路１
６のクロツク周波数を₂としたとき、 ₁／₂＝ｍ (1) となるように、各クロツク周波数を決めれば、時
間軸はｍ倍に伸長され、音声入力端子１に入力し
た圧縮音声は出力端子２に時間軸が復元されてあ
らわれる。読出しクロツク周波数₂は、当然必要
な出力音声周波数帯域に対しナイキストのサンプ
リング定理を満たすように決められる。上述の如き従来装置に於ては、アナログシフト
レジスタ３及び４を交互に出力する音素片の接続
タイミングは、書き込みクロツク１０を2mN分
周する分周回路１１の出力によりmN／₁秒毎に
自動的に決められるため、従つて第２図に図示す
るように音素片の接続部に不連続な波形変化とピ
ツチ周波数の変動とが発生する。前記の如く、こ
のような音素片の接続部に於ける波形やピツチの
不連続は音質や明瞭度をいちじるしく低下させ
る。次にこのような従来装置の欠点を改良するた
め、本願発明者が先に出願した特願昭59−94802
号（昭和56年６月18日出願）に記載の先願技術の
内容について第３図のブロツク図と共に簡易に説
明する。同図に於いて、１０１は音声信号入力端
子、１０２は音声信号出力端子、１０３は音声信
号をデイジタルデータに変換するアナログ−デイ
ジタル変換回路（以下Ａ／Ｄと称す）である。１
０４は2^Aバイトの記憶要素を持つランダムアクセ
スメモリ（以下RAMと称す）であり、制御入力
端子LT₃が論理レベル“０”のときデータ入力端
子I₁〜I_d（下位I₁）に与えられたデイジタル値をア
ドレス入力端子A₁〜A_a（下位A₁）により与えら
れるアドレスに記憶する。制御入力端子LT₃が論
理レベル“１”のときは、アドレス入力端子A₁
〜A_aにより与えられるアドレスの内容をデータ
出力端子O₁〜O_dに出力する。１０６，１０８は
クロツク発生回路である。クロツク発生回路１０
６の出力fRはORゲート１２０を介して読出しカ
ウンタ１０７のクロツク入力端子Ｔに供給され、
読出しカウンタ１０７の出力が歩進される。読出
しカウンタ１０７はＡビツトのカウンタであり、
演算制御回路１０５の出力により初期値が設定さ
れる。ここでこの初期値設定のしかたについて述
べる。先づ演算制御回路１０５は読出しカウンタ１０
７のクリア入力端子CLにパルスを与えて読出し
カウンタ１０７の出力をクリアする。続いて演算
制御回路１０５のSC（Set Counter）端子から初
期値化すべき数のパルスをORゲート１２０の入
力に与える事により読出しカウンタ１０７の初期
値を設定する。尚、この初期値を設定する周期は
クロツク発生回路１０６の出力fRが所定数計数
される間隔であり、従つて、このときの読出しカ
ウンタ１０７の出力値は、前の周期で初期値化さ
れた値に所定数が加わつた値であり、この値を新
たに初期値設定すべき値から減じた数のクロツク
をORゲート１２０を介して読出しカウンタ１０
７のクロツク入力端子Ｔに供給すればよい。この
場合、読出しカウンタをクリアする必要はない。
尚、以上述べた演算制御回路１０５による読出し
カウンタ１０７の歩進はクロツク発生回路１０６
の出力fRが論理レベル“０”のときに行なわな
ければならない。このfRの論理レベル“１”のときにも上述の
設定を行う場合は、ORゲート１２０のfRからの
入力端子の所に第４図に示すようにANDゲート
１２１をおき、一方の入力端子にこのfRを供給
し、他方の入力端子に演算制御回路１０５の出力
端子を出力結線して、このANDゲート１２１の
出力をORゲート１２０の入力端子に結線し、演
算制御回路１０５でANDゲート１２１の入力の
一方を禁止すれば、fRの論理レベルが“０”で
も“１”でも読出しカウンタ１０７の初期値を設
定できる。また、演算制御回路１０５による読出しカウン
タ１０７の初期値設定は第５図に示す如くクロツ
ク発生回路１２３の出力fHを用いる事によつて
も同様に行なわれる。この場合、fHはfRと較べ
て充分に周波数の高いクロツクであり、これを
ANDゲート１２２の一方の入力端子と演算制御
回路１０５の入力端子に結線する。演算制御回路
１０５は読出しカウンタ１０７の初期値設定を行
う際、ANDゲート１２１の入力に論理レベル
“０”を与え、ANDゲート１２２の入力に論理レ
ベル“１”を与え、クロツク回路１２３の出力が
所定数計数されたら、ANDゲート１２１の入力
を論理レベル“１”に、ANDゲート１２２の論
理レベルを“０”に戻すことにより読出しカウン
タを初期化できる。また、読出しカウンタをプリ
セツトカウンタで構成し、直接初期値をプリセツ
トしても同様である事は明らかである。このようにして初期値設定が行なわれたのち、
読出しカウンタはfRを分周する。尚読出しカウ
ンタの出力Y₁〜Y_aの下位ビツトはY₁である。さて、クロツク発生回路１０８はRAM１０４
の書込みクロツクタイミングを与える。クロツク
発生回路１０８の出力fwはＡビツトの分周回路
１０９のクロツク入力端子Ｔに入力供給され、分
周回路１０９の出力W₁〜W_a（下位W₁）を順次歩
進させる。１１０は切換え回路であり、制御入力
LT₁が論理レベル“１”のとき、分周回路１０９
の出力W₁〜W_aを、また論理レベル“０”のと
き、読出しカウンタ１０７の出力をRAM１０４
のアドレス入力A₁〜A_aへ出力する。１１４，１
１６はインバータであり、１１５はANDゲート、
１１７はNANDゲートである。R₁，R₂及びR₃は
抵抗器であり、C₁，C₂及びC₃はコンデンサであ
る。R₁とC₁、R₂とC₂及びR₃とC₃はそれぞれ積分
回路を構成している。これらの時定数をそれぞれ
τ₁，τ₂，τ₃とすると、これらは全て書込みクロツ
クfwの周期よりも充分に小さく。τ₁＞τ₃＞τ₂とな
るよう構成する。即ち、第６図に示す如く、
ANDゲート１１５の出力（同図ｂ）はfw（同図
ａ）の立ち上りで論理レベル“１”となり、時定
数τ₁でコンデンサC₁が充電されると、立ち下が
る。NANDゲート１１７の出力（同図ｃ）はfw
（同図ａ）の立ち上りより遅れて立ち下がり、
ANDゲート１１５の出力が立ち下がる時点より
先に立ち上がる。１１１はラツチ回路であり、制
御入力端子LT₂の論理レベルが“０”のとき、入
力を出力に伝え、“１”のときは立ち上りの時点
の情報をラツチ出力する。１１２はデイジタル−
アナログ変換回路（以下Ｄ／Ａと称す）であり、
デイジタル値をアナログ値に変換する。１１３は
ローパスフイルタであり、Ｄ／Ａ変換された音声
信号のサンプリングノイズを除去する。１３０は
NANDゲートであり、ANDゲート１１５の出力
と演算制御回路１０５の出力を入力結線し、出力
をラツチ回路１１１のLT₂の入力に結線する。演
算制御回路１０５は読出しカウンタ１０７の初期
値を設定している間は論理レベル“０”を
NANDゲート１３０に出力する。これにより読
出しカウンタの初期値が設定される過渡状態にお
いて、ラツチ回路１１１は入力を出力に伝えない
よう構成している。このように構成すると、入力端子に与えられた
音声信号はＡ／Ｄ１０３によりデイジタル値に変
換され、書込みクロツクfwの周期でRAM１０４
に記憶される。即ち、ANDゲート１１５の出力
が“１”のとき、RAM１０４のアドレス入力A₁
〜A_aは分周回路１０９の出力が与えられ、制御
入力端子LT₃が“０”となり、Ａ／Ｄ１０３の出
力が記憶される。fwの周期で分周回路１０９は
歩進するので、音声信号がサンプリングされ記憶
されるRAM１０４のアドレスは連続的である。
但し、2^Aのアドレスは０となる。書込みクロツク
fwに従つてサンプリングされ、デイジタル値と
してRAM１０４に記憶された音声信号は読出し
クロツクfRに従つて読み出され、Ｄ／Ａ変換１
１２され、アナログ信号として音声信号が再生さ
れる。この書込みクロツクfwと読出しクロツク
fRの比が時間軸変換される比率となる。読出しカウンタは読出しクロツクfRの周期で
歩進され、従つてRAM１０４の記憶内容を読み
出すアドレスはfRの周期で歩進される。ラツチ
回路１１１を設けたのはRAM１０４の書き込み
時に誤つたアドレスの内容を読み出さなくする為
である。即ち、RAM１０４の読み出しは書き込
み時以外常時行なわれている。さて、先願の技術は第１図の従来例にて説明し
た如く、接続する音素片の接続部について時間的
修正を加えるものであるが、これを演算制御回路
１０５により行なう。演算制御回路１０５は、
ROMによりプログラムされた演算処理装置CPU
（コンピユータ）であつても構わない。第７図は
演算制御回路１０５の働きを示すものである。各
処理周期は読出しクロツクがＮケ計数される周期
である。以下、時間軸ｔ方向は書込みクロツク
fwを単位に述べる。〔処理周期２〕で読み出され
る音素片サンプル列Ｎ個のうち、最後端のＭ個の
サンプル列を〔処理周期１〕において書込みクロ
ツクfwに従つて記憶する。〔処理周期２〕の先頭
から（Ｍ＋ｒ）個のサンプル列をとりこみ、これ
と前述のＭ個のサンプル列について、相開度の高
い点Ｋを算出する。このＫの算出については後述
する。〔処理周期２〕の先頭からＫ個経た時点か
ら、前述のＭ個のサンプル列の相開度が高い故、
〔処理周期３）の先端で、〔処理周期２〕の先頭か
ら（Ｋ＋Ｍ）個すぎた時点の分周回路１０９の出
力の値に読出しカウンタ１０７の出力を初期値化
する。これにより〔処理周期２〕と〔処理周期
３〕の接続点において読み出される音声波形のサ
ンプル列は連続的に連なつていくことができる。
〔処理周期２〕の先頭から（Ｋ＋Ｍ）個の書込み
クロツクfwを計数した時点からＭ個のサンプル
列は、〔処理周期３〕で読み出される後端部Ｍ個
のサンプル列であり、次の処理周期の間の接続点
の算出の為、これを記憶する。以後、処理周期毎
にこの操作をすれば、波形は滑らかに接続されて
ゆく。さて、相開度の高い接続点の値Ｋの算出につい
て以下述べる。第８図ａ及びｂはそれぞれ第７図
の〔処理周期１〕で書き込まれる先行音素片の後
端部のサンプルＭ個及び〔処理周期２〕の先端の
後続音素片の前端部（Ｍ＋ｒ）個のサンプルを示
す。この先行音素片後端部のサンプル数列をXp
（ｐ＝１、２、……Ｍ）、後続音素片前端部のサン
プル数列をYp（ｐ＝１、２、……、Ｍ＋ｒ）とす
る。このXp及びYpはＡ／Ｄ１０３の出力を書込
みクロツクfwでサンプリングして得られる。こ
の音素片の類似性を演算するには、XpとYpの二
乗誤差（e² _k）を計算するのがよい。二乗誤差
（e² _k）は、 e² _k＝１／Ｍ_M 〓^P=1 （Xp−Ｘ／σX−Yp＋Ｋ−Ｙ／σY）² ……(2) 但し、＝１／Ｍ_M 〓^P=1 Xp，＝１／Ｍ_M 〓^P=1 Yp，Ｋ＝０，１，２，……、ｒ−１であらわされる。これはサンプリング波形Xpに
対してYpをＫ個だけずらせて重ね合わせたとき
の類似度をあらわすものである。しかしながら、(2)式にもとづく演算処理は、実
際には膨大な計算ステツプ数となり、短時間（少
なくとも数10ミリ秒の間）で計算するには、高性
能のコンピユータによらねばならない。もともと
(2)式は振幅やレベルの異なる２つの波形の相関を
しらべるものであつて、その為標準偏差σx，σy
で波形を正規化し、更に平均レベル，との差
について二乗和をとることにより誤差を計算して
いる。ところで先願の技術の音声の合成装置の場
合、取扱う音素片は時間的に近接した波形であ
り、従つて振幅およびレベル共もともと類似して
いるとみて良い。この場合２つの波形間の差は(2)
式に代えて e² _k＝１／Ｍ_M 〓^P=1 （Xp−Yp＋ｋ）² ……(3) を計算しても良い。しかも、先頭の技術の場合は
２つの波形の類似度が最大のタイミングを把握す
れば良いのであり、従つて(3)式は更に次の(4)式に
代えられる。 g_k＝_M 〓^P=1 ｜Xp−Yp＋ｋ｜ ……(4) ここでXp及び（Yp＋ｋ）はＡ／Ｄ変換器の最
上位桁だけを用いてもよい。また、入力信号の交
流交叉点付近の極性を用いてもよい。この場合
Xp及び（Yp＋ｋ）はいずれも〔１〕又は The present invention relates to a speech synthesis device, and an object of the present invention is to improve the quality of a synthesized speech signal. In general, the quality of an audio signal synthesized by combining phoneme segments depends on the processing of the connecting parts of the phoneme segments. Sudden changes in waveforms, ie, discontinuities, occurring at the connection portion cause harmonic noise, lowering the S/N ratio of the synthesized sound and reducing the clarity. The present invention makes it possible to obtain a high-quality synthesized sound by calculating and processing the digital values of the audio signals at the waveform connections and engaging each phoneme in a natural manner. The detailed content of the present invention will be explained below using an audio time axis conversion device as a specific example. FIG. 1 is a block diagram illustrating a conventional time axis expansion device. In the figure, terminal 1 is an audio input terminal, 2 is an output terminal, and 3 and 4 are N-bit analog shift registers such as BBD.
5 is a low pass filter (LPF). 6,7,
8 and 9 are analog switches, and input terminal 1
From analog shift register 3 or 4, LPF
The switch controls the audio signal that reaches the output terminal 2 via the terminal 5. In addition, these analog switches are connected to the write clock circuit 1 of the analog shift registers 3 and 4.
Opening and closing are controlled as shown in the figure by the Q and output of a frequency dividing circuit 11 which divides 0 by 2 mN (m will be described later). The analog shift registers 3 and 4 are connected to OR gates 14 and 15 by the clock circuit 10 and the Q of the frequency divider circuit 11, and the output AND gates 12 and 13.
The write clock is alternately controlled via the read clock circuit 16 and the frequency divider circuit 11.
Similarly, OR is performed by output AND gates 17 and 18.
It is alternately read clock controlled via gates 14 and 15. That is, for example, an audio signal in which the time axis applied to the input terminal is compressed by m times (m>1) (such a compressed signal can be obtained, for example, by increasing the playback speed of a tape recorder to m times the recording speed) is , when the Q output of the frequency divider circuit 11 is 1, the analog shift register 4 passes through the analog switch 8.
written to. The number of bits of the shift register is N
Therefore, when the input audio signal completes sequential input as mN sampling strings, the rear end N of the mN sampling strings are stored in the shift register, and the Q output of the frequency dividing circuit 11 is inverted. becomes 0, and the switch 8 is closed. At the same time, the Q output of the frequency dividing circuit becomes 1, the switch 6 is opened, and the analog shift register 3 is written in the same manner. As is clear from the structure shown, the analog shift register 4 is then clocked by a readout clock circuit 16 and read out via a switch 9 which is also controlled by the output. During the writing period to the analog shift register 3, another analog shift register 4 performs reading in this way, and then when the Q and output of the frequency divider circuit 11 is inverted, the analog shift register 4 writes again, and the 3
performs the read. Here write clock circuit 1
0 clock frequency to ₁ , read clock circuit 1
If the clock frequency of 6 is set to ₂ , then if each clock frequency is determined so that ₁ / ₂ = m (1), the time axis will be expanded by m times, and the compressed audio input to audio input terminal 1 will be output. The restored time axis appears on terminal 2. The readout clock frequency ₂ is naturally determined so as to satisfy Nyquist's sampling theorem for the necessary output audio frequency band. In the conventional device as described above, the connection timing of the phoneme pieces that are alternately output from the analog shift registers 3 and 4 is automatically determined every mN/ _second by the output of the frequency dividing circuit 11 that divides the write clock 10 by 2 mN. Therefore, as shown in FIG. 2, discontinuous waveform changes and pitch frequency fluctuations occur at the connection portions of phoneme pieces. As mentioned above, such discontinuities in waveform and pitch at the junctions of phoneme segments significantly degrade sound quality and clarity. Next, in order to improve the drawbacks of the conventional device, the inventor of the present application previously filed a patent application No. 59-94802.
The contents of the prior art described in No. No. (filed on June 18, 1982) will be briefly explained with reference to the block diagram in Fig. 3. In the figure, 101 is an audio signal input terminal, 102 is an audio signal output terminal, and 103 is an analog-to-digital conversion circuit (hereinafter referred to as A/D) for converting the audio signal into digital data. 1
04 is a random access memory (hereinafter referred to as RAM) having a storage element of 2 ^A bytes, and when the control input terminal LT ₃ is at the logic level "0", the data input terminals I ₁ to I _d (lower I ₁ ) are The received digital value is stored at the address given by the address input terminals A ₁ to _{A a} (lower A ₁ ). When control input terminal LT ₃ is at logic level “1”, address input terminal A ₁
The contents of the address given by ~A _a are output to data output terminals O ₁ ~ _{O d} . 106 and 108 are clock generation circuits. Clock generation circuit 10
The output fR of 6 is supplied to the clock input terminal T of the read counter 107 via the OR gate 120.
The output of read counter 107 is incremented. The read counter 107 is an A bit counter,
An initial value is set by the output of the arithmetic control circuit 105. Here, we will discuss how to set this initial value. First, the arithmetic control circuit 105 is a read counter 10.
The output of the read counter 107 is cleared by applying a pulse to the clear input terminal CL of No. 7. Next, the initial value of the read counter 107 is set by applying the number of pulses to be initialized from the SC (Set Counter) terminal of the arithmetic control circuit 105 to the input of the OR gate 120. Note that the period for setting this initial value is the interval at which the output fR of the clock generation circuit 106 is counted a predetermined number, and therefore the output value of the read counter 107 at this time is the same as that initialized in the previous period. This value is the value plus a predetermined number, and the number of clocks obtained by subtracting this value from the value to be newly set as the initial value is read out via the OR gate 120 and the counter 10
It is sufficient to supply it to the clock input terminal T of No. 7. In this case, there is no need to clear the read counter.
Incidentally, the increment of the read counter 107 by the arithmetic control circuit 105 described above is performed by the clock generation circuit 106.
This must be done when the output fR of is at logic level "0". If the above setting is to be performed even when the logic level of fR is "1", an AND gate 121 is placed at the input terminal from fR of the OR gate 120 as shown in FIG. This fR is supplied, the output terminal of the arithmetic control circuit 105 is connected to the other input terminal, and the output of the AND gate 121 is connected to the input terminal of the OR gate 120. If one of the inputs is prohibited, the initial value of the read counter 107 can be set regardless of whether the logic level of fR is "0" or "1". Further, the initial value setting of the read counter 107 by the arithmetic control circuit 105 is similarly performed by using the output fH of the clock generation circuit 123 as shown in FIG. In this case, fH is a clock with a sufficiently high frequency compared to fR, and this
One input terminal of the AND gate 122 and an input terminal of the arithmetic control circuit 105 are connected. When the arithmetic control circuit 105 sets the initial value of the read counter 107, it applies a logic level "0" to the input of the AND gate 121, a logic level "1" to the input of the AND gate 122, and sets the output of the clock circuit 123 to After counting a predetermined number, the read counter can be initialized by returning the input of AND gate 121 to logic level "1" and the logic level of AND gate 122 to "0". Furthermore, it is clear that the same effect can be achieved even if the read counter is constructed from a preset counter and the initial value is directly preset. After the initial values are set in this way,
The read counter divides fR. Note that the lower bit of the outputs _Y1 to _Ya of the read counter is _Y1 . Now, the clock generation circuit 108 is the RAM 104.
Provides write clock timing. The output fw of the clock generating circuit 108 is supplied to the clock input terminal T of the A-bit frequency dividing circuit 109, and the outputs W ₁ to _{W a} (lower W ₁ ) of the frequency dividing circuit 109 are incremented sequentially. 110 is a switching circuit, and a control input
When LT ₁ is at logic level “1”, the frequency divider circuit 109
The outputs W ₁ to _{W a of}
output to address inputs A ₁ to A _a . 114,1
16 is an inverter, 115 is an AND gate,
117 is a NAND gate. R ₁ , R ₂ and R ₃ are resistors, and C ₁ , C ₂ and C ₃ are capacitors. R ₁ and C ₁ , R ₂ and C ₂ , and R ₃ and C ₃ each constitute an integrating circuit. Letting these time constants be τ ₁ , τ ₂ , and τ ₃ respectively, they are all sufficiently smaller than the period of the write clock fw. It is configured so that τ ₁ > τ ₃ > τ ₂ . That is, as shown in Figure 6,
The output of the AND gate 115 (b in the figure) becomes logic level "1" at the rising edge of fw (a in the figure), and falls when the capacitor _C1 is charged with the time constant _τ1 . The output of the NAND gate 117 (c in the same figure) is fw
It falls later than the rise of (a) in the same figure,
It rises before the output of the AND gate 115 falls. Reference numeral 111 denotes a latch circuit, which transmits the input to the output when the logic level of the control input terminal _LT2 is "0", and latches and outputs information at the time of rising when it is "1". 112 is digital
It is an analog conversion circuit (hereinafter referred to as D/A),
Convert digital values to analog values. A low-pass filter 113 removes sampling noise from the D/A converted audio signal. 130 is
It is a NAND gate, and the output of the AND gate 115 and the output of the arithmetic control circuit 105 are connected as inputs, and the output is connected as the input of the LT ₂ of the latch circuit 111. The arithmetic control circuit 105 keeps the logic level “0” while setting the initial value of the read counter 107.
Output to NAND gate 130. In a transient state where the initial value of the read counter is thereby set, the latch circuit 111 is configured not to transmit the input to the output. With this configuration, the audio signal applied to the input terminal is converted into a digital value by the A/D 103, and is sent to the RAM 104 at the cycle of the write clock fw.
is memorized. That is, when the output of the AND gate 115 is "1", the address input A ₁ of the RAM 104
~ _Aa is given the output of the frequency dividing circuit 109, the control input terminal _LT3 becomes "0", and the output of the A/D 103 is stored. Since the frequency dividing circuit 109 advances with a period of fw, the addresses of the RAM 104 where the audio signal is sampled and stored are continuous.
However, the address of ^2A is 0. write clock
The audio signal sampled according to fw and stored in the RAM 104 as a digital value is read out according to the readout clock fR, and the D/A conversion 1
12, and the audio signal is reproduced as an analog signal. This write clock FW and read clock
The ratio of fR is the ratio of time axis conversion. The read counter is incremented at the period of the read clock fR, and therefore the address from which the contents of the RAM 104 are read is incremented at the period of fR. The reason why the latch circuit 111 is provided is to prevent the contents of an erroneous address from being read out when writing to the RAM 104. That is, reading from the RAM 104 is performed at all times except when writing. Now, as explained in the prior art example of FIG. The arithmetic control circuit 105 is
Arithmetic processing unit CPU programmed by ROM
(computer). FIG. 7 shows the operation of the arithmetic control circuit 105. Each processing period is a period in which N read clocks are counted. Below, the time axis t direction is the write clock.
Described in units of fw. Of the N phoneme segment sample strings read out in [processing cycle 2], the last M sample strings are stored in [processing cycle 1] in accordance with the write clock fw. (M+r) sample sequences are taken from the beginning of [processing cycle 2], and a point K having a high phase opening is calculated for this and the above-mentioned M sample sequence. The calculation of this K will be described later. Since the phase opening degree of the above-mentioned M sample strings is high from the time when K samples have passed from the beginning of [processing cycle 2],
At the beginning of [processing cycle 3], the output of the read counter 107 is initialized to the value of the output of the frequency dividing circuit 109 at the time (K+M) times past the beginning of [processing cycle 2]. As a result, the sample string of the audio waveform read out at the connection point between [processing cycle 2] and [processing cycle 3] can be continuous.
The M sample strings from the time when (K+M) write clocks fw are counted from the beginning of [processing cycle 2] are the rear end M sample strings read out in [processing cycle 3], and are used for the next processing. This is stored for calculation of connection points between cycles. Thereafter, by performing this operation every processing cycle, the waveforms will be smoothly connected. Now, calculation of the value K of a connection point with a high phase opening will be described below. Figures 8a and b are M samples of the rear end of the preceding phoneme written in [processing cycle 1] and (M+r) samples of the front end of the succeeding phoneme at the tip of [processing cycle 2] in Figure 7, respectively. A sample is shown below. The sequence of samples at the rear end of this preceding phoneme is Xp
(p=1, 2, . . . M), and the sample sequence at the front end of the subsequent phoneme segment is Yp (p=1, 2, . . . , M+r). These Xp and Yp are obtained by sampling the output of the A/D 103 using the write clock fw. In order to calculate the similarity of phoneme segments, it is best to calculate the squared error (e ² _k ) between Xp and Yp. The squared error (e ² _k ) is: e ² _k = 1/M _M 〓 ^P=1 (Xp-X/σX-Yp+K-Y/σY) ² ...(2) However, = 1/M _M 〓 ^{P= 1} Xp,=1/M _M 〓 ^P=1 Yp, K=0, 1, 2, ..., r-1. This represents the degree of similarity when Yp is shifted by K points and superimposed on the sampling waveform Xp. However, the arithmetic processing based on equation (2) actually requires a huge number of calculation steps, and requires a high-performance computer to perform calculations in a short period of time (at least several tens of milliseconds). originally
Equation (2) examines the correlation between two waveforms with different amplitudes and levels, and therefore the standard deviations σx, σy
The error is calculated by normalizing the waveform with , and then calculating the sum of squares of the difference from the average level. By the way, in the case of the speech synthesis device of the prior art, the phoneme pieces handled have waveforms that are close in time, so it can be considered that they are originally similar in amplitude and level. In this case the difference between the two waveforms is (2)
Instead of the formula, e ² _k =1/M _M 〓 ^P=1 (Xp−Yp+k) ² ...(3) may be calculated. Furthermore, in the case of the first technique, it is sufficient to grasp the timing at which the similarity between the two waveforms is maximum, and therefore equation (3) can be further replaced with the following equation (4). g _k = _M 〓 ^P=1 |Xp−Yp+k| (4) Here, for Xp and (Yp+k), only the most significant digit of the A/D converter may be used. Alternatively, the polarity near the AC crossover point of the input signal may be used. in this case
Both Xp and (Yp+k) are [1] or

〔０〕
である。即ち、これは各対応するサンプリング値
の差の絶対値を積分したものであり、これが極小
となるｋを知る事により接続タイミングが決定さ
れる。先願の技術では計算処理時間を極力小さくする
為、(4)式にかえて g_k＝１／Ｍ_M 〓^P=1 （XpYp＋ｋ） ……(5) を計算してもよい。(5)式において、Xp及び（Yp
＋ｋ）はＡ／Ｄ変換器の最上位桁のデータであ
り、〔１〕又は[0]
It is. That is, this is the integral of the absolute value of the difference between the corresponding sampling values, and the connection timing is determined by knowing k at which this is the minimum. In the technique of the prior application, in order to minimize the calculation processing time, g _k = 1/M _M 〓 ^P=1 (XpYp+k) ... (5) may be calculated instead of equation (4). In equation (5), Xp and (Yp
+k) is the most significant digit data of the A/D converter, [1] or

〔０〕である。の記号は排他的
論理和をとる記号であり、従つて、（XpYp＋
ｋ）はXpと（Yp＋ｋ）の排他的論理和、すなわ
ちXpと（Yp＋ｋ）が共に〔１〕、又はIt is [0]. The symbol is a symbol for exclusive OR, and therefore, (XpYp+
k) is the exclusive OR of Xp and (Yp+k), that is, Xp and (Yp+k) are both [1], or

〔０〕の
ときWhen [0]

〔０〕が与えられ、その他の時〔１〕が与え
られる。従つて先行音素片の後端部の２値信号サ
ンプリングデータXpと、後続音素片の先端部の
２値信号サンプリングデータYpの類似性がgkに
より与えられ、このgkを最小にするｋを知る事
により接続タイミングが決定される。即ち、演算
制御回路１０５はgkをｋ＝０、１，……，ｒ−
１についてそれぞれ計算し、これが最も小さくな
るｋを決定する。即ち、第８図に示すように先行
音素片の後端Ｍ個のサンプル列は、後続音素片の
先頭よりｋ個ずれた部分から重ね合わせるのが最
も誤差が少ないということになる。以上説明したように演算制御回路１０５は入力
端子１０１に与えられた音声信号がＡ／Ｄ１０３
により変換されたデイジタル値を、クロツク発生
回路１０８の出力である書込みクロツクfwでサ
ンプリングして、前記サンプル列XpとYpを得
る。このサンプル列Xp及びYpを取り込むタイミ
ングは全て、分周回路１０９の出力W₁〜W_aの値
により指示される。又、演算制御回路１０５はク
ロツク発生回路１０６の出力である読み出しクロ
ツクを計数し、これがＮ個計数された時、読出し
カウンタ１０７の初期値を設定し、次の処理周期
に入る。この読出しカウンタを初期値化する値
は、XpとYpの演算により得られたｋにYpを取
り込んだ時の分周回路の指示値を加えたものであ
る。尚、演算制御回路１０５が類似度の演算を行な
うサンプル列は入力端子１０１に与えられたアナ
ログ入力信号Ａ／Ｄ変換器１０３とは異なる他の
Ａ／Ｄ変換器或は零交叉極性検出回路（図示せ
ず）でデイジタル値に変換したものを第１クロツ
クfwに従つてサンプリングしたものでもよい。このように先願の技術は、演算制御回路１０５
の働きにより滑らかな接続点の得られる時間軸変
換回路を提供するものであり、従つて従来装置の
如き接続部の不連続やピツチ周波数の変動の確率
の少ない合成音を得る事ができる。しかし乍らこ
の先願技術においては従来例と較べ、前述の如く
合成音質の改善したものを得る事ができるが充分
でない。即ち音声は、数十ミリ秒の相隣る音素片
の波形が類似しているという前提の基に先願技術
を説明してきたが、厳密に言うと、数十ミリ秒の
相隣る音素片の波形が類似している確率は高いと
言う事である。換言すれば、ある確率で相隣る音
素片波形が互いに異る場合があるという事であ
り、このような場合を考えると、先願技術におい
ても、接続部の不連続やピツチ周波数の変動が生
じている。また、音節の始まり部分と終り部分の
包絡波形は振幅方向の変動が大きく、このような
場合先願技術によりピツチ接続はできたとして
も、振幅の包絡が接続点において変動する場合が
多く、やはり合成音質を損ねるという問題があ
る。本発明は斯る問題を解決できる装置を提案する
ものである。即ち、本発明はRAMに記憶されて
いる先行音素片の後端の接続点に続く部位のデイ
ジタル値のサンプル列を後続部音素片前端部のデ
イジタル値のサンプル列を各対応するサンプル点
毎に一定比率で加算し該加算されたデイジタル値
を各対応する後続音素片前端部のサンプル点に記
憶する事により波形の不連続のない接続を得るも
のである。第３図においてＡ／Ｄ１０３でデイジタル信号
に変換された音声信号はRAM１０４に記憶さ
れ、これが読み出されてラツチ回路１１１でラツ
チされていたが、本発明の構成ではこれに加え
RAM１０４に記憶された音声信号を演算制御回
路１０５で読み出し、演算処理して再びRAM１
０４に書き込む機能を必要とし、第９図に示すよ
うに構成する。第９図はこの要部を表わしたブロ
ツク図であり、Ａ／Ｄ１０３でデイジタル信号に
変換された音声信号はRAM１０４に記憶され、
これが読み出されラツチ回路１１１でラツチさ
れ、Ｄ／Ａ変換されるという第３図で説明した機
能の他にRAM１０４の記憶内容が演算制御回路
１０５に読み出され演算処理して、再びRAM１
０４に記憶される。このため演算制御回路１０５
はRAM１０４の記憶内容の番地を指定できるよ
う、番地指定出力P₁〜P_Aを切換回路１１０に出
力する。第１０図は演算制御回路１０５でRAM１０４
の記憶内容を演算処理する働きを説明するため、
第７図を書き直したものである。第７図で説明し
たように、先行音素片の後端部J₁と後続音素片の
前端部J₂のところが接続されるが、このJ₁の時点
からの音素片のサンプルMPとJ₂の時点からのサ
ンプルMSを演算処理をして、この結果でMSを
書き直すMD。第１１図はこのMP，MS及びMP
とMSを演算処理したサンプル波形を示す図であ
る。 MPのサンプル列を〔MP_iｊ〕＝〔MP₁ｊ〕，
〔MP₂ｊ〕，……〔MP_Aｊ〕，（ｊ＝１、２、……
_Ｂ） MSのサンプル列を〔MS_iｊ〕＝〔MS₁ｊ〕，
〔MS₂ｊ〕，……〔MS_Aｊ〕、とすると、（ｊ＝１、
２、……_B） MD_iｊ＝（Ａ−ｉ）MP_iｊ＋ｉ・MS_iｊ／Ａ ……(6) なるサンプル列、即ち〔MD_iｊ〕＝（Ａ−１）MP₁ｊ＋MS₁ｊ／Ａ，（Ａ−２）MP₂ｊ＋2MS₂ｊ／Ａ， ……MP_Aｊ＋（Ａ−１）MS_Aｊ／Ａ ……(7) のサンプル列を演算処理により作り、〔MS_iｊ〕
におきかえればよい。これはMPとMSのサンプ
ル列からＪ個単位に加算される比率が順次かえら
れたものであり、ｉの番号が増える毎に先行音素
片の後端波形（第１１図ａ）から、順次後続音素
片の先端波形（第１１図ｂ）になめらかに移行
（第１１図ｃ）する。このようにすれば、音素片
の急激な波形変化やレベル変化が生じても、ノイ
ズ変動のない、滑らかな波形接続が行なえる。また、以上の説明は音素片の基本ピツチが接続
できるよう演算制御回路１０５で波形接続点を算
出し、読み出し位置を制御したうえで、上述の波
形処理をしたが、単に上述の波形処理をしても接
続点の不連続によるノイズ発生は防ぐ事ができ
る。但し、この場合ピツチ変動による音質劣化は
まぬがれられない。本発明はこのような構成であるから、波形接続
点に不連続が生じることなく、従つて滑らかな高
音質の合成音を得ることができる。[0] is given, and at other times [1] is given. Therefore, the similarity between the binary signal sampling data Xp at the rear end of the preceding phoneme and the binary signal sampling data Yp at the leading end of the subsequent phoneme is given by gk, and it is necessary to know k that minimizes this gk. The connection timing is determined by That is, the arithmetic control circuit 105 sets gk to k=0, 1, ..., r-
1, and determine k for which this is the smallest. That is, as shown in FIG. 8, the least error is achieved when the M sample strings at the rear end of the preceding phoneme are superimposed from a portion shifted by k from the beginning of the succeeding phoneme. As explained above, the arithmetic control circuit 105 receives the audio signal applied to the input terminal 101 from the A/D 103.
The digital values converted by the above are sampled by the write clock fw, which is the output of the clock generating circuit 108, to obtain the sample strings Xp and Yp. The timing of taking in the sample strings Xp and Yp is all instructed by the values of the outputs W ₁ to _{W a} of the frequency dividing circuit 109. Further, the arithmetic control circuit 105 counts the readout clocks output from the clock generation circuit 106, and when N clocks are counted, sets the initial value of the readout counter 107 and enters the next processing cycle. The value for initializing this read counter is the sum of k obtained by calculating Xp and Yp and the instruction value of the frequency dividing circuit when Yp is taken in. Note that the sample string on which the arithmetic control circuit 105 calculates the degree of similarity is obtained by using an analog input signal applied to the input terminal 101 from an A/D converter different from the A/D converter 103 or a zero-crossing polarity detection circuit ( (not shown) may be converted into a digital value and sampled in accordance with the first clock fw. In this way, the technology of the prior application is based on the arithmetic control circuit 105.
The present invention provides a time base conversion circuit which can obtain smooth connection points by the function of . Therefore, it is possible to obtain a synthesized sound with less probability of discontinuity of connection points and fluctuation of pitch frequency as in conventional devices. However, in this prior art, although it is possible to obtain an improved synthesized sound quality as described above compared to the conventional example, it is not sufficient. In other words, the technology of the prior application has been explained based on the premise that the waveforms of adjacent phoneme segments of several tens of milliseconds are similar in speech, but strictly speaking, the waveforms of adjacent phoneme segments of several tens of milliseconds are similar. This means that there is a high probability that the waveforms are similar. In other words, there are cases where adjacent phoneme segment waveforms are different from each other with a certain probability, and considering such a case, even in the prior art, discontinuity at the connection part and fluctuations in the pitch frequency may occur. It is occurring. In addition, the envelope waveform at the beginning and end of a syllable has large fluctuations in the amplitude direction, and in such cases, even if pitch connection is possible using the technology of the prior application, the amplitude envelope often fluctuates at the connection point. There is a problem that the synthesized sound quality is impaired. The present invention proposes a device that can solve this problem. That is, the present invention stores a sample string of digital values at the portion following the connection point at the rear end of the preceding phoneme stored in the RAM, and a sample string of digital values at the front end of the succeeding phoneme for each corresponding sample point. By adding them at a constant ratio and storing the added digital values at sample points at the front end of each corresponding subsequent phoneme segment, connections without discontinuities in waveforms can be obtained. In FIG. 3, the audio signal converted into a digital signal by the A/D 103 is stored in the RAM 104, read out and latched by the latch circuit 111, but in the configuration of the present invention, in addition to this,
The audio signal stored in the RAM 104 is read out by the arithmetic control circuit 105, arithmetic processing is performed, and the audio signal is read out from the RAM 104.
04, and is configured as shown in FIG. FIG. 9 is a block diagram showing this main part, and the audio signal converted into a digital signal by the A/D 103 is stored in the RAM 104.
In addition to the function explained in FIG. 3 in which this data is read out, latched by the latch circuit 111, and D/A converted, the memory contents of the RAM 104 are also read out by the arithmetic control circuit 105 and processed by the RAM 104 again.
04. Therefore, the arithmetic control circuit 105
outputs address designation outputs P ₁ to P _A to the switching circuit 110 so that the address of the memory contents of the RAM 104 can be designated. Figure 10 shows the RAM 104 in the arithmetic control circuit 105.
In order to explain the function of processing the memory contents of
This is a redrawn version of Figure 7. As explained in FIG. 7, the rear end J ₁ of the preceding phoneme and the front end J ₂ of the succeeding phoneme are connected, but the sample MP of the phoneme from the time of J ₁ and the sample MP of J ₂ are connected. MD that performs arithmetic processing on the sample MS from a point in time and rewrites the MS using this result. Figure 11 shows these MP, MS and MP
FIG. 3 is a diagram showing a sample waveform obtained by arithmetic processing of and MS. Let MP sample sequence be [MP _i j] = [MP ₁ j],
[MP ₂ j], ... [MP _A j], (j=1, 2, ...
_B ) MS sample sequence [MS _i j] = [MS ₁ j],
[MS ₂ j], ...[MS _A j], then (j=1,
2,... _B ) MD _i j=(A-i)MP _i j+i・MS _i j/A...(6) A sample sequence, that is, [MD _i j]=(A-1)MP ₁ j+MS ₁ j /A, (A-2)MP ₂ j+2MS ₂ j/A, ...MP _A j+(A-1)MS _A j/A ...(7) is created by arithmetic processing, and [MS _i j]
Just change it to . This is because the ratio of adding J pieces from the sample strings of MP and MS is changed sequentially, and each time the number i increases, the waveform of the preceding phoneme is added sequentially from the trailing end waveform (Figure 11a) to the succeeding phoneme. There is a smooth transition (Fig. 11c) to the tip waveform of the phoneme (Fig. 11b). In this way, even if a sudden change in waveform or level of a phoneme occurs, smooth waveform connection without noise fluctuation can be achieved. In addition, in the above explanation, the waveform connection point was calculated by the arithmetic control circuit 105 so that the basic pitches of the phoneme segments could be connected, and the waveform processing described above was performed after controlling the readout position. However, noise generation due to discontinuity of connection points can be prevented. However, in this case, deterioration in sound quality due to pitch fluctuation cannot be avoided. Since the present invention has such a configuration, it is possible to obtain a smooth synthesized sound of high quality without causing any discontinuity at the waveform connection point.

[Brief explanation of drawings]

第１図は従来の音声合成装置のブロツク・ダイ
ヤグラム、第２図は従来の装置の特性を示す図
面、第３図は本発明の音声合成装置の構成を示す
ブロツク・ダイヤグラム、第４図および第５図は
第３図の読出しカウンタ１０７の初期値化を行う
際の要部の構成例を示す回路図、第６図は第３図
の同装置のゲート１１５及び１１７の出力を説明
する為のタイムチヤートを示す図面、第７図は第
３図の同装置の演算制御回路１０５の働きを説明
する為のタイムチヤートを示す図面、第８図は先
行音素片Ｍ個と後続音素片（Ｍ＋ｒ）個のサンプ
ル列XpとYpの波形図、第９図は第３図における
RAM１０４の記憶内容を演算処理回路１０５で
演算処理し、RAM１０４に書き込む機能を付加
する為に第３図を変更して作つた本発明の音声合
成装置を示す要部の回路図である。第１０図は本
発明での演算制御回路１０５の働きを説明する為
に第７図を書き直した本発明の動作説明図であ
る。第１１図は第１０図のMP及びMSから新た
なサンプル波形MDを作成する過程を説明するた
めの波形図である。１０１……信号入力端子、１０２……信号出力
端子、１０３……アナログ−デイジタル変換回
路、１０４……ランダムアクセスメモリ、１０５
……演算制御回路、１０６……読出しクロツクを
発生するクロツク回路、１０７……読出しカウン
タ、１０８……書込みクロツクを発生するクロツ
ク回路、１１０……切り換え回路、１１１……ラ
ツチ回路、１１２……デイジタル−アナログ変換
回路、１１３……ローパスフイルタ、MP……接
続点後の先行音素片のサンプル波形、MS……接
続点後の後続音素片のサンプル波形。 FIG. 1 is a block diagram of a conventional speech synthesis device, FIG. 2 is a diagram showing the characteristics of the conventional device, FIG. 3 is a block diagram showing the configuration of the speech synthesis device of the present invention, and FIGS. FIG. 5 is a circuit diagram showing an example of the configuration of the main part when initializing the read counter 107 shown in FIG. 3, and FIG. Figure 7 is a diagram showing a time chart for explaining the operation of the arithmetic and control circuit 105 of the same device shown in Figure 3. Figure 8 is a diagram showing M preceding phoneme segments and subsequent phoneme units (M+r). Figure 9 is a waveform diagram of sample sequences Xp and Yp in Figure 3.
FIG. 3 is a circuit diagram of the main parts of the speech synthesis device of the present invention, which is produced by modifying FIG. 3 in order to add a function of processing the stored contents of RAM 104 in an arithmetic processing circuit 105 and writing them to RAM 104. FIG. FIG. 10 is an explanatory diagram of the operation of the present invention, which is a rewrite of FIG. 7 in order to explain the function of the arithmetic control circuit 105 in the present invention. FIG. 11 is a waveform diagram for explaining the process of creating a new sample waveform MD from the MP and MS shown in FIG. 10. 101...Signal input terminal, 102...Signal output terminal, 103...Analog-digital conversion circuit, 104...Random access memory, 105
... Arithmetic control circuit, 106 ... Clock circuit that generates a read clock, 107 ... Read counter, 108 ... Clock circuit that generates a write clock, 110 ... Switching circuit, 111 ... Latch circuit, 112 ... Digital - Analog conversion circuit, 113...Low pass filter, MP...Sample waveform of the preceding phoneme after the connection point, MS...Sample waveform of the subsequent phoneme after the connection point.

Claims

[Scope of Claims] 1. A speech synthesis device that performs editing and synthesis using phoneme segments extracted from an analog speech waveform, comprising: (a) analog-to-digital conversion means for converting an analog input signal into a digital signal; (c) digital storage means for storing the output of the converting means in accordance with one clock; The degree of similarity between the digital value sample sequence at the rear end of the preceding phoneme segment and the digital value sample sequence at the front end of the subsequent phoneme segment obtained by sampling a separate analog input signal. The degree of similarity is calculated, and the connection point between the rear end of the preceding phoneme segment and the front end of the succeeding phoneme segment of the portion with the highest degree of similarity, that is, the connection point between the preceding phoneme segment in the digital storage means and the following phoneme segment in the digital storage device, is calculated. Determine the address of the connection point [connection point A ₁ of the preceding phoneme, connection point A ₂ of the succeeding phoneme], and calculate the digital value of the preceding phoneme at the part following the decided end A ₁ of the preceding phoneme. The sample string and the sample string of digital values of the subsequent phoneme in the part following the determined connection point _A2 of the subsequent phoneme are added at a fixed ratio at each corresponding address, and the sample string of the added digital values is added. (d) converting the digital value read from the digital storage means based on the second clock into an analog value; digital-to-analog converting means for reproducing audio signals, and the arithmetic control means converts a sample string of digital values of the preceding phoneme at a portion following the rear end of the preceding phoneme and a front end of the succeeding phoneme. Speech synthesis characterized in that the ratio of adding sample strings of digital values of subsequent phoneme segments at each corresponding sample point is controlled so that the former ratio is initially high and the latter ratio increases sequentially with the sample points. Device.