JPH0141999B2

JPH0141999B2 -

Info

Publication number: JPH0141999B2
Application number: JP56069950A
Authority: JP
Inventors: Minoru Kuroda; Hiroshi Itoyama; Seiji Hiraoka; Kenji Kaga
Original assignee: Matsushita Electric Industrial Co Ltd; Matsushita Electric Works Ltd
Current assignee: Panasonic Electric Works Co Ltd; Panasonic Holdings Corp
Priority date: 1981-05-09
Filing date: 1981-05-09
Publication date: 1989-09-08
Also published as: JPS57185099A

Description

[Detailed description of the invention]

本発明は音声合成装置に関するものである。一
般に音声の特徴を表わす特徴パラメータには、音
の大小を表わす振巾パラメータ（以下Ａパラメー
タと略称する）と、音の高低すなわち基本周期を
表わすピツチパラメータ（以下Ｐパラメータと略
称する）と、音の音色すなわちスペクトル分布を
表わすスペクトルパラメータ（以下Ｓパラメータ
と略称する）とがある。したがつて音声を合成す
るには音声信号を音声周波数よりも十分高い周波
数を有するサンプリングパルスでサンプリング
し、各特徴パラメータを抽出して予めデータメモ
リに記憶させ、データメモリから読み出された特
徴パラメータに基いて音源を駆動して音声を合成
すれば良いことになる。この種の音声合成装置で
は音声信号のサンプリング数を多くすればするほ
ど忠実な音声を合成できることになるが、反面サ
ンプリング数が多くなると音声合成データのビツ
ト数が増大して大きな容量のデータメモリが必要
になるとともにデータ処理の回路構成が複雑にな
り、コストが高くなるという問題がある。従つて
従来の音声合成装置にあつてはサンプリングパル
ス周波数（以下サンプリング周波数と略称する）
は人間の声を忠実に再生するために最低必要な周
波数に設定されており、通常、サンプリング周波
数は８または10KHz（サンプリング周期125μSま
たは100μS）に設定する。ところで、サンプリン
グパルスにて音声信号をサンプリングしてＡ、
Ｐ、Ｓパラメータよりなる特徴パラメータを抽出
してメモリに記憶させ、メモリに記憶させた特徴
パラメータをサンプリングパルスに等しい周期の
同期パルスにて読み出して音声を合成する場合、
Ｐパラメータに基いて再生される音声の基本周期
はサンプリング周波数によつて決められる離散値
しかとり得ない。すなわち、サンプリング周期を
100μS、ＰパラメータをPi（整数値）とすれば再
生される基本周期ｔはｔ＝100Pi×10^-6（sec）（但しPi＝１、２、３…）となつて再生し得る音声周波数は（表１）のよう
な離散値となる。 The present invention relates to a speech synthesis device. In general, the characteristic parameters that represent the characteristics of speech include the amplitude parameter (hereinafter referred to as the A parameter) that represents the magnitude of the sound, the pitch parameter (hereinafter referred to as the P parameter) that represents the pitch or fundamental period of the sound, and the pitch parameter that represents the pitch of the sound (hereinafter referred to as the P parameter). There are spectral parameters (hereinafter abbreviated as S-parameters) that represent the timbre or spectral distribution of . Therefore, in order to synthesize speech, the speech signal is sampled with a sampling pulse having a frequency sufficiently higher than the speech frequency, each feature parameter is extracted and stored in a data memory in advance, and the feature parameters read from the data memory are All you have to do is drive the sound source based on this and synthesize the sound. In this type of speech synthesis device, the greater the number of samplings of the audio signal, the more faithful the speech can be synthesized. However, on the other hand, as the number of samplings increases, the number of bits of the speech synthesis data increases, requiring a large capacity data memory. There is a problem in that the data processing circuit configuration becomes complicated and the cost increases as the data processing becomes necessary. Therefore, in the case of conventional speech synthesis devices, the sampling pulse frequency (hereinafter abbreviated as sampling frequency)
is set to the minimum frequency required to faithfully reproduce the human voice, and the sampling frequency is usually set to 8 or 10 KHz (sampling cycle 125 μS or 100 μS). By the way, by sampling the audio signal with the sampling pulse, A,
When extracting feature parameters consisting of P and S parameters and storing them in a memory, and reading out the feature parameters stored in the memory using a synchronization pulse with a period equal to the sampling pulse to synthesize speech,
The fundamental period of audio reproduced based on the P parameter can only take discrete values determined by the sampling frequency. In other words, the sampling period is
100μS, and the P parameter is Pi (an integer value), the fundamental period t of reproduction is t = 100Pi × 10 ^-6 (sec) (however, Pi = 1, 2, 3...), and the reproducible audio frequency is The result will be discrete values as shown in Table 1.

【表】【table】

【表】このような離散的な音声周波数しか発生できな
くとも人間の声などは比較的忠実に再生できる。
しかしながら音階周波数で構成されたメロデイ音
を再生する場合、各音階（ド、レ、ミ…）の音階
周波数は表１に示すように上記離散値に含まれて
いないものが多く、メロデイ音をこのような離散
的な音声周波数を用いて再生すれば著しく音程の
ずれたメロデイ音が再生されるという問題があつ
た。本発明は上記問題点を解決することを目的と
するものである。以下PARCOR型音声合成装置の一実施例につ
いて図を用いて説明する。PARCOR型音声合成
方式は第１図に示すように音声信号V_Sをサンプ
リングパルスにより適当周期toでサンプリング
し、サンプリングされたサンプリング値Xtと
X_t-pの間にある（Ｐ−１）個のサンプリング値に
よる相関関係を除外し、XtとX_t-pとの相関関係
のみを抽出したPARCOR係数（部分自己相関係
数：以下Ｋパラメータと略称する）をＳパラメー
タとして音声を合成するものであり、Ｋパラメー
タは音声がほぼ定常状態とみなせる１フレーム
（５〜20ｍsec）において、適当周期to（約
100μsec）毎に音声信号V_Sのサンプリングを行な
い、隣り合うサンプル値間の相関係数をK₁とし、
複数間隔離されたサンプル値間では、その間に挾
まれたサンプル値による影響を最小２乗誤差によ
る線形予測によつて求め、それらを差引いてでき
る相関係数をK₂〜K_oとしたものである。このＫ
パラメータはK₁、K₂、K₃のようにX_tに近い点と
の部分自己相関関係を表わす係数にはスペクトル
分布に関する情報が豊富に含まれているが、K₈、
K₉、K₁₀のようなX_tから遠い点との部分自己相関
係数にはスペクトル分布に関する情報があまり含
まれていないので、低次のＫパラメータに多数の
量子化ビツトを割り当て、高次のＫパラメータに
は少数の量子化ビツトを割り当てることによりビ
ツト数を節減して冗長度を小さくするほうが効果
的である。したがつてPARCOR方式はＳパラメ
ータとして自己相関係数を用いて各係数に同一ビ
ツト数を割り当てるようにした自己相関係数方式
に比べて帯域圧縮率がすぐれているものである。
通常各Ａ、Ｐ、Ｋパラメータは圧縮されて記憶あ
るいは伝送され、Ａパラメータに対して５ビツ
ト、Ｐパラメータに対して６ビツト、Ｋパラメー
タの各係数K₁、K₂…K₁₀に対して７、６、５、
４、４、４、３、３、３、３ビツト等のように割
り当てる。第２図は時報装置、警報装置、目覚装置などに
用いるPARCOR型音声合成装置の一実施例のブ
ロツク回路図であり、音声、メロデイを圧縮され
た特徴パラメータとして記憶するデータメモリＭ
を具備した制御用IC(A)と、音声合成用IC（点線部
Ａ、Ｂを除いた部分）とで構成され、両IC間で
ビツトシリアルにデータの受渡しを行なうように
したものである。ところで、音声の特徴パラメー
タはすべて再生用ROM１内に10ビツトのデータ
として記憶されており、各特徴パラメータに割り
当てられるデータの個数は、その特徴パラメータ
が音質に寄与する度合に応じて最適に配分されて
いる。例えばＡパラメータの場合10ビツトで表現
されるデータが32個記憶されている。したがつて
Ａパラメータの任意のデータをアクセスするとき
に必要とされる相対アドレスのビツト数は５ビツ
トである。この相対アドレスは特徴パラメータを
必要最小限に圧縮して表現したものであるので圧
縮パラメータと呼ばれる。これに対して再生用
ROM１内に記憶されている実際の特徴パラメー
タは再生パラメータと呼ばれる。上述した所から
明らかなように再生パラメータのビツト数はＡ、
Ｐ、K₁〜K₁₀の各特徴パラメータについてすべて
共通に10ビツトであるが、圧縮パラメータのビツ
ト数はＡ、Ｐ、K₁〜K₁₀の各パラメータについて
異なるものであり、たとえばそれぞれ５、６、
３、３、３、３、４、４、４、５、６、７ビツト
（合計53ビツト）である。そのほか予備エリアと
して３ビツト分すなわちデータ８個分が再生用
ROM１に確保されている。かかる圧縮パラメー
タは音声信号がほぼ定常状態とみなし得る５〜20
ｍsec（１フレーム）ごとに１組（＝53ビツト）抽
出されたものであるから、高々2650ビツト／秒で
データを処理することにより音声信号を再生する
ことができ、無音区間やリピート区間をも考慮に
入れると実際には1600ビツト／秒程度で音声信号
を再生することができるものである。ところで、
実施例にあつては話し言葉のように均一に連続的
に音の高低が変化する音声を合成する場合とメロ
デイ音や歌唱のように離散的に続く音声を合成す
る場合とにおける基本周期発生方式を変更するよ
うになつており、メロデイ音を再生する場合、制
御用IC(A)からデータ入力端子８に入力される圧
縮パラメータのうち圧縮Ａパラメータの先頭にメ
ロデイ制御コードを付加し、メロデイ制御コード
検出回路３４からメロデイ制御コード検出信号
V_Mが得られたとき音声−メロデイ切換回路３３
をメロデイ側（ｂ側）に切換えて各音階音の基本
周期に等しい基本周期で音源を駆動してメロデイ
音を合成するように構成されている。以下、実施例の基本構成および動作（人間の声
などを合成する通常の音声合成動作）について説
明する。いま、圧縮パラメータ（すなわち再生用ROM
１の相対アドレス）は１フレームごとにデータ入
力端子８から切換回路１０を介してリングレジス
タ３にビツトシリアルに記憶されるが、このよう
な相対アドレスだけでは再生用ROM１には各パ
ラメータの再生データが連続して記憶してあるの
で、特定のデータを取り出すことができない。そ
こでインデツクスROM２の中に記憶されている
再生ROM中の各パラメータの先頭アドレスをア
ドレスカウンタ１１の制御の下に順次取り出し
て、上記相対アドレスと加算回路４によつて加算
することにより再生用ROM１の絶対アドレス
（９ビツト）を計算し、この絶対アドレスによつ
て再生用ROM１をアクセスするようにしてい
る。インデツクスROM２には圧縮パラメータの
ビツト配分数を３ビツトの２進数で記憶させてお
り、この圧縮パラメータのビツト配分数に関する
データは再生制御回路１２に送られ、再生制御回
路１２は、ビツト配分数だけシフトクロツクをリ
ングレジスタ３に送出する。したがつてリングレ
ジスタ３からは、上記ビツト配分数に応じて例え
ばＡパラメータの場合には５ビツト、Ｐパラメー
タの場合には６ビツト、K₁₀パラメータの場合に
は３ビツト、…K₁パラメータの場合には７ビツ
トという具合に圧縮パラメータ（相対アドレス）
をそれぞれ加算回路にシリアルに送出するもので
ある。またインデツクスROM２内に記憶されて
いる各特徴パラメータの再生用ROM１内におけ
る先頭アドレスは、パラレルシリアル変換回路１
３を介して１ビツトづつ順次加算回路４に送出さ
れるので、順次１ビツトづつ加算されて絶対アド
レスが計算されるものである。こうして計算され
たシリアルな絶対アドレスはシリアルパラレル変
換回路１４を介してパラレルデータに変換され、
再生用ROM１をアクセスするアドレスに変換さ
れる。この再生用ROM１から出力される特徴パラメ
ータは１フレームごとに更新されるものである
が、データを更新する際に各フレーム間の接続点
において特徴パラメータが不連続的に変化すると
音声信号に歪みを生じて明瞭度が低下するおそれ
があるので、データ更新の際に特徴パラメータが
スムーズに変化し得るように補間計算回路５を設
けて１フレーム内の８点において近似的な直線的
補間を行なうようにしている。このため、タイミ
ング制御回路２８では第２図に示すように１フレ
ーム（20ｍsec）中に８個の補間用Ｄクロツク
（2.5ｍsec）を発生し、１個のＤクロツク中に25
個のパラメータ読込用Ｐクロツク（100μsec）、さ
らに１個のＰクロツク中に22個のビツト読込用Ｔ
クロツク（4.5μsec周期）を作成する。なおＰク
ロツクはサンプリングパルスに相当する同期パル
スである。８個のＤクロツクのうち、最初のD₁
においてデータ入力端子８からリングレジスタ３
にデータが読み込まれる。各圧縮パラメータＡ、
Ｐ、K₁₀…、K₁は奇数番目のＰクロツクで順次読
み込まれるものであり、例えばＡパラメータは
P₁区間のT₆〜T₁₀の５個のＴクロツクで読み込ま
れる。偶数番目のＰクロツクあるいは上記以外の
Ｔクロツクは補間計算回路５、音源ROM６、デ
ジタルフイルタ７などのタイミングとして使用さ
れるものである。この補間計算回路５はメロデイ
制御コードが検出されたときにはその動作を停止
する。上記補間計算回路５によつて2.5ｍsecごとに新
しい値に更新された各特徴パラメータは、それぞ
れＰラツチ１６、AKラツチ２３に一時的に蓄え
られる。ただし、補間計算に差し当り必要のない
パラメータはすべてAKパラメータスタツク２４
に転送してデジタルフイルタ７の音声合成用デー
タとして蓄積している。ところでＰラツチ１６に蓄えられたＰパラメー
タは有声音源１９を駆動してＰパラメータに対応
する基本周期を有するインパルス信号を発生する
ためのデータであり、メロデイ制御コード検出回
路３４から出力が得られておらず、音声−メロデ
イ切換回路３３が人間の話し言葉のような音声を
合成する側（ａ側）に切換えられている場合、サ
ンプリングパルスに等しいＰクロツクをカウント
している音源ROM６のアドレスカウンタ１８の
リセツト信号はアドレスカウンタ１８出力とＰラ
ツチ１６に蓄えられたＰパラメータの一致を検出
する一致回路１７の出力となり、アドレスカウン
タ１８はＰクロツク周期の整数倍（Ｐパラメー
タ）の周期でリセツトされるようになつている。
したがつて音源ROM６からＰパラメータに基い
た有声音源制御データが出力され、有声音源１９
にてＰパラメータに対応する基本周期（表１に示
すような離散的な音声周波数）を有するインパル
ス信号を発生させる。なお、音声に基本周期がな
い場合には、音源制御回路２０にて切換回路２２
を駆動し、無声音源２１に切換えるようになつて
おり、無声音源２１は基本周期を持たないホワイ
トノイズ（白色雑音）を発生させるものである。
次にＡパラメータおよびＫパラメータはデジタル
フイルタ７に供給され、有声、無声音源より供給
されて、信号に振巾の大小およびスペクトル分布
に関する情報を付け加えることにより音声を再声
するものである。図中２５は再生された音声信号
を増巾する低周波アンプ、２６はスピーカ、２７
は水晶発振回路である。以下第４図〜第６図に示す音階信号発生回路３
１、リセツトパルス発生回路３２の構成およびメ
ロデイ音を合成する音声合成動作について説明す
る。音階信号発生回路３１はＰパラメータに対応
するデータすなわち制御用IC(A)から出力される
圧縮Ｐパラメータをリクエスト信号V_REによりと
りこむようにしたシフトレジスタ３１ａと、圧縮
Ｐパラメータをアドレスデータとして圧縮Ｐパラ
メータに対応する音階データを読み出すようにし
た音階ROM３１ｂと、音階ROM３１ｂから読
み出された音階データをプリセツト入力としＰク
ロツクよりも周波数の高いクロツクパルス例えば
Ｔクロツクをカウントするプリセツトカウンタ３
１ｃと、プリセツトカウンタ３１ｃの０検出信号
を反転するインバータ３１ｄとで構成され、クロ
ツクパルスの周期の整数倍（音階データ）の周期
を有する０検出信号を音階信号P_Mとして出力す
る。この場合、音階信号発生回路３１から出力さ
れる音階信号P_Mの周波数は離散的な値をとるが
離散間隔はクロツクパルスの周波数に応じて小さ
くなる。したがつて音階ROM３１ｂに適当な音
階データを記憶させておくことにより音階信号発
生回路３１にて各音階信号の周波数に一致するよ
うな音階信号P_Mが形成できることになる。例え
ばクロツクパルスをＴクロツク（周期4.5μsec）
とし、Ｐパラメータ「12」に対応する圧縮Ｐパラ
メータにて音階ROM３１ｂから音階データ
「284」が読み出されるようにすれば、プリセツト
カウンタ３１ｃから4.5×284μsecの周期で０検出
信号が得られ、この音階信号P_MはＰパラメータ
の「12.8」に相当する基本周期となり、Ｐパラメ
ータに対応する離散的な基本周期を補間できるこ
とになる。リセツトパルス発生回路３２はインバ
ータ３５ａ，３５ｂ、コンデンサ３６、ナンドゲ
ート３７、Ｄフリツプフロツプ３８およびナンド
ゲート３９にて形成されており、第７図ａのタイ
ムチヤートに示すようにプリセツトカウンタ３１
ｃから出力される音階信号P_Mが得られた直後の
Ｐクロツクをアドレスカウンタ１８のリセツトパ
ルスV_Rとして出力するようになつている。なお
図中イはＰパラメータが「12」のときの一致回路
１７出力、ロは音階信号P_M、ハはリセツトパル
スV_Rを示すものである。いま制御用IC(A)からメロデイ制御コードが出
力され、メロデイ制御コード検出回路３４からメ
ロデイ制御コード検出信号V_Mが得られている場
合、音声−メロデイ切換回路３３はメロデイ側
（ｂ側）に切換えられ、アドレスカウンタ１８は
リセツトパルス発生回路から出力されるリセツト
パルスV_Rにてリセツトされ、アドレスカウンタ
１８はＰクロツクを13個カウントしてリセツトさ
れる場合と、Ｐクロツク12個カウントしてリセツ
トされる場合とが、４：１の割合で起きることに
なる。したがつて等価的にＰパラメータ「12.8」
に相当する基本周期で音源ROM６がアドレスさ
れ、有声音源１９が制御されることになり、音階
音「ソ」が正確に再生されることになる。同様に
して各音階音が正確に再生され、メロデイが正し
い音程で再生される。第７図ｂに示すタイムチヤートは音階信号P_M
とリセツトパルスV_Rの関係をさらに分かり易く
説明するもので、例として3.75KHz（267μsec）
の音階信号P_Mに対応するリセツトパルスV_Rを示
したものである。図から明きらかなようにリセツ
トパルスV_RとしてのＰパルスの３、６、８、11、
14、16…番目のパルスが出力される。このリセツ
トパルスV_Rでリセツトされるアドレスカウンタ
１８による音源ROM６がアドレスされるので、
音源ROM６から等価的に3.75KHz（800／３μsec）とみなせる周期で有声音源データが読み出される
ことになり、有声音源１９が正しい音階周波数で
駆動されてメロデイ音や歌唱などの音声が正確な
音程で再生されることになる。なお実施例にあつ
ては圧縮パラメータを音階ROM３１ｂのアドレ
スデータとしているが、Ｐラツチ１６に蓄えられ
たＰパラメータを音階ROM３１ｂのアドレスと
して用いても良い。本発明は上述のように音声信号を音声周波数よ
りも高い周波数のサンプリングパルスにてサンプ
リングし、振巾パラメータ、ピツチパラメータお
よびスペクトルパラメータよりなる特徴パラメー
タを抽出してデータメモリに記憶させ、データメ
モリから読み出された特徴パラメータに基いて音
源を制御して音声を合成するようにした音声合成
装置において、話し言葉のように均一に連続的に
音の高低が変化するような音声を合成する場合
と、メロデイ音や歌唱のように離散的に続く音声
を合成する場合とでピツチパラメータに基いて設
定される音源駆動周期を変更するようになつてお
り、メロデイ音を再生する場合には、音階信号発
生回路にてピツチパラメータに対応する音階信号
（音階音の基本周期に略一致した）を発生させ、
この音階信号に基いて音源を駆動する基本周期を
設定するようになつているので、再生されたメロ
デイ音の音程のずれを使用上、差支えのない程度
に小さくすることができ、しかもアドレスカウン
タ、音源ROM、音源などの音声合成回路の回路
構成およびビツト構成を変更する必要がないとい
う利点をもつている。[Table] Even if only such discrete audio frequencies can be generated, the human voice can be reproduced with relative fidelity.
However, when reproducing a melody sound composed of scale frequencies, the scale frequencies of each scale (C, D, E, etc.) are often not included in the above discrete values as shown in Table 1, and the melody sound is If such discrete audio frequencies were used for reproduction, there was a problem in that melody sounds with a significantly shifted pitch would be reproduced. The present invention aims to solve the above problems. An embodiment of the PARCOR type speech synthesizer will be described below with reference to the drawings. As shown in Figure 1, the PARCOR type speech synthesis method samples the audio signal V _S at an appropriate period to using sampling pulses, and then converts the sampled value Xt to
PARCOR coefficient (partial autocorrelation coefficient: hereinafter abbreviated as K parameter) that extracts only the correlation between Xt and X _tp , excluding the correlation due to (P-1) sampling values between X _tp The sound is synthesized using the S parameter as the K parameter, and the K parameter is set at an appropriate period to (approximately
The audio signal V _S is sampled every 100 μsec), and the correlation coefficient between adjacent sample values is K ₁ .
Between multiple isolated sample values, the influence of the intervening sample values is determined by linear prediction using the least squares error, and the correlation coefficient obtained by subtracting them is K ₂ ~ _{K o} . be. This K
The parameters are K ₁ , K ₂ , and K ₃ , which represent partial autocorrelation with points close to X _t . Coefficients that represent partial autocorrelation with points close to X t contain a wealth of information about the spectral distribution, but K ₈ ,
_Since _the partial autocorrelation coefficients with _points far from It is more effective to reduce the number of bits and reduce redundancy by allocating a small number of quantization bits to the K parameter. Therefore, the PARCOR method has a better band compression rate than the autocorrelation coefficient method, which uses autocorrelation coefficients as S-parameters and allocates the same number of bits to each coefficient.
Typically, each A, P, and K parameter is stored or transmitted in a compressed manner, with 5 bits for the A parameter, 6 bits for the P parameter, and 7 bits for each coefficient K ₁ , K _{2 .} . . K ₁₀ of the K parameter. ,6,5,
Assign 4, 4, 4, 3, 3, 3, 3 bits, etc. FIG. 2 is a block circuit diagram of an embodiment of a PARCOR type speech synthesizer used in time signal devices, alarm devices, alarm devices, etc., and shows a data memory M for storing voices and melodies as compressed feature parameters.
It consists of a control IC (A) equipped with a voice synthesis IC (dotted line areas A and B), and a voice synthesis IC (excluding the dotted lines A and B), and data is exchanged between the two ICs in a bit-serial manner. By the way, all voice characteristic parameters are stored as 10-bit data in the playback ROM 1, and the number of data allocated to each characteristic parameter is optimally distributed according to the degree to which that characteristic parameter contributes to sound quality. ing. For example, in the case of the A parameter, 32 pieces of data expressed in 10 bits are stored. Therefore, the number of relative address bits required when accessing arbitrary data of the A parameter is 5 bits. This relative address is called a compressed parameter because it represents the characteristic parameter compressed to the minimum necessary size. For playback
The actual feature parameters stored in ROM1 are called playback parameters. As is clear from the above, the number of bits of the playback parameter is A,
The characteristic parameters P, _K1 to _K10 all have 10 bits in common, but the number of bits of the compression parameter differs for each parameter A, P, _K1 to _K10 , for example 5 and 6, respectively. ,
3, 3, 3, 3, 4, 4, 4, 5, 6, 7 bits (53 bits in total). In addition, 3 bits, or 8 pieces of data, are reserved for playback.
It is secured in ROM1. Such a compression parameter is between 5 and 20, which allows the audio signal to be considered to be in an approximately steady state.
Since one set (=53 bits) is extracted every msec (one frame), it is possible to reproduce the audio signal by processing the data at a rate of at most 2650 bits/second, and even silent sections and repeat sections can be reproduced. Taking this into consideration, it is actually possible to reproduce audio signals at around 1600 bits/second. by the way,
In this example, we will discuss the fundamental period generation method for synthesizing speech that changes uniformly and continuously in pitch, such as spoken words, and for synthesizing speech that continues discretely, such as melody sounds and singing. When playing a melody sound, a melody control code is added to the beginning of the compression A parameter among the compression parameters input from the control IC (A) to the data input terminal 8, and the melody control code is Melody control code detection signal from the detection circuit 34
When V _M is obtained, the audio-melody switching circuit 33
is switched to the melody side (side b), and the sound source is driven at a basic period equal to the basic period of each scale note, thereby synthesizing melody sounds. The basic configuration and operation (normal speech synthesis operation for synthesizing human voice, etc.) of the embodiment will be described below. Now, the compression parameters (i.e. playback ROM
1 relative address) is stored bit-serially in the ring register 3 from the data input terminal 8 via the switching circuit 10 for each frame, but with only such a relative address, the playback data of each parameter is stored in the playback ROM 1. are stored consecutively, so specific data cannot be retrieved. Therefore, the start address of each parameter in the playback ROM stored in the index ROM2 is taken out one after another under the control of the address counter 11, and added to the above relative address by the adder circuit 4. An absolute address (9 bits) is calculated, and the playback ROM 1 is accessed using this absolute address. The index ROM 2 stores the bit allocation number of the compression parameter as a 3-bit binary number, and the data regarding the bit allocation number of the compression parameter is sent to the reproduction control circuit 12, and the reproduction control circuit 12 stores the bit allocation number of the compression parameter. Send the shift clock to ring register 3. Therefore, from the ring register 3, depending on the number of bit allocations mentioned above, for example, in the case of the A parameter, 5 bits, in the case of the P parameter, 6 bits, in the case of the _K10 parameter, ₃ bits,... If the compression parameter (relative address) is 7 bits,
are sent to the adder circuit serially. Furthermore, the starting address in the reproduction ROM 1 of each feature parameter stored in the index ROM 2 is determined by the parallel serial conversion circuit 1.
3, one bit at a time is sent to the adder circuit 4, so that the absolute address is calculated by sequentially adding one bit at a time. The serial absolute address calculated in this way is converted into parallel data via the serial-parallel conversion circuit 14,
It is converted into an address for accessing the playback ROM1. The feature parameters output from this playback ROM 1 are updated every frame, but if the feature parameters change discontinuously at the connection points between each frame when updating data, distortion may occur in the audio signal. Therefore, an interpolation calculation circuit 5 is provided to perform approximate linear interpolation at 8 points within one frame so that the feature parameters can change smoothly when updating data. I have to. Therefore, the timing control circuit 28 generates eight interpolation D clocks (2.5 msec) during one frame (20 msec), as shown in FIG.
P clock for reading parameters (100μsec), and 22 T clocks for reading bits in one P clock
Create a clock (4.5μsec cycle). Note that the P clock is a synchronous pulse corresponding to a sampling pulse. First D ₁ of 8 D clocks
from data input terminal 8 to ring register 3
The data is loaded into. Each compression parameter A,
P, _K10 ..., _K1 are read sequentially at odd-numbered P clocks. For example, the A parameter is
It is read using five T clocks from T ₆ to T ₁₀ in the P ₁ section. The even numbered P clocks or T clocks other than those mentioned above are used as timing for the interpolation calculation circuit 5, the sound source ROM 6, the digital filter 7, etc. This interpolation calculation circuit 5 stops its operation when a melody control code is detected. Each feature parameter updated to a new value every 2.5 msec by the interpolation calculation circuit 5 is temporarily stored in the P latch 16 and the AK latch 23, respectively. However, all parameters that are not needed for the time being are stored in the AK parameter stack 24.
The data is transferred to the digital filter 7 and stored as speech synthesis data. By the way, the P parameter stored in the P latch 16 is data for driving the voiced sound source 19 to generate an impulse signal having a fundamental period corresponding to the P parameter, and the output is obtained from the melody control code detection circuit 34. When the voice-melody switching circuit 33 is switched to the side (a side) that synthesizes voices such as human speech, the address counter 18 of the sound source ROM 6, which is counting the P clock equal to the sampling pulse, The reset signal is the output of a match circuit 17 that detects a match between the address counter 18 output and the P parameter stored in the P latch 16, and the address counter 18 is reset at a cycle that is an integral multiple of the P clock cycle (P parameter). It's getting old.
Therefore, the voiced sound source control data based on the P parameter is output from the sound source ROM 6, and the voiced sound source 19
An impulse signal having a fundamental period (discrete audio frequencies as shown in Table 1) corresponding to the P parameter is generated. Note that when the sound does not have a fundamental period, the sound source control circuit 20 switches the switching circuit 22
The unvoiced sound source 21 generates white noise having no fundamental period.
Next, the A parameter and the K parameter are supplied to the digital filter 7, which reproduces the voice by adding information about amplitude and spectral distribution to the signal supplied from the voiced and unvoiced sound sources. In the figure, 25 is a low frequency amplifier that amplifies the reproduced audio signal, 26 is a speaker, and 27
is a crystal oscillation circuit. Scale signal generation circuit 3 shown in FIGS. 4 to 6 below.
1. The configuration of the reset pulse generation circuit 32 and the voice synthesis operation for synthesizing melody sounds will be explained. The scale signal generation circuit 31 includes a shift register 31a that takes in data corresponding to the P parameter, that is, a compressed P parameter output from the control IC (A), using a request signal V _RE , and a shift register 31a that receives data corresponding to the P parameter, that is, a compressed P parameter output from the control IC (A), and a shift register 31a that receives the compressed P parameter as address data. A scale ROM 31b configured to read out scale data corresponding to the scale ROM 31b, and a preset counter 3 which receives the scale data read out from the scale ROM 31b as a preset input and counts clock pulses having a higher frequency than the P clock, such as the T clock.
1c and an inverter 31d that inverts the 0 detection signal of the preset counter 31c, and outputs a 0 detection signal having a cycle that is an integral multiple (scale data) of the clock pulse cycle as a scale signal P _M. In this case, the frequency of the scale signal P _M output from the scale signal generation circuit 31 takes discrete values, but the discrete interval becomes smaller in accordance with the frequency of the clock pulse. Therefore, by storing appropriate scale data in the scale ROM 31b, the scale signal generation circuit 31 can generate a scale signal P _M that matches the frequency of each scale signal. For example, the clock pulse is T clock (period: 4.5μsec).
If the scale data "284" is read from the scale ROM 31b using the compressed P parameter corresponding to the P parameter "12", a 0 detection signal is obtained from the preset counter 31c at a cycle of 4.5 x 284 μsec, and this The scale signal P _M has a fundamental period corresponding to "12.8" of the P parameter, and a discrete fundamental period corresponding to the P parameter can be interpolated. The reset pulse generating circuit 32 is formed by inverters 35a, 35b, a capacitor 36, a NAND gate 37, a D flip-flop 38, and a NAND gate 39, and as shown in the time chart of FIG.
The P clock immediately after obtaining the scale signal P _M output from the address counter 18 is output as the reset pulse V _R of the address counter 18. In the figure, A shows the output of the coincidence circuit 17 when the P parameter is "12", B shows the scale signal P _M , and C shows the reset pulse _VR . If the melody control code is now output from the control IC (A) and the melody control code detection signal V _M is obtained from the melody control code detection circuit 34, the audio-melody switching circuit 33 switches to the melody side (side b). The address counter 18 is reset by the reset pulse V _R output from the reset pulse generation circuit, and the address counter 18 is reset by counting 13 P clocks and reset by counting 12 P clocks. This will occur at a ratio of 4:1. Therefore, equivalently, the P parameter is “12.8”
The sound source ROM 6 is addressed at a fundamental period corresponding to , and the voiced sound source 19 is controlled, so that the scale note "G" is accurately reproduced. Similarly, each scale note is accurately reproduced, and the melody is reproduced at the correct pitch. The time chart shown in Figure 7b is the scale signal P _M
This explains the relationship between the reset pulse VR and the reset pulse V _R more clearly.
The reset pulse V _R corresponding to the scale signal P _M is shown. As is clear from the figure, P _pulses 3, 6, 8, 11,
The 14th, 16th... pulse is output. Since the sound source ROM 6 is addressed by the address counter 18 which is reset by this reset pulse _VR ,
Voiced sound source data is read from the sound source ROM 6 at a cycle that can equivalently be considered as 3.75KHz (800/3μsec), and the voiced sound source 19 is driven at the correct scale frequency, so that sounds such as melody sounds and singing are produced at accurate pitches. It will be played. In the embodiment, the compression parameters are used as the address data of the scale ROM 31b, but the P parameters stored in the P latch 16 may be used as the address of the scale ROM 31b. As described above, the present invention samples an audio signal with a sampling pulse having a frequency higher than the audio frequency, extracts characteristic parameters consisting of amplitude parameters, pitch parameters, and spectral parameters, and stores them in a data memory. In a speech synthesis device that synthesizes speech by controlling a sound source based on the read characteristic parameters, there are two cases in which a speech whose pitch changes uniformly and continuously, such as spoken words, is to be synthesized. The sound source drive cycle set based on the pitch parameter is changed when synthesizing discretely continuous sounds such as melody sounds or singing, and when playing melody sounds, the pitch signal generation The circuit generates a scale signal (approximately matching the fundamental period of the scale note) corresponding to the pitch parameter,
Since the basic cycle for driving the sound source is set based on this scale signal, it is possible to reduce the pitch deviation of the reproduced melody tones to a level that does not pose a problem for use. This has the advantage that there is no need to change the circuit configuration and bit configuration of the voice synthesis circuit such as the sound source ROM and sound source.

[Brief explanation of drawings]

第１図は本発明一実施例の音声合成方式の原理
説明図、第２図は同上のブロツク回路図、第３図
は同上の動作説明図、第４図〜第６図は同上の要
部回路図、第７図ａ，ｂは同上の動作説明図であ
る。Ｍはデータメモリ、６は音源ROM、１７は一
致回路、１８はアドレスカウンタ、１９は音源、
３１は音階信号発生回路、３１ｂは音源ROM、
３１ｃはプリセツトカウンタ、３２はリセツトパ
ルス発生回路、３３は音声−メロデイ切換回路、
３４はメロデイ制御コード検出回路である。 Fig. 1 is an explanatory diagram of the principle of the speech synthesis method according to an embodiment of the present invention, Fig. 2 is a block circuit diagram of the same as the above, Fig. 3 is an explanatory diagram of the same as the above, and Figs. 4 to 6 are the main parts of the same as the above. The circuit diagram and FIGS. 7a and 7b are explanatory diagrams of the same operation. M is a data memory, 6 is a sound source ROM, 17 is a matching circuit, 18 is an address counter, 19 is a sound source,
31 is a scale signal generation circuit, 31b is a sound source ROM,
31c is a preset counter, 32 is a reset pulse generation circuit, 33 is an audio-melody switching circuit,
34 is a melody control code detection circuit.

Claims

[Claims] 1. An audio signal is sampled with a sampling pulse having a frequency higher than the audio frequency, characteristic parameters consisting of an amplitude parameter, a pitch parameter, and a spectrum parameter are extracted and stored in a data memory. In a speech synthesis device that controls a sound source and synthesizes speech based on the read characteristic parameters, the value of an address counter that reads sound source data from the sound source ROM by counting synchronization pulses with a period equal to the sampling pulse is A matching circuit that outputs a matching signal when matching a pitch parameter, and a scale that reads out stored scale data using address data corresponding to the pitch parameter.
A scale generation circuit consists of a preset counter that takes ROM and scale data as preset input and subtracts clock pulses with a higher frequency than the synchronization pulse, and a synchronization pulse that is generated immediately after the scale signal is obtained from the zero detection signal of the preset counter. It is equipped with a reset pulse generation circuit for output and a voice-melody switching circuit for switching the reset signal of the address counter to the output of the reset pulse generation circuit or the output of the matching circuit, so that the pitch of the sound changes uniformly and continuously like spoken words. When synthesizing a continuous voice, reset the address counter using the output of the matching circuit. When synthesizing discrete sounds such as melody sounds or singing, reset the address counter using the output of the reset pulse generator circuit. A speech synthesis device characterized in that sound source data is repeatedly used. 2. A melody control code detection circuit is provided to detect the melody control code added to the amplitude parameter, and the output of the melody control code detection circuit is used to detect audio.
2. The speech synthesis device according to claim 1, wherein the melody switching circuit is controlled.