JPS6240717B2

JPS6240717B2 -

Info

Publication number: JPS6240717B2
Application number: JP56215889A
Authority: JP
Inventors: Tomomi Sano; Kikuo Kawasaki
Original assignee: Fuji Electric Co Ltd
Current assignee: Fuji Electric Co Ltd
Priority date: 1981-12-29
Filing date: 1981-12-29
Publication date: 1987-08-29
Also published as: JPS58117597A

Description

[Detailed description of the invention]

本発明は自動販売機や音声案内装置等に好適な
音声発生制御装置に関し、特に適用した機器の入
出力状況に応じて種々の準備してある言葉の中か
ら状況に合つた言葉を選択して即座に音声として
出力する音声発生制御装置に関するものである。最近、自動販売機の販売促進の一環として、音
声発生装置を搭載した「しやべる自動販売機」が
要求されている。ここで、音声を機械で発生させ
ようとする場合、最も簡単な方法としては、発生
させたい音声信号を磁気テープやデイスク、ドラ
ム等に録音しておき、必要な時に再生してスピー
カから音声を発生させる方式がある。しかし、こ
の方式は、磁気ヘツドの摩耗、テープの延び、摩
耗による音質劣化、瞬間的選曲（音声の頭出し）
ができないなどの問題があり、自動販売機のよう
に外部状況の変化に応じて即座に音声を出力する
要のあるものには適さない。そこで、マイクロコンピユータと半導体メモリ
とを組み合わせ、音声信号をデジタル化して記憶
させ、再生時にもとの音声信号に戻す音声合成方
式が提案されている。この方式の代表的なものと
しては波形再生方式と分析合成方式がある。波形
再生方式は、デジタル化した音声信号に、各種変
調をかけてメモリに記憶させ、再生時に復調する
ものであり、代表的なものとしては、量子化幅を
適応的に変化させる適応差分パルス符号変調方式
（ADPCM）と、一定の振幅ステツプ量Δ（デル
タ）を定めておき、前回の音声信号と今回の音声
信号との残差信号に対して符号化するデルタ変調
方式（DM）または適応デルタ変調方式（ADM）
とがある。分析合成方式は波形再生方式より更に
少ないメモリ容量で発声させることを目的として
開発された方式であり、音声信号の波形に含まれ
る特徴的なパラメータ、例えば発声時の口の動
き、有声、無声音の区別等のデータだけを抽出し
て記憶しておき、そのデータをもとに音声を合成
するものである。その代表的なものに、偏自己相
関係数方式（PARCOR）がある。その他、人間
の音声におけるイントネーシヨン、アクセントな
どのアルゴリズムを解明し、文字系列入力に対し
音声を合成しようとする法則合成方式が提案され
ているが、自然な音声を合成する点で、困難な問
題が多く実用化されていない。しかしながら、これらの音声発生方式を適用し
た従来の自動販売機等の音声合成装置において
は、音声発生出力の起動をかけると一連の準備さ
れた言葉の発生が終了するまでは、次の音声要求
が発生してもその発声要求に即座に切替えて音声
として出力するということができない。このよう
に、適切なメツセージを発声要求時点で直ちに出
力できないばかりでなく、発声中に複数の発声要
求が発生した場合も適切な発声処置ができないと
いう問題がある。この問題は、種々の外部状況が
時々刻々変化するのに即応し、現在の状況に応じ
た要求に答えて素早く音声として出力することが
望まれる自動販売機等の音声発生制御装置にとつ
ては重大な欠点となる。上述の従来の問題点を下記の自動動販売機での
例により更に具体的に詳述する。 The present invention relates to a voice generation control device suitable for vending machines, voice guidance devices, etc., and in particular, it selects words suitable for the situation from among various prepared words depending on the input/output situation of the equipment to which it is applied. The present invention relates to a sound generation control device that immediately outputs sound. Recently, as part of the sales promotion of vending machines, there has been a demand for ``Shiyaberu vending machines'' equipped with a voice generator. If you want to generate audio mechanically, the easiest way is to record the audio signal you want to generate on a magnetic tape, disk, drum, etc., and then play it back when needed to output the audio from the speaker. There is a method to generate it. However, this method suffers from wear of the magnetic head, tape stretching, deterioration of sound quality due to wear, and instantaneous song selection (starting the audio).
This method is not suitable for devices such as vending machines that need to output audio instantly in response to changes in external conditions. Therefore, a voice synthesis method has been proposed that combines a microcomputer and a semiconductor memory, digitizes and stores an audio signal, and restores the original audio signal during playback. Typical examples of this method include a waveform reproduction method and an analysis/synthesis method. The waveform reproduction method applies various modulations to the digitized audio signal, stores it in memory, and demodulates it during playback.A typical example is an adaptive differential pulse code that adaptively changes the quantization width. A modulation method (ADPCM) and a constant amplitude step amount Δ (delta) are determined, and a delta modulation method (DM) or adaptive delta is used to encode the residual signal between the previous audio signal and the current audio signal. Modulation method (ADM)
There is. The analysis and synthesis method was developed with the aim of generating vocalizations using even less memory capacity than the waveform reproduction method, and it uses characteristic parameters included in the waveform of the audio signal, such as mouth movements during vocalization, voiced and unvoiced sounds. Only data such as distinctions is extracted and stored, and speech is synthesized based on that data. A typical example is the partial autocorrelation coefficient method (PARCOR). In addition, a lawful synthesis method has been proposed that attempts to elucidate intonation, accent, and other algorithms in human speech and synthesize speech based on character sequence input, but it is difficult to synthesize natural speech. It has many problems and has not been put into practical use. However, in conventional speech synthesis devices such as vending machines that apply these speech generation methods, when the speech generation output is activated, the next speech request is not made until the generation of a series of prepared words is completed. Even if a voice request occurs, it is not possible to immediately switch to the voice request and output it as voice. As described above, there is a problem in that not only is it not possible to output an appropriate message immediately at the time of a voice request, but also it is not possible to take appropriate voice action when a plurality of voice requests occur during voice generation. This problem is particularly important for voice generation control devices such as vending machines, which are expected to be able to quickly respond to changes in various external conditions from time to time and quickly output voice in response to requests according to the current situation. This is a serious drawback. The above-mentioned conventional problems will be explained in more detail with reference to an example of a vending machine below.

【表】【table】

【表】第１表に示した程度の言葉を発生する自動販売
機での種々の動作モードを考えてみる。 (イ) 商品選択用スイツチを押した状態でお金を投
入する。そのとき、例えば50円の商品選択ボタ
ンを押したまま100円硬貨を投入する。自動販
売機からの音声は前述の手順の「いらつしや
いませ」から手順，と話す処置を実行す
る。ところが手順の「いらつしやいませ」を
話している間に販売時間の短い商品（例えばビ
ン、缶）などの販売では、すでに商品が販売さ
れており、客は商品を取り出して自販機から立
ち去つてしまう。そのため、自動販売機は客が
立ち去つてからも手順の「お好みのものをど
うぞ」と手順の「毎度ありがとうございま
す」とを引き続き発声することになる。 (ロ) 非常に早い速度でお金を投入し、ボタンを押
し、商品取り出しを行つても(イ)と同様なことが
発生し得る。 (ハ) 売切れ商品や投入した金額では金額不足で購
入できない商品を数回押した後、販売可能な商
品を選択した場合、手順の「お金が足りませ
ん」と手順の「売切れです」の音声を出力す
るが、ボタン操作を早く行うと、本来手順を
話すべき場合なのに手順を出力しているなど
的確なメツセージを即時に出力できない。このように新しい音声を出力すべき操作を客が
音声発生時間よりも早く行うと、実際の操作に対
して的確なメツセージが出力されない状況にな
る。本発明の目的は、上述した欠点を除去し、音声
出力中に新しい発声要求があつた場合に、現在出
力している音声を途中で中断または話す速度を早
めて必要な言葉をすみやかに音声出力するように
した高性能な音声発生制御装置を提供することに
ある。すなわち、本発明は、記憶されている複数の言
葉の中から要求された言葉を選択して音声出力さ
せる音声発生制御装置において、前記言葉を文節
毎または所定の数の単語毎に独立させて記憶し、
該文節毎または所定の数の単語毎に中断可否フラ
グとスピードアツプ可否フラグとを備えた音声発
声テーブルと、前記言葉を音声出力中に新たな前
記言葉の音声出力要求を受けたときに、前記両フ
ラグを参照して音声出力中の言葉の音声出力を途
中で中断するか、または途中で速度を早めるよう
にして前記新たな言葉の音声出力を行うことがで
きる制御手段とを有することを特徴とするもので
ある。以下、図面により本発明を詳細に説明する。第１図は本発明を適用した自動販売機システム
の構成の一例を示し、ここで１はコインメカニズ
ムユニツト、２は販売制御回路、３は選択ボタ
ン、４は商品搬出機構、５は音声発生制御装置、
６はPARCOR方式の音声合成器、７は音声デー
タROM（リードオンリメモリ）、８は増幅器、９
は上述の部品６〜８を含む音声合成部、１０はス
ピーカである。音声合成部９は３チツプのLSI（大規模集積回
路）構成となつており、所定の複数種類の言葉
（音声メツセージ）を女性または男性の声で発生
できる。また、通常の発生機能のほかに、音声発
生制御装置５による後述の発声速度切換機能や発
声途中打切り機能を有しており、自動販売機の動
作状況に合わせて、きめ細かい音声情報を提供で
きる。まず、購入者が自動販売機の前に立ち、硬貨ま
たは紙弊を投入すると、通常コインメツクと呼ば
れるコインメカニズムユニツト１から貨弊投入信
号が販売制御回路２を経て音声合成部９の音声発
生制御装置５に送出される。音声発生制御装置５
は受信した信号を分析して音声データを音声デー
タROM７から取り出し、このデータを音声合成
器６に供給する。音声合成器６は分析の場合と逆
の過程によつて音声信号を再生し、この信号を増
幅器８で増幅してスピーカ１０から、例えば「い
らつしやいませ富士電機でございます」という内
容の音声を発生させる。次に、コインメカニズムユニツト１により、販
売設定価格と投入金額とを比較演算し、販売可能
であるときは販売制御回路２を介して、音声発生
制御装置５へ販売可能信号を送出する。この信号
を制御装置５により検出し、音声合成器６等を経
てスピーカ１０から、例えば「お好みのボタンを
押して下さい」と音声を発生させる。次に、購入
者が商品選択ボタン３を押し、商品が商品搬出機
構４により搬出されると、販売開始信号が販売制
御回路２からコインメカニズムユニツト１と音声
発生制御装置５とに供給され、上述と同様に音声
データROM７から取り出した音声データに基づ
きスピーカ１０から、例えば「毎度ありがとうご
ざいます」と発声させる。スピーカ１０から発声
させるその他の音声メツセージの内容は既述した
第１表の場合とほぼ同様である。また、おまけ装
置としてのベンドルーレツト（不図示）を設けた
場合には、ベンドルーレツトからの当り信号を検
出して上述と同様の手段により「当りです」とい
う内容の音声を発生させることができる。第２図は第１図の音声合成器６の構成の一例を
示し、ここでパラメータ変換用ROMは音声デー
タROM７から読み出された音声データに基づ
き、フレーム周期と呼ばれる10〜25msの単位ご
とに音声合成に必要な特徴パラメータを抽出し、
このパラメータをパラメータ補間回路に供給す
る。パラメータは、声道の共鳴特性である音声周
波数スペクトルの情報を表わす時間領域の係数ｋ
_i（１≦ｉ≦ｐ）と、音声の大きさ（振幅）、音声
音における音声の周波数（ピツチ周期）有音声／
無声音の区別とを示す略150のビツトからなる。
パラメータ補間回路は10〜25ms間隔のフレーム
周期間に2.5msごとの補間をとるためのもので、
パラメータ変換用ROMから送出されたパラメー
タに基づき、有声音の場合はパラメータのインパ
ルス信号をデジタル・フイルタに供給し、無声音
の場合は白色雑音を音源δ（ｎ）とする音源回路
を介して白色雑音信号をデジタル・フイルタに供
給する。デジタル・フイルタは人間の発声機構を
模擬した回路で、パイプライン乗算器とｋ_iパラ
メータのスタツク、加算・減算器、シフトレジス
タとから成る。デジタル・フイルタからの出力信
号をデジタル―アナログ（Ｄ―Ａ）変換回路を通
すことにより音声を合成し、増幅器８を介してス
ピーカ１０により肉声にきわめて近い音声として
発生させる。上述のPARCOR方式による音声合成器６は発
声時間／メモリ容量、音質、経済性の点で比較的
優れているが、本発明が適用される音声合成器と
しては、この線形予測符号化（LPC）による
PARCOR方式のものに限定されるものではな
く、他の方式例えば線スペクトル対（LSP）方式
などの音声合成器でもよいことは勿論である。第３図は第１図の音声データROM７から読み
出される音声データの読み出し制御単位を示し、
ここでＡ，Ｂ，Ｃはそれぞれ独立した言葉（音声
データ）であり、複数個ある言葉の中から例示と
して３個だけ選択したものである。図示のよう
に、Ａの例えば「いらつしやいませ」、Ｂの例え
ば「富士電機」、Ｃの例えば「でございます」と
いうように、文章を読む際の自然の発音によつて
区切られる最小の単位である各文節毎に音声発生
制御装置５により読み出し制御を行い（第４図Ａ
〜Ｃ参照）、Ａ，Ｂ，Ｃの順につないで音声を合
成し、「いらつしやいませ富士電機でございま
す」という一連のメツセージをスピーカ１０から
発声させる。すなわち、出力メツセージを後述の
ようにＡ，Ｂ，Ｃ……の文節単位で音声データ
ROM７の音声発生テーブルに記憶しておき、制
御装置５から供給される選択情報に基づき、Ａ，
Ｂ，Ｃ……を独立的に読み出しデータ量を
2.4K，4.8K，9.6Kビツト／秒などと変えて読み
出し、独立的に読み出し中止を行う。このため、
各文節毎に独立させた言葉Ａ，Ｂ，Ｃ……は、各
Ａ，Ｂ，Ｃ……毎に読み出しと、読み出し中止が
でき、かつ独立的に発声速度を可変にすることが
できる。第４図Ａ〜Ｃは本発明による音声発生制御手順
の一例を、第３図の文節Ａ，Ｂ，Ｃを用いて示し
たものである。ここで、Ａ，Ｂ，Ｃの横幅は発声
に要する時間を示す。自動販売機に対する購入者
の押ボタン操作があらかじめ予定した時間内の間
隔で行われる通常状態時では、一連のメツセージ
を発声するのに専念しても支障がないから、第４
図Ａで示すように、Ａ，Ｂ，Ｃ等の言葉をあらか
じめ設定した普通の話し方の早さで音声発声を行
うように制御装置５により制御する。次に、所定のメツセージＡ，Ｂ，Ｃを発声中に
新しい言葉Ｙ，Ｚの発声を行う要求があつた場合
には、メツセージの内容と自動販売機の入出力状
況に応じて、第４図Ｂに示すように、発声中のメ
ツセージの途中から発声速度を早めるか、または
第４図Ｃに示すように発声中のメツセージの途中
で後の言葉を中断、省略を行つて、新しく要求さ
れたメツセージＹ，Ｚをすみやかに発声するよう
に制御装置５により制御する。例えば、第４図Ｂに示すように、Ａの「いらつ
しやいませ」を発声している途中でＹの「お金
が」とＺの「足りません」の新しい発声要求があ
つたときには、Ｂの「富士電機」とＣの「でござ
います」の発声速度を通常速度より20〜30％程
度、順次早めて発声する。このような発声速度の
可変制御は、一連のメツセージＡ，Ｂ，Ｃの後半
部分の言葉を省略すると意味不明となるので省略
できないが、発声速度をはやめても意味が不明に
ならない場合に適する。なお、Ａ，Ｂ，Ｃのよう
に、本実施例では文節単位で制御しているが、単
語単位で音声合成の制御を行う場合には、新しい
発声要求のあつた時点（例えばＡを話している途
中）から発声速度を早めることが可能である。一
方、第４図Ｃの場合は、例えばＡの「いらつしや
いませ」の発声中にＹ，Ｚの新しいメツセージの
発声要求があつたとき、Ａの「いらつしやいま
せ」以後のＢ，Ｃの言葉が省略可能な言葉である
ときに適する。このときは、Ａの発声が終了した
時点で、Ｂ，Ｃの発声を止め、新しく発声のあつ
たＹ，Ｚのメツセージを発声する。一般に、自動販売機のように、人を相手とした
機械では機械の操作を人が通常よりも素早く行う
と、機械は次から次へと発声すべき言葉を出力し
ようとする。その際、一番新しい要求に応じた発
声すべき言葉を直ちに出力すべきであるが、従来
の音声発生制御方式では前回に発声の要求のあつ
た言葉の発声が全て終了するまで、新しい言葉を
発生することができない不都合があつた。本発明
では上述のように、発声速度を早めたり、発声を
途中で止めて新しい言葉をすみやかに発声するこ
とができる。このため、発声チヤンスを拡大する
ことができるとともに、発声すべきチヤンスを逃
がさずに適切なメツセージを即座に出力すること
ができる効果が得られる。第５図は第４図Ａ〜Ｃの制御手順を更に流れ図
で示したものである。音声発生制御装置５が発声
要求待ちの場合には、コントロールは手順２１→
２２→２１→２２……とループしている。ここ
で、音声発声要求があると、手順２３から手順２
４と２５の音声発声準備と音声発声処理に進む。
この手順２４と２５で選択された言葉を第６図の
テーブル４１でサーチし、この発声データをもと
に発声の準備と発声スタート処置等を行う。手順
２５の処理中は手順２３は発声中となる。手順２
５の音声発声処理が終了すると、手順２６で発声
要求待ちか否かの判断がなされ、発声要求待ちの
場合は、コントロールは手順２１に戻り、新しい
発声要求があるまで手順２１→２２→２１→２２
……の待機ループを回つている。他方、手順２５の音声発声処理によるメツセー
ジの発声が終了しないうちに新しい発声要求があ
ると、手順２３で発声中と半断されるので、手順
２７に進み、現在音声出力している言葉（文節）
が中断可能か否かを判断する。中断可能であれ
ば、手順２８の発声中断処理に進み、現在音声出
力している言葉またはその後続の言葉を中断し、
手順２１に戻つて手順２４と２５により新しく要
求された言葉を発声する。もし、手順２７におい
て、現在音声出力している言葉が中断できない言
葉であると判断した場合には、現在音声出力して
いる言葉の音声発声速度（話す速度）を早くして
も良いか否かを手順２９で判断し、話す速度のス
ピードアツプが可能な場合には、手順３０のスピ
ードアツプ処理で現在音声出力している言葉また
はその後続の言葉の音声発生速度を早め、その音
声出力が終つたら手順２１に戻つて手順２４と２
５により新しく要求された言葉を発生する。ただ
し、手順２９において話す速度のスピードアツプ
が不可能な場合と判断された場合にはコントロー
ルは手順２１に戻り、現在音声出力している一連
のメツセージの音声出力が終了後、ただちに手順
２４と手順２５に進み、新しく要求された言葉を
発声する。また、新しく要求された言葉を発声中
でも、更に新しい要求があれば、手順２７と手順
２９の判断により手順２８の発声中断処理、また
は手順３０のスピードアツプ処理をして発声すべ
きチヤンスを逃がさずに機械（自動販売機）の入
出力状況に応じた適切なメツセージを要求時点に
近いタイミングで適切に音声出力することができ
る。なお、自動販売機の販売動作が終了し、メツ
セージの発声要求を待つ必要がなくなつた場合に
は、コントロールは手順２６から図示しないメイ
ンプログラムに戻される。第６図は、第１図の音声データROM７の音声
発声テーブル４１の一例を示し、ここで、４２は
０，１，２，３……Ａ，Ｂ，Ｃ……の16進で示す
テーブルナンバであり、４３はあらかじめ選択し
たメツセージを分析して文節Ａ，Ｂ，Ｃ，Ｄ……
毎に独立させ、つなぎ合わせたメツセージデータ
であり、４４は発声終了のENDコードである。
テーブルナンバ４２はあらかじめ選択した個々の
メツセージデータ４３に対応して付けられてお
り、メツセージデータ４３のＡ，Ｂ，Ｃ（斜線図
示）は複数のメツセージに共通に使用される文節
の言葉を示している。音声発生制御装置７は自動販売機の外部状況
（入出力状況）に応じて供給される選択情報４５
に基づき、要求された必要な言葉Ａ，Ｂ，Ｃ……
をテーブルナンバ４２により選択抽出し、これら
の言葉Ａ，Ｂ，Ｃ……をつなぎ合わせて一つのメ
ツセージの音声発生を行わせしめる。第７図は第６図の音声発声テーブル４１内のメ
ツセージデータ４３の一例として、テーブルナン
バ４２が０のメツセージデータ４３を詳細に示
す。ここで、４３ａ〜４３ｄはそれぞれ音声発声
用の発声データであり、各発声データ４３ａ〜４
３ｄはそれぞれ１つの文節ＡまたはＢ，Ｃ，Ｄと
１ビツト構成のフラグF₁とF₂とを有する。フラ
グF₁は所属する発声データ内の文節の言葉Ａま
たはＢ，Ｃ，Ｄを中断しても良いか否かの判断に
使用する中断可否フラグであり、フラグF₂は所
属する発声データ内の文節の言葉ＡまたはＢ，
Ｃ，Ｄの音声出力速度を上げて良いか否かの判断
に使用するスピードアツプ可否フラグである。両
フラグF₁とF₂はあらかじめテーブル４１にセツ
トされる。X₁〜X₃はそれぞれ各発声データ４３
ａ〜４３ｄ間に設けられて、各文節の言葉Ａ〜Ｄ
をつなぐ時間を制御する語間調整タイマであり、
発声する言葉が自然な感じになるように時間のセ
ツトをする。ENDコード４４はテーブルデータ
の最後尾にセツトされる特殊コードであり、制御
装置７はこのENDコード４４を入力したら発声
終了と判断する。フラグF₁は第５図の手順２７で使用され、フ
ラグF₂は同図手順２９で使用される。例えば、
第１発声データ４３ａの言葉Ａを発声している最
中に、新しい発声要求があれば、手順２７におい
て制御装置７により言葉Ａに付属しているフラグ
F₁を参照して、それが中断可能の、例えば
“１”であれば手順２８で言葉Ａの発声をすぐに
中断し、次の新しく要求のあつた言葉を手順２４
でテーブルナンバ４２から抽出して手順２５の音
声発声処理に進むこととなる。一方、フラグF₁
に中断下可の、例えば“０”が立つていれば、制
御装置７により次に手順２９においてフラグF₂
を参照して、それに音声出力のスピードアツプが
可能な、例えば“１”のフラグが立つていれば、
手順３０で言葉Ａ以下の後続の言葉の音声発声速
度をスピードアツプする。この際、各文節Ａ〜Ｄ
をつなぐ時間を、語間調整タイマX₁〜X₃のセツ
ト時間に基づいて、発声速度の上昇に応じて相対
的に短縮するように調整する。なお、フラグF₂
の判定データを複数にすれば、音声出力速度を複
数段階変化させることができる。例えば、16進数
で表わせば最大16通りの変化が得られる。本実施例では、文節の各言葉毎にフラグF₁お
よびF₂を参照しているので、言葉Ａが中断不可
であつても、言葉Ｂ，Ｃ，Ｄのいずれかが中断可
能であれば、中断フラグF₁の中断可能フラグデ
ータを発見したときの言葉から発声を止め、新し
い言葉の発声に進むことができる。また、同様の
理由により、言葉Ａのあとに続く言葉Ｂ，Ｃ，Ｄ
の全てをスピードアツプして発声することも、一
部のみスピードアツプして発声することも可能で
ある。以上説明したように、本発明によれば、所定の
メツセージデータを各文節毎、または所定数の単
語毎に独立させて組合わせるとともに、各独立さ
せたデータ毎に中断可否およびスピードアツプ可
否のフラグデータを備えた発声用テーブルと、要
求されたメツセージを発声中に新たな発声要求を
受けたときには、これらのフラグデータを参照し
て出力中の一連のメツセージの音声出力を途中で
中断または話す速度を早めるように制御を行う制
御手段とを設け、これにより音声を出力したい時
点で、あるいは少ない待ち時間で有効なメツセー
ジを出力するようにしたため、音声出力を受け取
る人間に対して適切な指示を与えることが可能と
なる効果が得られる。また、本発明は、１つの音声出力装置から直列
的に多数の音声情報を出力し、外部または内部情
況の変化に対していち早く音声で通知する必要の
ある自動販売機や音声案内装置等に好適である。[Table] Let us consider various operating modes of a vending machine that generates the words shown in Table 1. (b) Insert money while pressing the product selection switch. At that time, for example, insert a 100 yen coin while holding down the 50 yen product selection button. The voice from the vending machine executes the steps described above, starting with ``Irritation'' and then proceeding. However, in the case of selling products with a short selling time (e.g. bottles, cans), the customer has already sold the product and has to take out the product and stand up from the vending machine while the customer is talking about "Irritation" in the procedure. I'll leave. Therefore, even after the customer has left, the vending machine will continue to say the following instructions: ``Please take what you like'' and ``Thank you for your continued support.'' (b) The same thing as (b) can occur even if you insert money very quickly, press a button, and take out an item. (c) If you select a product that can be sold after pressing the button several times on a sold-out product or a product that cannot be purchased due to insufficient amount of money, you will hear the voice prompts ``I don't have enough money'' in the step and ``Sold out'' in the step. However, if you press the button too quickly, you will not be able to output an accurate message immediately, such as outputting the procedure when it should have been. If the customer performs an operation that requires the output of a new voice earlier than the time when the voice is generated, a situation will arise in which an accurate message will not be output in response to the actual operation. An object of the present invention is to eliminate the above-mentioned drawbacks, and when a new voice request is made during voice output, the currently output voice is interrupted in the middle or the speaking speed is increased to quickly output the necessary words. An object of the present invention is to provide a high-performance voice generation control device that performs the following functions. That is, the present invention provides a voice generation control device that selects a requested word from a plurality of stored words and outputs it as a voice, in which the word is stored independently for each clause or for each predetermined number of words. death,
A voice production table including an interrupt flag and a speedup flag for each clause or a predetermined number of words; It is characterized by having a control means capable of outputting the new word as audio by referring to both flags and interrupting the audio output of the word currently being output, or speeding up the audio output midway. That is. Hereinafter, the present invention will be explained in detail with reference to the drawings. FIG. 1 shows an example of the configuration of a vending machine system to which the present invention is applied, where 1 is a coin mechanism unit, 2 is a vending control circuit, 3 is a selection button, 4 is a product delivery mechanism, and 5 is a voice generation control. Device,
6 is a PARCOR voice synthesizer, 7 is an audio data ROM (read-only memory), 8 is an amplifier, 9
1 is a speech synthesis section including the above-mentioned parts 6 to 8, and 10 is a speaker. The speech synthesis section 9 has a 3-chip LSI (Large Scale Integrated Circuit) configuration, and can generate a plurality of predetermined types of words (voice messages) in a female or male voice. In addition to the normal generation function, the voice generation control device 5 also has a voice rate switching function and a voice termination function, which will be described later, so that detailed voice information can be provided in accordance with the operating status of the vending machine. First, when a purchaser stands in front of a vending machine and inserts coins or paper, a coin insertion signal is sent from a coin mechanism unit 1, which is usually called a coin pick, to a vending control circuit 2 to a voice generation control device of a voice synthesis section 9. Sent on 5th. Sound generation control device 5
analyzes the received signal, extracts audio data from the audio data ROM 7, and supplies this data to the audio synthesizer 6. The speech synthesizer 6 reproduces the speech signal by the reverse process of the analysis, and this signal is amplified by the amplifier 8 and output from the speaker 10 with the content, for example, "I'm sorry, this is Fuji Electric." generates a sound. Next, the coin mechanism unit 1 compares and calculates the sales set price and the input amount, and when it is possible to sell, sends a sales possible signal to the voice generation control device 5 via the sales control circuit 2. This signal is detected by the control device 5, and a voice is generated from the speaker 10 via the voice synthesizer 6 or the like, saying, for example, "Please press the button of your choice." Next, when the purchaser presses the product selection button 3 and the product is taken out by the product delivery mechanism 4, a sales start signal is supplied from the sales control circuit 2 to the coin mechanism unit 1 and the sound generation control device 5, and the above-mentioned Similarly, based on the audio data taken out from the audio data ROM 7, the speaker 10 is made to say, for example, "Thank you for your continued support." The contents of other voice messages uttered from the speaker 10 are almost the same as those in Table 1 described above. Furthermore, if a bend roulette (not shown) is provided as a bonus device, a winning signal from the bend roulette can be detected and a sound saying ``It's a win'' can be generated by the same means as described above. FIG. 2 shows an example of the configuration of the speech synthesizer 6 shown in FIG. Extract the feature parameters necessary for speech synthesis,
This parameter is supplied to the parameter interpolation circuit. The parameter is a time-domain coefficient k that represents information on the audio frequency spectrum, which is the resonance characteristic of the vocal tract.
_i (1≦i≦p), the loudness (amplitude) of the voice, the frequency of the voice in the voice (pitch period), the presence of voice/
It consists of approximately 150 bits that indicate the distinction between unvoiced sounds and unvoiced sounds.
The parameter interpolation circuit is for interpolating every 2.5ms between frame periods with intervals of 10 to 25ms.
Based on the parameters sent from the parameter conversion ROM, in the case of voiced sound, the impulse signal of the parameter is supplied to the digital filter, and in the case of unvoiced sound, white noise is supplied to the digital filter via the sound source circuit which uses white noise as the sound source δ(n). Feed the signal to a digital filter. The digital filter is a circuit that simulates the human vocal mechanism, and consists of a pipeline multiplier, a stack of k _i parameters, an adder/subtractor, and a shift register. The output signal from the digital filter is passed through a digital-to-analog (DA) conversion circuit to synthesize sound, and the synthesized sound is generated by a speaker 10 via an amplifier 8 as a sound very close to the real voice. Although the speech synthesizer 6 using the PARCOR method described above is relatively superior in terms of speaking time/memory capacity, sound quality, and economical efficiency, the speech synthesizer to which the present invention is applied uses linear predictive coding (LPC). by
Of course, the present invention is not limited to the PARCOR system, and may be a speech synthesizer using other systems, such as a line spectrum pair (LSP) system. FIG. 3 shows a read control unit of audio data read from the audio data ROM 7 of FIG. 1,
Here, A, B, and C are independent words (audio data), and only three are selected from a plurality of words as an example. As shown in the diagram, sentences are separated by natural pronunciation when reading, such as A's ``iratsushi ya imase,''B's ``Fuji Electric,'' and C's ``desimasu''. The speech generation control device 5 performs readout control for each clause, which is the smallest unit (see Fig. 4A).
-C), A, B, and C are connected in this order to synthesize the voices, and a series of messages such as "I'm sorry, I'm Fuji Electric" are uttered from the speaker 10. In other words, the output message is converted into audio data in units of phrases A, B, C, etc. as described later.
Based on the selection information stored in the sound generation table of ROM 7 and supplied from control device 5, A,
Read B, C... independently and calculate the amount of data.
Read at different speeds such as 2.4K, 4.8K, 9.6K bits/second, etc., and stop reading independently. For this reason,
The words A, B, C, etc., which are made independent for each phrase, can be read and stopped for each A, B, C, and so on, and the utterance speed can be varied independently. 4A to 4C illustrate an example of the speech generation control procedure according to the present invention using clauses A, B, and C of FIG. 3. Here, the widths of A, B, and C indicate the time required for utterance. Under normal conditions, when the purchaser presses the buttons on the vending machine at prescheduled intervals, there is no problem in concentrating on uttering a series of messages.
As shown in FIG. A, the control device 5 controls the words A, B, C, etc. to be uttered at a preset normal speaking speed. Next, if there is a request to utter new words Y and Z while the predetermined messages A, B, and C are being uttered, depending on the content of the message and the input/output status of the vending machine, As shown in Figure 4, the speed of speech is increased in the middle of the message being uttered, or as shown in Figure 4C, the following words are interrupted or omitted in the middle of the message being uttered, and a new request is made. The control device 5 controls the messages Y and Z to be uttered promptly. For example, as shown in Figure 4B, when A is in the middle of uttering ``I'm irritated'', Y's request for ``I don't have money'' and Z's request to say ``I don't have enough'' is received. , B's ``Fuji Electric'' and C's ``desimasu'' are uttered at a speed of about 20 to 30% faster than the normal speed. Such variable control of the speech rate cannot be omitted because omitting the latter half of the message series A, B, and C would make the meaning unclear, but it is suitable when the meaning will not become unclear even if the speech rate is slowed down. In this embodiment, speech synthesis is controlled in units of clauses, such as A, B, and C. However, when controlling speech synthesis in units of words, it is possible to It is possible to speed up the speaking speed from the beginning (in the middle of the speech). On the other hand, in the case of Fig. 4C, for example, when Y and Z request to speak a new message while A is saying "Irasu shi ya imase", after A's "Irasu shi ya imase", Suitable when words B and C are optional words. In this case, when the utterance of A is finished, the utterances of B and C are stopped, and the newly uttered messages of Y and Z are uttered. Generally speaking, when a machine, such as a vending machine, is operated by a person, if the person operates the machine more quickly than usual, the machine tries to output the words to be uttered one after another. At that time, the word to be uttered in response to the most recent request should be output immediately, but in conventional voice generation control methods, new words are not output until all the words for which the utterance was previously requested have been uttered. There was an inconvenience that could not have occurred. In the present invention, as described above, it is possible to speed up the speech rate or stop speech midway to quickly produce a new word. For this reason, it is possible to expand the chances of uttering the message, and to output an appropriate message immediately without missing the opportunity to utter the message. FIG. 5 is a flowchart further illustrating the control procedure of FIGS. 4A to 4C. If the voice generation control device 5 is waiting for a voice request, the control is performed in step 21→
It loops like 22 → 21 → 22... Here, if there is a voice request, step 23 to step 2
Proceed to steps 4 and 25 for voice production preparation and voice production processing.
The words selected in steps 24 and 25 are searched for in the table 41 of FIG. 6, and preparation for utterance, utterance start procedure, etc. are performed based on this utterance data. While step 25 is being processed, step 23 is being uttered. Step 2
When the voice production process in step 5 is completed, it is determined in step 26 whether or not a voice request is being waited for. If the voice request is being waited for, control returns to step 21, and steps 21 → 22 → 21 → are repeated until a new voice request is received. 22
...is running in a waiting loop. On the other hand, if a new utterance request is received before the utterance of the message by the voice utterance process in step 25 is completed, the utterance will be interrupted in step 23, so the process will proceed to step 27 and the words (phrases) currently being output will be interrupted in step 23. )
determine whether it can be interrupted. If it is possible to interrupt the speech, proceed to step 28, the speech interruption process, to interrupt the currently outputted word or the subsequent word,
Returning to step 21, the newly requested words are uttered in steps 24 and 25. If it is determined in step 27 that the words currently being output as audio are words that cannot be interrupted, whether or not it is okay to increase the speech rate (speaking speed) of the words currently being output as audio is determined. If it is possible to speed up the speaking speed, the speed-up process in step 30 speeds up the speech generation speed of the word currently being output or its successor, and the speech output ends. Return to step 21 and step 24 and 2
5 generates the new requested word. However, if it is determined in step 29 that it is impossible to speed up the speaking speed, the control returns to step 21, and immediately after the voice output of the series of messages currently being output is finished, step 24 and step Go to step 25 and say the new requested word. Also, if there is a new request while the newly requested word is being uttered, the utterance interruption process in step 28 or the speed-up process in step 30 is performed based on the judgment in step 27 and step 29, so that the opportunity to utter the word is not missed. It is possible to appropriately output a voice message appropriate to the input/output status of the machine (vending machine) at a timing close to the time of request. Note that when the vending operation of the vending machine is completed and there is no longer a need to wait for a request to speak a message, control is returned to the main program (not shown) from step 26. FIG. 6 shows an example of the voice utterance table 41 of the voice data ROM 7 in FIG. 43 analyzes the pre-selected message and creates phrases A, B, C, D...
This is message data that is made independent for each message and connected, and 44 is the END code that indicates the end of the utterance.
The table numbers 42 are assigned corresponding to individual message data 43 selected in advance, and A, B, and C (indicated by diagonal lines) of the message data 43 indicate phrases commonly used in multiple messages. There is. The voice generation control device 7 receives selection information 45 that is supplied according to the external situation (input/output situation) of the vending machine.
Based on the required words A, B, C...
are selected and extracted using the table number 42, and these words A, B, C, . . . are connected to generate a single message. FIG. 7 shows in detail the message data 43 whose table number 42 is 0 as an example of the message data 43 in the voice utterance table 41 of FIG. Here, 43a to 43d are voice data for voice production, respectively, and each voice data 43a to 4
3d each has one clause A or B, C, D and flags F ₁ and F ₂ each consisting of one bit. The flag _F1 is an interruption permission flag used to determine whether it is okay to interrupt words A, B, C, or D of the clause in the utterance data to which it belongs, and the flag _F2 is an interruption permission flag used to determine whether it is okay to interrupt words A, B, C, or D in the clause in the utterance data to which it belongs. Word A or B of the clause,
This flag is used to determine whether or not the audio output speed of C and D can be increased. Both flags F ₁ and F ₂ are set in table 41 in advance. X ₁ to _{X 3} are each utterance data 43
The words A to D of each clause are provided between a to 43d.
It is a word spacing adjustment timer that controls the time to connect the words.
Set the time so that the words you say sound natural. The END code 44 is a special code set at the end of the table data, and the control device 7 determines that the utterance has ended when the END code 44 is input. Flag F ₁ is used in step 27 of FIG. 5, and flag F ₂ is used in step 29 of the same figure. for example,
If there is a new utterance request while the word A of the first utterance data 43a is being uttered, a flag attached to the word A is set by the control device 7 in step 27.
Referring to F ₁ , if it is interruptible, for example "1", the utterance of word A is immediately interrupted in step 28, and the next newly requested word is uttered in step 24.
Then, the table number is extracted from the table number 42, and the process proceeds to step 25, the voice production process. On the other hand, flag F ₁
If, for example, "0" is set to enable interruption, the controller 7 then sets the flag F ₂ in step 29.
Refer to , and if it has a flag of "1" that can speed up the audio output, for example,
In step 30, the voice pronunciation speed of words subsequent to word A is sped up. At this time, each clause A to D
Based on the set times of the word spacing adjustment timers _X1 to _X3 , the time for connecting the words is adjusted so as to be relatively shortened as the speaking speed increases. Furthermore, the flag F ₂
By using a plurality of determination data, it is possible to change the audio output speed in multiple steps. For example, if you express it in hexadecimal, you can get up to 16 variations. In this embodiment, flags F ₁ and F ₂ are referenced for each word in the clause, so even if word A cannot be interrupted, if any of words B, C, or D is interruptible, It is possible to stop the utterance from the word when the interrupt flag data of the interrupt flag _F1 is found and proceed to utter the new word. Also, for the same reason, the words B, C, and D that follow word A
It is possible to speed up all of the words or say only a part of them. As explained above, according to the present invention, predetermined message data is independently combined for each phrase or for each predetermined number of words, and flags indicating whether or not to interrupt and whether to speed up are flagged for each independent data. A speech table with data, and when a new speech request is received while a requested message is being uttered, these flag data are referenced to interrupt the voice output of the series of messages being output or to change the speaking speed. A control means is provided to perform control so as to speed up the message output, thereby outputting an effective message at the point when the voice is desired to be output or with a short waiting time, and giving appropriate instructions to the person receiving the voice output. The effect is that it becomes possible to do this. Further, the present invention is suitable for automatic vending machines, voice guidance devices, etc., which output a large amount of voice information in series from one voice output device, and which require prompt voice notification of changes in external or internal circumstances. It is.

[Brief explanation of the drawing]

第１図は本発明を適用した自動販売機の一例を
示すブロツク図、第２図は第１図の音声合成器の
一例を示すブロツク図、第３図は第１図の音声デ
ータROMの要部を示す説明図、第４図Ａ，Ｂ及
びＣはそれぞれ第１図の音声出力状態を示す説明
図、第５図は第１図の音声出力制御手順の一例を
示す流れ図、第６図は第１図の音声データROM
の音声発声テーブルの一例を示す構成図、第７図
は第６図の音声発声テーブルのメツセージデータ
部分の一例を示す構成図である。１……コインメカニズムユニツト、２……販売
制御回路、３……商品選択ボタン、４……商品搬
出機構、５……音声発生制御装置、６……音声合
成器、７……音声データROM、８……増幅器、
９……音声合成部、１０……スピーカ、２１〜３
０……制御手順、４１……音声発声テーブル、４
２……テーブルナンバ、４３……メツセージデー
タ、４３ａ〜４３ｄ……発声データ、４４……発
声終了ENDコード、４５……選択情報、Ａ，
Ｂ，Ｃ，Ｄ，Ｅ，Ｆ，Ｇ……文節毎の言葉のデー
タ、F₁……中断可否フラグ、F₂……スピードア
ツプ可否フラグ、X₁〜X₃……語間調整タイマ。 FIG. 1 is a block diagram showing an example of a vending machine to which the present invention is applied, FIG. 2 is a block diagram showing an example of the speech synthesizer shown in FIG. 1, and FIG. FIGS. 4A, B, and C are explanatory diagrams showing the audio output state of FIG. 1, respectively. FIG. 5 is a flowchart showing an example of the audio output control procedure of FIG. 1, and FIG. Audio data ROM in Figure 1
7 is a block diagram showing an example of the message data portion of the voice utterance table of FIG. 6. FIG. 1... coin mechanism unit, 2... sales control circuit, 3... product selection button, 4... product delivery mechanism, 5... voice generation control device, 6... voice synthesizer, 7... voice data ROM, 8...Amplifier,
9...Speech synthesis unit, 10...Speaker, 21-3
0...Control procedure, 41...Audio utterance table, 4
2...Table number, 43...Message data, 43a to 43d...Voice data, 44...Speech end END code, 45...Selection information, A,
B, C, D, E, F, G...Word data for each clause, _F1 ...Interruption flag, _F2 ...Speed up flag, _X1 to _X3 ...Word spacing adjustment timer.

Claims

[Scope of Claims] 1 a A plurality of words are stored in advance for each clause or a predetermined number of words independently, and it is indicated whether or not audio output can be interrupted for each clause or for each predetermined number of words. a storage means that stores an interruption flag and a speed-up flag indicating whether speed-up of audio output is possible; b. selecting and specifying a necessary word from the plurality of words and requesting audio output of the word; a voice output requesting means; c. a selection means for selecting and reading out a corresponding word from the storage means in response to a request of the voice output requesting means; and d. converting the word read out by the selection means into voice. e. When receiving a new audio output request from the audio output requesting means while the audio outputting means is outputting audio, the interruptability flag and the speed of the words currently being outputted are set. and an audio output control means for suspending or speeding up the current audio output according to the up flag, and then permitting the new audio output. Generation control device.