JPS59201141A

JPS59201141A - Input device of sound information

Info

Publication number: JPS59201141A
Application number: JP58073927A
Authority: JP
Inventors: Tomio Tadokoro; 田所　富男
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1983-04-28
Filing date: 1983-04-28
Publication date: 1984-11-14
Also published as: JPS6313207B2

Abstract

PURPOSE:To improve the sound recognizing factor and the reliability for an input device of sound information by generating the guidance in sound, when the words are registered in sound and inputting the sound data while hearing the answer-back sound when a sound information data is inputted. CONSTITUTION:A control circuit 3 controls a sound recognition control circuit 13 of a sound recognizing device 1 to fetch the result of sound recognition or controls the sound output control circuit 21 of a sound output device 2 to output the guidance and answer-back sound through a speaker 7. For registration of the sound, a speaker utters following the guidance sound outputted through the speaker 7 with use of a mike 6. Then the voice of the speaker is stored in a registered sound memory 14 via the circuit 13. For inputting the sound information data, the speaker inputs the voice in reminding own voice of a register mode while hearing the answer-back sound. In such a way of sound registration and input, the sound recognizing factor is improved with the improvement of reliability and operability.

Description

【発明の詳細な説明】〔発明の利用分野〕本発明は・人間が音声で情報を入力する音声情報入力装
置に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Application of the Invention] The present invention relates to a voice information input device for inputting information by voice by a human being.

[Background of the invention]

従来の音声情報入力装置は、音声を確認させるための登
録語を登録する際に、ＣＲＴなどの画面に表示された文
字や記号を見ながら自分勝手に発声する方式がとられて
いた。このため、音声情報を入力する際に登録時の自分
の登録語の発声を忘れてしまって認識率が悪く、使用に
耐えないという欠点があった。また、ＣＲＴなどの画面
を見ながら登録語を登録する方式であったため、音声情
報を入力する現場かられざわざＣＢ、Ｔ画面のある場所
まで話者が移動しなければ音声の再登録ができかい不便
なものであった。In conventional voice information input devices, when registering a registered word to confirm the voice, a system is adopted in which the word is uttered automatically while looking at characters and symbols displayed on a screen such as a CRT. For this reason, when inputting voice information, the user forgets to pronounce the registered word at the time of registration, resulting in a poor recognition rate and a drawback that the system is unusable. In addition, since the registered words were registered while looking at a screen such as a CRT, it was not possible to re-register the voice unless the speaker had to go to the trouble of entering the voice information to the location where the CB or T screen was located. It was inconvenient.

[Purpose of the invention]

本発明の目的は、認識率の高い使い易い音声情報入力装
置を提供するにある。An object of the present invention is to provide an easy-to-use voice information input device with a high recognition rate.

[Summary of the invention]

本発明は５登録語を音声で登録する際に、音声でガイダ
ンス音を発生させ、音声情報データを入力する際はアン
サーバック音を発生させるようにしたものである。In the present invention, when registering five registered words by voice, a guidance sound is generated by voice, and when voice information data is input, an answerback sound is generated.

[Embodiments of the invention]

第１図は本発明の一実施例を示す音声情報入力装置の構
成を示すシステム構成図である。FIG. 1 is a system configuration diagram showing the configuration of a voice information input device according to an embodiment of the present invention.

音声情報認識装置１は音声入力用マイクロの音声信号を
増幅する増幅器１１、音声信号をアナログからディジタ
ルに変換するＡ／Ｄ変換器１２・あらかじめ登録音声を
記憶しておく登録音声メモリ１４、入力音声と登録音声
とを比較して音声認識をする音声認識制御回路１３によ
って構成されている。The voice information recognition device 1 includes an amplifier 11 that amplifies the voice signal of the voice input micro, an A/D converter 12 that converts the voice signal from analog to digital, a registered voice memory 14 that stores registered voice in advance, and input voice. The voice recognition control circuit 13 performs voice recognition by comparing the registered voice and the registered voice.

音声出力装置２は音声出力をするための音声を記憶して
おく合成音声メモリ２２、音声登録時および音声認識結
果に応じて合成音声メモリ２２の記憶内容を選別して出
力する音声出力制御回路２１・出力信号をディジタルか
らアナログに変換するＤ／Ａ変換器２３．アナログ信号
を増幅してスピーカー（！、たはイヤホン）７からガイ
ダンスやアンサーバックの音声を発生させる増幅器２４
によって構成されている。制御回路３は音声認識装置１
の音声認識制御回路１３を制御して音声認識結果を取り
込んだり、音声出力装置２の音声出力制御回路２１を制
御してガイダンスやアンサーバック音をスピーカー７か
ら出力させたり・表示器５に制御状態や音声認識結果な
どを表示したシする制御用コンピュータである。制御回
路３は音声の他にキーボード４によっても制御される。The voice output device 2 includes a synthesized voice memory 22 that stores voices for voice output, and a voice output control circuit 21 that selects and outputs the stored contents of the synthesized voice memory 22 according to voice registration and voice recognition results. - D/A converter 23 that converts the output signal from digital to analog. An amplifier 24 that amplifies the analog signal and generates guidance and answerback audio from the speaker (!, or earphone) 7
It is made up of. Control circuit 3 is voice recognition device 1
control the voice recognition control circuit 13 of the voice recognition control circuit 13 to take in voice recognition results, control the voice output control circuit 21 of the voice output device 2 to output guidance and answerback sounds from the speaker 7, and display the control status on the display 5. This is a control computer that displays information such as information and voice recognition results. The control circuit 3 is controlled not only by voice but also by a keyboard 4.

第２図は本発明の一実施例に使用する音声単語の一例を
示す一覧表である。第２図の音声単語Ｎ０１１〜４２す
なわちαの範囲は音声認識用登録音声であり、登録音声
メモリ１４に登録語としてあらかじめ登録しておく。音
声の登録は話者がマイクロを使って音声単語Ｎ０１１〜
４２をスピーカー７からのガイダンス音によって順序真
似をして発声することによって増幅器１１　、　Ａ／Ｄ
変換器１２、音声認識制御回路１３を介して登録音声メ
モリ１４に記憶される。第２図のαの範囲は登録語を音
声登録する場合のガイダンス音および音声認識をした場
合のアンサーバックをするための音声出力語でもある。FIG. 2 is a list showing an example of audio words used in an embodiment of the present invention. The voice words N011 to N42 in FIG. 2, that is, the range of α, are registered voices for voice recognition, and are registered in advance as registered words in the registered voice memory 14. To register the voice, the speaker uses a micro to register voice words N011~
42 by imitating the sequence using the guidance sound from the speaker 7, the amplifier 11, A/D
It is stored in the registered voice memory 14 via the converter 12 and the voice recognition control circuit 13. The range α in FIG. 2 is also a guidance sound when registering a registered word by voice and a voice output word for answering when voice recognition is performed.

話者による音声登録完了後に、話者がマイクロを使って
音声単語Ｎｏ、　１〜４２のいづれかを音声にて入力す
ると音声認識制御回路１３によって登録音声メモリ１４
の音声単語メモリの中から同−Ｎｏ、の音声単語を探し
出してそのＮＯ，を制御回路３に出力する。制御回路３
は音声単語Ｎｏ、の入力によシデータとして取り込んだ
シ表示器５に表示したシする他に音声出力制御回路２１
に指令を発する。After the speaker completes voice registration, when the speaker inputs voice word No. 1 to 42 using the micro, the voice recognition control circuit 13 registers the voice word in the registered voice memory 14.
A voice word with the same No. is searched out from the voice word memory of , and the NO. is outputted to the control circuit 3. Control circuit 3
In addition to the input data of the audio word No., which is displayed on the display 5, the audio output control circuit 21
issue commands to.

第２図の音声単語Ｎｏ、　ｌ〜５５、すなわちβの範囲
は音声出力用合成音声メモリであって合成音声メモリ２
２にあらかじめ記録しておく。合成音声メモリ２２への
音声単語の記録方式は音声単語Ｎｏ、　１〜５５の音声
単語をそのま＼記録する方式・音声単語Ｎ０１１〜５５
のそれぞれの音声単語を何語かに区切っておき、音声出
力制御回路２１で順序正しく並べて出力する方式、さら
に、合成音声メモリ２２には音素片のみを記録しておき
、音声出力制御回路２１で音声単語Ｎ０９１〜５５を合
成して音声単語として出力する方式とすることができ、
いづれの方式でも良い。話者が登録語以外の音声を入力
すると音声認識制御回路１３は認識不能として、その認
識不能レベルを制御回路３に出力する。制御回路３は認
識不能レベルを取り込んで、その旨を表示器５に表示し
たり、音声出力制御回路２１に指令を発する。音声出力
制御回路２１は制御回路３の指令を受けて合成音声メモ
リ２２内の音素片を順序正しく読み出して来て、音声単
語Ｎ００５１〜５５を出力してＤ／Ａ変換器２３、増幅
器２４を介してスピーカー７から合成音声を発生させる
。すなわち、話者の登録語を音声登録する場合はスピー
カー７から登録語のガイダンス音が音声出力され、音声
情報を入力する音声データ入力時は認識結果がスピーカ
ー７からアンサーバックされる。The voice word No. 1 to 55 in FIG.
Record it in advance in step 2. The recording method of audio words in the synthesized speech memory 22 is as follows: Audio words No. 1 to 55 are recorded as they are. Audio words No. 1 to 55 are recorded as they are.
In this method, each voice word is divided into several words, and the voice output control circuit 21 arranges them in the correct order and outputs them.Furthermore, only the phoneme pieces are recorded in the synthesized voice memory 22, and the voice output control circuit 21 outputs them. A method can be adopted in which the spoken words N091 to N095 are synthesized and output as a spoken word,
Either method is fine. When the speaker inputs speech other than the registered word, the speech recognition control circuit 13 determines that the speech is unrecognizable and outputs the unrecognizability level to the control circuit 3. The control circuit 3 takes in the unrecognizable level, displays the fact on the display 5, and issues a command to the audio output control circuit 21. The speech output control circuit 21 receives the command from the control circuit 3, reads out the phoneme pieces in the synthesized speech memory 22 in order, outputs speech words N0051 to N0055, and outputs the speech words N0051 to N0055 through the D/A converter 23 and amplifier 24. synthesized speech is generated from the speaker 7. That is, when a registered word of a speaker is registered as a voice, a guidance sound of the registered word is outputted from the speaker 7, and when voice data for inputting voice information is input, a recognition result is answered back from the speaker 7.

第３図、第４図は本発明の一実施例を示す音声登録のフ
ローチャートである。FIGS. 3 and 4 are flowcharts of voice registration showing an embodiment of the present invention.

第３図において、ステートメン）　Ｎｏ、　２０１で音
声出力装置２のスピーカー７から「ゼロ」とガイダンス
音が発生される。そこで、２０２で話者が音声ガイダン
スに従っておうむ返しに「ゼロ」と発音してマイクロか
ら音声認識装置１の登録音声メモリ１４の中に登録語「
０」を記憶させる。In FIG. 3, a guidance sound "Zero" is generated from the speaker 7 of the audio output device 2 at 201 (Statement No. 201). Then, in step 202, the speaker pronounces "zero" in return according to the voice guidance, and the registered word "zero" is sent from the micro to the registered speech memory 14 of the speech recognition device 1.
0" is memorized.

引続いて、２１１で、スピーカー７から１イチ」とガイ
ダンス音が発生される。２１２で、話者が「イチ」と発
声して音声認識装置１の登録音声メモリ１４の中に登録
語「１」を記憶させる。以下、同様にして、スピーカー
７からの音声ガイダンスに従って、話者がおうむ返しに
発声してマイクロから登録語を順次音声入力して行き、
２９１での音声ガイダンス「キュウ」に対し２９２で「
キュウ」と発声して数字の登録語「０〜９」の登録語の
登録を光子する。Subsequently, at 211, a guidance sound is generated from the speaker 7, saying "1 1". At 212, the speaker utters "ichi" to store the registered word "1" in the registered speech memory 14 of the speech recognition device 1. Thereafter, in the same manner, following the voice guidance from the speaker 7, the speaker repeats the voice input and sequentially inputs the registered words from the micro.
The voice guidance in 291 is “Kyuu”, while in 292 it is “Kyuu”.
``Kyu'' and photons to register the registered numeric words ``0-9''.

第４図は同一登録語を複数回話者がくり返して音声登録
することによって音声情報データを入力した際の認識率
を向上させたものである。２０１と２０２までは、第３
図と同様に「ゼロ」を１回音声登録した後ステートメン
）２０１Ａで再度スピーカー７から「ゼロ」と音声ガイ
ダンスされる。FIG. 4 shows an example in which the recognition rate is improved when voice information data is input by having a speaker register the same registered word multiple times and registering the voice. Up to 201 and 202, the third
As in the figure, after "zero" has been registered as a voice once, the speaker 7 again gives voice guidance as "zero" in the statement member 201A.

話者は再び２０２Ａで「ゼロ」と発声して音声登録をく
り返す。次に、２１１と２１２で、第３図と同様に第１
回目の「イチ」の音声登録をし、ステートメントＮｏ、
　２１１　Ａと２１２Ａで第２回目の「イチ」の音声登
録をする。以下、同様にして、２９１．２９２，２９１
Ａ、２９２Ａで「キュウ」の音声ガイダンスに対して「
キュウ」を発声して二回くシ返して「９」を音声登録す
る。このようにして数字「０〜９」を複数回くり返しな
がら登録語の音声登録を終了する。この場合の実際の音
声登録語として登録音声メモリ１４に記憶される音声デ
ータは各登録語の最後に発声したものとするのが好捷し
い。同一語ヲクシ返し発声することによって話者の音質
、アクセント、発声法がガイダンス音に近づくようにな
る。この結果、音声登録時と音声入力時のアンサーバッ
ク音は同一音素全合成した同一の合成音声であるため、
音声情報データを入力する際に、アンサーバック音を聞
きながら登録時の自分の音声を思い出しながら音声入力
することになり、認識率が格段に向上する。The speaker again utters "zero" at 202A and repeats the voice registration. Next, at 211 and 212, the first
Register the voice of "Ichi" for the second time, and the statement No.
211A and 212A register the second "Ichi" voice. Hereinafter, in the same manner, 291.292,291
A. At 292A, in response to the voice guidance "Kyu", "
Say ``Kyuu'' and repeat it twice to register ``9'' as a voice. In this way, the voice registration of the registered word is completed while repeating the numbers "0 to 9" a plurality of times. In this case, it is preferable that the voice data stored in the registered voice memory 14 as the actual registered voice words be the voice data uttered at the end of each registered word. By repeating the same word, the speaker's tone quality, accent, and method of enunciation become closer to the guidance sound. As a result, the answerback sound during voice registration and voice input is the same synthesized voice that is completely synthesized from the same phoneme.
When inputting voice information data, the user inputs voice while listening to the answerback sound and remembering his/her own voice at the time of registration, which greatly improves the recognition rate.

このように、スピーカーからの音声ガイダンスに従って
、ただ、おうむ返しで音声入力するだけで音声認識用の
登録語の登録が簡単に済みしかも音声認識率の高い音声
を登録することができる。In this way, registration words for speech recognition can be easily registered by simply repeating voice input in accordance with the voice guidance from the speaker, and it is possible to register speech with a high speech recognition rate.

第５図は本発明の他の実施例の音声情報入力装置のシス
テム構成図である。話者は送信機８１と受信機８２とか
ら構成された携帯無線機８を携帯してマイクロから音声
を入力し無線電波を介して無線固定局９の受信機９１に
音声情報を入力し、音声ガイダンスや音声アンサーバッ
クを無線固定局９の送信機９２からの電波を受信機８２
で受信してスピーカー（またはイヤホン）７で聴取する
。FIG. 5 is a system configuration diagram of a voice information input device according to another embodiment of the present invention. A speaker carries a portable radio device 8 consisting of a transmitter 81 and a receiver 82, inputs voice from a micro, and inputs the voice information to the receiver 91 of the wireless fixed station 9 via radio waves, and the voice is transmitted. The receiver 82 receives radio waves from the transmitter 92 of the wireless fixed station 9 for guidance and voice answerback.
and listen to it through the speaker (or earphones) 7.

話者は小型携帯用無＃機のみを携帯すればよいので両手
を使いながら自由に移動して登録語の登録および音声情
報の入力をすることができる・〔発明の効果〕本発明によれば、認識率が格段に向上して信頼性の高い
使い勝手の良い音声情報入力装置とすることができる。Since the speaker only needs to carry a small portable wireless device, the speaker can move freely while registering registered words and inputting voice information using both hands. [Effects of the Invention] According to the present invention , the recognition rate is significantly improved, and a highly reliable and easy-to-use voice information input device can be obtained.

[Brief explanation of the drawing]

第１図は本発明の一実施例のシステム構成図、第２図は
本発明に使用する音声単語の一例を示す一覧図、第３図
、第４図は本発明の一実施例の音声登録のフローチャー
ト、第５図は本発明の他の実施例のシステム構成図であ
る。２１・・・音声出力制御回路、２２・・・合成音声メモ
リ、壊　３　目璃　今　因Fig. 1 is a system configuration diagram of an embodiment of the present invention, Fig. 2 is a list diagram showing an example of voice words used in the present invention, and Figs. 3 and 4 are voice registration diagrams of an embodiment of the present invention. FIG. 5 is a system configuration diagram of another embodiment of the present invention. 21...Speech output control circuit, 22...Synthesized speech memory, broken 3.

Claims

[Scope of Claims] 1. Voice information input characterized by a configuration in which a guidance voice of a registered word is emitted during voice registration, and an answerback voice for confirming voice input data is emitted when voice information is input. Device. 2. The voice information input device according to claim 1, characterized in that a memory for storing the guidance voice and the answerback voice is shared.