JPS63149699A

JPS63149699A - Voice input/output device

Info

Publication number: JPS63149699A
Application number: JP61298037A
Authority: JP
Inventors: 染川　恵美子
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1986-12-15
Filing date: 1986-12-15
Publication date: 1988-06-22

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔概要〕音声入出力装置において、既に登録済みの音声辞書を、
日数を置いて運用する際での認識率を高める為に行う学
習時に、登録時の発声のニュウアンスを忘れてしまい、
何度も、同一音声を発声して入力しなければならない状
態が起こりうるという点を解決する為に、上記登録時の
音声データを見本音声再生用データとして保持しておく
手段を設けることにより、学習時にそれを音声出力させ
て、操作者に登録時の発声方法を確認させながら音声入
力させるようにしたものである。[Detailed Description of the Invention] [Summary] In a voice input/output device, an already registered voice dictionary is
When learning to increase the recognition rate during operation after a few days, I forgot the nuance of the utterance at the time of registration.
In order to solve the problem of having to utter and input the same voice many times, a means is provided for storing the voice data at the time of registration as sample voice playback data. During learning, it is outputted aloud, and the operator inputs the voice while confirming the utterance method at the time of registration.

[Industrial application field]

本発明は、音声入出力装置に係り、特に、その音声辞書
を学習するときの見本となる音声の出力処理（プロンプ
ト処理）方式に関する。The present invention relates to a voice input/output device, and particularly to a voice output processing (prompt processing) method used as a sample when learning a voice dictionary.

最近の計算機システムの処理能力の向上と、ｑ及に伴っ
て、音声をデータとして入力し、処理することが盛んに
なってきた。With the recent improvement in the processing power of computer systems and the growth in technology, it has become popular to input and process speech as data.

その−例として、工場における仕分は作業の為の音声入
出力装置がある。この場合、予め、音声辞書として登録
されている特定話者の音声が、学習により確認されたと
き、該特定話者からの音声を入力して、上記仕分は作業
の為の音声入出力装置を運用することになるが、日数を
経て学習する際には、登録時と全く同じ条件で、音声を
発声させることは困難であるので、このような場合でも
、できる限り音声辞書を登録した時と同じ条件で学習で
きる音声学習方式が必要とされる。As an example, there is a voice input/output device for sorting work in a factory. In this case, when the voice of a specific speaker registered in the voice dictionary is confirmed through learning, the voice from the specific speaker is input and the voice input/output device for the above-mentioned sorting operation is used. However, when learning over a number of days, it is difficult to produce the voice under exactly the same conditions as at the time of registration, so even in such cases, try to use the voice dictionary as much as possible when you registered it. A voice learning method that allows learning under the same conditions is needed.

特に、音声辞書の学習処理は、音声入出力装置を運用す
る際の認識率を高める為の準備作業に過ぎないので、そ
のオペレーションは極力短時間で、効率的にできること
が望ましい。In particular, since the learning process of the speech dictionary is merely a preparatory work for increasing the recognition rate when operating the speech input/output device, it is desirable that the operation be performed as quickly and efficiently as possible.

〔従来の技術と発明が解決しようとする問題点〕第３図
は従来の音声入出力装置の構成を示す処理の流れ図であ
る。[Prior art and problems to be solved by the invention] FIG. 3 is a processing flowchart showing the configuration of a conventional audio input/output device.

先ず、音声辞書の登録処理ｌにおいて、特定話者が順次
音声を発声、入力して音声辞書を作成する。First, in a voice dictionary registration process 1, a specific speaker sequentially utters and inputs voices to create a voice dictionary.

次に、該音声入出力装置を連用して、前述の工場におけ
る仕分は作業を行う場合には、先に登録しておいた音声
辞書に対して学習処理２を行うが、面を見て、そこに表
示されている学習の為の“読み文字列”に従って、音声
を発声、入力し、上記登録時の音声と、例えば、公知の
パターンマツチ法等によって照合し、全て一致した時点
において、実際の仕分は作業の為の音声入力と云った運
用処理に入ることになる。Next, when the voice input/output device is used continuously for the sorting work in the factory mentioned above, learning process 2 is performed on the previously registered voice dictionary. In accordance with the "reading character string" for learning displayed there, utter and input the audio, match it with the audio at the time of registration using, for example, a known pattern matching method, and when all matches, the actual The sorting will involve operational processing such as voice input for work.

従って、従来の音声入出力装置では、予め、特定の話者
が音声辞書の登録を行っておき、運用時に、該登録され
ている音声辞書に対する学習を行い、運用時の認識率を
安定させると云う方法が採られているが、登録処理後、
日数を暫くおいてからの学習処理では、同一の音声辞書
に対して、発声語は同じでも発声の調子迄も類似させて
入力するのは困難であると云う問題があった。Therefore, in conventional voice input/output devices, a specific speaker registers a voice dictionary in advance, and during operation, learning is performed on the registered voice dictionary to stabilize the recognition rate during operation. However, after the registration process,
In the learning process after a certain number of days, there is a problem in that it is difficult to input into the same speech dictionary even if the utterances are the same, but the tone of the utterances are similar.

又、登録内容の確認が不可能である為、従来特定話者が
気づかない侭に、雑音が入っていても、その侭音声辞書
として登録され、学習時に何度同じ単語を発声、入力し
ても、上記登録用の音声データとの照合ができないと云
う状態も起こり得た。In addition, since it is impossible to check the registered contents, even if there is noise in the background that a specific speaker would not notice, it will be registered in the voice dictionary, and no matter how many times the same word is uttered or input during learning, it will be recorded. However, a situation may also occur in which it is not possible to match the voice data for registration.

このようなことから、音声辞書の登録時と、運用時の発
声が安定せず、音声認識に支障がきたすと云う問題が生
じていた。As a result, a problem has arisen in that speech is not stable when registering the speech dictionary and during operation, and this causes problems in speech recognition.

本発明は上記従来の欠点に鑑み、音声入出力装置におい
て、既に登録済みの音声辞書を日数をおいて運用する際
の、学習時のオペレーションを円滑にして、運用時の認
識率を安定させる学習方式を提供することを目的とする
ものである。In view of the above-mentioned conventional drawbacks, the present invention provides a learning method for smoothing the operation during learning and stabilizing the recognition rate during operation when using an already registered speech dictionary after a number of days in a speech input/output device. The purpose is to provide a method.

[Means for solving problems]

第１図は、本発明の音声入出力装置の構成を示　゛す処
理の流れ図である。FIG. 1 is a process flowchart showing the configuration of the audio input/output device of the present invention.

本発明においては、音声辞書の登録ｌと、該登録されている音声辞書に対す
る学習２を行ってから音声認識を行う機能を備えた音声
入出力装置であって、上記音声辞書の登録１時に、入力した音声データを、見
本音声再生用データとして保持しておく手段１２と、上記学習２時に音声を入力する際、上記見本音声再生用
データを、見本音声として出力させる手段２１　とを設
け、当該見本音声を聞きながら、登録時と近い発声で音声を
入力して学習するように構成する。In the present invention, there is provided a voice input/output device having a function of registering a voice dictionary, performing learning 2 for the registered voice dictionary, and then performing voice recognition, wherein at the time of registering the voice dictionary, means 12 for holding the input audio data as sample audio reproduction data; and means 21 for outputting the sample audio reproduction data as a sample audio when inputting audio at the time of learning 2; The system is configured to learn by inputting a voice with a utterance similar to that at the time of registration while listening to a sample voice.

[Effect]

即ち、本発明によれば、音声入出力装置において、既に
登録済みの音声辞書を、日数を置いて運用する際での認
識率を高める為に行う学習時に、登録時の発声のニュウ
アンスを忘れてしまい、何度も、同一音声を発声して入
力しなければならない状態が起こりうるという点を解決
する為に、上記登録時の音声データを見本音声再生用デ
ータとして保持しておく手段を設けることにより、学習
時にそれを音声出力させて、操作者に登録時の発声方法
を確認させながら音声入力させるようにしたものである
ので、学習時のオペレーションを円滑にし、運用時の音
声認識時の発声を安定させる効果がある。又、再登録を
必要とする音声データの検出ができる。That is, according to the present invention, when learning is performed to increase the recognition rate when using an already registered speech dictionary after a few days in a speech input/output device, it is possible to forget the nuance of the utterance at the time of registration. In order to solve the problem that the same voice may have to be uttered and input many times, a means is provided for storing the voice data at the time of registration as data for sample voice reproduction. This allows the operator to input the voice while checking the utterance method used during registration, making the operation smoother during learning, and making it easier to utter the utterances during voice recognition during operation. It has a stabilizing effect. Also, it is possible to detect audio data that requires re-registration.

〔Example〕

以下本発明の実施例を図面によって詳述する。 Embodiments of the present invention will be described in detail below with reference to the drawings.

前述の第１図が、本発明の音声入出力装置の構成を示す
処理の流れ図であり、第２図は本発明の一実施例を模式
的に示した図であって、第１図におけるステップ１２．
２１が本発明を実施するのに必要な手段である。The above-mentioned FIG. 1 is a process flowchart showing the configuration of the audio input/output device of the present invention, and FIG. 2 is a diagram schematically showing an embodiment of the present invention, and the steps in FIG. 12.
21 are the means necessary to carry out the present invention.

以下、第１図を参照しながら、第２図によって、本発明
の音声入出力装置における音声登録、及び学習処理を説
明する。Hereinafter, voice registration and learning processing in the voice input/output device of the present invention will be explained with reference to FIG. 2 while referring to FIG.

先ず、当該音声入出力装置を使用する特定話者は、第１
図で示した音声辞書の登録処理１において、予め、音声
認識装置■を介して、自分の音声辞書を作成、登録して
おく。（ステップ１０．１１参照）このとき、本発明においては、該登録された音声データ
を、登録用データとは別のデータ形式（例えば、単なる
ディジタル信号に変換したデータ形式）で、音声再生用
データとして、例えば、マイクロディスク■に保存して
お（。更に、本音声人出力装置では、表示用データも、
“読み文字列”として上記マイクロディスク■に保存し
ておく。　（ステップ１２参照）尚、上記登録用データは、後述の音声認識が効率的に行
われるデータ形式、具体的には、例えば、公知のパター
ンマツチングを行う為の６デ一タ形式で、マイクロディ
スク■に登録されており、上記音声再生用のデータ形式
とは異なる。First, the specific speaker using the audio input/output device
In the speech dictionary registration process 1 shown in the figure, a user's own speech dictionary is created and registered in advance via the speech recognition device (2). (See step 10.11) At this time, in the present invention, the registered audio data is converted into audio playback data in a data format different from the registration data (for example, a data format converted into a simple digital signal). For example, save it on a microdisk (.Furthermore, with this voice output device, display data can also be saved as
Save it as a “reading character string” on the microdisk ■ above. (Refer to step 12) The above registration data is in a data format in which voice recognition, which will be described later, is efficiently performed, specifically, for example, in a 6-data format for performing well-known pattern matching. It is registered on disk ■ and is different from the data format for audio playback described above.

次に、当該音声入出力装置を運用する場合について説明
する。Next, the case of operating the audio input/output device will be explained.

先ず、オペレータである特定話者■が、学習操作キーを
押下すると、ディスプレイ上の画面■に、図示の如く、
例えば、読み方「ホ・ノカイドウ」と。First, when the specific speaker ■ who is the operator presses the learning operation key, the screen ■ on the display shows the following information as shown in the figure.
For example, the reading is ``ho no kaido''.

文字列「北海道」の学習画面が表示される。（ステ、プ
２０参照）続いて、音声認識装置■においては′、該表示内、容（
「北海道」）に対応している登録済みの音声再生用デー
タをスピーカ■から出力する。　（ステップ２１参照）上記特定話者（オペレータ）■は、表示装置■の画面上
の読み方「ホラカイドウ」と、上記音声出力の「ホラカ
イドウ」を意識して、より登録時に近い発声を行い学習
する。該人力された認識用音声データは音声認識装置内
において、マイクロディスク■内に登録されているデー
タと照合され。A learning screen for the character string "Hokkaido" is displayed. (See step 20.) Next, in the voice recognition device ■, ', the display content, the content (
Registered audio playback data compatible with "Hokkaido") is output from the speaker ■. (See step 21) The specific speaker (operator) (2) learns by uttering a voice that is closer to the time of registration, keeping in mind the reading "Horakaido" on the screen of the display device (2) and the "Horakaido" in the audio output. The human-generated voice data for recognition is checked in the voice recognition device against the data registered in the microdisk (2).

一致が得られると、次の単語の学習に入る。（ステップ
２２．２３．２４参照）上記ディスプレイ画面■の下のメソセージ表示欄には、
該学習結果とし、例えば、“発声が大き過ぎる”とか、
　“もう一度発声して下さい”等の所謂警告メソセージ
が表示される。Once a match is found, the next word will be learned. (Refer to Steps 22.23.24) In the message display field under ■ on the display screen above,
The learning result is, for example, "the vocalization is too loud",
A so-called warning message such as "Please speak again" is displayed.

このように、本発明は、特定話者による音声入出力方式
において、該特定話者の為の音声辞書を登録する際、音
声再生用デー、夕を作成して保存しておき、運用時の学
習に際して、ディスプレイ画面上に表示されている学習
単語に対応する上記音声再生用データをスピーカから出
力させ、上記ディスプレイ画面上の学習単語と、スピー
カからの音声出力データに基づいて、音声辞書登録時に
近い発生方法で音声入力を行い、該学習処理を円滑に行
うようにした所に特徴がある。As described above, in the audio input/output method by a specific speaker, when registering a speech dictionary for the specific speaker, the audio playback data and data are created and saved, and the During learning, the above-mentioned audio playback data corresponding to the learning words displayed on the display screen is output from the speaker, and based on the learning words on the display screen and the audio output data from the speaker, the audio data is recorded when registering the audio dictionary. The feature is that the voice input is performed in a similar way to the generation, and the learning process is performed smoothly.

〔Effect of the invention〕

以上、詳細に説明したように、本発明の音声入出力装置
は、既に登録済みの音声辞書を、日数を置いて運用する
際での認識率を高める為に行う学習時に、登録時の発声
のニュウアンスを忘れてしまい、何度も、同一音声を発
声して入力しなければならない状態が起こりうるという
点を解決する為に、上記登録時の音声データを見本音声
再生用データとして保持しておく手段を設けることによ
り、学習時にそれを音声出力させて、操作者に登録時の
発声方法を確認させながら音声入力させるようにしたも
のであるので、学習時のオペレーションを円滑にし、運
用時の音声認識時の発声を安定させる効果がある。又、
再登録を”必要とする音声データの検出ができる。As described above in detail, the voice input/output device of the present invention is useful for learning to improve the recognition rate when using an already registered voice dictionary after a few days. In order to solve the problem of forgetting the nuance and having to utter and input the same voice over and over again, the voice data at the time of registration above is retained as sample voice playback data. By providing a means to output the voice during learning, the operator can input the voice while confirming the utterance method at the time of registration, making the operation smoother during learning and improving the voice output during operation. It has the effect of stabilizing speech during recognition. or,
It is possible to detect audio data that requires re-registration.

[Brief explanation of the drawing]

第１図は本発明の音声入出力装置の構成を示す処理の流
れ図。第２図は本発明の一実施例を模式的に示した図。第３図は従来の音声入出力装置の構成を示す処理の流れ
図。である。図面において、ｌは音声登録処理、　　２は音声学習処理。１０〜１３．２０〜２５は各処理ステップ。 ■はディスプレイ画面、■は音声認識装置。 ■はオペレータ、　　　　■はマイクロディスク。 ■はスピーカ。をそれぞれ示す。処理／Ｉ　５Ｌ札図第５Ｉｌ！ｌヤ一言ヒリ−FIG. 1 is a process flowchart showing the configuration of the audio input/output device of the present invention. FIG. 2 is a diagram schematically showing an embodiment of the present invention. FIG. 3 is a processing flowchart showing the configuration of a conventional audio input/output device. It is. In the drawings, 1 is a voice registration process, and 2 is a voice learning process. 10-13. 20-25 are each processing step. ■ is the display screen, ■ is the voice recognition device. ■ is the operator, ■ is the micro disk. ■ is a speaker. are shown respectively. Processing/I 5L bill map No. 5Il! l Yakoto Hiri-

Claims

[Claims] A voice input/output device having a function of registering a voice dictionary (1) and performing voice recognition after performing learning for the registered voice dictionary (2), the voice dictionary having the function of performing voice recognition. means (12) for retaining the input audio data as data for sample audio playback at the time of registration (1);
), and means (21) for outputting the sample voice reproduction data as a sample voice when inputting voice during the learning (2).
A voice input/output device comprising: a voice input/output device which inputs a voice for learning while listening to the sample voice.