JPS59177599A

JPS59177599A - Voice registration system

Info

Publication number: JPS59177599A
Application number: JP58053125A
Authority: JP
Inventors: 竹内　亜紀彦
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1983-03-29
Filing date: 1983-03-29
Publication date: 1984-10-08

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】発明の技術分野本発明は、電話回線等を入力媒体とする宥声認識応答シ
ステムの音声登録方式に関し、特に話者に登録操作を意
識させることなく音声登録を実行しようとするものであ
４゜従来技術と問題点音声認識装置には特定話者を対象とする登録式と不特定
話者を対象とする非登録式とがある。不特定話者を対象
と丈る音声認識装置は、使用する前に音声の登録をする
必要がないために、簡単に使用できる。しかし、人（話
者）によるばらつきを考慮して認識用の辞書を作成しな
くてはならず、そのために辞１等が複雑になって認識対
象単語を増やすことも変更することも難しい。一方、登
録式の音声認識装置では、使用する前に音声を登録する
必要があるが、それをしてしまえば話者によるばらつき
を考慮する必要がないために辞書等が簡単になり認識対
象単語秦ふやしたり変更したりすることが比較的容易に
行える。このように特定式、不特定式にはそれぞれ一長
一短があるので、これら両方の装置を組み合せることで
より柔軟で利用範囲の広いシステムの構成が期待できる
。[Detailed Description of the Invention] Technical Field of the Invention The present invention relates to a voice registration method for an appeasement recognition and response system that uses a telephone line or the like as an input medium, and specifically attempts to perform voice registration without making the speaker aware of the registration operation. 4. Prior Art and Problems There are two types of speech recognition devices: a registered type that targets a specific speaker, and a non-registered type that targets unspecified speakers. Speech recognition devices designed for non-specific speakers are easy to use because they do not require voice registration before use. However, a dictionary for recognition must be created taking into account variations among people (speakers), which makes the lexicon 1 etc. complicated, making it difficult to increase or change the number of words to be recognized. On the other hand, with registration-type speech recognition devices, it is necessary to register the speech before use, but once this is done, there is no need to take into account variations among speakers, which simplifies dictionaries, etc., and words to be recognized. Qin It is relatively easy to increase or change. As described above, the specific type and the unspecified type each have their advantages and disadvantages, so by combining both types of devices, a more flexible system configuration with a wider range of use can be expected.

従来、オーク・エントリ　（電話による注文受付等のア
プリケーションで不特定の音声認識を行なおうとする場
合は、予め各商品名を数字等でコード化し、このコート
を話者が発声してこれを装置が８１１にし、人力すると
いう方法がとられてきた。Conventionally, Oak Entry (when attempting to perform unspecified voice recognition in an application such as accepting orders over the phone, the name of each product is coded in advance with numbers, etc., and the code is uttered by the speaker to be used by the device. The method used has been to call 811 and have it done manually.

数字コードを使用したのは、商品名の数が多いことと変
更が頻繁であるために、不特定の音声認識装置では認識
のための多数の商品名に対する辞書を作成することが困
難であったためである。しかし数字コードでは暗記しに
くいので商品名と数字コードの対応表を用意しておいて
、それを見ながら所望商品に対する数字コートを発声す
るなどの処理が必要になり、操作性が悪い。登録式の音
声認識装置を使用すればコードではなく直接商品名での
入力が可能であるが、ＬＲｉ＆に先立って音声の登録が
必要である。また、商品名の数が多いことと変動がある
こ′とを考えると、総ての商品名を登録するこ履は大変
であり、また商品名は普通は一部を使用することが多い
ことを考えると、効率的でもない。更に認識率の点から
も登録数の少ないほうが好ましい。しかし、必要な単語
だけを登録していても、ある単語が登録されているか、
あるいはどの様な読みで登録しであるのかということを
管理するのは大変である。また、新しい単語を使用しよ
うとするたびにその単語を登録処理するのも大変である
。Numerical codes were used because the large number of product names and their frequent changes made it difficult for unspecified voice recognition devices to create a dictionary for the large number of product names for recognition. It is. However, since numeric codes are difficult to memorize, it is necessary to prepare a correspondence table between product names and numeric codes, and then speak out the numeric code for the desired product while looking at the table, resulting in poor operability. If a registration type voice recognition device is used, it is possible to input the product name directly instead of the code, but voice registration is required prior to LRi&. In addition, considering the large number of product names and the fact that they change, it is difficult to register all product names, and it is common to use only a part of the product name. When you think about it, it's not even efficient. Furthermore, from the viewpoint of recognition rate, it is preferable to have a smaller number of registrations. However, even if you register only the necessary words, some words may still be registered.
Also, it is difficult to manage how the words should be registered. Furthermore, it is difficult to register a new word every time the user wants to use the word.

発明の目的本発明は、上述した音声登録に関する話者側の負担を軽
減し、また登録型音声認識システムの能力を必要に応し
て自然に高めてゆこうとするものである。OBJECTS OF THE INVENTION The present invention aims to reduce the burden on the speaker regarding the above-mentioned voice registration, and also to naturally improve the ability of the registration type voice recognition system as required.

発明の構成本発明は、電話回線等を入力媒体とし、不特定話者を対
象とする音声認識部および特定話者を対象とする登録式
の音声認識部と、出力用の音声合成部とを備える音声認
識応答システムの音声登録方式において、話者が発声し
た音声を蓄えておくハソファを設け、該音声に対する登
録式の音声認識部の認識結果が該話者による確認で否定
された場合には該不特定の音声認識部に切換えて該話者
から該音声が示す単語について該不特定音声認識部で予
め設定したコードの入力を求め、そして該コードによる
入力終了後に該不特定の音声認識部による認識結果と該
バッファ中の音声を該登録式の音声認識部の辞書に登録
することを特徴とするが、以・下図水の実施例を参照し
ながらこれを詳細に説明する。Structure of the Invention The present invention uses a telephone line or the like as an input medium, and includes a speech recognition section for unspecified speakers, a registered speech recognition section for specific speakers, and a speech synthesis section for output. In the voice registration method of the voice recognition response system provided, a hasher is provided to store the voice uttered by the speaker, and if the recognition result of the registration type voice recognition unit for the voice is denied by confirmation by the speaker. The unspecified voice recognition unit switches to the unspecified voice recognition unit, requests the speaker to input a code preset in the unspecified voice recognition unit for the word indicated by the voice, and after the input by the code is completed, the unspecified voice recognition unit The present invention is characterized in that the recognition result obtained by the method and the voice in the buffer are registered in the dictionary of the registration type voice recognition unit, and this will be explained in detail below with reference to the embodiments shown in Figs.

発明の実施例第１図は本発明の一実施例を示すブロック図で、１は話
者につながる回線、２は網制御側゛ユニット（ＮＣＵ）
、３は入出力の切替および認識部の切替えを行う切替部
、４は不特定話者を対象とする音声認識部（以下、不特
定認識部という）、５ばその辞書、６は特定話者を対象
とする登録式の音声認識部（以下、特定認識部という）
、７はその辞書メモリ、８は本発明により追加した音声
ハソファ、９は話者に１対し確認、応答用の音声を返送
する音声合成部、１０は制御部、１１は特定用辞書、■
２はホスト計算機へつながる回線である。Embodiment of the Invention FIG. 1 is a block diagram showing an embodiment of the present invention, in which 1 is a line connected to a speaker, 2 is a network control side unit (NCU).
, 3 is a switching unit that switches input/output and the recognition unit, 4 is a speech recognition unit for unspecified speakers (hereinafter referred to as unspecified recognition unit), 5 is a dictionary for speakers, and 6 is a specific speaker Registered speech recognition unit (hereinafter referred to as specific recognition unit) for
, 7 is the dictionary memory, 8 is a voice processor added according to the present invention, 9 is a voice synthesis unit that sends back confirmation and response voices to the speaker, 10 is a control unit, 11 is a specific dictionary,
2 is a line connected to the host computer.

第一２図はオーダエン１−リシステムの全仏を示す説明
図で、１５ば話者（消費者個人、商店係員など）が使用
する電話機、１６は電話交−換機、１７はポストコンピ
ュータ、そして２０が第１図に示した音声認識応答シス
テムである。Figure 12 is an explanatory diagram showing the French ordering system, where 15 is a telephone used by a speaker (individual consumer, store clerk, etc.), 16 is a telephone exchange, 17 is a post computer, 20 is the voice recognition response system shown in FIG.

このオーダ・エントリシステムでは先ず話者に電話で商
品名をその読みの通りまたは任意に略式化した読みで発
声させ、音声応答認識システムでこれを受けて音声ハソ
ファ８に格納し、特定認識部６で認識する。認識に際し
ては特定用辞書１１の話者に該当する領域を読出し、こ
れを辞書メモリ７に格納して使用する（話者の識別等は
後述する）。そして、最も近い認識結果を音声合成部９
から返送して話者に確認を求める。このとき肯定されれ
ばその商品名のコードおよび同時に音声入力された数量
などをポスト計算機へ送って注文処理し、然るのち次の
入力にそなえ−るゲ、否定されたときは切替部３によっ
て不特定認識部４へ切替える。不特定認識部４は商品名
をその読みでなく、該商品名に予め割当てられた数字コ
ードの読みとして入力させる。つまり、不特定用辞書５
には予め各商品名に割当てられた数字コートの読めが登
録されているので、話者が希望とする商品名の数字コー
ドを発声することにより認識することかできる。このと
きも音声合成ｆｌ１３９からの音声で１．１＃認をとる
が、その結果か肯定されたときは音声バッファ８内の音
声と、そのとき不特定認識部４で使用したコートを特定
用辞書１１に登録する。このようにすれば、以後同一話
者が同一商品を注文するときは商品名で入力できる。し
かも、この間話者は一切音声登録ということを意識せず
に済む。In this order entry system, first, the speaker speaks the product name as it is pronounced or in an arbitrarily abbreviated form over the telephone, and the voice response recognition system receives this and stores it in the voice response sofa 8. Recognize with. For recognition, an area corresponding to the speaker is read from the identification dictionary 11, and this is stored in the dictionary memory 7 for use (identification of the speaker will be described later). Then, the voice synthesis unit 9 selects the closest recognition result.
send it back to the speaker for confirmation. If affirmative at this time, the code of the product name and the quantity input by voice at the same time are sent to the post computer to process the order, and then prepared for the next input.If negative, the switching unit 3 Switch to unspecified recognition unit 4. The unspecified recognition unit 4 inputs the product name not as its reading but as the reading of a numerical code assigned in advance to the product name. In other words, unspecified dictionary 5
Since the reading of the numerical code assigned to each product name is registered in advance, the speaker can recognize the desired product name by uttering the numerical code of the desired product name. At this time as well, 1.1# recognition is obtained using the voice from the voice synthesis fl139, but when the result is affirmative, the voice in the voice buffer 8 and the code used by the non-specific recognition unit 4 at that time are used in the specific dictionary. Register on 11. In this way, when the same speaker orders the same product from now on, they can input the product name. Moreover, during this time, the speaker does not have to be aware of voice registration at all.

尚、既に同し単語か登録されている場合には置き換えを
行う。これは電話回線が異なるごとにより同じ発音か異
って認識される場合等に、富に辞書内容を最適化してお
く上で有効である。Note that if the same word has already been registered, it will be replaced. This is effective in optimizing the contents of the dictionary, such as when the same pronunciation or different pronunciations are recognized depending on the telephone line.

以下にオレンジ・ジュースを注文する動作例を示す。但
し、装置側にはオレンジ・ジュースにり」応する言葉は
登録されておらず、それに最も近い言葉としてヒール大
瓶に対応する”オオヒール゛という言葉が登録されてい
るものとする。またビール大瓶のコートは１２１、オレ
ンジ・ジノー−スのコートは１１１であるものとする。An example of ordering orange juice is shown below. However, it is assumed that no word corresponding to orange juice is registered on the device side, and the word ``ohheal'', which corresponds to a large bottle of beer, is registered as the closest word. The coat is 121, and the orange genose coat is 111.

以下はコーーザ（話者）と装置とのやりとりである。The following is the interaction between the causer (speaker) and the device.

ユーザ：゛オレンジ゛（この音声をバッファに蓄える。User: ``Orange'' (Stores this audio in the buffer.

また特定用辞書を使って音声認識する）装置　−゛商品番号　二、イチ、ニ　し−）−１ｉ人瓶
ですね。ピー” ユーザ：゛イイエ” 装置　：パ商品番号をどうそ。ピー”（不特定認識に切
替える）ユーザ：゛イチ゛装置　：゛ビー゛ユーザ二パイチ” 装置　：゛ピー゛ユーザ二パイチ°゛装置　−パ商品番号　イチ、イチ、イナオレンジ・ジュ
ースですね。゛ユーザ：”ハイ” （このとき、自動的に先の゛オレンジ゛が登録される。Also, it uses a specific dictionary to recognize speech) device -゛Product number 2, ichi, ni shi-)-1i It's a human bottle. P" User: "Yes" Device: P Please tell me the product number. (switches to non-specific recognition) User: ``Ichi'' Device: ``Bee'' User 2-in-1 Device: ``Pii'' User 2-in-1 Device - Pa Product Number Ichi, Ichi, Ina Orange juice, isn't it?゛User: ``Yes'' (At this time, the previous ``Orange'' is automatically registered.

）以降、登録式の認識装置を使用して、”オレンジ”と発
声すれば、商品番号１１１のオレンジ・ジュースが認識
されるようになる。) From then on, if you say "orange" using a registered recognition device, the orange juice with product number 111 will be recognized.

オーダエントリにおける注文受付処理は一般Ｑこ次の様
な手順で行われる。先ず話者は電話機１５でダイヤルし
て音声応答認識システム２０を呼び出し、自己の加入者
番号を音声によって通知し、更に本人確認用の暗証番号
を通知する。この他にオーダエントりでは注文の他に、
照会、変更等もあるのでこれらを区別するための号−ビ
スコードを通知するのが一般的である。不特定の認識方
式ではこの後商品番号と数量が求められるので、１１１
０５（１１１の商品を５（固）、１５６０３　　Ｃ１５
６の商品を３個）のように読み」二げ、最後に゛おわり
”と告げる。これで注文は終了したので、今度は装置側
が入力内容を読み上げ確認を求める。Order acceptance processing in order entry is generally performed in the following procedure. First, the speaker dials on the telephone 15 to call up the voice response recognition system 20, notifies the subscriber's subscriber number by voice, and also notifies the password for identity verification. In addition to ordering, in addition to ordering,
Since there are inquiries, changes, etc., it is common to notify them by a code to distinguish them. For unspecified recognition methods, the product number and quantity are required after this, so call 111.
05 (111 products 5 (solid), 15603 C15
The order is read as follows: ``3 pieces of item 6'' and finally ``End''.The order is now complete, and the device now reads out the entered information and asks for confirmation.

そして肯定されればホス１ル計算機側のファイル等に注
文内容を登録するか、次の処理に制御を渡す。If affirmative, the order details are registered in a file on the host computer side, or control is passed to the next process.

このようにして商品番号で注文する場合にはユーザ側に
番号と商品名とを関連づける一覧表がある。When ordering by product number in this way, the user has a list that associates the number with the product name.

この表に列記される商品名は１０００以上になるので、
その商品老令ての読みを辞書に登録しておくのは大変で
あり、また経年的に扱う商品も変わるのでその都度辞書
内容を変更するのも大変である。従ってへ一般に不特定
用の辞書はコートで多種のしかも稀にしか利用しない商
品を扱い、特定の辞書は少種の頻繁に扱う商品を扱うと
いうことになる。ユーザにとっては勿論、直感的に判別
し易い商品名の読みで入力できる方が都合が良い。There are over 1000 product names listed in this table, so
It is difficult to register the pronunciation of the product ``old'' in a dictionary, and since the products handled change over time, it is also difficult to change the contents of the dictionary each time. Therefore, in general, non-specific dictionaries deal with a wide variety of products that are rarely used, while specific dictionaries deal with a small number of frequently used products. Of course, it is more convenient for the user to be able to input the pronunciation of the product name that is easier to intuitively identify.

この点本発明によれば、ユーザにとっては登録式として
扱ってよく、そして登録されていないもの若しくは変更
を要するものは、装置に促かされてコードを入力するこ
とで自動的に登録され、より充実した、ユーザにとって
使用し易いシステムとなって行く。In this respect, according to the present invention, the user can treat it as a registration type, and if something is not registered or needs to be changed, it will be automatically registered by inputting a code when prompted by the device. The system will be complete and easy to use for users.

発明の効果以上述べたように本発明の音声登録方式によれば、ユー
ザが意識することなく音声登録が行われ、次第に入力し
易い音声認識システムになる利点かある。Effects of the Invention As described above, the voice registration method of the present invention has the advantage that voice registration is performed without the user being aware of it, resulting in a voice recognition system that gradually becomes easier to input.

[Brief explanation of the drawing]

第１図および第２図は本発明の一実施例および設置状態
を示すブロック図である。図中、１は回線、３は切替部、４は不特定認識部、５は
その辞書、６は特定認識部、８は音声ハソファ、９は音
声合成部、１１は特定用辞書である。出願人　富士通株式会社1 and 2 are block diagrams showing an embodiment of the present invention and its installed state. In the figure, 1 is a line, 3 is a switching unit, 4 is an unspecified recognition unit, 5 is a dictionary thereof, 6 is a specific recognition unit, 8 is a voice recognition unit, 9 is a voice synthesis unit, and 11 is a specific dictionary. Applicant Fujitsu Limited

Claims

[Claims]

A voice recognition response system that uses a telephone line etc. as an input medium and includes a voice recognition unit for unspecified speakers, a registered voice recognition unit for specific speakers, and a voice synthesis unit for output. In the registration method, a buffer is provided to store the voice uttered by the speaker, and if the recognition result of the registration type voice recognition unit for the voice is rejected by the speaker, the unspecified voice recognition unit , the speaker requests the input of a code preset by the unspecified speech recognition unit for the word indicating the voice, and after the input using the code is completed, the recognition result by the unspecified speech recognition unit and the data in the buffer are requested. A voice registration method characterized in that the voice of is registered in a dictionary of the registration type voice recognition unit.