JP3192324B2

JP3192324B2 - Word speaker for specific speaker

Info

Publication number: JP3192324B2
Application number: JP18344794A
Authority: JP
Inventors: 清治濱口; 耕市山口; 俊夫赤羽
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1994-08-04
Filing date: 1994-08-04
Publication date: 2001-07-23
Anticipated expiration: 2016-07-23
Also published as: JPH0844388A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、単語音声認識装置に関
し、特に特定話者の音声を認識する特定話者音声認識技
術を利用した特定話者用単語音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a word-speech recognition apparatus, and more particularly to a word-speech recognition apparatus for a specific speaker using a specific-speaker voice recognition technique for recognizing a specific speaker's voice.

【０００２】[0002]

【従来の技術】従来、特定話者用単語音声認識装置は、
認識対象となる単語が予め利用者本人の声により登録さ
れ、認識時にはそれらの単語のうちのどれかが選ばれて
発声されることにより、発声された単語音声の特徴量と
登録されている単語音声の特徴量とが比較され、最も類
似している単語が選び出される。特定話者用音声認識に
は、登録作業が必要なものの、認識語彙に自由度を与え
ることが可能であり、また不特定話者用音声認識に比べ
て認識性能の面で有利であるという特徴を持っている。2. Description of the Related Art Conventionally, word-speech recognition devices for specific speakers have
The word to be recognized is registered in advance by the user's own voice, and at the time of recognition, one of the words is selected and uttered, and the feature amount of the uttered word voice and the registered word The feature amount of the voice is compared, and the word that is most similar is selected. Although the speaker recognition for specific speakers requires registration work, it is possible to give a degree of freedom to the recognition vocabulary, and it is advantageous in recognition performance compared to speech recognition for unspecified speakers. have.

【０００３】なお、音声認識性能を低下させる原因の一
つに、登録語彙中に類似単語が存在していて、認識結果
としてその類似した単語が誤って出力されるという場合
がある。このような問題を解決するために、ダイヤル操
作の代わりに相手の名前を発生して電話をかける音声ダ
イヤル装置などにおいて、人名を音声で登録する際に、
すでに登録済みの名前の中に類似パターンが存在した場
合、利用者にその旨を知らせ、類似音声の変更や削除を
行わせる方法が特開平３−１２３２４９号公報に開示さ
れている。[0003] One of the causes of a decrease in speech recognition performance is that a similar word exists in a registered vocabulary, and the similar word is erroneously output as a recognition result. In order to solve such a problem, when registering a person's name by voice in a voice dialing device that generates a name of the other party instead of dialing and makes a call,
Japanese Patent Application Laid-Open No. 3-123249 discloses a method in which, when a similar pattern exists in a registered name, a user is notified of the fact and the similar voice is changed or deleted.

【０００４】[0004]

【発明が解決しようとする課題】上述した音声ダイヤル
装置の相手先の名前などは登録音声の語彙が利用者の自
由に任されており、そのため認識結果のエコーバックや
ガイダンス出力には利用者の発声した音声の特徴量より
作成された合成用標準パターンが使用されることにな
る。一方、認識装置の用途によっては、認識語彙が固定
されていて、エコーバックやガイダンス用の音声は予め
認識装置のメモリ中に用意されていることがあり、利用
者が類似単語登録を避けるために認識語彙を言い換えた
りすると、エコーバックやガイダンス用の音声が利用者
の登録音声と食い違ってしまい、利用者は正しく発音し
ているつもりでも装置がまったく認識しないことが起こ
る虞がある。認識結果や語彙がパネルなどに表示される
場合なども同様の問題が発生する。The vocabulary of the registered voice is left to the user freely with respect to the name of the other party of the above-mentioned voice dialing device, and therefore, the echo back of the recognition result and the output of the guidance are performed by the user. The synthesis standard pattern created from the feature amount of the uttered voice is used. On the other hand, depending on the use of the recognition device, the recognition vocabulary is fixed, and voices for echo back and guidance are prepared in advance in the memory of the recognition device. If the recognition vocabulary is paraphrased, the voice for the echo back and the guidance will be different from the registered voice of the user, and the device may not recognize at all even if the user intends to pronounce correctly. A similar problem occurs when the recognition result or vocabulary is displayed on a panel or the like.

【０００５】例えば、０〜９の一桁の数字を登録する必
要がある音声認識装置の場合、利用者に対して「れい」
「いち」「に」…「く」などの発生を求めてくる。利用
者はこれを受けて数字音声を発生するのであるが、この
中には「１（いち）」と「７（しち）」、「６（ろ
く）」と「９（く）」といった類似単語が含まれてい
る。最初から、認識語彙にはこのような紛らわしい単語
を選ばなければよいのであるが、人間の習慣上、やむを
得ず類似単語を含む場合がある。また、発声方法は各個
人によって異なるので、標準的な人にとっては類似して
いない単語同士であっても、それらの単語が類似してし
まう人が存在する。利用者が「７（しち）」を「なな」
などと言い換えて登録を行えば類似単語の問題を回避で
きる可能性が高いが、エコーバックやガイダンス音声は
「しち」であるため、利用者の登録音声とは食い違って
しまい、利用者が自分がどういう発声で登録したかを覚
えていないと、言い換えて登録した語彙が認識できない
という状況が発生してしまう。ガイダンス音声が「し
ち」のままだと、登録時点からの時間経過によって、利
用者は「なな」と発声して登録したことを忘れ、認識時
に「しち」と発声する可能性が高いからである。[0005] For example, in the case of a voice recognition device that needs to register a single-digit number of 0 to 9, a user is required to register “Rei”.
"Ichi", "Ni", "Ku", etc. are requested. In response to this, the user generates a number voice, including “1 (one)” and “7 (schi)”, and “6 (roku)” and “9 (ku)”. Contains words. From the beginning, it is only necessary to select such a confusing word in the recognition vocabulary, but there are cases where similar words are unavoidably included due to human habits. Further, since the utterance method is different for each individual, even if words are not similar to each other for a standard person, there are some people who have similar words. The user changes “7” to “Nana”
It is highly likely that the problem of similar words can be avoided by registering in other words, but the echo back and the guidance voice are “Shi”, so they differ from the registered voice of the user, and the user If the user does not remember what utterance was registered, in other words, the registered vocabulary cannot be recognized. If the guidance voice remains “Shi”, the user is likely to utter “Nana” and forget to register with the passage of time from the registration point, and will likely say “Shi” when recognizing. Because.

【０００６】本発明は、上記のような課題を解消するた
めになされたもので、利用者の登録音声とエコーバック
やガイダンス音声あるいは認識結果の表示などが食い違
うことなしに、類似単語の登録を避け高い認識性能を実
現する特定話者用単語音声認識装置を提供することを目
的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above-described problem, and a similar word can be registered without discrepancy between the registered voice of the user and the display of the echo back, the guidance voice or the recognition result. It is an object of the present invention to provide a specific-speaker word-speech recognition device that achieves high avoidance recognition performance.

【０００７】[0007]

【課題を解決するための手段】本発明は、利用者が複数
の対象を表す単語の音声を予め自分の声で登録した後、
使用時に登録されている前記単語の何れかを選択して発
声したときに、発声された音声パターンとすでに登録さ
れている音声パターンとを比較して発声された単語を認
識しその結果を出力することによって前記複数の対象を
識別する特定話者用単語音声認識装置であって、利用者
に登録すべき単語の発声を促すために前記単語を提示す
る手段と、利用者が前記提示された単語の音声を入力す
る入力手段と、入力された音声を分析して特徴量を抽出
する特徴量抽出手段と、抽出された特徴量を記憶する記
憶手段と、入力された単語の音声の特徴量と、すでに登
録されている単語の音声の特徴量との間の距離を算出す
る演算手段と、新たに入力された単語と登録されている
前記単語との特徴量間の距離が前記複数の対象を識別す
るために十分な大きさを確保できないと判断した場合、
前記複数の対象を識別するために十分な大きさを確保で
きるまで、次候補として用意されている他の単語を１単
語づつ利用者に提示するように制御する制御手段とを有
し、前記制御手段は、前記提示された単語の次候補とし
て用意されている他の単語を利用者に提示するか、すで
に登録されている単語の次候補として用意されている他
の単語を利用者に提示するかを選択可能であると共に、
前記提示された単語の次候補として用意されている単語
が尽きた場合、任意の単語の発声を利用者に促すことを
特徴とする特定話者用単語音声認識装置を提供する。SUMMARY OF THE INVENTION According to the present invention, after a user registers in advance his / her own voices of words representing a plurality of objects,
When any of the words registered during use is selected and uttered, the uttered voice pattern is compared with the already registered voice pattern to recognize the uttered word and output the result. A word-for-speech recognition device for a specific speaker, which identifies the plurality of objects by presenting the word to prompt a user to utter a word to be registered; Input means for inputting the voice of the input, a feature value extracting means for analyzing the input voice and extracting the feature value, a storage means for storing the extracted feature value, and a feature value of the voice of the input word. Calculating means for calculating a distance between a speech feature of a registered word, and a distance between the feature of the newly input word and the registered word, Large enough to identify If it is determined that can not be ensured is,
Control means for controlling so that another word prepared as the next candidate is presented to the user one word at a time until a size sufficient to identify the plurality of objects can be secured; The means presents to the user another word prepared as the next candidate of the presented word, or presents another word prepared as the next candidate of the already registered word to the user. or together with the can be selected,
A word prepared as a next candidate of the presented word
The present invention provides a specific-speaker word speech recognition apparatus characterized by prompting the user to utter an arbitrary word when all the words have run out .

【０００８】本発明は、利用者による単語の音声入力を
促すガイド音声を出力する出力手段と、前記特徴量間の
距離が所定値以上でない単語を報知する報知手段とをさ
らに具備することを特徴とする特定話者用単語音声認識
装置を提供する。 The present invention is characterized and output means for outputting the guide voice for prompting a voice input of a word by the user, that the distance between the feature amount is further comprising a notification means for notifying the words less than the predetermined value Word Recognition for Specific Speaker
Provide equipment.

【０００９】利用者が複数の対象を表す単語の音声を予
め自分の声で登録した後、使用時に登録されている前記
単語の何れかを選択して発声したときに、発声された音
声パターンとすでに登録されている音声パターンとを比
較して発声された単語を認識しその結果を出力すること
によって前記複数の対象を識別する特定話者用単語音声
認識装置であって、利用者が音声を入力する入力手段
と、入力された音声を分析して特徴量を抽出する特徴量
抽出手段と、抽出された特徴量を記憶する記憶手段と、
入力された単語の音声の特徴量と、すでに登録されてい
る単語の音声の特徴量との間の距離を算出する演算手段
と、新たに入力された単語と登録されている前記単語と
の特徴量間の距離が前記複数の対象を識別するために十
分な大きさを確保できないと判断した場合、前記複数の
対象を識別するために十分な大きさを確保できるまで、
次候補として用意されている他の単語を１単語づつ利用
者に提示するように制御する制御手段と、利用者による
単語の音声入力を促すガイド音声を出力する出力手段
と、前記特徴量間の距離が所定値以上でない単語を報知
する報知手段とを有し、前記制御手段は、次候補として
用意されている単語が尽きた場合、任意の単語の発声を
利用者に促すことを特徴とする特定話者用単語音声認識
装置を提供する。[0009] After the user registers in advance his / her own voices of words representing a plurality of objects and selects and utters one of the words registered at the time of use, the uttered voice pattern and A word recognition apparatus for a specific speaker that identifies a plurality of objects by recognizing a uttered word by comparing with an already registered voice pattern and outputting the result, wherein a user recognizes a voice. An input unit for inputting, a feature amount extracting unit for analyzing the input voice and extracting a feature amount, a storage unit for storing the extracted feature amount,
Calculating means for calculating a distance between the voice feature of the input word and the voice feature of the already registered word; and a feature of the newly input word and the registered word. If it is determined that the distance between the quantities can not secure a sufficient size to identify the plurality of objects, until a sufficient size can be secured to identify the plurality of objects,
And control means for controlling so as to present the other words that are provided as the next candidate in one word by one user, by the user
An output unit that outputs a guide voice prompting a voice input of a word
And a word whose distance between the feature amounts is not more than a predetermined value.
The controlling means, when the word prepared as the next candidate runs out, prompts the user to utter an arbitrary word. provide.

【００１０】本発明は、現在どのような単語セットが登
録されているかを出力する音声出力手段又は表示手段を
備えることを特徴とする特定話者用単語音声認識装置を
提供する。 According to the present invention, there is provided a word-for-speech recognition apparatus for a specific speaker, comprising voice output means or display means for outputting what word set is currently registered.
provide.

【００１１】[0011]

【作用】本発明によれば、利用者が入力手段により提示
された単語の音声を入力すると、特徴量抽出手段により
入力された音声が分析されて特徴量が抽出され、記憶手
段により、抽出された前記特徴量が記憶され、演算手段
により入力された音声の特徴量と、すでに登録されてい
る単語音声の特徴量との距離が算出され、初期認識語彙
として用意されている単語では特徴量間の距離が十分な
大きさを確保できない場合、制御手段により、前記複数
の対象を識別するために十分な大きさを確保できるま
で、次候補として用意されている他の単語を１単語づつ
利用者に提示するように制御される。さらに、本発明に
よれば、制御手段は、前記提示された単語の次候補とし
て用意されている他の単語を利用者に提示するか、すで
に登録されている単語の次候補として用意されている他
の単語を利用者に提示するかを選択するこができる。こ
れにより、音声の特徴量間の距離が十分に大きな単語に
より、複数の対象の識別を確実に行うことができる。さ
らに、本発明によれば、次候補として用意されている単
語が尽きた場合、前記制御手段により、任意の単語の発
声が利用者に促される。これにより、類似単語の登録を
避け高い認識性能を実現する。 According to the present invention, when the user inputs the voice of the word presented by the input means, the input voice is analyzed by the feature quantity extracting means to extract the feature quantity, and the feature quantity is extracted by the storage means. The feature amount is stored, and a distance between the feature amount of the speech input by the arithmetic means and the feature amount of the already registered word speech is calculated. If the distance is not large enough, the control means may use the other words prepared as next candidates one by one until the user can secure a sufficient size to identify the plurality of objects. Is controlled to be presented. Further, according to the present invention, the control means presents another word prepared as a next candidate of the presented word to the user or is prepared as a next candidate of a registered word. It is possible to select whether to present another word to the user. As a result, it is possible to reliably identify a plurality of targets by using a word having a sufficiently large distance between the feature amounts of speech. Sa
Further, according to the present invention, the unit prepared as the next candidate is
When a word is exhausted, the control means generates an arbitrary word.
Voice is prompted by the user. This allows you to register similar words
Avoid high recognition performance.

【００１２】また、本発明によれば、出力手段により利
用者の音声入力を促すガイド音声が出力され、利用者は
このガイド音声を確認して音声を入力する。さらに、報
知手段により特徴量間の距離が所定値以上でない単語が
報知され、利用者は、他の単語の音声登録を選択するこ
とができる。これにより、利用者の登録音声とエコーバ
ックやガイダンス音声あるいは認識結果の表示などが食
い違うことなしに、類似単語の登録を避け高い認識性能
を実現することができる。Further , according to the present invention, the output means outputs a guide voice prompting the user to input a voice, and the user confirms the guide voice and inputs a voice. Furthermore, a word whose distance between the feature amounts is not more than a predetermined value is notified by the notifying means, and the user can select voice registration of another word. Accordingly, registration of similar words can be avoided and high recognition performance can be realized without any discrepancy between the registered voice of the user and the display of the echo back, the guidance voice, or the recognition result.

【００１３】また、本発明によれば、利用者が入力手段
により提示された単語の音声を入力すると、特徴量抽出
手段により入力された音声が分析されて特徴量が抽出さ
れ、記憶手段により、抽出された前記特徴量が記憶さ
れ、演算手段により入力された音声の特徴量と、すでに
登録されている単語音声の特徴量との距離が算出され、
初期認識語彙として用意されている単語では特徴量間の
距離が十分な大きさを確保できない場合、制御手段によ
り、前記複数の対象を識別するために十分な大きさを確
保できるまで、次候補として用意されている他の単語を
１単語づつ利用者に提示するように制御される。さら
に、本発明によれば、出力手段により利用者の音声入力
を促すガイド音声が出力され、利用者はこのガイド音声
を確認して音声を入力する。さらに、報知手段により特
徴量間の距離が所定値以上でない単語が報知され、利用
者は、他の単語の音声登録を選択することができる。こ
れにより、利用者の登録音声とエコーバックやガイダン
ス音声あるいは認識結果の表示などが食い違うことなし
に、類似単語の登録を避け高い認識性能を実現すること
ができる。さらに、本発明によれば、次候補として用意
されている単語が尽きた場合、前記制御手段により、任
意の単語の発声が利用者に促される。これにより、類似
単語の登録を避け高い認識性能を実現する。Further, according to the present invention, when the user inputs the speech of the word presented by the input means, the input speech is analyzed by the feature quantity extraction means, and the feature quantity is extracted. The extracted feature amount is stored, and a distance between the feature amount of the voice input by the arithmetic unit and the feature amount of the already registered word voice is calculated,
In the case where the distance between the feature amounts cannot be secured to a sufficient size in the words prepared as the initial recognition vocabulary, the control unit sets the next candidate as a next candidate until a sufficient size for identifying the plurality of objects can be secured. Control is performed so that the other prepared words are presented to the user one word at a time. Further
In addition, according to the present invention, the user's voice input by the output means
Is output, and the user receives this guide audio.
Check and input audio. In addition, special
Words whose distance between collections is not more than a predetermined value are reported and used
The user can select voice registration of another word. This
As a result, the registered voice of the user and echo back and guidance
There is no discrepancy in the display of the speech or recognition result
In addition, avoiding registration of similar words and achieving high recognition performance
Can be. Further, according to the present invention, when the word prepared as the next candidate runs out, the control unit prompts the user to utter an arbitrary word. Thereby, registration of similar words is avoided and high recognition performance is realized.

【００１４】また、本発明によれば、音声出力手段また
は表示手段により現在どのような単語セットが登録され
ているかが出力される。 According to the present invention, what kind of word set is currently registered is output by the voice output means or the display means.

【００１５】[0015]

【実施例】以下、本発明の特定話者用単語音声認識装置
の第１の実施例を図を参照しながら説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A first embodiment of a specific-speaker word speech recognition apparatus according to the present invention will be described below with reference to the drawings.

【００１６】本実施例の特定話者用単語音声認識装置
は、図１に示すように、利用者の音声入力を促すガイド
音声を出力する出力手段及び現在どのような単語セット
が登録されているかを出力する音声出力手段としての音
声合成部１と、利用者の音声を入力する入力手段として
のマイクロホン２と、入力された音声を分析して特徴量
を抽出する特徴量抽出手段としての単語登録／認識部３
と、抽出された特徴量を記憶する記憶手段及び利用者が
発声した音声を保存しかつ再生し得る保存再生手段とし
ての単語認識・再生パターンメモリ４と、各部を制御す
る制御部５と、指示等を入力する操作パネル６と、現在
どのような単語セットが登録されているかを出力する表
示手段としての表示パネル７と、ガイド音声を格納する
ガイド音声メモリ８とを具備している。As shown in FIG. 1, the specific-speaker word-speech recognition apparatus of this embodiment outputs, as shown in FIG. 1, a guide sound for prompting a user to input a voice, and what word set is currently registered. , A microphone 2 as an input unit for inputting a user's voice, and word registration as a feature extraction unit for analyzing the input speech and extracting a feature. / Recognition unit 3
A word recognition / reproduction pattern memory 4 as storage means for storing and reproducing the voice uttered by the user, and a control unit 5 for controlling each unit; And the like, a display panel 7 as display means for outputting what word set is currently registered, and a guide voice memory 8 for storing guide voice.

【００１７】制御部５は、図２に示すように、単語登録
／認識部３により抽出された入力音声の特徴量と、すで
に単語認識・再生パターンメモリ４に登録されている単
語音声の特徴量との距離を算出する演算手段９と、特徴
量間の距離が所定値以上でない単語を報知する報知手段
１０と、利用者が音声を登録する際に、初期認識語彙と
して用意されている単語では特徴量間の距離が十分な大
きさを確保できない場合、次候補として用意されている
他の単語の音声を登録するように制御する制御手段１１
とを具備している。As shown in FIG. 2, the control unit 5 includes a feature amount of the input voice extracted by the word registration / recognition unit 3 and a feature amount of the word voice already registered in the word recognition / reproduction pattern memory 4. Calculating means 9 for calculating the distance between the two, and a notifying means 10 for notifying a word in which the distance between the feature amounts is not a predetermined value or more, and a word prepared as an initial recognition vocabulary when a user registers a voice. If the distance between the feature values cannot be sufficiently large, the control unit 11 controls to register the voice of another word prepared as the next candidate.
Is provided.

【００１８】なお、特定話者用単語音声認識装置の適用
例としては、例えばホームオートメーションにおいて、
屋内の照明やテープレコーダーの操作を音声で行うため
の装置が考えられ、認識語彙としては図３に示すような
語彙を想定する。図３から分かるように、一つの操作に
対していくつかの候補単語が予め用意されている。As an application example of the specific-speaker word speech recognition apparatus, for example, in home automation,
A device for operating the indoor lighting and the tape recorder by voice is considered, and a vocabulary as shown in FIG. 3 is assumed as a recognition vocabulary. As can be seen from FIG. 3, several candidate words are prepared in advance for one operation.

【００１９】次に、本実施例の動作を図４のフローチャ
ートに沿って説明する。なお、特定話者用単語音声認識
装置の操作はマイクロホン２及び操作パネル６を使用し
て行われるので、特定話者用単語音声認識装置を使用す
るためには、まず利用者の声で単語を登録する必要があ
る。Next, the operation of this embodiment will be described with reference to the flowchart of FIG. In addition, since the operation of the specific-speaker word speech recognition device is performed using the microphone 2 and the operation panel 6, in order to use the specific-speaker word speech recognition device, words are first spoken by the voice of the user. You need to register.

【００２０】操作パネル６が操作されて登録が開始され
ると（ステップＳ１）、音声合成部１によりガイド音声
メモリ８に格納されている「これから言う単語を発声し
てください」というガイド音声が発声された後（ステッ
プＳ２）、例えば認識語彙が図３に記されているもので
ある場合、まず「電源」というガイド音声が流される。
そして、次候補選択操作があるか否かが判断され、すな
わち操作パネル６からの操作が行われたか否か判断され
る（ステップＳ３）。次候補選択操作を行わない場合、
利用者は「電源」というガイド音声を聞いて、「電源」
と発声する（ステップＳ４）。単語登録／認識部３によ
り利用者が発声した音声が分析され、特徴量が抽出され
る。そして、演算手段９により単語認識・再生パターン
メモリ４にすでに登録されている各単語との音声パター
ンの特徴量との距離がＤＰマッチングなどの手法を用い
算出される（ステップＳ５）。When the operation panel 6 is operated to start registration (step S1), the voice synthesizer 1 utters a guide voice "Please say the word to be said" stored in the guide voice memory 8. After that (step S2), for example, when the recognized vocabulary is as shown in FIG. 3, a guide voice "power" is played first.
Then, it is determined whether or not there is a next candidate selection operation, that is, whether or not an operation has been performed from the operation panel 6 (step S3). If you do not perform the next candidate selection operation,
The user listens to the guide voice “Power” and “Power”
(Step S4). The voice registered by the user is analyzed by the word registration / recognition unit 3, and the feature amount is extracted. Then, the distance between each word already registered in the word recognition / reproduction pattern memory 4 and the feature amount of the voice pattern is calculated by the calculating means 9 using a technique such as DP matching (step S5).

【００２１】この距離が設定閾値以下となる組み合わせ
すなわち類似パターンが存在するか否かが制御手段１１
により判断され（ステップＳ６）、類似パターンが存在
しない場合は、利用者が発声したばかりの「電源」とい
う音声の特徴量が記憶パターンとしてメモリ４に記憶さ
れる（ステップＳ７）。また、類似パターンが存在する
と判断された場合、例えば「電源」という音声が登録さ
れた後、上述ステップＳ１から次の語彙「電灯」の登録
を進めてきたときに、「電源」と「電灯」のパターン間
距離が設定閾値以下となった場合、それらの単語が類似
単語と判断される。そして、利用者に対して「電源と電
灯が類似しています。どちらかを登録し直して下さ
い。」というような警告が報知手段１０により音声合成
部１を介して発声される（ステップＳ８）。The control means 11 determines whether or not a combination in which the distance is equal to or less than the set threshold value, that is, a similar pattern exists.
(Step S6), and when there is no similar pattern, the feature amount of the voice "power" just uttered by the user is stored in the memory 4 as a storage pattern (step S7). Further, when it is determined that a similar pattern exists, for example, after the voice of “power” is registered, when the registration of the next vocabulary “light” is advanced from step S1, the “power” and “light” Are determined to be similar words when the inter-pattern distance becomes equal to or less than the set threshold. Then, a warning such as "The power supply and the light are similar. Please register either one again." Is issued to the user through the voice synthesizing unit 1 by the notification means 10 (step S8). .

【００２２】それから、利用者がこのような警告を受け
た場合、利用者が既登録類似単語再登録、登録キャンセ
ル、登録の内のいずれを選択したか否かが判断され（ス
テップＳ９）、この警告を無視して登録操作が行われれ
ば、ステップＳ７に移り、発声したばかりの「電灯」の
音声がメモリ４に記憶され、次の単語の登録に進むこと
ができる。しかし、利用者が認識性能の向上を望み、警
告を受けた単語を登録し直す場合、例えば「電灯」の登
録をやり直す場合、利用者は操作パネル６から登録キャ
ンセル操作を行う。登録がキャンセルされた後（ステッ
プＳ１０）、再び上述ステップＳ２に戻って、発声を促
すガイダンスが流れるが、図３に示すように、今度は
「電灯」の次候補として用意されている「照明」が出力
される。ここで、「照明」と発声すれば、「電灯」の発
声時と同様にステップＳ５ですでに登録されている各単
語と「照明」の距離が算出され、距離が十分でない組み
合わせすなわち類似単語が存在する場合、ステップＳ８
で警告が発せられる。類似単語の組み合わせが存在しな
い場合、もしくは警告を無視して「照明」が登録された
場合には、以降は認識結果のエコーバックなどにも「電
灯」ではなく「照明」が使用される。これは、音声の特
徴量と共に何番目の候補音声のときに登録したかを記憶
しておくことによって可能になる。「電灯」で登録した
なら”１”を、「照明」で登録したなら”２”を音声の
特徴量と共に記憶しておく。この記憶様式は、図５に示
すように、登録データ開始アドレス、データ長、候補番
号とからなっている。Then, when the user receives such a warning, it is determined whether the user has selected any of re-registering, re-registering, re-registering, or re-registering similar words (step S9). If the registration operation is performed ignoring the warning, the process moves to step S7, the voice of the "light" just uttered is stored in the memory 4, and the process can proceed to the registration of the next word. However, when the user wishes to improve the recognition performance and re-registers a word for which a warning has been received, for example, when re-registering “light”, the user performs a registration cancel operation from the operation panel 6. After the registration is canceled (step S10), the process returns to step S2 again, and a guidance prompting utterance flows. As shown in FIG. 3, “lighting” prepared as the next candidate of “electric light” is shown in FIG. Is output. Here, if "utterance" is uttered, the distance between each word already registered in step S5 and "lighting" is calculated in the same manner as when the "light" is uttered. If there is, step S8
Will give a warning. If there is no combination of similar words, or if "illumination" is registered ignoring the warning, "illumination" is used instead of "electric light" for echo back of the recognition result. This is made possible by storing the number of candidate voices registered together with the voice feature amount. “1” is stored together with the feature amount of the voice when registered with “lighting” and “2” when registered with “lighting”. As shown in FIG. 5, the storage format includes a registration data start address, a data length, and a candidate number.

【００２３】また、上述ステップＳ３において、操作パ
ネル６からの操作が行われた場合、例えば「照明」とい
うガイド音声が発せられ、利用者が「照明」という発声
で登録したくなければ、音声を発声せずに操作パネル６
を操作すると、図３に示すように、「照明」の次候補単
語には「ライト」が用意されているので（ステップＳ１
１）、「ライト」というガイド音声が出力され（ステッ
プＳ１２）、ステップＳ３へ戻る。なお、ステップＳ３
からステップＳ１１への移行は、登録やり直しの場合で
なくても可能であることはいうまでもない。ステップＳ
３に戻り、「ライト」でも登録したくない場合、操作パ
ネル６から上述同様の操作が行われると、「ライト」の
次候補は存在しないので、最初の候補である「電灯」に
戻る。つまり、ステップＳ１１からステップＳ１２を通
過する度に、候補単語はシフトされる。In step S3, when an operation is performed from the operation panel 6, for example, a guide voice "illumination" is issued, and if the user does not want to register by uttering "illumination", the voice is output. Operation panel 6 without speaking
Is operated, "light" is prepared as the next candidate word of "lighting" as shown in FIG. 3 (step S1).
1) A guide voice “light” is output (step S12), and the process returns to step S3. Step S3
Needless to say, the transition from to the step S11 is possible even if the registration is not redone. Step S
Returning to 3, if the user does not want to register even "light", the same operation as described above is performed from the operation panel 6, and since there is no next candidate for "light", the operation returns to the first candidate "electric light". That is, each time the process passes from step S11 to step S12, the candidate word is shifted.

【００２４】また、ステップＳ１１において、登録用次
候補単語のガイド音声がない場合、例えば「ライト」の
次候補は存在しないので、「任意の発声で入力してくだ
さい」などとガイダンスし（ステップＳ１３）、利用者
に任意の単語を発声させることにより、類似語の登録を
回避する。ここで、発声して登録すると、その音声は認
識のみならず、音声出力にも使用され、それ以降は認識
結果のエコーバックやガイド音声などでも利用者の発声
音声が使用される。In step S11, if there is no guide voice for the next candidate word for registration, for example, since there is no next candidate for "light", guidance is given such as "enter by arbitrary utterance" (step S13). ), Avoiding registration of similar words by making the user utter any word. Here, when a voice is registered, the voice is used not only for recognition but also for voice output, and thereafter, the voice uttered by the user is used for the echo back of the recognition result, the guide voice, and the like.

【００２５】利用者の任意発声を登録した場合には、図
５の候補番号欄に”０”などと記録されて、予め用意さ
れている音声と区別しておく。利用者の登録音声のエコ
ーバックは、登録時のサンプリングデータを単語認識・
再生パターンメモリ４に蓄えておき、それをＤ／Ａ変換
することで可能になるが、エコーバック用のメモリを節
約するために、認識用に登録された単語の特徴量から合
成する方法もある。When an arbitrary utterance of the user is registered, "0" or the like is recorded in the candidate number column of FIG. 5 to distinguish it from the voice prepared in advance. The echo back of the registered voice of the user uses the sampling data at the time of registration as word recognition and
This can be achieved by storing the data in the reproduction pattern memory 4 and performing D / A conversion. However, in order to save the memory for echo back, there is also a method of synthesizing from the feature amounts of words registered for recognition. .

【００２６】一方、「任意の発声で入力してください」
などのガイド音声や、「電源」「電灯」「ライト」など
の図３に示す登録用単語のガイド音声などはこれを書き
替えを必要としないのでＲＡＭよりも安価なＲＯＭなど
で実現できる。図１のガイド音声メモリ８はこれらのガ
イド音声用データを格納しており、音声合成部１は制御
部５からの命令を受けてガイド音声メモリ８からデータ
を読み出してガイド音声を再生する。任意の発声を要求
してきた時点で、音声を発声せず、更に次ぎの候補を選
択した場合は、最初の候補に戻り、再び「電灯」という
ガイド音声が出力される。On the other hand, "Please input by arbitrary utterance"
The guide voice such as "power", "electric light", "light", and the like for the registration words shown in FIG. 3 do not need to be rewritten, and can be realized by a ROM that is cheaper than a RAM. The guide voice memory 8 in FIG. 1 stores these guide voice data, and the voice synthesizing unit 1 receives a command from the control unit 5, reads out the data from the guide voice memory 8, and reproduces the guide voice. At the point in time when an arbitrary utterance is requested, if no further utterance is selected and the next candidate is selected, the process returns to the first candidate and a guide sound of "light" is output again.

【００２７】ところで、上述ステップＳ８において、
「電源と電灯が類似しています。どちらかを登録し直し
てください。」という警告を受けた場合、ステップＳ９
で既登録類似単語の再登録が選択されると、「電灯」の
音声パターンはメモリ４に記憶され、「電源」の方が登
録し直される（ステップＳ１４）。この例では、ステッ
プＳ９で操作パネル６が操作され、「電源」の登録が受
付け状態にされ、ステップＳ３で「電源」の次候補単語
が選ばれて発声される。登録をやり直した結果、既に登
録済みの別の単語との距離が近くなることが考えられ
る。もちろん、この場合でも警告のガイダンスが流れる
ので、操作パネル６が操作されて該当単語音声を登録し
直すことができる。By the way, in the above step S8,
If a warning "Power supply and light are similar. Please register either one again" is received, step S9
When re-registration of a registered similar word is selected, the voice pattern of "electric light" is stored in the memory 4, and "power" is registered again (step S14). In this example, the operation panel 6 is operated in step S9, and the registration of “power” is accepted, and the next candidate word of “power” is selected and uttered in step S3. As a result of re-registering, the distance to another word already registered may be short. Of course, even in this case, a warning guidance is played, so that the operation panel 6 is operated and the corresponding word voice can be registered again.

【００２８】なお、登録アルゴリズムとしては、上記の
ように一単語を登録し終わったときに、警告を発する方
法だけでなく、全ての単語を登録し終わった時に単語間
距離が設定閾値以下となる組み合わせがあるか否かがチ
ェックされ、該当するものが警告されるという方法も考
えられる。何れの登録アルゴリズムであっても、距離が
設定閾値以下の単語の組み合わせが複数あった場合に、
それらの全てを知らせてもよいが、最も距離の小さい組
み合わせのみを知らせるだけでもよい。また、操作パネ
ル６からの操作により、距離が十分でない単語の組み合
わせを随時確認できる機能を持たせる必要があると思わ
れるが、これは登録時の距離チェック機能を流用するだ
けで可能になる。As a registration algorithm, not only a method of issuing a warning when one word is registered as described above, but also a distance between words becomes equal to or less than a set threshold when all words are registered. It is also conceivable to check whether there is a combination and warn the corresponding one. Regardless of the registration algorithm, if there are multiple combinations of words whose distance is equal to or less than the set threshold,
All of them may be notified, or only the combination with the smallest distance may be notified. In addition, it seems that it is necessary to provide a function for confirming a combination of words having an insufficient distance at any time by operating the operation panel 6, but this can be achieved only by using the distance check function at the time of registration.

【００２９】単語の登録や認識は、単語登録／認識部３
が制御部５からの命令を受けてマイクロホン２からの入
力が分析され、その特徴量が単語認識・再生パターンメ
モリ４へ記憶されたり、既に記憶されている他の単語の
特徴量と比較されることによって行われる。制御部５に
より操作パネル６からの入力と、その時点での装置の状
態に応じ、音声合成部１や単語登録／認識部３に命令が
出される。表示パネル７は、後述のように認識語彙の確
認が文字表示によって行われる場合に必要となる。認識
語彙の確認が音声によって行われる場合には表示パネル
７を必要としない。なお、上述実施例においては、認
識語彙として図３に表示されているものを例にとり説明
したが、本発明は認識語彙の種類に限定されるものでは
ない。The registration and recognition of words is performed by the word registration / recognition unit 3
Receives an instruction from the control unit 5, analyzes the input from the microphone 2, and stores the characteristic amount in the word recognition / reproduction pattern memory 4 or compares it with the characteristic amount of another word already stored. This is done by: The control unit 5 issues a command to the speech synthesis unit 1 and the word registration / recognition unit 3 according to the input from the operation panel 6 and the state of the device at that time. The display panel 7 is required when the recognition vocabulary is confirmed by character display as described later. When the recognition vocabulary is confirmed by voice, the display panel 7 is not required. In the above embodiment, the recognition vocabulary displayed in FIG. 3 has been described as an example, but the present invention is not limited to the types of recognition vocabulary.

【００３０】さて、上述したような構成の認識装置の場
合、認識語彙が一意に固定されていないため、現時点で
どのような語彙が登録されているかを確認する方法が必
要になる。第１に、音声で確認する方法が挙げられる。
図１の認識装置の操作パネル６から特定の操作を行う
と、登録されている単語の音声が順に出力される。この
確認音声は、登録用のガイド音声として用意されている
ものを出力する方法と、利用者の登録音声を合成出力す
る方法がある。ガイド音声を用いるほうがきれいな音声
を出力できるが、任意発声で登録した単語にはガイド音
声が用意されていないので、利用者の登録音声を合成出
力する方法を使用することになる。In the case of the recognition device having the above-described configuration, since the recognition vocabulary is not uniquely fixed, a method for confirming what vocabulary is registered at the present time is required. First, there is a method of confirming by voice.
When a specific operation is performed from the operation panel 6 of the recognition device shown in FIG. 1, sounds of registered words are sequentially output. As the confirmation voice, there are a method of outputting what is prepared as a guide voice for registration and a method of synthesizing and outputting the registered voice of the user. Although the use of the guide voice can output a clearer voice, the guide voice is not prepared for words registered by arbitrary utterances, so that a method of synthesizing and outputting the registered voice of the user is used.

【００３１】第２の方法として、液晶などの表示パネル
７を用い、登録単語を文字で表示する方法が考えられ
る。表示パネルによる認識語彙確認手段を備えた音声認
識装置は、図６に示すように構成されている。総認識単
語数が少ない場合や、表示パネルが十分大きい場合に
は、一面に全ての単語を表示することも可能であるが、
そうでない場合には、操作パネル３１を操作して、認識
語彙表示部３２の画面を切り換えたり、スクロールさせ
たりして表示させる機能が必要となる。利用者が任意の
音声を登録した単語には文字列が用意されていないの
で、音声登録の際に操作パネルの文字入力部３３から文
字列を入力しておくなどの方法が考えられる。例えば、
図６の認識語彙表示部３２の中の７番目の認識語彙であ
る「止まれ」は図３の単語候補の中には用意されていな
い。これは「停止」や「ストップ」では他の認識語彙と
十分な距離が確保できないため、利用者が自分で考えて
登録した単語である。「止まれ」の文字列は用意されて
いないので、文字入力部３３から利用者が操作して入力
する。As a second method, a method of using a display panel 7 such as a liquid crystal to display registered words in characters can be considered. A speech recognition apparatus provided with a recognition vocabulary confirmation unit using a display panel is configured as shown in FIG. If the total number of recognized words is small or the display panel is large enough, it is possible to display all words on one side,
If not, a function of operating the operation panel 31 to switch the screen of the recognized vocabulary display unit 32 or scrolling the screen is required. Since a character string is not prepared for a word in which the user has registered an arbitrary voice, a method of inputting a character string from the character input unit 33 of the operation panel at the time of voice registration may be considered. For example,
“Stop”, which is the seventh recognized vocabulary in the recognized vocabulary display unit 32 in FIG. 6, is not prepared in the word candidates in FIG. This is a word that the user thought and registered himself, because “stop” or “stop” cannot secure a sufficient distance from other recognized vocabulary. Since the character string “STOP” is not prepared, the user operates the character input unit 33 to input the character string.

【００３２】文字列のデータは、図３の単語候補の場合
なら、ガイド音声メモリ８の中に予め用意しておき、利
用者が任意に入力したデータなら単語認識・再生パター
ンメモリ４の中に記憶するという方法が使える。なお、
図中の３４はマイクロホンである。The character string data is prepared in advance in the guide voice memory 8 for the word candidates in FIG. 3, and is stored in the word recognition / reproduction pattern memory 4 for the data arbitrarily input by the user. You can use the method of remembering. In addition,
Reference numeral 34 in the figure denotes a microphone.

【００３３】[0033]

【発明の効果】本発明によれば、利用者が入力手段によ
り提示された単語の音声を入力すると、特徴量抽出手段
により入力された音声が分析されて特徴量が抽出され、
記憶手段により、抽出された前記特徴量が記憶され、演
算手段により入力された音声の特徴量と、すでに登録さ
れている単語音声の特徴量との距離が算出され、初期認
識語彙として用意されている単語では特徴量間の距離が
十分な大きさを確保できない場合、制御手段により、前
記複数の対象を識別するために十分な大きさを確保でき
るまで、次候補として用意されている他の単語を１単語
づつ利用者に提示するように制御され、さらに、制御手
段は、前記提示された単語の次候補として用意されてい
る他の単語を利用者に提示するか、すでに登録されてい
る単語の次候補として用意されている他の単語を利用者
に提示するかを選択可能な構成にしたので、音声の特徴
量間の距離が十分に大きな単語により、複数の対象の識
別を確実に行うことができる。さらに、次候補として用
意されている単語が尽きた場合、前記制御手段により、
任意の単語の発声が利用者に促されるように構成したの
で、類似単語の登録を避け高い認識性能を実現する。 According to the present invention, when the user inputs the voice of the word presented by the input means, the input voice is analyzed by the feature extracting means, and the feature is extracted.
The storage means stores the extracted feature quantity, and calculates the distance between the speech feature quantity input by the calculation means and the feature quantity of the already registered word speech, and prepares the distance as an initial recognition vocabulary. If the distance between the feature values cannot be sufficiently large for the word that is present, another word prepared as the next candidate is kept until the control means can secure a sufficient size for identifying the plurality of targets. Is presented to the user one word at a time, and the control means presents to the user another word prepared as the next candidate of the presented word, or the registered word. The user can select whether to present another word prepared as the next candidate to the user, so that a plurality of objects can be reliably identified by a word having a sufficiently large distance between the speech feature amounts. This Can. In addition, as the next candidate
When the intended word is exhausted, the control means
It was designed to encourage users to utter arbitrary words
Thus, high recognition performance is realized by avoiding registration of similar words.

【００３４】また、本発明によれば、出力手段により利
用者の音声入力を促すガイド音声が出力され、利用者は
このガイド音声を確認して音声を入力する。さらに、報
知手段により特徴量間の距離が所定値以上でない単語が
報知され、利用者は、他の単語の音声登録を選択するこ
とができるように構成したので、利用者の登録音声とエ
コーバックやガイダンス音声あるいは認識結果の表示な
どが食い違うことなしに、類似単語の登録を避け高い認
識性能を実現することができる。 Further, according to the present invention, the output means outputs a guide voice prompting the user to input a voice, and the user confirms the guide voice and inputs the voice. In addition, the notification unit notifies the user of a word whose distance between the feature amounts is not more than a predetermined value, and the user can select voice registration of another word. It is possible to avoid registration of similar words and achieve high recognition performance without causing any discrepancies in the display of the recognition voice, the guidance voice, or the recognition result .

【００３５】また、本発明によれば、利用者が入力手段
により提示された単語の音声を入力すると、特徴量抽出
手段により入力された音声が分析されて特徴量が抽出さ
れ、記憶手段により、抽出された前記特徴量が記憶さ
れ、演算手段により入力された音声の特徴量と、すでに
登録されている単語音声の特徴量との距離が算出され、
初期認識語彙として用意されている単語では特徴量間の
距離が十分な大きさを確保できない場合、制御手段によ
り、前記複数の対象を識別するために十分な大きさを確
保できるまで、次候補として用意されている他の単語を
１単語づつ利用者に提示するように制御される。さら
に、出力手段により利用者の音声入力を促すガイド音声
が出力され、利用者はこのガイド音声を確認して音声を
入力する。さらに、報知手段により特徴量間の距離が所
定値以上でない単語が報知され、利用者は、他の単語の
音声登録を選択することができるように構成したので、
利用者の登録音声とエコーバックやガイダンス音声ある
いは認識結果の表示などが食い違うことなしに、類似単
語の登録を避け高い認識性能を実現することができる。
さらに、次候補として用意されている単語が尽きた場
合、前記制御手段により、任意の単語の発声が利用者に
促されるように構成したので、類似単語の登録を避け高
い認識性能を実現する。Further, according to the present invention, when the user inputs the voice of the word presented by the input means, the input voice is analyzed by the feature quantity extracting means, and the feature quantity is extracted. The extracted feature amount is stored, and a distance between the feature amount of the voice input by the arithmetic unit and the feature amount of the already registered word voice is calculated,
In the case where the distance between the feature amounts cannot be secured to a sufficient size in the words prepared as the initial recognition vocabulary, the control unit sets the next candidate as a next candidate until a sufficient size for identifying the plurality of objects can be secured. Control is performed so that the other prepared words are presented to the user one word at a time. Further
In addition, a guide voice that prompts the user to input voice by output means
Is output, and the user confirms this guide sound and outputs the sound.
input. In addition, the distance between the feature values is
Words that are not equal to or greater than the fixed value are notified, and the user can
Since we have configured so that you can select voice registration,
There is a registered voice of the user and an echo back or guidance voice
Display of the recognition result, etc.
High recognition performance can be realized by avoiding word registration.
Furthermore, when the word prepared as the next candidate runs out, the control means is configured to prompt the user to utter an arbitrary word, thereby avoiding registration of a similar word and realizing high recognition performance.

【００３６】また、本発明によれば、音声出力手段また
は表示手段により現在どのような単語セットが登録され
ているかを出力するように構成したので、現時点でどの
ような語彙が登録されているか否かを容易に確認するこ
とができる。Further , according to the present invention, the speech output means or the display means is configured to output what word set is currently registered. Therefore, what kind of vocabulary is currently registered is determined. Can be easily confirmed.

[Brief description of the drawings]

【図１】本発明の特定話者用単語音声認識装置の第１の
実施例を示すブロック図である。FIG. 1 is a block diagram showing a first embodiment of a specific-speaker word speech recognition apparatus according to the present invention.

【図２】本発明の特定話者用単語音声認識装置の制御部
の構成を示すブロック図である。FIG. 2 is a block diagram showing a configuration of a control unit of the specific-speaker word speech recognition apparatus of the present invention.

【図３】認識語彙候補を示す図である。FIG. 3 is a diagram showing recognized vocabulary candidates.

【図４】本発明の動作を示すフローチャートである。FIG. 4 is a flowchart showing the operation of the present invention.

【図５】登録音声情報の記憶様式を示す図である。FIG. 5 is a diagram showing a storage format of registered voice information.

【図６】表示パネルを使用した認識語彙確認装置を示す
図である。FIG. 6 is a diagram showing a recognized vocabulary checking device using a display panel.

[Explanation of symbols]

1 音声合成部 2 マイクロホン 3 単語登録／認識部 4 単語認識・再生パターンメモリ 5 制御部 6 操作パネル 7 表示パネル 8 ガイド音声メモリ 9 演算手段 10 報知手段 11 制御手段 31 操作パネル 32 認識語彙表示部 33 文字入力部 34 マイクロホン 1 Voice synthesis unit 2 Microphone 3 Word registration / recognition unit 4 Word recognition / playback pattern memory 5 Control unit 6 Operation panel 7 Display panel 8 Guide voice memory 9 Computation unit 10 Notification unit 11 Control unit 31 Operation panel 32 Recognized vocabulary display unit 33 Character input section 34 Microphone

フロントページの続き (56)参考文献特開昭57−129497（ＪＰ，Ａ) 特開昭58−76944（ＪＰ，Ａ) 特開昭60−218698（ＪＰ，Ａ) 特開平３−123249（ＪＰ，Ａ) 特開昭63−70296（ＪＰ，Ａ) 特開昭62−260194（ＪＰ，Ａ) 特開平１−285997（ＪＰ，Ａ) 特開昭63−38994（ＪＰ，Ａ) 特開昭56−123600（ＪＰ，Ａ) 特開昭63−294600（ＪＰ，Ａ) 特開平３−10298（ＪＰ，Ａ) 特開平２−23399（ＪＰ，Ａ) 特開昭58−178396（ＪＰ，Ａ) 特開平２−141154（ＪＰ，Ａ) 特公平５−74837（ＪＰ，Ｂ２) 特公昭61−26677（ＪＰ，Ｂ２) 特公平２−15080（ＪＰ，Ｂ２) 特公平５−38958（ＪＰ，Ｂ２) 特公平６−7347（ＪＰ，Ｂ２) 特公平４−62595（ＪＰ，Ｂ２) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 15/06 G10L 15/28 ＪＩＣＳＴファイル（ＪＯＩＳ)Continuation of the front page (56) References JP-A-57-129497 (JP, A) JP-A-58-76944 (JP, A) JP-A-60-218698 (JP, A) JP-A-3-123249 (JP) JP-A-63-70296 (JP, A) JP-A-62-260194 (JP, A) JP-A-1-285997 (JP, A) JP-A-63-38994 (JP, A) JP-A-63-294600 (JP, A) JP-A-3-10298 (JP, A) JP-A-2-23399 (JP, A) JP-A-58-178396 (JP, A) A) JP-A-2-141154 (JP, A) JP-B 5-74837 (JP, B2) JP-B 61-26677 (JP, B2) JP-B 2-15080 (JP, B2) JP-B 5 -38958 (JP, B2) JP 6-7347 (JP, B2) JP 4-62595 (JP, B2) (58) Fields studied (Int. Cl. ⁷ , DB name) G10L 15/06 G10L 15/28 JICST file (JOIS)

Claims

(57) [Claims]

When a user registers in advance his / her voice of a word representing a plurality of objects with his / her own voice and selects and utters one of the words registered at the time of use, the uttered voice A word recognition apparatus for a specific speaker, which identifies a plurality of objects by recognizing a spoken word by comparing a pattern with an already registered voice pattern and outputting a result thereof, Means for presenting the word to prompt the utterance of the word to be registered; input means for the user to input the voice of the presented word; and feature quantity for analyzing the input voice and extracting the feature quantity Extracting means; storing means for storing the extracted feature quantity; calculating means for calculating a distance between the speech feature quantity of the input word and the speech feature quantity of the already registered word; Newly entered word and registration If it is determined that the distance between the feature amount and the recorded word cannot be large enough to identify the plurality of objects, a sufficient amount is secured to identify the plurality of objects. And control means for presenting another word prepared as a next candidate to the user one word at a time, wherein said control means is prepared as a next candidate of said presented word. or presents other words are in user, with is already possible choose to present the other words that are provided as the next candidate of the words registered in the user, the
The word prepared as the next candidate of the presented word is exhausted.
A word-recognition apparatus for a specific speaker, which prompts a user to utter an arbitrary word when the word comes .

2. The apparatus according to claim 1, further comprising: an output unit that outputs a guide voice that prompts a user to input a word by voice; and a notifying unit that notifies a word that the distance between the feature amounts is not more than a predetermined value. A word-speech recognition device for specific speakers.

3. A voice uttered when a user pre-registers voices of words representing a plurality of objects with his / her own voice and selects and utters any of the words registered at the time of use. A word recognition apparatus for a specific speaker that identifies a plurality of objects by recognizing a spoken word by comparing a pattern with an already registered speech pattern and outputting a result, wherein a user An input unit for inputting a voice, a feature amount extracting unit for analyzing the input voice and extracting a feature amount, a storage unit for storing the extracted feature amount, a feature amount of a voice of the input word, Calculating means for calculating a distance between a speech feature of a word already registered; and a distance between a feature of the newly input word and the registered word identifying the plurality of targets. Be large enough to If it is determined that it can not, until enough large to identify the plurality of object, and control means for controlling so as to present the other words that are provided as the next candidate in one word by one user Output a guide voice prompting the user to input words
Output means, and a word whose distance between the feature amounts is not more than a predetermined value.
And a notifying unit, wherein the control unit prompts the user to utter an arbitrary word when the word prepared as the next candidate runs out.

4. The word recognition apparatus for a specific speaker according to claim 1, further comprising a voice output unit or a display unit for outputting what word set is currently registered. .