JP2001005490A

JP2001005490A - Voice recognition device

Info

Publication number: JP2001005490A
Application number: JP11170884A
Authority: JP
Inventors: Takeji Hirata; 武二平田
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1999-06-17
Filing date: 1999-06-17
Publication date: 2001-01-12

Abstract

PROBLEM TO BE SOLVED: To obtain a voice recognition device, in which response to inputted voice is made more complex while employing a simple constitution, by determining the degree of similarity between inputted voice data by using the threshold values of recognition distances and the voice data registered in a storage means and selecting the operation of an operating unit from the determination result. SOLUTION: A word uttered by a user for a device is transmitted to an A/D converting IC2 through a microphone 1. The inputted voice is converted into digital data by the IC2 and the data become collating data for registering data 5a. A CPU3 collates the data 5a in a memory 5 and the inputted voice data by using a processing program 5b that executes data process/collating process stored in the memory 5. When the inputted data agree with any of the data 5a by the collating result, discrimination is made to determine the extent of nearness to the agreed data 5a by using the threshold value computed in the process of the collating process. Based on the result above, any one of the operations, that are able to be executed by an operation unit 4, is selected.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は音声認識装置に関
し、特に、簡単な構成で、同一の呼びかけに対する反応
動作を変化させることができる音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition apparatus, and more particularly, to a speech recognition apparatus having a simple structure and capable of changing a response operation to the same call.

【０００２】[0002]

【従来の技術】近年、例えば「お手」「吠えろ」といっ
た命令により命令通りの動作をする犬の玩具など、いく
つかの命令や呼びかけ（単語、口笛など）を個別に認識
して、これらに応じた動作を行う装置が開発されてい
る。このような装置はマイクロホン等の音声入力手段
と、入力音声を処理する処理手段と、入力音声を予め登
録された音声データと比較する照合手段とを具えてお
り、入力音声が登録音声と一致した場合にこれをトリガ
として特定の動作を行ったり、音声の内容自体を認識し
て特定の反応を示すように構成される。2. Description of the Related Art In recent years, several commands and calls (words, whistles, etc.) have been individually recognized, such as a dog toy that operates as instructed by commands such as "hand" and "barking", and these are recognized. Devices that perform corresponding operations have been developed. Such an apparatus includes voice input means such as a microphone, processing means for processing the input voice, and matching means for comparing the input voice with pre-registered voice data, and the input voice matches the registered voice. In this case, a specific operation is performed by using this as a trigger, or a specific reaction is indicated by recognizing the content of the voice itself.

【０００３】この種の玩具において、ユーザからの命令
や呼びかけに対する反応は、より動物のそれに近い方が
好ましいものである。すなわち、あたかも感情があるか
のように呼びかけに対して様々な反応を示し、時にはユ
ーザの期待を裏切るような反応を示すものが求められて
いる。[0003] In this type of toy, it is preferable that the response to a command or call from a user is closer to that of an animal. In other words, there is a demand for a device that shows various reactions to a call as if it has emotions, and sometimes shows a reaction that disappoints the user's expectations.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、音声認
識装置を用いた従来の玩具では、入力音声の照合判断は
登録音声と一致するか否かの２通りの結果しかなく、ま
た一致した場合にも入力音声に応じた決められた動作を
行うだけのものが通常であり、この反応が単純でユーザ
に飽きられやすいものとなってしまう場合が多い。従っ
て、このような音声認識玩具において、入力音声に対す
る反応を複雑化させてユーザをより愉しませるものが求
められている。However, in the conventional toy using the voice recognition device, the collation judgment of the input voice has only two results, that is, whether the input voice matches the registered voice or not. Normally, only a predetermined operation corresponding to an input voice is performed, and in many cases, the reaction is simple and the user is easily bored. Accordingly, there is a demand for such a speech recognition toy that complicates the reaction to the input speech and makes the user more enjoyable.

【０００５】また、電子機器である玩具に実際の感情を
持たせることは不可能であり、あたかも感情があるかの
ように動作するよう構成するのも高度の技術を要し、非
常に困難を伴うものである。[0005] Further, it is impossible to give real emotions to a toy as an electronic device, and it is very difficult to configure a toy that operates as if it has emotions, which requires a high level of technology. It is accompanied.

【０００６】一方、呼びかけに対する装置の反応動作に
バリエーションを持たせる方法としては、複数種類の反
応動作を用意しておき、呼びかけがあったらこれら動作
の１つをランダムに選択して行わせる方法が考えられる
が、この方法では選択される反応動作が呼びかけの声の
調子などに関連することがなく、あたかもユーザの呼び
かけを認識・判断した結果の反応動作であるという要素
を持たせることは不可能である。On the other hand, as a method for giving a variation to the reaction operation of the apparatus in response to a call, a method of preparing a plurality of types of reaction operations and randomly selecting one of these operations when a call is made is performed. Although it is conceivable, in this method, the reaction action selected is not related to the tone of the calling voice, etc., and it is impossible to have an element that it is a reaction action as a result of recognizing and judging the user's call. It is.

【０００７】本発明はこのような要望や問題を鑑み、音
声照合判断にしきい値を用いることにより、簡単な構成
で入力音声に対する反応をより複雑化させた音声認識装
置を提供することを目的とする。SUMMARY OF THE INVENTION In view of such demands and problems, it is an object of the present invention to provide a speech recognition apparatus which uses a threshold value for speech collation judgment and which has a simple configuration and makes the reaction to input speech more complicated. I do.

【０００８】[0008]

【課題を解決するための手段】この目的を達成するため
に、本発明は、ユーザの音声を入力するマイクロフォン
と、予め１以上の音声データを登録した記憶手段と、少
なくとも２以上の動作を実行可能な動作ユニットと、前
記マイクロフォンを介して入力された音声と前記記憶手
段に登録された音声データとの照合を行う照合手段とを
具え、この照合により前記入力された音声データが前記
記憶手段に登録されたいずれかの音声データと一致する
場合に前記動作ユニットが当該音声に対応する動作を実
行するよう構成した音声認識装置において、前記照合手
段が更に、認識距離のしきい値を用いて前記入力された
音声データと前記記憶手段に登録された音声データとの
類似度を判定し、この判定結果に応じて前記動作ユニッ
トの動作を選択することを特徴とする。In order to achieve this object, the present invention provides a microphone for inputting a user's voice, storage means for pre-registering one or more voice data, and at least two or more operations. A possible operation unit, and matching means for checking the voice input via the microphone with the voice data registered in the storage means, and the input voice data is stored in the storage means by the verification. In a voice recognition device configured so that the operation unit executes an operation corresponding to the voice when the voice data matches one of the registered voice data, the collation unit further includes a threshold value of a recognition distance. A similarity between the input voice data and the voice data registered in the storage unit is determined, and an operation of the operation unit is selected according to the determination result. It is characterized in.

【０００９】このように、照合手段がしきい値を用いて
入力音声と登録音声の類似度を判定し、両者の距離に応
じて動作ユニットの動作を変化させるようにすれば、例
えば当該装置に同じ命令あるいは問いかけを発した場合
でも反応動作は常に同じとはならない。これにより、ユ
ーザは時には予期せぬ様々な反応を愉しむことができ、
製品価値が飛躍的に向上する。また、呼びかけに対する
反応動作はユーザによる入力音声と予め登録された音声
との距離に応じて選択されるため、装置の反応動作をユ
ーザの声の調子などに関連づけることができ、より一層
好適である。In this way, if the matching means determines the similarity between the input voice and the registered voice by using the threshold value and changes the operation of the operation unit according to the distance between the two, for example, Even if the same command or question is issued, the reaction behavior is not always the same. This allows users to enjoy a variety of unexpected reactions,
Product value is dramatically improved. Further, since the reaction operation to the call is selected according to the distance between the input voice by the user and the pre-registered voice, the reaction operation of the device can be associated with the tone of the user's voice, which is even more preferable. .

【００１０】また、本発明の音声認識装置では、前記動
作ユニットが音声出力機構を具えることが望ましい。こ
のように装置を構成してユーザの命令や呼びかけに対し
て音声で返答可能にすれば、より多様な反応が可能にな
りユーザを愉しませることができる。[0010] In the voice recognition device of the present invention, it is desirable that the operation unit includes a voice output mechanism. If the apparatus is configured in this way and can respond to a user's command or call by voice, more various reactions can be performed and the user can be enjoyed.

【００１１】また、本発明の音声認識装置では、前記記
憶手段は書込み可能な記憶手段であることが望ましい。
更に、この記憶手段は着脱可能な記憶手段として構成し
ても良い。このように記憶手段を書込み可能として構成
し、及び／又は着脱可能な記憶手段を用いることによ
り、製品出荷後に記憶手段の内容を変更することができ
る。これにより、製品出荷後にも登録音声データの増減
や反応動作の変更等の設定変更が可能となり、装置の機
能に一層の幅を持たせることができる。In the speech recognition apparatus according to the present invention, it is preferable that the storage unit is a writable storage unit.
Further, the storage unit may be configured as a removable storage unit. By configuring the storage means as writable and / or using the removable storage means in this manner, the contents of the storage means can be changed after the product is shipped. As a result, setting changes such as increase / decrease of registered voice data and change of reaction operation can be performed even after the product is shipped, and the function of the apparatus can be provided with more flexibility.

【００１２】更に、前記記憶手段に予め前記ユーザ自身
の音声データを登録するようにしても良い。記憶手段に
予めユーザの音声データを登録しておくようにすれば、
例えば正規ユーザ以外の命令には従わないとか、あるい
はその音声を発した他人に対して正規ユーザへの忠誠を
述べる趣旨の動作を行うよう構成することが可能とな
り、よりユーザに愛される玩具を提供することが可能と
なる。Furthermore, the user's own voice data may be registered in the storage means in advance. If the user's voice data is registered in the storage means in advance,
For example, it is possible to provide a toy that is not obeyed by a command other than the authorized user, or that performs an action of stating loyalty to the authorized user with respect to others who uttered the voice, thereby providing a toy that is more loved by the user. It is possible to do.

【００１３】このように予めユーザ自身の音声を登録す
るようにした装置においては、前記照合手段によるしき
い値を用いた判定結果により、前記入力音声が予め登録
された音声データと一致するが、両者の距離が所定以上
遠い場合に、前記動作ユニットが前記ユーザの体調を問
いかける趣旨の動作を行うようにしても良い。例えば、
入力音声が予め登録されたユーザの音声とほぼ一致した
場合には通常の反応動作を行い、ぎりぎりで登録音声と
一致する場合は「声の調子が変だけど風邪ひいてるの
？」と音声出力するように反応動作を変更すれば、ユー
ザに一層親しんでもらえる玩具を提供することが可能と
なる。In the apparatus in which the user's own voice is registered in advance, the input voice matches the voice data registered in advance according to the determination result using the threshold value by the matching means. When the distance between the two is longer than a predetermined distance, the operation unit may perform an operation of inquiring about the physical condition of the user. For example,
When the input voice almost matches the voice of the user registered in advance, a normal reaction operation is performed, and when the voice just matches the registered voice, the voice is output as "The tone of the voice is strange but is a cold?" By changing the reaction operation as described above, it is possible to provide a toy that is more familiar to the user.

【００１４】また、本発明の音声認識装置は、当該装置
がユーザの発音能力向上を目的とする装置であり、前記
動作ユニットが前記入力された音声データと予め登録さ
れた音声データとの距離をユーザに示す動作を行うよう
にしても良い。すなわち、例えば記憶手段に予め正確な
発音の外国語を複数登録しておき、ユーザが当該装置に
向けて発した発音の正確さ等を示すようにすれば、子供
向けの外国語習熟用装置として好適に用いることができ
る。The voice recognition device of the present invention is a device for improving the pronunciation ability of a user, and the operation unit determines the distance between the input voice data and the pre-registered voice data. The operation shown to the user may be performed. That is, for example, if a plurality of foreign languages with accurate pronunciations are registered in advance in the storage means, and the accuracy of the pronunciation emitted by the user toward the device is indicated, the device can be used as a foreign language proficiency device for children. It can be suitably used.

【００１５】更に、本発明の音声認識装置は実在のある
いは空想上の生物をモチーフに形成された玩具であり、
前記動作ユニットが前記装置の手足や顔面を動作させる
ように構成しても良い。上述したように本発明の目的は
反応動作のバリエーションによりユーザに飽きられない
音声認識装置を提供することであり、その意味でユーザ
にとって親しみやすい愛玩動物等をモチーフに外見を構
成すればこの目的を達成することが容易となる。Further, the voice recognition device of the present invention is a toy formed using a real or fantasy motif as a motif,
The operating unit may be configured to operate a limb or a face of the device. As described above, an object of the present invention is to provide a voice recognition device that does not get tired of the user due to variations in the reaction operation. In this sense, if the appearance is configured with a pet animal or the like that is familiar to the user, this object is achieved. It is easier to achieve.

【００１６】[0016]

【発明の実施の形態】本発明の実施の形態を、添付の図
面を参照しながら以下に説明する。図１は本実施形態に
係る音声認識装置の内部構成を示すブロック図である。
ここで、音声認識装置は例えば一般的な愛玩動物をモチ
ーフに形成された玩具であり、ユーザの命令や呼びかけ
に応じて単語や歌声を発したり手足や顔面を動かすタイ
プの玩具である。Embodiments of the present invention will be described below with reference to the accompanying drawings. FIG. 1 is a block diagram showing the internal configuration of the speech recognition device according to the present embodiment.
Here, the voice recognition device is, for example, a toy formed using a general pet as a motif, and is a type of toy that utters a word or a singing voice or moves a limb or a face in response to a user command or call.

【００１７】図１を参照すると、本実施形態の音声認識
装置は、ユーザによる音声入力を行うマイクロフォン１
と、入力音声をデジタルデータ化するＡ／Ｄ変換ＩＣ２
と、音声データ処理や登録データとの照合その他の制御
を行うＣＰＵ３と、入力音声に反応した動作を実行する
動作ユニット４と、登録音声データや処理プログラム等
を格納するメモリ５とを具えている。このメモリ５には
書込み可能なメモリを使用する。また、動作ユニット４
は、鳴声を発するための音声出力装置や、まばたきを実
行したり手足や尻尾を動かすための機械的機構を具えて
おり、少なくとも２以上の動作を実行可能である。Referring to FIG. 1, a voice recognition apparatus according to the present embodiment includes a microphone 1 for performing voice input by a user.
And an A / D conversion IC 2 for converting input voice into digital data
And a CPU 3 for performing voice data processing and collation with registered data and other controls, an operation unit 4 for executing an operation in response to an input voice, and a memory 5 for storing registered voice data, a processing program, and the like. . As the memory 5, a writable memory is used. The operation unit 4
Has a sound output device for generating a sound, and a mechanical mechanism for performing blinking and moving limbs and tails, and can execute at least two or more operations.

【００１８】図２及び図３は、本実施形態の音声認識装
置の動作を説明するフローチャートである。本実施形態
の装置では、予めユーザがマイクロフォン１から所定数
の単語や口笛などを入力し、照合用データをメモリ５内
に作成しておくようにする。すなわち、図２に示すよう
に、ユーザが本発明に係る音声認識装置を入手したら、
まず登録が必要な単語等をマイクロフォン１を介して入
力する（ステップＳ１）。FIGS. 2 and 3 are flowcharts for explaining the operation of the speech recognition apparatus according to the present embodiment. In the apparatus of the present embodiment, a user inputs a predetermined number of words, whistles, and the like from the microphone 1 in advance, and creates collation data in the memory 5. That is, as shown in FIG. 2, when the user obtains the voice recognition device according to the present invention,
First, a word or the like that needs to be registered is input via the microphone 1 (step S1).

【００１９】これにより、アナログデータである入力音
声はＡ／Ｄ変換ＩＣ２によってデジタルデータ化され
（ステップＳ２）、メモリ５の所定の領域に基準データ
（図１：登録データ５ａ）として順次登録される（ステ
ップＳ３）。このようにして所定数の単語を登録してい
き、必要な音声データの登録が全て終了したら（ステッ
プＳ４）、設定作業を終了する。ここで、登録が必要な
単語は製品出荷時のマニュアルでユーザに説明するよう
にしても良いし、あるいは装置が自動的に初期設定モー
ドにてユーザに所定の単語の発声を求めるように構成し
ても良い。Thus, the input voice, which is analog data, is converted into digital data by the A / D conversion IC 2 (step S2), and is sequentially registered as reference data (FIG. 1: registration data 5a) in a predetermined area of the memory 5. (Step S3). In this manner, a predetermined number of words are registered, and when all necessary voice data have been registered (step S4), the setting operation ends. Here, words that need to be registered may be explained to the user in a manual at the time of product shipment, or the apparatus may be configured to automatically ask the user to utter a predetermined word in the initial setting mode. May be.

【００２０】図３は、本実施例に係る音声認識装置の通
常運用時の動作を説明するフローチャートである。図３
に示すように、通常運用時には、ユーザが装置に向って
発した単語はマイクロフォン１を介してＡ／Ｄ変換ＩＣ
２へ送られる（ステップＳ１１）。この入力音声はＡ／
Ｄ変換ＩＣ２によりデジタルデータ化され（ステップＳ
１２）、前述した登録データ５ａとの照合用データとな
る（ステップＳ１３）。FIG. 3 is a flowchart for explaining the operation of the speech recognition apparatus according to the present embodiment during normal operation. FIG.
As shown in the figure, during normal operation, a word uttered by the user toward the device is transmitted through the microphone 1 to the A / D conversion IC.
2 (step S11). This input voice is A /
It is converted into digital data by the D conversion IC 2 (step S
12), becomes data for collation with the above-mentioned registration data 5a (step S13).

【００２１】ＣＰＵ３はメモリ５に格納されたデータ処
理／照合処理を実行するプログラム５ｂを用いて、メモ
リ５内の登録データ５ａとユーザによる入力音声データ
との照合を行う（ステップＳ１４）。この照合結果によ
り入力データが登録データのいずれかと一致した場合に
は（ＹＥＳ）、更に、照合処理の過程において計算され
たしきい値を用いて、一致する登録データにどのくらい
近いかの判定を行う（ステップＳ１５）。この判定結果
により動作ユニット４が実行可能ないずれかの動作が選
択され、装置の実際の動作が実行される（ステップＳ１
６）。また、ステップＳ１４にて入力データが登録デー
タ５ａのいずれとも一致しない場合には当該装置にとっ
て有意な命令や呼びかけでないものとして無視し、工程
を終了する。The CPU 3 compares the registered data 5a in the memory 5 with the voice data input by the user by using the program 5b for executing data processing / collating processing stored in the memory 5 (step S14). If the input data matches any of the registered data based on the result of the comparison (YES), it is further determined, using the threshold value calculated in the process of the matching process, how close the registered data is. (Step S15). One of the operations that can be executed by the operation unit 4 is selected based on the determination result, and the actual operation of the device is executed (step S1).
6). If the input data does not match any of the registered data 5a in step S14, it is ignored because it is not a command or call that is significant for the device, and the process ends.

【００２２】図４は、図３に示すステップＳ１５、Ｓ１
６の工程を詳細に説明する図である。図４に示すよう
に、ユーザからの入力音声データがメモリ５内の登録デ
ータのいずれかと一致した場合には（ステップＳ１４か
ら）、ＣＰＵ３は更に認識距離のしきい値を用いてこの
登録データとの類似度を判定し、用意されたユニット動
作１〜ｎまでのいずれかの動作を実行するべく動作ユニ
ット４を制御する（ステップＳ１６）。図４に示す例で
は、ＣＰＵ５のしきい値判定により入力データが登録デ
ータと限りなく同一に近い場合には入力音声に対する一
般的な（ユーザが期待する通りの）動作であるユニット
動作１（例えば、入力音声「おはよう」に対して「おは
よう」と返答する動作）を選択し、逆に入力データがぎ
りぎりで登録データと一致している場合にはユニット動
作ｎ（例えば、入力音声「おはよう」に対して「声の調
子が変だけど風邪でもひいているの」と返答する動作）
を選択するように設定する。登録データと最も近いユニ
ット動作１以外の動作２〜ｎの中には、例えば入力音声
「おはよう」に対して「まだねむい」と返答したり、ま
ばたきをしたり、手足をばたつかせる等、ユーザがむし
ろ予期しない動作をするよう設定することが考えられ
る。FIG. 4 shows steps S15 and S1 shown in FIG.
FIG. 6 is a diagram for explaining step 6 in detail. As shown in FIG. 4, when the input voice data from the user matches any of the registered data in the memory 5 (from step S14), the CPU 3 further uses the threshold of the recognition distance to store the registered data in the memory 5. Is determined, and the operation unit 4 is controlled to execute one of the prepared unit operations 1 to n (step S16). In the example shown in FIG. 4, when the input data is almost the same as the registered data by the threshold determination of the CPU 5, the unit operation 1 (for example, a general operation (as expected by the user)) for the input voice (for example, If the input data is very close to the registered data, the unit operation n (for example, the input voice "good morning") is selected. On the other hand, the reply is "The voice tone is strange, but it is cold."
Set to select. Some of the operations 2 to n other than the unit operation 1 closest to the registered data include, for example, a reply to the input voice "Good morning" of "still sleepy", blinking, flapping of limbs, etc. May be set to perform unexpected operations.

【００２３】本発明の実施例は上記のものに限る訳では
なく、他にも様々なものを考えることができる。特に、
動作ユニットが実行する動作の種類は装置の目的や形態
に応じて、例えば動物型の玩具であれば鳴声などの音声
出力、手足ユニットの機械的動作、目の動作やまばた
き、口の開け閉め、尻尾の動作などを実行できるよう構
成しても良いし、人形やロボット型の玩具であれば言葉
をしゃべったり、手足の動作、ＬＥＤの点滅などの動作
を行うように構成しても良い。更に、しきい値判定によ
り選択するユニット動作は、例えば音声出力と同時に口
を開閉させるなど、上記動作を２以上組合わせて行うよ
うに構成することも可能である。The embodiments of the present invention are not limited to those described above, and various other embodiments can be considered. In particular,
The type of operation performed by the operation unit depends on the purpose and form of the device.For example, in the case of an animal-type toy, sound output such as sounding, mechanical operation of the limb unit, eye operation and blinking, opening and closing of the mouth are performed. Or a tail or the like, or a doll or robot-type toy may be configured to speak words, perform a limb operation, or blink an LED. Further, the unit operation selected based on the threshold value determination may be configured to perform a combination of two or more of the above operations, such as opening and closing the mouth simultaneously with audio output.

【００２４】また、メモリ５内に予め登録する音声デー
タは、目的によってはユーザ自身の音声でなくても良
い。その場合は、メモリ５は書込み可能なメモリでなく
ても、予め複数の音声データや処理プログラムを格納し
た書換不能なＲＯＭを使用しても良い。あるいは、この
メモリ５を装置から着脱可能なメモリ（例えば、メモリ
カード等）として構成するとともに、装置の目的や対象
年齢層毎に異なる内容を記憶させた複数種のメモリを用
意しておき、必要に応じて変えられるようにしても良
い。The voice data registered in the memory 5 in advance may not be the user's own voice depending on the purpose. In this case, the memory 5 need not be a writable memory, but may be a non-rewritable ROM that stores a plurality of audio data and processing programs in advance. Alternatively, the memory 5 is configured as a memory (for example, a memory card or the like) that is detachable from the device, and a plurality of types of memories that store different contents for each purpose and target age group of the device are prepared. May be changed according to the conditions.

【００２５】更に、本発明の音声認識装置は、例えば子
供の外国語習熟における発音能力向上用の装置としても
実現することができる。図５は、発音能力向上を目的と
して使用する音声認識装置の動作を説明するフローチャ
ートである。この場合、装置の外見は外国人教師を想像
させる人形タイプとして構成しても良いし、外見上はシ
ンプルな筐体として構成しても良い。また、メモリ５に
は、例えば当該外国語を母国語とする人間が正しい発音
で吹込んだ複数の単語を予め登録しておくものとする。
このメモリ５は取換可能なメモリカードを用いるか、あ
るいは上書可能なメモリを用いて登録単語の更新を行え
るようにしても良い。Further, the speech recognition device of the present invention can be realized as, for example, a device for improving pronunciation ability in learning a foreign language of a child. FIG. 5 is a flowchart illustrating the operation of the speech recognition device used for the purpose of improving the pronunciation ability. In this case, the appearance of the device may be configured as a doll type that makes a foreign teacher imagine, or may be configured as a simple appearance. In the memory 5, for example, a plurality of words, which are pronounced by a person whose native language is the native language and are pronounced correctly, are registered in advance.
The memory 5 may use a replaceable memory card, or may use a rewritable memory so that the registered words can be updated.

【００２６】図５に示すように、メモリ５に格納したプ
ログラムの実行により、装置のＣＰＵはまずメモリ５内
の登録音声データのいずれかを任意に選択し（ステップ
Ｓ２１）、動作ユニット４が具える音声出力機構を制御
して選択した単語をユーザに向けて再生する。ユーザは
ここで示された単語をそのまま繰返して発声するように
する（ステップＳ２２）。この工程を円滑に遂行するた
めに、ステップＳ２２の後にユーザに当該単語の発声を
促すメッセージを出すようにしても良い。As shown in FIG. 5, by executing the program stored in the memory 5, the CPU of the apparatus first selects any of the registered voice data in the memory 5 (step S21), and the operation unit 4 executes the operation. The selected word is played back to the user by controlling the voice output mechanism. The user repetitively utters the word indicated here (step S22). In order to smoothly perform this process, a message that prompts the user to speak the word may be issued after step S22.

【００２７】ユーザにより発声された単語はマイクロフ
ォン１を介して入力され（ステップＳ２３）、Ａ／Ｄ変
換ＩＣ２によりデジタルデータ化され、ステップＳ２１
で選択した単語との照合が行われる（ステップＳ２
４）。更に、入力音声データが登録データと一致した場
合にはしきい値を用いてユーザの発音がどれだけ登録単
語の発音と近いかを判定し（ステップＳ２５）、この判
定結果に応じて動作ユニット４の動作を選択して実行す
る（ステップＳ２６）。ここで選択される動作は、例え
ばしきい値判定により登録単語と入力単語がほぼ同じ発
音である場合には「ＯＫ」「ベリーグッド」などの音声
出力を行ったり、逆に登録単語と入力単語との距離が離
れている場合は「ＮＯ」「ヤリナオシ」などの音声出力
をすることが考えられる。なお、この動作は音声出力に
限るものではなく、例えば登録単語と入力単語の類似度
を点数にしてディスプレイ表示するような動作や、人形
型の装置であれば首を振る動作を実行するようにしても
良い。The word uttered by the user is input via the microphone 1 (step S23), converted into digital data by the A / D conversion IC 2, and then converted to the digital data at step S21.
Is collated with the word selected in step (step S2).
4). Further, when the input voice data matches the registered data, it is determined using the threshold value how close the pronunciation of the user is to the pronunciation of the registered word (step S25). Is selected and executed (step S26). The operation selected here is, for example, when the registered word and the input word have almost the same pronunciation by the threshold value judgment, sound output such as “OK” or “very good” is performed, or conversely, the registered word and the input word are output. If the distance is far, it may be possible to output a voice such as "NO" or "Yarinaoshi". Note that this operation is not limited to voice output, and for example, an operation of displaying a similarity between a registered word and an input word on a display or a shaking operation for a doll-shaped device may be performed. May be.

【００２８】[0028]

【発明の効果】以上、本発明の実施の形態について詳細
に説明したが、このように音声認識装置を構成すること
により、入力データが登録データと一致した場合でも、
どの程度登録データと近いかにより、動作ユニットの動
作の変更を行うことができるため、ユーザは時には予期
せぬ反応を愉しむことができる。また、この機能はＣＰ
Ｕにしきい値を用いたデータ照合を行わせることで実現
できるため、従来の装置への容易かつ安価な構成変更に
よりあたかも感情があるような反応動作を実行させるこ
とができる。従って、簡単な構成で入力音声に対する反
応をより複雑化させてユーザに飽きのこない音声認識装
置を提供することができる。As described above, the embodiment of the present invention has been described in detail. By configuring the speech recognition apparatus in this way, even if the input data matches the registered data,
Since the operation of the operation unit can be changed depending on how close the data is to the registered data, the user can sometimes enjoy an unexpected reaction. In addition, this function
Since this can be realized by causing U to perform data comparison using a threshold value, it is possible to execute a reaction operation as if there is an emotion by an easy and inexpensive configuration change to a conventional device. Therefore, it is possible to provide a voice recognition device that has a simple configuration and further complicates the reaction to the input voice and does not tire the user.

[Brief description of the drawings]

【図１】本発明に係る音声認識装置の実施の形態の内部
構成を示すブロック図である。FIG. 1 is a block diagram showing an internal configuration of a speech recognition apparatus according to an embodiment of the present invention.

【図２】図１に示す音声認識装置の初期設定を説明する
フローチャートである。FIG. 2 is a flowchart illustrating an initial setting of the voice recognition device shown in FIG.

【図３】図１に示す音声認識装置の通常運用時の動作を
説明するフローチャートである。FIG. 3 is a flowchart illustrating an operation of the voice recognition device illustrated in FIG. 1 during normal operation.

【図４】図３に示すしきい値判定とユニット動作の工程
を詳細に示す図である。FIG. 4 is a diagram showing in detail the steps of threshold value determination and unit operation shown in FIG. 3;

【図５】本発明の音声認識装置の他の実施形態の動作を
説明するフローチャートである。FIG. 5 is a flowchart illustrating the operation of another embodiment of the speech recognition device of the present invention.

[Explanation of symbols]

１マイクロフォン２Ａ／Ｄ変換ＩＣ３ＣＰＵ４動作ユニット５メモリ５ａ処理プログラム５ｂ登録データ Reference Signs List 1 microphone 2 A / D conversion IC 3 CPU 4 operation unit 5 memory 5a processing program 5b registration data

Claims

[Claims]

1. A microphone for inputting a user's voice, a storage unit in which one or more voice data is registered in advance, an operation unit capable of executing at least two or more operations, and a voice input via the microphone A collating unit for collating with the audio data registered in the storage unit, and when the inputted audio data matches any one of the audio data registered in the storage unit, the operation unit In a voice recognition device configured to execute an operation corresponding to the voice, the collation unit further includes a voice data registered in the storage unit and the input voice data using a threshold of a recognition distance. A speech recognition apparatus characterized in that the similarity of the motion unit is determined, and the operation of the operation unit is selected according to the determination result.

2. The speech recognition device according to claim 1, wherein said operation unit includes a speech output mechanism.

3. The speech recognition apparatus according to claim 1, wherein said storage means is a writable storage means.

4. The speech recognition apparatus according to claim 1, wherein said storage means is a storage means detachable from said speech recognition apparatus.

5. The voice recognition apparatus according to claim 3, wherein said user's own voice data is registered in said storage means in advance.

6. The voice recognition device according to claim 5, wherein the input voice matches the voice data registered in advance based on a determination result using the threshold value by the matching unit, but the distance between the two is predetermined. The speech recognition device according to claim 1, wherein the operation unit performs an operation for asking the physical condition of the user when the distance is far.

7. The voice recognition device according to claim 1, wherein the voice recognition device is a device for improving a user's pronunciation ability, and the operation unit is configured to store the input voice data and pre-registered voice data. A voice recognition device for performing an operation of indicating a distance to a user.

8. The voice recognition device according to claim 1, wherein the device is a toy formed using a real or fantasy motif as a motif, and the operation unit is a limb or a limb of the toy. A voice recognition device characterized by operating a face.