JPH08187368A

JPH08187368A - Game device, input device, voice selector, voice recognizing device and voice reacting device

Info

Publication number: JPH08187368A
Application number: JP7114957A
Authority: JP
Inventors: Hidetsugu Maekawa; 英嗣前川; Tatsumi Watanabe; 辰巳渡辺; Kazuaki Obara; 和昭小原; Kazuhiro Kayashima; 一弘萱嶋; Kenji Matsui; 謙二松井; Yoshihiko Matsukawa; 善彦松川
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1994-05-13
Filing date: 1995-05-12
Publication date: 1996-07-23

Abstract

PURPOSE: To provide a game device to be operated by a voice natural for a human being. CONSTITUTION: A voice recognizing part 2 recognizes a voice, a uttering zone detecting part 4 detects the uttering zone of a speaker (the operator of the device) from the motion near the lips of the speaker fetched by an image input part 3. Based on the voice recognized result and the information of the detected uttering zone, an integrative judging part 5 extracts only the voice recognizing result uttered by the speaker. The recognized result is sent to a control part 6 and utilized for controlling an airship 7. Thus, wrong recognition caused by surrounding noises other than the voice of the speaker can be prevented.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声を用いて操作する
ゲーム装置、口唇画像や音声を入力する入力装置、およ
び音声反応装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a game device operated by using a voice, an input device for inputting a lip image and a voice, and a voice reaction device.

【０００２】[0002]

【従来の技術】図３４に従来のゲーム装置の例として、
無線受信機を備えた飛行船を操作者の手元の無線受信器
付きリモートコントローラーによって操作するゲーム装
置を示す。図３４に示すように、従来のゲーム装置で
は、リモートコントローラに備えられたジョイスティッ
ク１６１を用いて対象物を操作するのが一般的である。
操作者がジョイスティック１６１を動かすと、その角度
が角度検出部１６２および１６３によって検出され、電
気信号に変換されて制御部１６４に出力される。制御部
１６４は、これらの電気信号に基づき、ジョイスティッ
ク１６１の角度に応じて飛行船７の移動を制御するため
のラジオコントロール信号を出力する。2. Description of the Related Art FIG. 34 shows an example of a conventional game machine.
1 shows a game device in which an airship equipped with a wireless receiver is operated by a remote controller with a wireless receiver in the hand of an operator. As shown in FIG. 34, in a conventional game device, a joystick 161 included in a remote controller is generally used to operate an object.
When the operator moves the joystick 161, the angle is detected by the angle detection units 162 and 163, converted into an electric signal and output to the control unit 164. The control unit 164 outputs a radio control signal for controlling the movement of the airship 7 according to the angle of the joystick 161 based on these electric signals.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら従来のゲ
ーム装置は、ジョイスティック１６１による操作である
ため、人間にとって自然な操作とはなっていない。この
ため操作習熟に時間がかかる、とっさの反応に鈍くなる
等の問題点を有していた。また、飛行船ではなく、駆動
装置付きの風船を操作するゲーム装置もあるが、この装
置においても上述したようにして風船の動きが制御され
るため、風船の動きが非生物的になってしまい、風船独
特の暖かみが薄れるという問題があった。However, since the conventional game device is operated by the joystick 161, it is not a natural operation for humans. For this reason, there are problems that it takes a long time to master the operation, and that the reaction to the reaction becomes dull. Also, there is a game device that operates a balloon with a drive device instead of an airship, but in this device as well, since the movement of the balloon is controlled as described above, the movement of the balloon becomes abiotic, There was a problem that the warmth peculiar to balloons diminished.

【０００４】また、操作者の口唇の画像を入力すること
により、音声を認識する装置も提案されているが、この
ような装置では、高度な光学系レンズを必要とするため
に装置自体が大ががりなものとなってしまう上に高価で
あるという問題点がある。An apparatus for recognizing a voice by inputting an image of an operator's lips has also been proposed. However, such an apparatus requires a sophisticated optical system lens, so that the apparatus itself is large. There is a problem that it is expensive and expensive.

【０００５】本発明はこのような現状に鑑みてなされた
ものであり、その目的は、（１）自然な音声による操作
が可能であり、操作習熟を必要とせず、さらに騒音下あ
るいは音声を発しにくい状況での利用、および発声に障
害を持つ者の利用を可能にするゲーム装置を低コストか
つ簡易な構成で提供すること、（２）操作者の口唇の動
きおよび音声を簡易な構成により入力することが可能で
ある入力装置、（３）同一の入力音声に対して、複数の
言葉の中からランダムに選択された言葉を音声として出
力する音声選択装置、（４）音声によって自然な動作を
させることができるゲーム装置または玩具、ならびにこ
れらに用いられる音声認識装置を提供すること、および
（５）入力される音声に応じて動作を変えることができ
る音声反応装置を提供することにある。The present invention has been made in view of the above circumstances, and the purpose thereof is (1) operation by natural voice is possible, operation familiarity is not required, and noise or voice is emitted. To provide a game device with a low cost and a simple structure that enables the use in a difficult situation and the use of a person with a speech disability. (2) Inputting the movement and voice of the operator's lips with a simple structure. An input device capable of performing, (3) a voice selection device that outputs as a voice a word randomly selected from a plurality of words with respect to the same input voice, (EN) A game device or toy that can be played, and a voice recognition device used therefor, and (5) a voice reaction device that can change its operation according to an input voice. There to be provided.

【０００６】[0006]

【課題を解決するための手段】本発明のゲーム装置は、
操作者によって発生された音声を含む少なくとも１つの
音声を入力し、入力された該音声を第１の電気信号に変
換し、該第１の電気信号を出力する音声入力手段と、該
音声入力手段から出力された該第１の電気信号に基づい
て該少なくとも１つの音声を認識する音声認識手段と、
該操作者の口唇の動きを光学的に検出し、検出された該
口唇の動きを第２の電気信号に変換し、該第２の電気信
号を出力する画像入力手段と、該第２の電気信号を受け
取り、受け取った該第２の電気信号に基づいて、該話者
によって該音声が発生されている区間を求める発生区間
検出手段と、該音声認識手段によって認識された該少な
くとも１つの音声と、該発生区間検出手段によって求め
られた該区間とに基づいて、該少なくとも１つの音声か
ら該操作者によって発生された該音声を抽出する統合判
断手段と、該統合判断手段によって抽出された該音声に
基づいて、対象物を制御する制御手段とを備えており、
そのことにより上記目的を達成する。The game device of the present invention comprises:
A voice input means for inputting at least one voice including a voice generated by an operator, converting the input voice into a first electric signal, and outputting the first electric signal, and the voice input means. Voice recognition means for recognizing the at least one voice based on the first electrical signal output from
Image input means for optically detecting the movement of the operator's lip, converting the detected movement of the lip into a second electric signal, and outputting the second electric signal, and the second electric signal. A generated section detecting means for receiving a signal and obtaining a section in which the voice is generated by the speaker based on the received second electric signal; and the at least one voice recognized by the voice recognition means. An integrated judgment means for extracting the sound generated by the operator from the at least one sound based on the section obtained by the generation section detection means, and the sound extracted by the integrated judgment means And a control means for controlling the object based on
This achieves the above object.

【０００７】前記発声区間検出手段は、前記画像入力手
段から出力される前記第２の電気信号の変化の度合いを
検出する微分手段と、該微分手段によって検出される該
変化の度合いが所定の値を超えたときに、対応する音声
は前記操作者によって発生されたと判断する手段とを備
えていてもよい。The vocalization section detecting means is a differentiating means for detecting the degree of change of the second electric signal output from the image inputting means, and the degree of change detected by the differentiating means is a predetermined value. And a means for determining that the corresponding voice is generated by the operator when the number exceeds the limit.

【０００８】前記統合判断手段は、前記発声区間検出手
段によって求められた前記区間に所定の長さの区間を加
えることにより評価区間を作成する手段と、前記音声認
識手段によって認識された前記少なくとも１つの音声
が、該音声認識手段から出力された認識結果出力時間を
検出する手段と、該認識結果出力時間と該評価区間とを
比較し、該少なくとも１つの音声のうち、該認識結果出
力時間が該評価区間内に収まっている音声を前記操作者
によって発声された前記音声と判断する手段とを備えて
いてもよい。The integrated judgment means creates an evaluation section by adding a section of a predetermined length to the section obtained by the vocal section detection section, and the at least one recognized by the voice recognition section. One voice compares the recognition result output time with the means for detecting the recognition result output time outputted from the voice recognition means, and the recognition result output time of the at least one voice is compared. It may be provided with a unit for determining a voice contained in the evaluation section as the voice uttered by the operator.

【０００９】本発明の他のゲーム装置は、操作者の口唇
の動きを光学的に入力し、該入力された口唇の動きを電
気信号に変換し、該電気信号を出力する画像入力手段
と、該電気信号に基づいて該口唇の動きを求め、該求め
られた口唇の動きに対応する言葉を認識し、認識結果を
出力する口唇認識手段と、該認識結果に基づいた制御信
号に応じて対象物を制御する制御手段とを備えており、
そのことにより上記目的を達成する。Another game device of the present invention is an image input means for optically inputting the movement of the operator's lips, converting the inputted movement of the lips into an electric signal, and outputting the electric signal. A lip recognition unit that obtains the movement of the lip based on the electric signal, recognizes a word corresponding to the obtained movement of the lip, and outputs a recognition result, and a target according to a control signal based on the recognition result. And a control means for controlling the object,
This achieves the above object.

【００１０】前記口唇認識手段は、所定数の言葉を記憶
している記憶手段と、前記求められた口唇の動きに応じ
て該所定数の言葉から１つを選択し、該選択された言葉
を該口唇の動きに対応する該言葉であると判断するマッ
チング手段とを備えていてもよい。The lip recognition means selects one from the predetermined number of words in accordance with the storage means for storing a predetermined number of words and the obtained movement of the lip, and selects the selected word. A matching unit that determines that the word corresponds to the movement of the lip may be provided.

【００１１】前記記憶手段は、前記所定数の言葉に対応
する口唇の動きを標準パターンとして記憶しており、前
記マッチング手段は、該標準パターンの全てについて、
前記求められた口唇の動きとの距離を算出し、該標準パ
ターンのうちの該距離が最も小さい１つに対応する言葉
を選択してもよい。The storage means stores the movement of the lips corresponding to the predetermined number of words as a standard pattern, and the matching means stores all of the standard patterns.
It is also possible to calculate a distance from the obtained lip movement and select a word corresponding to one of the standard patterns having the smallest distance.

【００１２】前記ゲーム装置は、音声を入力し、該音声
を他の電気信号に変換し、該他の電気信号を出力する音
声入力手段と、該音声入力手段から出力された該他の電
気信号に基づいて該音声を認識する音声認識手段と、該
音声認識手段による認識結果と、前記口唇認識手段によ
る前記認識結果との両方に基づいて、前記制御手段に与
えられるべき前記制御信号を出力する統合判断手段とを
さらに備えていてもよい。The game device inputs voice, converts the voice into another electric signal, and outputs the other electric signal, and the other electric signal output from the voice input unit. The control signal to be given to the control means based on both the voice recognition means for recognizing the voice based on the above, the recognition result by the voice recognition means, and the recognition result by the lip recognition means. It may further include integrated determination means.

【００１３】前記ゲーム装置は、前記音声認識手段によ
る前記認識結果に対して、音声認識信頼度を求める手段
と、前記口唇認識手段による前記認識結果に対して、口
唇認識信頼度を求める手段とを有しており、前記統合判
断手段は、該音声認識信頼度および該口唇認識信頼度に
基づいて、該音声認識手段による該認識結果および該口
唇認識手段の該認識結果のうちの一方を選択し、それを
前記制御信号として出力してもよい。The game apparatus includes means for obtaining a voice recognition reliability for the recognition result by the voice recognition means, and means for obtaining a lip recognition reliability for the recognition result by the lip recognition means. The integrated determination means selects one of the recognition result by the voice recognition means and the recognition result by the lip recognition means based on the voice recognition reliability and the lip recognition reliability. , It may be output as the control signal.

【００１４】前記画像入力手段は、光を出射する発光手
段と、前記操作者の前記口唇によって反射された該光を
受け取り、該受け取った光を前記第２の電気信号に変換
する受光手段とを有していてもよい。The image input means includes a light emitting means for emitting light and a light receiving means for receiving the light reflected by the lips of the operator and converting the received light into the second electric signal. You may have.

【００１５】前記画像入力手段は、光を出射する発光手
段と、前記操作者の前記口唇によって反射された該光を
受け取り、該受け取った光を前記電気信号に変換する受
光手段とを有していてもよい。The image input means has a light emitting means for emitting light and a light receiving means for receiving the light reflected by the lips of the operator and converting the received light into the electric signal. May be.

【００１６】前記画像入力手段は、光を出射する発光手
段と、前記操作者の前記口唇によって反射された該光を
受け取り、該受け取った光を前記電気信号に変換する受
光手段とを有していてもよい。The image input means has a light emitting means for emitting light and a light receiving means for receiving the light reflected by the lips of the operator and converting the received light into the electric signal. May be.

【００１７】前記光は、前記口唇に側方から照射されて
もよい。The light may be applied to the lips from the side.

【００１８】前記光は、前記口唇に正面から照射されて
もよい。The light may be applied to the lips from the front.

【００１９】前記音声入力手段は、少なくとも１つのマ
イクロフォンを有していてもよい。前記音声入力手段は
少なくとも１つのマイクロフォンを有しており、該少な
くとも１つのマイクロフォン、および前記画像入力手段
の前記発光手段および前記受光手段は、１つの台上に設
けられていてもよい。The voice input means may have at least one microphone. The voice input means may include at least one microphone, and the at least one microphone and the light emitting means and the light receiving means of the image input means may be provided on one stand.

【００２０】本発明の入力装置は、ヘッドフォン状のヘ
ッドセットと、一端が該ヘッドセットに接合されている
支柱と、該支柱の他端に接合されている台であって、そ
の上に、操作者の口唇に照射される光を発生する少なく
とも１つの発光素子と、該口唇によって反射された該光
を受け取る少なくとも１つの受光素子とが設けられてい
る台とを備えており、そのことにより上記目的を達成す
る。The input device of the present invention comprises a headset in the form of a headphone, a support having one end joined to the headset, and a base joined to the other end of the support. A pedestal provided with at least one light-emitting element that emits light applied to the lips of a person and at least one light-receiving element that receives the light reflected by the lips. Achieve the purpose.

【００２１】前記台上には、音声を入力する音声入力手
段が設けられていてもよい。A voice input means for inputting voice may be provided on the table.

【００２２】本発明の音声選択装置は、複数のテーブル
を格納する第１の記憶手段であって、該複数のテーブル
のそれぞれは、１つの入力に対して出力されうる複数の
言葉を含んでいる第１の記憶手段と、該複数のテーブル
のうちの１つを格納する第２の記憶手段と、外部からの
入力に応じて、該第２の記憶手段に格納されている該１
つのテーブルに含まれている該複数の言葉から１つの言
葉を選択し、該選択された１つの言葉を音声として出力
する選択手段と、該第２の記憶手段に格納されている該
１つのテーブルを、該第１の記憶手段に格納されている
該複数のテーブルのうちから該選択された１つの言葉に
応じて決定される他のテーブルに更新する遷移手段とを
備えており、そのことにより上記目的を達成する。The voice selection device of the present invention is the first storage means for storing a plurality of tables, each of the plurality of tables including a plurality of words that can be output with respect to one input. A first storage means; a second storage means for storing one of the plurality of tables; and the first storage means stored in the second storage means in response to an external input.
Selecting means for selecting one word from the plurality of words included in one table and outputting the selected one word as voice, and the one table stored in the second storage means Of the plurality of tables stored in the first storage means to another table that is determined according to the selected one word, To achieve the above objectives.

【００２３】前記音声選択装置は、乱数を発生する手段
をさらに備えており、前記選択手段は該乱数を用いて前
記複数の言葉から前記１つの言葉を選択してもよい。The voice selection device may further include means for generating a random number, and the selection means may use the random number to select the one word from the plurality of words.

【００２４】本発明の他の音声選択装置は、テーブルを
格納する記憶手段であって、該テーブルは、１つの入力
に応じて出力されうる複数の言葉を含んでいる記憶手段
と、外部からの入力を受け取り、該記憶手段に格納され
ている該テーブルに含まれている該複数の言葉から乱数
を用いて１つの言葉を選択し、それを音声として出力す
る選択手段と、該乱数を発生する手段とを備えており、
そのことにより上記目的を達成する。Another voice selection device of the present invention is a storage means for storing a table, and the table includes a storage means containing a plurality of words that can be output in response to one input, and external storage. Selection means for receiving an input, selecting one word from the plurality of words contained in the table stored in the storage means by using a random number, and outputting the selected word as a voice, and generating the random number And means,
This achieves the above object.

【００２５】本発明の音声反応装置は、上述した音声選
択装置と、音声を入力し、該音声を認識し、認識結果を
該音声選択装置に与える音声認識手段とを備えており、
そのことにより上記目的を達成する。The voice reaction device of the present invention comprises the above-mentioned voice selection device and voice recognition means for inputting a voice, recognizing the voice, and giving a recognition result to the voice selection device.
This achieves the above object.

【００２６】本発明の他のゲーム装置は、上述した音声
反応装置を備えており、そのことにより上記目的を達成
する。Another game device of the present invention is equipped with the above-described voice reaction device, thereby achieving the above object.

【００２７】本発明の他のゲーム装置は、上述した音声
反応装置を複数個備えており、それにより該音声反応装
置がお互いに対話し、そのことにより上記目的を達成す
る。本発明の他のゲーム装置は、入力した音声を電気信
号に変換する複数の音声入力部であって、該複数の音声
入力部はそれぞれ異なる方向に対応している音声入力部
と、該電気信号のエネルギーを該複数の音声入力部のそ
れぞれについて求め、該複数の音声入力部のうちの該エ
ネルギーが最大である１つを決定し、該決定された１つ
の音声入力部に対応する方向を該音声が発生された方向
であると判定する方向検出手段とを備えており、そのこ
とにより上記目的を達成する。Another game device of the present invention comprises a plurality of the above-mentioned voice reaction devices, whereby the voice reaction devices interact with each other, thereby achieving the above object. Another game device of the present invention is a plurality of voice input units for converting input voices into electric signals, the plurality of voice input units respectively corresponding to different directions, and the electric signals. Energy of each of the plurality of voice input units is determined, one of the plurality of voice input units having the largest energy is determined, and the direction corresponding to the determined one voice input unit is determined. And a direction detecting means for determining that the direction is the direction in which the voice is generated, thereby achieving the above object.

【００２８】前記ゲーム装置は、対象物を動作させる動
作手段と、前記判定された方向に該対象物の動作する方
向を変更するように該動作手段を制御する制御手段とを
さらに備えていてもよい。The game apparatus may further include operation means for moving the object, and control means for controlling the operation means so as to change the operation direction of the object in the determined direction. Good.

【００２９】前記ゲーム装置は、対象物の動作の現在の
方向を計測する計測手段、および前記判定された方向を
入力し、該現在の方向および該判定された方向に基づい
て目的方向を求め、該目的方向を格納する手段とを有し
ている方向選択手段と、該対象物を動作させる動作手段
とをさらに備えており、該方向選択手段は、該目的方向
と該現在の方向の差を用いて、該対象物の動作の該現在
の方向と該目的方向とが実質的に一致するように該動作
手段を制御してもよい。The game device inputs measuring means for measuring the current direction of movement of an object and the determined direction, and obtains a target direction based on the current direction and the determined direction, The method further comprises direction selecting means having a means for storing the target direction and operation means for moving the object, the direction selecting means determining the difference between the target direction and the current direction. It may be used to control the motion means such that the current direction of motion of the object and the target direction substantially coincide.

【００３０】本発明の他のゲーム装置は、音声により相
対的な方向を入力する入力手段と、対象物の現在の方向
を計測する計測手段と、該現在の方向および該入力され
た相対的な方向に基づいて目的方向を求め、該目的方向
を格納する手段とを有する方向選択手段を備えたゲーム
装置であって、該方向選択手段によって、該目的方向と
該現在の方向の差を用いて、該対象物の該現在の方向と
該目的方向とが実質的に一致するように該対象物を制御
し、そのことにより上記目的を達成する。In another game device of the present invention, input means for inputting a relative direction by voice, measuring means for measuring the current direction of the object, the current direction and the input relative direction are used. A game device provided with a direction selecting means having a means for obtaining a target direction based on a direction and storing the target direction, wherein the direction selecting means uses the difference between the target direction and the current direction. The object is controlled so that the current direction of the object and the target direction substantially coincide with each other, thereby achieving the above object.

【００３１】前記入力手段は、前記音声が入力される入
力部と、該入力された音声に基づいて前記相対的な方向
を認識する認識部とを有していてもよい。The input means may include an input section for inputting the voice and a recognition section for recognizing the relative direction based on the input voice.

【００３２】本発明の他のゲーム装置は、音声により絶
対的な方向を入力する入力手段と、該絶対的な方向に基
づいて目的方向を決定し、該目的方向を格納する手段
と、対象物の現在の方向を計測する計測手段とを有する
方向選択手段を備えたゲーム装置であって、該方向選択
手段によって、該目的方向と該現在の方向の差を用い
て、該対象物の該現在の方向と該目的方向とが実質的に
一致するように該対象物を制御し、そのことにより上記
目的を達成する。Another game device according to the present invention comprises input means for inputting an absolute direction by voice, means for determining a target direction based on the absolute direction, and storing the target direction, and an object. A direction selection means having a measurement means for measuring the current direction of the target device, wherein the direction selection means uses the difference between the target direction and the current direction to detect the current direction of the object. The object is controlled so that the direction of and the target direction substantially coincide with each other, thereby achieving the above object.

【００３３】前記入力手段は、前記音声が入力される入
力部と、該入力された音声に基づいて前記絶対的な方向
を認識する認識部とを有していてもよい。The input means may include an input section for inputting the voice and a recognition section for recognizing the absolute direction based on the input voice.

【００３４】本発明の音声認識装置は、音声に対応する
電気信号を受け取り、該電気信号から、該音声の入力が
終了した時間である音声終了点を検出する第１の検出手
段と、該電気信号に基づいて、該音声が入力された区間
のうちの該音声が発声された区間である発声区間を決定
する第２の検出手段と、該電気信号の該発声区間の部分
に基づいて、特徴量ベクトルを作成する特徴量抽出手段
と、予め作成された複数の候補音声の特徴量ベクトルを
記憶する記憶手段と、該特徴量抽出手段からの該特徴量
ベクトルを、該記憶手段に記憶されている該複数の候補
音声の該特徴量ベクトルのそれぞれと比較することによ
り、該入力された音声を認識する手段とを備えており、
そのことにより上記目的を達成する。The voice recognition device of the present invention receives the electric signal corresponding to the voice, and detects the voice end point, which is the time when the input of the voice ends, from the electric signal; Second detection means for determining a vocalization section, which is a section in which the voice is uttered, of a section in which the voice is input, based on the signal, and a feature based on a part of the vocalization section of the electric signal. A feature quantity extraction means for creating a quantity vector, a storage means for storing the feature quantity vectors of a plurality of candidate voices created in advance, and the feature quantity vector from the feature quantity extraction means are stored in the storage means. Means for recognizing the input voice by comparing each of the feature amount vectors of the plurality of candidate voices,
This achieves the above object.

【００３５】前記第１の検出手段は、前記電気信号を、
それぞれが所定の長さを有する複数のフレームに分割す
る手段と、該複数のフレームのそれぞれに対して該電気
信号のエネルギーを求める算出手段と、該エネルギーの
分散に基づいて前記音声終了点を決定する決定手段とを
備えていてもよい。The first detecting means outputs the electric signal,
Means for dividing into a plurality of frames each having a predetermined length, calculating means for obtaining the energy of the electric signal for each of the plurality of frames, and determining the voice end point based on the dispersion of the energy It may be provided with a determining means for performing.

【００３６】前記決定手段は、予め定められている閾値
と前記エネルギーの前記分散とを比較することにより前
記音声終了点を決定し、該音声終了点は、該エネルギー
の該分散が該閾値とよりも大きい値から小さい値に変化
するときに該分散が該閾値と一致する時間であってもよ
い。The deciding means decides the voice end point by comparing a predetermined threshold value with the variance of the energy, and the voice end point is determined by comparing the variance of the energy with the threshold value. May be the time when the variance matches the threshold when changing from a large value to a small value.

【００３７】前記決定手段は、前記複数のフレームの前
記エネルギーのうちの所定数のフレームのエネルギーに
対する分散を用いてもよい。The determining means may use the variance of the energies of a predetermined number of frames among the energies of the plurality of frames.

【００３８】前記第２の検出手段は、前記電気信号の前
記エネルギーを平滑化する手段と、該電気信号の該エネ
ルギーを平滑化しないままフレーム毎に順次格納する第
１の循環式記憶手段と、該平滑化されたエネルギーをフ
レーム毎に順次格納する第２の循環式記憶手段と、前記
音声終了点が検出されたときに該第１の循環式記憶手段
に格納されている該平滑化されていないエネルギーおよ
び該第２の循環式記憶手段に格納されている平滑化され
たエネルギーの両方を用いて、発声区間検出用閾値を算
出する閾値算出手段と、該平滑化されていないエネルギ
ーを該発声区間検出用閾値と比較することにより、前記
発声区間を決定する発声区間決定手段とを有していても
よい。The second detecting means includes means for smoothing the energy of the electric signal, and first circulating storage means for sequentially storing the energy of the electric signal without smoothing it for each frame. Second circular storage means for sequentially storing the smoothed energy for each frame, and the smoothed storage stored in the first cyclic storage means when the voice end point is detected. Threshold calculation means for calculating a vocalization section detection threshold value using both the non-energy and the smoothed energy stored in the second circulation type storage means, and the non-smoothed energy for the vocalization. It may have a vocalization section determining means for determining the vocalization section by comparing with a threshold for section detection.

【００３９】前記閾値算出手段は、前記音声終了点が検
出された時点で前記第１の循環式記憶手段に格納されて
いる前記平滑化されていないエネルギーの最大値と、該
音声終了点が検出されていない時点で前記第２の循環式
記憶手段に格納されている前記平滑化エネルギーの最小
値とを用いて、前記発声区間検出用閾値を算出してもよ
い。The threshold value calculation means detects the maximum value of the unsmoothed energy stored in the first circulation type storage means at the time when the voice end point is detected, and the voice end point. The voicing section detection threshold value may be calculated by using the minimum value of the smoothing energy stored in the second circulation type storage means at a time when the vocalization section is not performed.

【００４０】前記特徴量検出手段は、前記電気信号の前
記発声区間の部分から、該電気信号のフレーム毎のゼロ
交差数と、該電気信号を微分して得られる信号のフレー
ム毎のゼロ交差数と、該電気信号の前記エネルギーとを
算出し、これらを前記特徴量ベクトルの要素としてもよ
い。The feature amount detecting means detects the number of zero crossings for each frame of the electric signal and the number of zero crossings for each frame of the signal obtained by differentiating the electric signal from the vocalization section of the electric signal. And the energy of the electric signal are calculated, and these may be used as elements of the feature amount vector.

【００４１】本発明の他の音声反応装置は、少なくとも
１つの上述した音声認識装置と、該少なくとも１つの音
声認識装置の認識結果に基づいて対象物を制御する少な
くとも１つの制御手段とを備えており、そのことにより
上記目的を達成する。Another voice reaction device of the present invention comprises at least one voice recognition device described above, and at least one control means for controlling an object based on the recognition result of the at least one voice recognition device. Therefore, the above object is achieved.

【００４２】前記音声反応装置は、前記少なくとも１つ
の音声認識装置に接続されており、該少なくとも１つの
音声認識装置による前記認識結果を送信する送信手段
と、前記少なくとも１つの制御装置に接続されており、
該送信された認識結果を受け取り、該少なくとも１つの
制御装置に与える受信手段とをさらに備えており、該少
なくとも１つの制御装置および該受信手段は前記対象物
に取り付けられており、それにより該対象物を遠隔より
操作することを可能としてもよい。The voice reaction device is connected to the at least one voice recognition device, and is connected to a transmission means for transmitting the recognition result by the at least one voice recognition device and to the at least one control device. Cage,
Receiving means for receiving the transmitted recognition result and giving it to the at least one control device, the at least one control device and the receiving means being attached to the object, whereby the object It may be possible to remotely control the object.

【００４３】[0043]

【作用】本発明のゲーム装置では、音声認識手段は入力
された音声を認識し、発声区間検出装置は話者（操作
者）の口唇の動きから話者が発声している区間である発
声区間を検出する。この音声認識結果、および発声区間
の検出結果に基づいて、統合判断部が話者が音声により
入力したコマンドを認識し、そのコマンドに応じて制御
部が対象物を制御する。これにより、人間の音声により
ゲームを操作することが可能であり、話者以外の者の音
声を誤認識したことに起因する誤操作を防ぐことができ
る。また、本発明の他のゲーム装置では、操作者の口唇
の動きから直にコマンドを認識するので、人間の音声に
より、騒音下、あるいは音声を発しにくい状況でもゲー
ムを操作することが可能となる。また、このゲーム装置
は、発生に障害のある者の利用も可能とする。本発明の
さらに他のゲーム装置では、音声認識手段による認識結
果と口唇の動きに基づく認識結果との両方から統合判断
部がより確からしい認識結果を判定する。このため、上
述した利点に加えて、音声によるゲーム操作の信頼性を
より高くすることができるという利点も得られる。In the game device of the present invention, the voice recognition means recognizes the input voice, and the utterance section detection device is the utterance section in which the speaker is uttering from the movement of the lip of the speaker (operator). To detect. Based on the voice recognition result and the voiced section detection result, the integrated determination unit recognizes the command input by the speaker by voice, and the control unit controls the object according to the command. As a result, it is possible to operate the game with a human voice, and it is possible to prevent an erroneous operation caused by erroneously recognizing a voice of a person other than the speaker. Further, in the other game device of the present invention, the command is directly recognized from the movement of the lips of the operator, so that it is possible to operate the game in the presence of noise by the human voice or in a situation where it is difficult to produce the voice. . In addition, this game device can be used by persons with disabilities. In still another game device of the present invention, the integrated determination unit determines a more probable recognition result from both the recognition result by the voice recognition means and the recognition result based on the movement of the lips. Therefore, in addition to the above-mentioned advantages, there is an advantage that the reliability of the game operation by voice can be further increased.

【００４４】本発明の入力装置は、軽いヘッドセットに
支柱を取り付け、支柱に取り付けた台に安価な発光素子
（例えば、ＬＥＤ等）と安価が受光素子（フォトダイオ
ード等）を取り付けているために、非常に軽く、しかも
安価に入力装置を提供することができる。さらに、ヘッ
ドセットを伸縮可能にしておけば、その入力装置の操作
者ごとにヘッドセットの長さを調節して、発光素子およ
び受光素子と操作者の口唇付近との位置関係を調節する
ことができる。In the input device of the present invention, since a pillar is attached to a light headset, and an inexpensive light emitting element (eg, LED) and an inexpensive light receiving element (photodiode, etc.) are attached to a base attached to the pillar. It is possible to provide an input device that is extremely light and inexpensive. Furthermore, if the headset is made expandable and contractible, the length of the headset can be adjusted for each operator of the input device to adjust the positional relationship between the light emitting element and the light receiving element and the vicinity of the operator's lip. it can.

【００４５】本発明の音声選択装置では、外部から１つ
の入力があると、第２の記憶手段に格納されているテー
ブルに含まれている言葉のうちの１つが選択され、音声
として出力される。そして、第２の記憶手段に格納され
ているテーブルは、第１の記憶手段に格納されている複
数のテーブルのうちからこの出力に応じて選ばれるテー
ブルに変更される。次に外部から入力があると、上述し
た動作が繰り返される。このようにして、本発明の音声
選択装置は、１つの入力に１つの言葉を返すという１回
の動作だけではなく、次々と与えられる入力に応じて言
葉を返していくということができる。この音声選択装置
を音声認識装置と組み合わせれば、入力された音声から
それに対応する言葉を認識し、その認識結果に応じて、
ランダムに選ばれた言葉を音声として出力する音声反応
装置を構成することができる。ゲーム装置にこの音声反
応装置を少なくとも１個設ければ、音声反応装置に操作
者と対話を行わせることができるし、また複数個設ける
と、装置同士で対話を行うゲーム装置を構成することも
できる。また１つの入力に対して出力されるべき言葉を
乱数を用いて選択することにより、同一の入力に対して
常に同じ言葉を出力するというのではなく、変化のある
出力を行うことができる。In the voice selection device of the present invention, when there is one input from the outside, one of the words contained in the table stored in the second storage means is selected and output as a voice. . Then, the table stored in the second storage means is changed to a table selected according to this output from among the plurality of tables stored in the first storage means. Next, when there is an input from the outside, the above operation is repeated. In this way, the voice selection device of the present invention can return not only one operation of returning one word to one input, but also returning words according to input given one after another. If this voice selection device is combined with a voice recognition device, it will recognize the corresponding words from the input voice, and depending on the recognition result,
A voice reaction device that outputs randomly selected words as voice can be configured. If at least one voice reaction device is provided in the game device, the voice reaction device can interact with the operator. If more than one voice reaction device is provided, a game device in which the devices interact with each other can be configured. it can. Further, by selecting a word to be output for one input by using a random number, it is possible to perform a different output instead of always outputting the same word for the same input.

【００４６】本発明の他のゲーム装置では、それぞれが
異なる方向に対応している複数の音声入力部を用いて音
声が入力された方向を検出する。そして、検出された方
向に対象物の移動の向きあるいは対象物自体の向きを変
更する。このようにして、音声により対象物を動作させ
ることができる。また本発明の他のゲーム装置では、音
声によって入力された方向と現在の対象物の移動方向あ
るいは向きとの差を方位計で検出しながら、対象物の移
動方向あるいは向きを変更する。In another game device according to the present invention, the direction in which a voice is input is detected by using a plurality of voice input units, each of which corresponds to a different direction. Then, the direction of movement of the object or the direction of the object itself is changed to the detected direction. In this way, the object can be operated by the voice. Further, in another game device of the present invention, the moving direction or the direction of the target object is changed while detecting the difference between the direction input by voice and the current moving direction or the direction of the target object.

【００４７】本発明の音声認識装置は、入力された音声
に対応する電気信号から音声の入力が終了した点を検出
する。続いて、このようにして求められる音声が入力さ
れている区間分の電気信号から、さらに音声が発声され
ている区間を抽出する。この音声が発声されている区間
分の電気信号から、実際に候補音声の特徴量ベクトルと
比較される特徴量ベクトルを作成するので、本発明の音
声認識装置は簡単な構成で精度よく音声を認識すること
ができる。また、音声が発声されている区間の抽出に用
いられる閾値は、上記電気信号のエネルギーおよびこの
エネルギーを平滑化したものとに基づいて算出される。
これにより、音声が発声されている区間を良好に検出す
ることができる。さらに、この音声認識装置を、対象物
の動作を制御する手段と組み合わせて得られる音声反応
装置では、入力された音声に対応する動作を対象物に行
わせることができる。The voice recognition device of the present invention detects the point at which voice input is completed from the electrical signal corresponding to the input voice. Subsequently, the section in which the voice is further uttered is extracted from the electric signal for the section in which the voice thus obtained is input. Since a feature amount vector that is actually compared with the feature amount vector of the candidate voice is created from the electrical signal for the section in which this voice is uttered, the voice recognition device of the present invention recognizes voice with high accuracy with a simple configuration. can do. Further, the threshold value used for extracting the section in which the voice is uttered is calculated based on the energy of the electric signal and a smoothed version of this energy.
As a result, it is possible to satisfactorily detect the section in which the voice is uttered. Furthermore, in the voice reaction device obtained by combining the voice recognition device with the means for controlling the motion of the target object, the target object can be caused to perform the motion corresponding to the input voice.

【００４８】[0048]

【Example】

（第１の実施例）以下、図面を参照しながら本発明のゲ
ーム装置の第１の実施例を説明する。本実施例は、飛行
船の動きに応じた音声コマンドで飛行船を操作するゲー
ム装置である。音声コマンドは、「前」・「後ろ」・
「右」・「左」・「上」・「下」の６個のコマンドを含
んでいる。(First Embodiment) A first embodiment of the game apparatus of the present invention will be described below with reference to the drawings. The present embodiment is a game device for operating an airship with a voice command according to the movement of the airship. Voice commands are "front", "back",
It contains 6 commands, "Right", "Left", "Up" and "Down".

【００４９】本実施例では、話者（ゲーム装置の操作
者）の音声信号とともに話者の口唇の動きを表す信号を
入力し、これらの信号に基づいて話者が発声しているか
否かを判定する処理を行っている。これにより、周囲の
騒音、特に他者が話した声による誤動作を防止すること
が可能となる。In this embodiment, a voice signal of the speaker (operator of the game device) and a signal representing the movement of the lip of the speaker are input, and whether or not the speaker is uttering based on these signals is input. The determination process is being performed. As a result, it is possible to prevent malfunctions due to ambient noise, especially voices spoken by others.

【００５０】図１に、本実施例のゲーム装置の構成を簡
単に示す。本実施例のゲーム装置は、入力された音声を
処理するための音声入力部１および音声認識部２、口唇
の動きを入力し、口唇の動きを示す信号を処理するため
の画像入力部３および発声区間検出部４を備えている。
音声認識部２および発声区間検出部４は、ともに統合判
断部５に接続されており、ここで入力された音声および
口唇の動きの両方に基づき、話者が発声したコマンドが
何であるかが判断される。統合判断部５の判断結果は制
御部６に入力され、これに基づいて制御部６は飛行船７
を制御する。FIG. 1 briefly shows the structure of the game apparatus of this embodiment. The game device of this embodiment includes a voice input unit 1 and a voice recognition unit 2 for processing input voices, an image input unit 3 for inputting lip movements, and an image input unit 3 for processing signals indicating lip movements. The utterance section detection unit 4 is provided.
The voice recognition unit 2 and the utterance section detection unit 4 are both connected to the integrated determination unit 5, and determine what the command uttered by the speaker is based on both the voice input and the movement of the lips. To be done. The determination result of the integrated determination unit 5 is input to the control unit 6, and based on this, the control unit 6 causes the airship 7 to operate.
Control.

【００５１】まず、話者が発声したコマンドを含む音声
が音声入力部１に入力される。音声の入力は、例えば、
通常のマイクロフォン等を利用することができる。音声
入力部１は入力された音声を電気信号に変換し、これを
音声信号１１として音声認識部２に出力する。音声認識
部２は音声信号１１を解析し、その結果を音声認識結果
１２として出力する。音声信号１１の解析は、例えばＤ
Ｐマッチング等の従来から知られている手法により行う
ことができる。First, the voice including the command uttered by the speaker is input to the voice input unit 1. For voice input, for example,
A normal microphone or the like can be used. The voice input unit 1 converts the input voice into an electric signal and outputs it as a voice signal 11 to the voice recognition unit 2. The voice recognition unit 2 analyzes the voice signal 11 and outputs the result as a voice recognition result 12. The analysis of the audio signal 11 is performed, for example, by
It can be performed by a conventionally known method such as P matching.

【００５２】以上の入力音声の処理と平行して、口唇の
動きを表す電気信号の処理が行われる。話者がコマンド
を発声すると、そのときの口唇の動きが画像入力部３に
入力される。図２に画像入力部３の構成例を示す。本実
施例の画像入力部３は、ＬＥＤ２１から発した光を話者
の口唇部分に照射し、口唇部分に反射された光をフォト
ダイオード２２によって検出する。これにより、口唇の
動きに応じた電気信号１３を出力する。話者の口唇に動
きがある場合、電気信号１３のレベルは、話者の口唇付
近の陰影の変化に応じて変化する。なお、話者の口唇に
は、ＬＥＤ２１からの光を正面から照射してもよいし、
側面から照射してもよい。In parallel with the above processing of the input voice, the processing of the electric signal representing the movement of the lips is performed. When the speaker utters a command, the movement of the lips at that time is input to the image input unit 3. FIG. 2 shows a configuration example of the image input unit 3. The image input unit 3 of the present embodiment irradiates the lip portion of the speaker with the light emitted from the LED 21, and the photodiode 22 detects the light reflected by the lip portion. As a result, the electric signal 13 corresponding to the movement of the lips is output. When the speaker's lip moves, the level of the electric signal 13 changes in accordance with the change in the shadow near the speaker's lip. The speaker's lips may be illuminated with light from the LED 21 from the front,
You may irradiate from a side surface.

【００５３】画像入力部３からの電気信号１３は発声区
間検出部４に入力される。図３に、本実施例の発声区間
検出部４の構成を示す。発声区間検出部４は、微分回路
３１と区間検出部３２とを有している。微分回路３１
は、入力された電気信号１３の変化度合いを示す微分信
号３３を出力する。微分信号３３の波形の一例を図５に
示す。図５は、ＬＥＤ２１からの光を話者の口唇に側面
から照射した状態で話者がコマンド「前」および「後
ろ」を発声したときに得られた微分信号３３を示してい
る。図５から分かるように、話者が発声している場合に
は、微分信号３３の振幅が大きくなる。また、話者の口
唇に側面からＬＥＤ光を当てているため、コマンド「後
ろ」の「う」を発した時に唇が尖る動きが波形に反映さ
れているのがわかる。なお、ＬＥＤ２１からの光を話者
の口唇に正面からあてる場合には、光が話者の顔のみに
当たるので、電気信号１３および微分信号３３は背景の
動きに起因するノイズの影響を受けないという利点があ
る。The electric signal 13 from the image input section 3 is input to the vocal section detecting section 4. FIG. 3 shows the configuration of the vocalization section detection unit 4 of this embodiment. The utterance section detection unit 4 has a differentiating circuit 31 and a section detection unit 32. Differentiating circuit 31
Outputs a differential signal 33 indicating the degree of change of the input electric signal 13. An example of the waveform of the differential signal 33 is shown in FIG. FIG. 5 shows the differential signal 33 obtained when the speaker utters the commands “front” and “back” in a state in which the speaker's lips are laterally illuminated with the light from the LED 21. As can be seen from FIG. 5, when the speaker is speaking, the amplitude of the differential signal 33 is large. Further, since the LED light is applied to the speaker's lips from the side, it can be seen that the movement of sharpening the lips when the command "back""u" is emitted is reflected in the waveform. Note that when the light from the LED 21 is applied to the speaker's lip from the front, the light hits only the speaker's face, so that the electrical signal 13 and the differential signal 33 are not affected by noise caused by the movement of the background. There are advantages.

【００５４】区間検出部３２は、この微分信号３３を受
け取り、微分信号３３の振幅の大きさを判定し、話者の
発声区間を検出する。具体的な発声区間の検出法を図６
を参照しながら説明する。The section detection unit 32 receives the differential signal 33, determines the magnitude of the amplitude of the differential signal 33, and detects the speaking section of the speaker. FIG. 6 shows a specific method of detecting the vocalization section.
Will be described with reference to.

【００５５】区間検出部３２は、微分信号３３のレベル
が所定の振幅閾値５１を超えると、その微分信号３３は
話者がコマンドを発声したことによって生じたものであ
ると判断し、微分信号３３のレベルが振幅閾値５１を超
えている区間を発声区間とする。図６に示す例では、区
間１および区間２が発声区間である。続いて、隣接する
発声区間のインターバルを所定の時間閾値５２と比較す
る。この時間閾値５２は、複数の発声区間が同一の発声
に対応するものか否か、つまり複数の発声区間が連続す
るものか否かを判断するために用いられる値である。発
声区間のインターバルが時間閾値５２以内であれば、そ
のインターバルを挟んだ２つの発声区間は連続した発声
区間であると判断される。このようにして判定された連
続した発声区間を表す信号１４が発声区間検出部４から
出力される。なお、振幅閾値５１および時間閾値５２
は、いずれも、予め適当な値に設定され得る。When the level of the differential signal 33 exceeds a predetermined amplitude threshold value 51, the section detecting unit 32 determines that the differential signal 33 is caused by the speaker uttering a command, and the differential signal 33 The section in which the level of exceeds the amplitude threshold 51 is the vocal section. In the example shown in FIG. 6, section 1 and section 2 are vocalization sections. Then, the intervals of the adjacent vocal sections are compared with a predetermined time threshold value 52. The time threshold value 52 is a value used for determining whether or not a plurality of vocalization sections correspond to the same vocalization, that is, whether or not a plurality of vocalization sections are continuous. If the interval of the vocalization section is within the time threshold 52, it is determined that the two vocalization sections sandwiching the interval are continuous vocalization sections. A signal 14 representing the continuous utterance section thus determined is output from the utterance section detector 4. The amplitude threshold 51 and the time threshold 52
Can be set to appropriate values in advance.

【００５６】以上述べたようにして、発声区間検出部４
は、微分信号３３を用いて口唇の動きの激しさと持続時
間を検出することにより、話者がコマンドを発声した区
間を求める。As described above, the utterance section detection unit 4
Detects the intensity and duration of lip movement using the differential signal 33 to obtain the section in which the speaker uttered a command.

【００５７】次に、統合判断部５の動作について説明す
る。統合判断部５は、図４に示すように、音声認識時間
判定部４１、出力判定部４２および出力ゲート４３を有
している。音声認識時間判定部４１は音声認識結果１２
を受け取り、認識された音声が音声入力部１に入力され
た時間を出力判定部４２に伝える。出力判定部４２に
は、音声認識時間判定部４１からの出力の他に、発声区
間検出部４からの発声区間検出信号１４が入力される。
ここで、図７を参照しながら出力判定部４２の動作を説
明する。Next, the operation of the integrated judgment section 5 will be described. As shown in FIG. 4, the integrated judgment unit 5 has a voice recognition time judgment unit 41, an output judgment unit 42, and an output gate 43. The voice recognition time determination unit 41 determines the voice recognition result 12
Is received and the time when the recognized voice is input to the voice input unit 1 is transmitted to the output determination unit 42. In addition to the output from the voice recognition time determination unit 41, the output determination unit 42 receives the utterance section detection signal 14 from the utterance section detection unit 4.
Here, the operation of the output determination unit 42 will be described with reference to FIG. 7.

【００５８】出力判定部４２は、まず、受け取った発声
区間検出信号１４に基づいて、発声区間の前後に評価用
の時間閾値７１を足すことにより評価用発声区間７２を
作成する。次に、音声認識結果１２が音声認識部２から
出力された時間が、上記評価用発声区間７２に収まって
いるか否かを判定する。収まっている場合には、音声入
力部１に入力され、音声認識部２によって認識された音
声は、話者によって発声されたものであると判断され
る。判断の結果は信号１５として制御部６に出力され
る。The output judging section 42 first creates an evaluation vocal section 72 by adding a time threshold 71 for evaluation before and after the vocal section based on the received vocal section detection signal 14. Next, it is determined whether the time when the voice recognition result 12 is output from the voice recognition unit 2 is within the evaluation vocalization section 72. When it is within the range, it is determined that the voice input to the voice input unit 1 and recognized by the voice recognition unit 2 is uttered by the speaker. The result of the determination is output to the control unit 6 as the signal 15.

【００５９】なお、評価用発声区間７２を作成するため
の時間閾値７１は、音声認識部２が行う認識処理に要す
る時間を考慮して設定される。これは、認識された音声
が話者の発声によるものかどうかを判断する材料の１つ
として、音声認識結果１２が出力された時間を用いてい
るためである。The time threshold value 71 for creating the evaluation utterance section 72 is set in consideration of the time required for the recognition process performed by the voice recognition unit 2. This is because the time when the voice recognition result 12 is output is used as one of the materials for determining whether the recognized voice is due to the utterance of the speaker.

【００６０】このようにして、音声によって入力された
コマンドに対応する信号１５が得られると、制御部６
は、入力されたコマンドに応じたラジオコントロール信
号を出力することにより飛行船７を制御する。In this way, when the signal 15 corresponding to the command input by voice is obtained, the control unit 6
Controls the airship 7 by outputting a radio control signal according to the input command.

【００６１】以上のように第１の実施例では、話者がコ
マンドを発声したときの口唇の動きから話者が発声して
いる発声区間を検出し、これに基づいて、認識された音
声が話者のものか否かを判断する。このため、話者以外
の発声による誤認識、およびその結果生じる対象物の誤
動作を防止することができる。As described above, in the first embodiment, the utterance section in which the speaker is uttering is detected from the movement of the lips when the speaker utters a command, and the recognized voice is detected based on this. Determine if it belongs to the speaker. Therefore, it is possible to prevent the erroneous recognition due to the utterance of a person other than the speaker, and the erroneous operation of the target object resulting therefrom.

【００６２】従って、音声による操作という人間にとっ
て自然な操作によるゲーム装置を実現することが可能と
なる。また、本実施例では、話者の口唇の動きを、ＬＥ
Ｄとフォトダイオードとの組み合わせといった簡易な構
成・方法によって検出している。このため、話者の口唇
の画像をビデオカメラ等を用いて取り入れていた従来の
装置と比較して、非常に安価に実現することができる。
もちろん、フォトダイオードの代わりにフォトトランジ
スタを用いても構わない。Therefore, it is possible to realize a game device by a human-friendly operation of voice operation. In addition, in this embodiment, the movement of the lip of the speaker is LE
It is detected by a simple configuration / method such as a combination of D and a photodiode. Therefore, the image of the lip of the speaker can be realized at a very low cost as compared with the conventional device which has taken in the image using the video camera or the like.
Of course, a phototransistor may be used instead of the photodiode.

【００６３】なお、図２、図３の回路構成は一例を示し
たもので、この構成のみに限定されるものではない。ま
た、計算機のソフトウェアを利用して実現することも可
能である。The circuit configurations shown in FIGS. 2 and 3 are merely examples, and the present invention is not limited to this configuration. It is also possible to realize it by using computer software.

【００６４】（第２の実施例）本発明の第２の実施例の
ゲーム装置では、コマンドを音声により入力するのでは
なく、口唇の動きのみで入力し、入力されたコマンドに
応じて飛行船を制御する。これにより、騒音下での利
用、また例えば真夜中等の音声を発声できない状況にお
ける利用、あるいは発声に障害がある者の利用を可能に
する。(Second Embodiment) In the game device according to the second embodiment of the present invention, the command is not inputted by voice, but only by the movement of the lips, and the airship is moved according to the inputted command. Control. This enables use in a noisy environment, use in a situation where no voice can be produced, such as midnight, or use by a person with a voice disorder.

【００６５】図８は、本実施例のゲーム装置の構成を簡
単に示す図である。本実施例のゲーム装置は、上記実施
例１と同様に、画像入力部３、制御部６、飛行船７を備
えており、さらに、口唇の動きから話者（操作者）の言
葉を認識する口唇認識部８１を備えている。FIG. 8 is a schematic diagram showing the structure of the game apparatus of this embodiment. The game device according to the present embodiment includes the image input unit 3, the control unit 6, and the airship 7 as in the above-described first embodiment, and further, the lip for recognizing the words of the speaker (operator) from the movement of the lip. The recognition unit 81 is provided.

【００６６】口唇認識部８１の構成例を図９に示す。本
実施例では、口唇認識部８１は、微分回路３１、差分計
算部９１、データベース９２およびパターンマッチング
部９３から構成される。微分回路３１は、上記第１の実
施例のゲーム装置の発声区間検出部４において用いられ
たものと同じである。差分計算部９１は、微分回路３１
からの微分信号３３を所定の時間幅でサンプリングし、
サンプリングデータ間の差分を計算する。差分計算の結
果は、差分計算部９１からデータベース９２およびパタ
ーンマッチング部９３の両方に送られる。データベース
９２には、認識に用いられる標準パターンの差分計算結
果が保持されている。パターンマッチング部９３は、保
持されている標準パターンの差分結果と、認識対象とな
っている入力パターンの差分計算結果との距離の差を求
め、この差に基づいて口唇の動きとして入力された言葉
を認識する。もちろん、差が小さいほど認識結果の信頼
性は高い。FIG. 9 shows an example of the structure of the lip recognition section 81. In the present embodiment, the lip recognition unit 81 includes a differentiating circuit 31, a difference calculation unit 91, a database 92, and a pattern matching unit 93. The differentiating circuit 31 is the same as that used in the utterance section detecting unit 4 of the game device of the first embodiment. The difference calculation unit 91 uses the differentiating circuit 31.
The differential signal 33 from is sampled in a predetermined time width,
Calculate the difference between the sampling data. The result of the difference calculation is sent from the difference calculation unit 91 to both the database 92 and the pattern matching unit 93. The database 92 holds the difference calculation results of standard patterns used for recognition. The pattern matching unit 93 obtains a difference in distance between the difference result of the held standard pattern and the difference calculation result of the input pattern that is the recognition target, and based on this difference, the word input as the movement of the lips Recognize. Of course, the smaller the difference, the higher the reliability of the recognition result.

【００６７】以下、本実施例のゲーム装置の動作を詳細
に説明する。本実施例では、口唇認識部８１は、上述し
たように標準パターンと入力パターンとの比較により入
力された言葉の認識を行うために、認識動作を行うより
も前に標準パターンを予め口唇認識部８１に登録してお
く必要がある。The operation of the game apparatus of this embodiment will be described in detail below. In the present embodiment, the lip recognition unit 81 recognizes the standard pattern in advance by performing the recognition operation in order to recognize the input word by comparing the standard pattern with the input pattern as described above. It is necessary to register in 81.

【００６８】（登録動作）まず、画像入力部３が、話者
の口唇部分によって反射されたＬＥＤ反射光を受け、口
唇の動きに応じた電気信号１３を口唇認識部８１に出力
する。電気信号１３は口唇認識部８１の微分回路３１に
入力される。微分回路３１は、電気信号１３の変化の度
合いを示す微分信号３３を差分計算部９１に伝える。こ
こまでは、第１の実施例と同様である。(Registration Operation) First, the image input section 3 receives the LED reflected light reflected by the lip portion of the speaker and outputs the electric signal 13 corresponding to the movement of the lip to the lip recognition section 81. The electric signal 13 is input to the differentiation circuit 31 of the lip recognition unit 81. The differentiating circuit 31 transmits the differential signal 33 indicating the degree of change of the electric signal 13 to the difference calculating unit 91. Up to this point, the process is the same as in the first embodiment.

【００６９】差分計算部９１の動作を図１０を参照しな
がら説明する。まず、微分信号３３を時間幅（Δt）で
サンプリングし、得られたサンプリングデータにおいて
隣り合うサンプリングデータ間の差を計算する。計算さ
れたサンプリングデータ間の差、すなわち一連の差分デ
ータはデータベース９２に出力される。データベース９
２はこの差分データ列を保持する。以上の動作を、認識
されるべき言葉（カテゴリー）の数だけ繰り返し、全て
のカテゴリーに対して差分データ列を格納する。格納さ
れた差分データ列は、認識に用いられる標準パターンと
して保持されることになる。本実施例では、対象物の制
御に用いられるコマンドは、「前」・「後ろ」・「右」
・「左」・「上」・「下」の６つである。従って、上述
した差分データ列の格納は６回繰り返され、最終的には
６つの標準パターンがデータベース９２に保持されるこ
とになる。The operation of the difference calculator 91 will be described with reference to FIG. First, the differential signal 33 is sampled in a time width (Δt), and the difference between adjacent sampling data in the obtained sampling data is calculated. The difference between the calculated sampling data, that is, a series of difference data is output to the database 92. Database 9
2 holds this difference data string. The above operation is repeated for the number of words (categories) to be recognized, and difference data strings are stored for all categories. The stored difference data string is held as a standard pattern used for recognition. In this embodiment, the commands used to control the object are "front", "back", and "right".
・ The six items are "left", "top", and "bottom". Therefore, the above-described storage of the difference data string is repeated 6 times, and finally the 6 standard patterns are held in the database 92.

【００７０】このようにして全ての標準パターンをデー
タベース９２に登録し終えると、データベース９２は各
差分データ列を調べ、口唇が動いている部分に相当する
データが続いている区間の長さを各差分データ列に対し
て抽出する。具体的には、例えば、差分データ列内でゼ
ロに近い値が所定の時間よりも長く続いていれば、その
区間は口唇が動いていないときに相当すると判断する。
そして、全ての標準パターンについて口唇が動いている
部分に対応する区間の長さを抽出し終わると、最も長い
長さを有する標準パターンを選び出し、その長さを標準
パターンの差分データ列長（Ｎ）と定める。以上で登録
動作が終了し、標準パターンの差分データ列がデータベ
ース９２に保持された状態となる。When all the standard patterns have been registered in the database 92 in this way, the database 92 examines each differential data string and determines the length of the section in which the data corresponding to the part where the lips are moving continues. Extract to the difference data string. Specifically, for example, if a value close to zero in the difference data string continues longer than a predetermined time, it is determined that the section corresponds to when the lips are not moving.
Then, when the length of the section corresponding to the part where the lips are moving is extracted for all the standard patterns, the standard pattern having the longest length is selected, and the length thereof is set to the difference data string length (N ). The registration operation is completed as described above, and the difference data string of the standard pattern is held in the database 92.

【００７１】（認識動作時）口唇部分の動きを入力して
から微分信号３３を得るまでの動作は、登録動作時と全
く同様である。ここでは、微分信号３３が差分計算部９
１に入力されてから後の動作を図１１を参照しながら説
明する。(During recognition operation) The operation from inputting the movement of the lip portion to obtaining the differential signal 33 is exactly the same as the registration operation. Here, the differential signal 33 is the difference calculation unit 9
The operation after the input to 1 will be described with reference to FIG.

【００７２】差分計算部９１に入力された微分信号３３
は、登録動作時と同じように時間幅（Δt）でサンプリ
ングされる。続いて、標準パターンの差分データ列長
（Ｎ）の長さ分の区間内のサンプリングデータについ
て、隣接するサンプリングデータ間の差分を計算し、得
られた一連の差分データをその区間の差分データ列とす
る。差分が計算される区間は順次Δtずつ時間的に後方
にずらしていく。図１１では、一番目のサンプリングデ
ータを区間の始まりとし、区間の長さがＮであるような
区間１１１についての差分データ列、および区間１１１
からＮ／２だけ時間的に後方にずれた区間１１２につい
て差分データ列のみを図示している。The differential signal 33 input to the difference calculator 91
Is sampled in the time width (Δt) as in the registration operation. Next, the difference between adjacent sampling data is calculated for the sampling data within the section of the difference data string length (N) of the standard pattern, and the obtained series of difference data is used as the difference data string for the section. And The section in which the difference is calculated is sequentially shifted backward by Δt. In FIG. 11, the first sampling data is the beginning of the section, and the difference data string for the section 111 in which the section length is N and the section 111
Only the difference data string is illustrated for the section 112 that is shifted backward by N / 2 from 1 to.

【００７３】区間の長さがＮである複数の区間の差分デ
ータ列（以下、これらを認識差分データ列とする）が求
められると、これらの認識差分データ列は、パターンマ
ッチング部９３に送られる。パターンマッチング部９３
は、データベース９２から標準パターンを読み出してき
て、複数の認識差分データ列のそれぞれについて、標準
パターンのそれぞれとの距離を求める。本実施例では、
上述したように６個の標準パターンがデータベース９２
に登録されているので、パターンマッチング部９３は認
識差分データ列のそれぞれについて、各標準パターンと
の距離を１つずつ計算することになる。When the difference data strings of a plurality of sections having the section length N (hereinafter, these are referred to as recognition difference data strings) are obtained, these recognition difference data strings are sent to the pattern matching unit 93. . Pattern matching unit 93
Reads the standard pattern from the database 92 and obtains the distance from each of the standard patterns for each of the plurality of recognized difference data strings. In this embodiment,
As described above, the six standard patterns are stored in the database 92.
Therefore, the pattern matching unit 93 calculates one distance from each standard pattern for each recognized difference data string.

【００７４】認識差分データ列と標準パターンとの距離
は、以下の式を用いて計算される。ここで、r_iはｉ番目の認識差分データ列、p_ijはｊ番目
の標準パターン（ｊ番目のカテゴリーに対応）、ｄ_jは
認識差分データ列とｊ番目の標準パターンとの距離であ
る。パターンマッチング部９３は、この距離ｄ_j がある
一定値以下となると、認識差分データ列がｊ番目の標準
パターンと一致したと判断し、そのｊ番目のカテゴリー
（言葉）に対応する信号８２を判断結果として出力す
る。The distance between the recognized difference data string and the standard pattern is calculated using the following formula. Here, r _i is the i-th recognized difference data string, p _ij is the j-th standard pattern (corresponding to the j-th category), and d _j is the distance between the recognized difference data string and the j-th standard pattern. When the distance d _j becomes a certain value or less, the pattern matching unit 93 determines that the recognized difference data string matches the jth standard pattern, and determines the signal 82 corresponding to the jth category (word). Output as a result.

【００７５】この判断結果は制御部６に入力され、制御
部６はｊ番目のカテゴリーに対応したラジオコントロー
ル信号を出力して、飛行船７を制御する。The result of this determination is input to the control unit 6, and the control unit 6 outputs a radio control signal corresponding to the jth category to control the airship 7.

【００７６】以上述べたように、本実施例では、口唇の
動きのみを基に入力された言葉（コマンド）を認識し、
認識された言葉に応じて飛行船を制御する。このため、
騒音下での利用や、声が出しにくい状況での利用、また
発声に障害がある者の利用が可能になる。As described above, in this embodiment, the input word (command) is recognized based on only the movement of the lips,
Control the airship according to the recognized words. For this reason,
It is possible to use in a noisy environment, in situations where it is difficult to speak, and for people with speech disabilities.

【００７７】また、口唇の動きを入力する画像入力部３
は、上記実施例１と同様に、ＬＥＤ２１とフォトダイオ
ード２２の組み合わせによって実現され得るため、ビデ
オカメラ等を用いて口唇の画像自体を取り込む従来の方
法と比較して、非常に安価なゲーム装置を提供すること
ができる。The image input section 3 for inputting the movement of the lips
Can be realized by a combination of the LED 21 and the photodiode 22 as in the case of the first embodiment. Therefore, as compared with the conventional method of capturing the image of the lips itself using a video camera or the like, a very inexpensive game device can be provided. Can be provided.

【００７８】なお、本実施例ではゲームの利用者が、コ
マンドの入力に先立って、コマンドの認識に用いられる
標準パターンの登録を行っている。しかし、例えばゲー
ム装置製造時あるいは出荷時等にあらかじめ不特定の利
用者の口唇の動きに対応できるような標準パターンをデ
ータベース９２に登録しておき、利用者による登録を省
略するようにしてもよい。In this embodiment, the user of the game registers a standard pattern used for command recognition prior to command input. However, for example, when manufacturing or shipping the game device, a standard pattern capable of coping with unspecified user's lip movements may be registered in the database 92 in advance, and registration by the user may be omitted. .

【００７９】（第３の実施例）続いて、本発明の第３の
実施例のゲーム装置を説明する。本実施例では、コマン
ドを音声および話者（操作者）の口唇の動きの両方によ
り入力し、両方の認識結果を統合して判断することによ
り、飛行船を操作する。このため、騒音下においても話
者が発声したコマンドを確実に認識することが可能であ
る。(Third Embodiment) Next, a game device according to a third embodiment of the present invention will be described. In the present embodiment, the airship is operated by inputting commands by both voice and movement of the lips of the speaker (operator) and integrating and recognizing the recognition results of both. Therefore, it is possible to reliably recognize the command uttered by the speaker even in a noisy environment.

【００８０】図１２に本実施例のゲーム装置の構成を簡
単に示す。本実施例のゲーム装置は、実施例１のゲーム
装置と同様の構成を有する音声入力部１、画像入力部
３、制御部６および飛行船７を備えている。また、さら
に音声処理部１２１および口唇処理部１２２を備えてい
る。音声処理部１２１は、上記実施例１の音声認識部２
と同様にして入力された音声を認識し、続いて認識結果
の信頼度を算出する。また、口唇処理部１２２は、実施
例２の口唇認識部８１と同様にして口唇の動きとして入
力された言葉（コマンド）を認識し、それとあわせて認
識結果の信頼度を算出する。音声処理部１２１および口
唇処理部１２２からの出力はともに統合判断部１２３に
入力される。統合判断部１２３は、各処理部１２１およ
び１２２からの認識結果、および信頼度から統合的に話
者の入力したコマンドを判断し、判断結果を出力する。FIG. 12 briefly shows the structure of the game apparatus of this embodiment. The game device according to the present embodiment includes a voice input unit 1, an image input unit 3, a control unit 6, and an airship 7 having the same configuration as that of the game device according to the first embodiment. Further, the audio processing unit 121 and the lip processing unit 122 are further provided. The voice processing unit 121 is the voice recognition unit 2 of the first embodiment.
Similarly, the input voice is recognized, and then the reliability of the recognition result is calculated. Further, the lip processing unit 122 recognizes a word (command) input as the movement of the lip similarly to the lip recognition unit 81 of the second embodiment, and calculates the reliability of the recognition result together with it. Outputs from the voice processing unit 121 and the lip processing unit 122 are both input to the integrated determination unit 123. The integrated determination unit 123 comprehensively determines the command input by the speaker based on the recognition results from the processing units 121 and 122 and the reliability, and outputs the determination result.

【００８１】以下、本実施例のゲーム装置の動作を詳細
に説明する。The operation of the game apparatus of this embodiment will be described in detail below.

【００８２】話者（ゲーム装置の操作者）が発声した音
声を音声入力部１が入力し、入力された音声に対応する
電気信号１１を音声処理部１２１に伝えるのは、実施例
１と同様である。音声処理部１２１は、電気信号１１を
受け取り、これに基づいて入力された音声を認識する。
音声認識の手法としては、従来から知られているどの方
法を用いてもよい。ここでは、例えば上記実施例の口唇
認識部の説明において述べた方法と同様に、入力される
可能性のある全てのコマンドについてそれを発声したと
きに得られる電気信号１１を処理して得られるデータ列
を標準パターンとして予め登録しておき、実際にゲーム
装置の操作者がコマンドを発声したときに得られた電気
信号１１を処理して得られた認識対象データ列と、予め
登録された全ての標準パターンとの距離を算出すること
により、音声入力部から入力されたコマンド（音声）が
何であるかを認識する。このようにして音声が認識され
ると、続いて音声処理部１２１は、認識結果はどの程度
信頼がおけるものかを示す信頼度を求め、音声認識結果
と信頼度との両方を出力１２４として統合判断部１２３
に与える。信頼度の求め方は後で述べる。Similar to the first embodiment, the voice input unit 1 inputs the voice uttered by the speaker (the operator of the game device) and transmits the electric signal 11 corresponding to the input voice to the voice processing unit 121. Is. The voice processing unit 121 receives the electrical signal 11 and recognizes the input voice based on the electrical signal 11.
As a voice recognition method, any conventionally known method may be used. Here, for example, similar to the method described in the description of the lip recognition unit in the above-described embodiment, data obtained by processing the electric signal 11 obtained when uttering all commands that may be input. The columns are registered in advance as a standard pattern, and the recognition target data sequence obtained by processing the electric signal 11 obtained when the operator of the game device actually speaks a command, and all of the previously registered data are registered. The command (voice) input from the voice input unit is recognized by calculating the distance from the standard pattern. When the voice is recognized in this way, the voice processing unit 121 subsequently obtains the reliability indicating how reliable the recognition result is, and integrates both the voice recognition result and the reliability as an output 124. Judgment unit 123
Give to. How to obtain the reliability will be described later.

【００８３】また入力された音声の処理と平行して、口
唇の動きを表す信号の処理が行われる。まず、画像入力
部３は、話者の口唇の動きを実施例１と同様にして入力
し、口唇の動きに応じてレベルが変化する電気信号１３
を出力する。口唇処理部１２２は電気信号１３を受け取
り、実施例２と同様の処理を行う。ただし、本実施例の
口唇処理部１２２は、認識差分データ列と標準パターン
とのパターンマッチングの結果、認識差分データ列がｊ
番目の標準パターンと一致するものと判断されると、そ
の認識差分データ列とｊ番目の標準パターンとの距離ｄ
_jに基づいて、認識結果の信頼度を算出する。このよう
にして得られた認識結果と信頼度はともに統合判断部１
２３に出力される。Further, in parallel with the processing of the input voice, the processing of the signal representing the movement of the lips is performed. First, the image input unit 3 inputs the movement of the speaker's lip in the same manner as in Example 1, and the level of the electric signal 13 changes according to the movement of the lip.
Is output. The lip processing unit 122 receives the electric signal 13 and performs the same processing as in the second embodiment. However, the lip processing unit 122 of the present embodiment determines that the recognition difference data string is j as a result of the pattern matching between the recognition difference data string and the standard pattern.
If it is determined that the pattern matches the standard pattern of the th th, the distance d between the recognized difference data string and the standard pattern of the j th
The reliability of the recognition result is calculated based on _j . The recognition result and the reliability thus obtained are both the integrated determination unit 1
23 is output.

【００８４】次に、簡単に信頼度の算出方法を簡単に説
明する。本実施例では、音声認識結果の信頼度も口唇の
動きに基づく認識結果の信頼度も同じ処理により求めら
れる。以下、音声認識結果の信頼度の算出を説明する。
音声認識結果の信頼度を「大」、「中」、「小」の３段
階で評価する場合を考える。なお、信頼度「小」のとき
が最も認識結果の信頼性が高く、信頼度「大」のときに
認識結果の信頼性は最も低いものとする。この場合、信
頼度「小」と「中」とを分ける閾値α_L、および信頼度
「中」と「大」とを区切る閾値α_H（ただしα_L＜α_H）
を用い、認識対象と一致すると判断された標準パターン
と認識対象との距離ｄを上記閾値と比較する。比較した
結果ｄ＜α_Lならば信頼度は「小」と判定される。同様
に、α_L≦ｄ＜α_H、ｄ≧α_Hのときには、それぞれ信頼
度は「中」、「大」と判定される。口唇の動きに基づく
認識結果についても同様に、閾値との比較により信頼度
がどの段階であるかが判定される。なお、ここで用いら
れる閾値は、適当な値に設定することができる。また、
信頼度の算出方法は、ここで説明した方法に限られず、
公知のどの方法を用いてもよい。Next, the method of calculating the reliability will be briefly described. In this embodiment, the reliability of the voice recognition result and the reliability of the recognition result based on the movement of the lip are obtained by the same process. The calculation of the reliability of the voice recognition result will be described below.
Consider a case where the reliability of the voice recognition result is evaluated in three levels of “large”, “medium”, and “small”. The reliability of the recognition result is highest when the reliability is “small”, and the reliability of the recognition result is lowest when the reliability is “high”. In this case, the threshold α _L that separates the reliability “small” and “medium”, and the threshold α _H that separates the reliability “medium” and “large” (where α _L <α _H )
The distance d between the standard pattern determined to match the recognition target and the recognition target is compared with the threshold value. If the comparison result d <α _L , the reliability is determined to be “small”. Similarly, when α _L ≦ d <α _H and d ≧ α _H , the reliability is determined to be “medium” and “large”, respectively. Similarly, with respect to the recognition result based on the movement of the lips, the level of the reliability is determined by comparison with the threshold value. The threshold used here can be set to an appropriate value. Also,
The reliability calculation method is not limited to the method described here,
Any known method may be used.

【００８５】続いて、統合判断部１２３の動作を、図１
３を参照しながら説明する。Next, the operation of the integrated judgment unit 123 will be described with reference to FIG.
This will be described with reference to FIG.

【００８６】図１３は、統合判断を行う方法の概念を示
す図である。まず、統合判断部１２３は、音声認識結果
が音声処理部１２１から出力された時間（すなわち出力
１２４が発生された時間）および口唇の動きに基づく認
識結果が口唇処理部１２２から出力された時間（すなわ
ち出力１２５が発生された時間）を検出し、検出された
各出力時間の前後に所定の閾値１３１に相当する区間を
足すことにより、評価用区間１３２ａおよび１３２ｂを
作成する。続いて、口唇認識結果についての評価用区間
１３２ａと音声認識結果について作成された評価用認識
区間１３２ｂとが重なっているか否かを判定する。重な
っている場合には、統合判断部１２３は、口唇の動きを
入力した操作者が発声した音声が入力され、認識された
と判断する。重なっていない場合には、認識された音声
は、周囲の騒音や操作者以外のものの発声によると判断
される。これにより、操作者以外の音声の誤認識を防ぐ
ことができる。FIG. 13 is a diagram showing the concept of a method for making an integrated judgment. First, the integrated determination unit 123 outputs the time when the voice recognition result is output from the voice processing unit 121 (that is, the time when the output 124 is generated) and the time when the recognition result based on the lip movement is output from the lip processing unit 122 ( That is, the time when the output 125 is generated is detected, and the sections corresponding to the predetermined threshold value 131 are added before and after each detected output time to create the evaluation sections 132a and 132b. Then, it is determined whether the evaluation section 132a for the lip recognition result and the evaluation recognition section 132b created for the voice recognition result overlap. If they overlap, the integrated determination unit 123 determines that the voice uttered by the operator who has input the movement of the lips has been input and has been recognized. If they do not overlap, it is determined that the recognized voice is due to noise in the surroundings or the utterance of something other than the operator. This can prevent erroneous recognition of voices other than the operator.

【００８７】次に、統合判断部１２３は、口唇の動きに
基づく認識結果と音声に基づく認識結果とが一致してい
るかどうかを判定し、一致した場合にはそれらの認識結
果を統合判断結果とする（図１３の統合判断結果
「前」）。一致しなかった場合、各認識結果に対して求
められた信頼度に応じて統合判断結果を決定する。認識
結果に対する信頼度の組み合わせと、その組み合わせに
応じて決定される統合判断結果との対応関係の例を図１
４に示す。この例では、上述したように、各認識結果に
対する信頼度を、最も信頼性の低い「大」、最も信頼性
の高い「小」、およびこれらの間の「中」との３段階で
評価している。図１４の（ａ）は、信頼度が同等である
ときに音声認識結果を優先する場合の対応関係であり、
（ｂ）は口唇認識結果を優先する場合の対応関係であ
る。どちらの認識結果を採用するかは、このゲーム装置
が操作される周囲の環境等の要因に応じて決定されるも
のであり、これを予めゲーム装置に登録しておくことも
可能であるし、あるいは操作者が自分で入力するように
ゲーム装置を構成してもよい。例えば、（ａ）のように
音声認識結果が優先されるのは、発声に支障がない健常
者でかつ周囲の騒音が比較的小さい場合であり、発声に
障害を持つ話者の場合や周囲の騒音が非常に大きい場合
には（ｂ）を採用する。Next, the integrated judgment section 123 judges whether or not the recognition result based on the movement of the lips and the recognition result based on the voice match, and if they match, the recognition results are regarded as the integrated judgment result. (The integrated judgment result “before” in FIG. 13). If they do not match, the integrated judgment result is determined according to the reliability obtained for each recognition result. FIG. 1 shows an example of the correspondence relationship between the combination of the reliability of the recognition result and the integrated determination result determined according to the combination.
4 shows. In this example, as described above, the reliability with respect to each recognition result is evaluated in three stages of "most large", least reliable, "small", and "medium" between them. ing. FIG. 14A shows a correspondence relationship when the voice recognition result is prioritized when the reliability is the same,
(B) is a correspondence relationship when the lip recognition result is prioritized. Which recognition result is adopted is determined according to factors such as the surrounding environment in which this game device is operated, and it is possible to register this in the game device in advance, Alternatively, the game device may be configured such that the operator inputs it by himself. For example, as shown in (a), the voice recognition result is prioritized when a healthy person who does not hinder the utterance and the surrounding noise is relatively small. If the noise is very loud, adopt (b).

【００８８】統合判断部１２３は、以上述べたように決
定された統合判断結果を信号１５として出力する。最後
に、制御部６が判断結果に応じたラジオコントロール信
号を出力して、飛行船７を制御する。The integrated judgment section 123 outputs the integrated judgment result determined as described above as a signal 15. Finally, the control unit 6 outputs a radio control signal according to the determination result to control the airship 7.

【００８９】以上のように本実施例によれば、音声信号
とともに口唇の動きも認識し、両者の結果を統合的に使
って認識するため、騒音下においても確実に話者が発声
した言葉（コマンド）を認識することができる。同時
に、発声に障害を持つ者が音声操作によるゲームを利用
することを可能にするという効果もある。また、上述し
た実施例１および２と同様に、ＬＥＤ２１とフォトダイ
オード２２の組み合わせで口唇の動きを検出しているた
め、ビデオカメラ等を用いて口唇の画像を取り込む方法
と比較して非常に安価に実現できる、という効果もあ
る。As described above, according to this embodiment, the movement of the lips is recognized together with the voice signal, and the results of both are recognized in an integrated manner. Command) can be recognized. At the same time, it also has the effect of enabling a person with speech disabilities to use the game by voice operation. Further, as in the first and second embodiments described above, since the movement of the lips is detected by the combination of the LED 21 and the photodiode 22, it is much cheaper than the method of capturing the image of the lips using a video camera or the like. There is also an effect that can be realized.

【００９０】なお、詳細な説明は省略したが、本実施例
でも第２の実施例と同様に、ゲームの利用者が口唇認識
時の標準パターンの登録を行うが、あらかじめ不特定話
者に対応できる形の標準パターンを準備しておき、利用
者による登録を省略するようにしてもよい。Although a detailed description is omitted, in the present embodiment as well as in the second embodiment, the user of the game registers a standard pattern for lip recognition, but it corresponds to an unspecified speaker beforehand. It is also possible to prepare a standard pattern in a form that allows the user to omit registration by the user.

【００９１】また、上記実施例１〜３では、飛行船７を
ラジオコントロール信号によって制御するゲーム装置を
例として説明しているが、本発明が適用されうるゲーム
装置はこれに限られないのはもちろんである。例えば、
上記実施例のいずれかで述べたような構成を操作者の数
だけ設ければ、複数の操作者が同時にプレイすることが
可能なゲーム装置を実現することができる。Further, in the above-mentioned first to third embodiments, the game device for controlling the airship 7 by the radio control signal is described as an example, but the game device to which the present invention can be applied is not limited to this. Is. For example,
By providing the number of operators as described in any of the above embodiments, it is possible to realize a game device in which a plurality of operators can play at the same time.

【００９２】以下、本発明の入力装置を説明する。図１
５は、本発明の入力装置の構成を簡単に示す図である。
本発明の入力装置は、ヘッドセット１５４と、それに取
り付けられた支柱１５５と、フォトダイオード１５１お
よびＬＥＤ１５２が設けられた台１５３とを有してお
り、台１５３は所定の角度で支柱１５５に接合されてい
る（図１５の（ａ）参照）。台１５３と支柱１５５との
角度を調整すれば、ＬＥＤ１５２が発した光が操作者の
口唇部分に照射される方向を変更することができる。こ
の入力装置は、ＬＥＤ１５２が発した光を操作者の口唇
部分に照射し、反射された光をフォトダイオード１５１
で検出することにより、口唇の動きを入力する装置であ
る。このような入力装置は、例えば、上記実施例１〜３
における画像入力部として用いることができる。また、
台１５３にマイク１５６を付加すれば（図１５の（ｂ）
参照）、この入力装置を音声入力装置としても用いるこ
とができる。The input device of the present invention will be described below. FIG.
FIG. 5 is a diagram simply showing the configuration of the input device of the present invention.
The input device of the present invention includes a headset 154, a support 155 attached to the headset 154, and a base 153 provided with a photodiode 151 and an LED 152, and the base 153 is joined to the support 155 at a predetermined angle. (See FIG. 15A). By adjusting the angle between the base 153 and the support 155, it is possible to change the direction in which the light emitted from the LED 152 is applied to the lip portion of the operator. This input device irradiates the operator's lips with the light emitted from the LED 152 and reflects the reflected light into the photodiode 151.
It is a device for inputting the movement of the lips by detecting with. Such an input device is, for example, the above-mentioned first to third embodiments.
Can be used as an image input unit in. Also,
If a microphone 156 is added to the stand 153 ((b) of FIG. 15)
This input device can also be used as a voice input device.

【００９３】図１５の（ａ）に示すようにマイクを設け
ていない入力装置は、上記実施例２の画像入力部として
用いることができる。また、図１５の（ｂ）に示すよう
にマイクを有する入力装置は、上記実施例１および３の
音声入力部と画像入力部とを兼ねた装置として用いるこ
とができる。An input device having no microphone as shown in FIG. 15A can be used as the image input section of the second embodiment. Further, the input device having a microphone as shown in FIG. 15B can be used as a device that serves as both the voice input unit and the image input unit in the first and third embodiments.

【００９４】このように、本発明の入力装置は、非常に
サイズが小さく、かつ非常に軽く実装することができる
フォトダイオード１５１、ＬＥＤ１５２、およびマイク
１５６を用いているので、入力装置全体のサイズおよび
重量は非常に小さい。また、使用している構成要素はす
べて安価であるため、低コストで実現することができ
る。さらに、本発明の入力装置は、ヘッドセット１５４
により操作者の頭部に固定されるため、口唇とフォトダ
イオード１５１およびＬＥＤ１５２の位置を実質的に一
定にすることができる。このため、口唇の動きを安定し
て入力することができる。また、本発明の入力装置は光
により口唇の動きを入力し、それを電気信号に変換して
出力するので、従来の入力装置、例えば口唇の動きでは
なく画像を入力する装置や、超音波を用いる装置といっ
た大がかりで複雑な構成にならざるを得ない入力装置よ
りも簡易な構成にすることが可能である。As described above, the input device of the present invention uses the photodiode 151, the LED 152, and the microphone 156, which are very small in size and can be mounted very lightly. The weight is very small. Further, since all the constituent elements used are inexpensive, it can be realized at low cost. Further, the input device of the present invention may be used in headset 154.
Since it is fixed to the operator's head by this, the positions of the lips and the photodiode 151 and the LED 152 can be made substantially constant. Therefore, the movement of the lips can be stably input. Further, since the input device of the present invention inputs the movement of the lips by light, converts it into an electric signal and outputs it, a conventional input device, for example, a device for inputting an image instead of the movement of the lips, or an ultrasonic wave is used. It is possible to make the configuration simpler than that of the input device, which has a large and complicated configuration such as a device to be used.

【００９５】なお、ここでは、フォトダイオードとＬＥ
Ｄはそれぞれ１つずつしか実装していないが、それぞれ
を複数個実装することも可能である。たとえば、ＬＥＤ
とフォトダイオードを２組準備し、各組を十字状に配置
すれば面上での動き方向が検出できるといった効果があ
る。In this example, the photodiode and LE
Although only one D is mounted, a plurality of Ds can be mounted. For example, LED
By preparing two sets of photodiodes and arranging each set in a cross shape, it is possible to detect the direction of movement on the surface.

【００９６】以上説明したように、本発明によれば、人
間にとって自然な音声による操作が可能であり、かつ操
作習熟を必要としないゲーム装置を得ることができる。
また、音声のみから入力された言葉（コマンド）を認識
するのではなく、口唇の動きを利用しているので、騒音
下においても安定な操作が可能である。さらに、口唇の
動きをＬＥＤとフォトダイオード（フォトトランジス
タ）の組み合わせでとらえるため、ビデオカメラ、ある
いは超音波等を利用する場合と比較して、低コストで実
現することができる。As described above, according to the present invention, it is possible to obtain a game device which can be operated by a voice which is natural to humans and which requires no operation proficiency.
Moreover, since the movement of the lips is used instead of recognizing the words (commands) input from only the voice, stable operation is possible even under noise. Furthermore, since the movement of the lips is detected by the combination of the LED and the photodiode (phototransistor), it can be realized at a low cost as compared with the case of using a video camera or ultrasonic waves.

【００９７】さらに、上記第１の実施例で述べたよう
に、口唇の動きから話者の発声区間を検出し、これを音
声認識結果の判断材料とするため、話者以外の発声によ
る誤認識を防止することができる。また、上記第２およ
び第３の実施例で述べたように、口唇の動きから入力さ
れた言葉（コマンド）を認識して飛行船の制御を行うよ
うにすれば、騒音下においても、また声が出しにくい状
況や、発声に障害を持つ者の利用も可能となる。Further, as described in the first embodiment, the utterance section of the speaker is detected from the movement of the lips, and this is used as the judgment material of the speech recognition result. Can be prevented. In addition, as described in the second and third embodiments, if the airship is controlled by recognizing the words (commands) input from the movement of the lips, the voice will be audible even in the noise. It is also possible to use it in situations where it is difficult to put out or people with speech disabilities.

【００９８】また、本発明の入力装置は、軽いヘッドセ
ットと支柱および台に安価な発光素子（ＬＥＤ等）と安
価な受光素子（フォトダイオード等）を取り付けてい
る。このため、非常に軽く、しかも安価な入力装置を実
現することができる。Further, in the input device of the present invention, an inexpensive light emitting element (LED or the like) and an inexpensive light receiving element (photodiode or the like) are attached to a light headset, a support and a stand. Therefore, an extremely light and inexpensive input device can be realized.

【００９９】上記実施例１〜３では、認識された音声あ
るいは口唇の動きに応じて、対象物の移動が制御される
例を説明した。しかし、音声あるいは口唇の動きに基づ
いて制御される対象物の動作は移動に限らず、例えば何
らかの言葉を言い返す、等の動作であってもよい。以下
に説明するのは、認識された音声に応じて、対象物に何
らかの動作（移動を含む）を行わせるための様々な装置
である。In the first to third embodiments, the movement of the object is controlled according to the recognized voice or movement of the lip. However, the movement of the object controlled based on the movement of the voice or the lip is not limited to movement, but may be movement such as saying back some words. Described below are various devices for causing an object to perform some action (including movement) according to the recognized voice.

【０１００】以下に、認識された音声に応じて対象物に
何らかの動作を行わせるための装置を各実施例において
説明する。An apparatus for causing an object to perform some action in accordance with the recognized voice will be described below in each embodiment.

【０１０１】（第４の実施例）本実施例では、認識され
た音声に応じて、その音声に対して用意された出力音声
の集合から１つの出力音声を選択し、それを出力する装
置を説明する。(Fourth Embodiment) In this embodiment, an apparatus for selecting one output voice from a set of output voices prepared for the voice and outputting the output voice according to the recognized voice is provided. explain.

【０１０２】図１６に本実施例の音声選択装置１００の
構成を簡単に示す。音声選択装置１００は、乱数発生部
１０１、選択部１０２、入出力状態メモリ１０３、状態
遷移部１０４および入出力状態データベース１０５とを
有している。入出力状態データベース１０５には、複数
個の入出力状態テーブルが予め記憶されている。各入出
力状態テーブルは、状態ｓにおける入力ｘ（ｘは負でな
い整数）と、入力ｘに対するｎ（ｓ）個の出力音声の集
合ｓｐ（ｘ，ｉ）（０≦ｉ＜ｎ（ｓ））とを含んでい
る。入出力状態テーブルの例を図１７に示す。入出力状
態メモリ１０３には、最初、図１７（ａ）に示す初期状
態のテーブル２０１が格納されている。乱数発生部１０
１は、出力音声の集合から出力すべき１つの音声を選択
するのに用いられるｉを決定する。FIG. 16 briefly shows the configuration of the voice selection device 100 of this embodiment. The voice selection device 100 includes a random number generation unit 101, a selection unit 102, an input / output state memory 103, a state transition unit 104, and an input / output state database 105. The input / output state database 105 stores a plurality of input / output state tables in advance. Each input / output state table includes an input x in the state s (x is a non-negative integer) and a set sp (x, i) of n (s) output voices for the input x (0 ≦ i <n (s)). Includes and. FIG. 17 shows an example of the input / output state table. The input / output state memory 103 initially stores an initial state table 201 shown in FIG. Random number generator 10
1 determines the i used to select one voice to output from the set of output voices.

【０１０３】以下、この音声選択装置１００の動作を説
明する。選択部１０２に外部から入力ｘがあると、選択
部１０２は、入出力状態メモリ１０３に格納されている
入出力状態テーブルを参照し、入力ｘに対応する出力音
声集合ｓｐ（ｘ，ｉ）を選択する。続いて、選択部１０
２は、乱数発生部１０１に乱数ｒ（ｎ（ｓ））（ただ
し、０≦ｒ（ｎ（ｓ））＜ｎ（ｓ））によって決定さ
せ、ｉ＝ｒ（ｎ（ｓ））として出力音声集合ｓｐ（ｘ，
ｉ）の中から１つの音声を選び出す。そして、これを外
部に出力する。The operation of the voice selection device 100 will be described below. When the selection unit 102 has an input x from the outside, the selection unit 102 refers to the input / output state table stored in the input / output state memory 103, and outputs the output voice set sp (x, i) corresponding to the input x. select. Then, the selection unit 10
2, the random number generator 101 determines the random number r (n (s)) (where 0 ≦ r (n (s)) <n (s)), and outputs the output voice as i = r (n (s)). Set sp (x,
Select one voice from i). Then, this is output to the outside.

【０１０４】選択部１０２からの出力は、外部だけでは
なく、状態遷移部１０４にも与えられる。選択部１０２
からの出力を受け取ると、状態遷移部１０４は、入出力
状態データベース１０５を参照しながら、入出力状態メ
モリ１０３の内容を、選択部１０２からの出力に対する
入出力状態テーブルに書き換える。例えば、初期状態２
０１において「元気？」が出力された場合、状態遷移部
１０４は、入出力状態データベース１０５を参照して、
出力「元気？」に対する入出力状態２０２のテーブルを
取り出す。そして、取り出した状態２０２のテーブルを
入出力状態メモリ１０３に格納する。The output from the selection unit 102 is given not only to the outside but also to the state transition unit 104. Selection unit 102
Upon receiving the output from, the state transition unit 104 refers to the input / output state database 105 and rewrites the contents of the input / output state memory 103 into the input / output state table for the output from the selection unit 102. For example, initial state 2
When 01 is output, the state transition unit 104 refers to the input / output state database 105,
The table of the input / output state 202 for the output "Is it fine?" Then, the table of the extracted state 202 is stored in the input / output state memory 103.

【０１０５】このようにして本実施例の音声選択装置１
００は、入力された音声に対して、乱数を用いて選ばれ
た音声を出力する。従って、この音声選択装置１００を
用いれば簡単な対話システムを構築することが可能とな
る。また、図１８に示すように、状態遷移部１０４と入
出力状態データベース１０５とを省略した簡単な構成の
音声選択装置１００ａを用いれば、入力された音声に対
して一回だけの応答をさせるようにすることもできる。In this way, the voice selection device 1 of this embodiment
00 outputs a voice selected using a random number with respect to the input voice. Therefore, a simple dialogue system can be constructed by using the voice selection device 100. Further, as shown in FIG. 18, by using the voice selection device 100a having a simple configuration in which the state transition unit 104 and the input / output state database 105 are omitted, it is possible to make a one-time response to the input voice. You can also

【０１０６】上記音声選択装置１００および１００ａ
は、図２７に示すように音声反応装置１２０３の音声選
択装置１２０２として、音声認識装置１２０１と組み合
わせて用いられ得る。具体的に説明すると、まず、音声
認識装置１２０１によって音声が認識されると、その認
識結果は、例えばその音声に付された識別番号によって
音声選択装置１２０２に入力される。音声選択装置１２
０２は、入力された識別番号を入力ｘとして出力音声集
合から１つの音声をランダムに選択し、それを出力す
る。これにより、ある音声を入力するとそれに対応した
音声が出力され、しかも、同じ入力音声に対してもさま
ざまな応答をすることができる音声反応装置１２０３を
実現することができる。例えば、音声選択装置１２０２
が初期状態にあるときに音声認識装置１２０１が「おは
よう」という音声を認識結果として出力すると、音声選
択装置１２０２には、「おはよう」という音声に与えら
れた識別番号１が入力ｘとして入力される（図２（ａ）
参照）。これに応じて、音声選択装置１２０２は、「お
はよう」、「元気？」の２つの出力音声を含む集合ｓｐ
（１，ｉ）から１つをランダムに選び、出力する。The voice selection devices 100 and 100a.
27 can be used in combination with the voice recognition device 1201 as the voice selection device 1202 of the voice reaction device 1203 as shown in FIG. Specifically, first, when a voice is recognized by the voice recognition device 1201, the recognition result is input to the voice selection device 1202 by an identification number attached to the voice, for example. Voice selection device 12
02 uses input identification number as input x, randomly selects one voice from the output voice set, and outputs it. As a result, when a certain voice is input, a voice corresponding to the voice is output, and further, the voice reaction device 1203 that can make various responses to the same input voice can be realized. For example, the voice selection device 1202
When the voice recognition device 1201 outputs the voice "Ohayo" as the recognition result in the initial state, the identification number 1 given to the voice "Ohayo" is input as the input x to the voice selection device 1202. (Fig. 2 (a)
reference). In response to this, the voice selection device 1202 causes the set sp including the two output voices of “Good morning” and “How are you?”
One is randomly selected from (1, i) and output.

【０１０７】この音声反応装置１２０３では、実際の動
作に先立って、音声選択装置１２０２に入力として受け
入れられ得る音声を登録しておく必要がある。登録音声
集合に含まれていない音声が音声選択装置１２０２に入
力されたときには、例えば、「何？」という音声を音声
選択装置１２０２から出力させればよい。また上記実施
例３の装置を音声認識装置１２０１として用いた場合に
は、認識された音声の信頼性が低いときにはもう一度音
声入力をしてもらうための音声を音声認識選択装置１２
０２から出力させることもできる。In this voice reaction device 1203, it is necessary to register a voice that can be accepted as an input in the voice selection device 1202, prior to the actual operation. When a voice not included in the registered voice set is input to the voice selection device 1202, for example, a voice “what?” May be output from the voice selection device 1202. When the device of the third embodiment is used as the voice recognition device 1201, when the reliability of the recognized voice is low, the voice for selecting the voice again is input to the voice recognition selection device 12.
It is also possible to output from 02.

【０１０８】このように本発明の音声選択装置では、入
出力の状態を表すテーブルを複数個用意し、過去の入出
力の履歴に応じて入出力の状態を遷移させている。従っ
て本発明の音声選択装置を用いれば、簡単な対話を行う
装置を実現することが可能となる。また、この音声選択
装置では、１つの入力に対して複数の出力音声の候補を
有しており、これらの出力音声候補から１つをランダム
に選択して出力する。このため、１つの入力に対して常
に同じ応答をするのではなく、変化のある応答をするこ
とができる音声反応装置が得られる。As described above, in the voice selection device of the present invention, a plurality of tables indicating the input / output states are prepared and the input / output states are changed according to the past input / output history. Therefore, by using the voice selection device of the present invention, it is possible to realize a device for performing a simple dialogue. Further, this voice selection device has a plurality of output voice candidates for one input, and randomly selects and outputs one of these output voice candidates. Therefore, it is possible to obtain a voice reaction device capable of giving a varying response instead of always giving the same response to one input.

【０１０９】（第５の実施例）次に、本発明の方向検出
装置および方向選択装置を説明する。(Fifth Embodiment) Next, a direction detecting device and a direction selecting device of the present invention will be described.

【０１１０】まず、図１９を参照しながら方向検出装置
４００を説明する。方向検出装置４００は、方向検出部
４０１とこれに接続された複数のマイク４０２を有して
おり、マイク４０２は、制御される対象物に取り付けら
れている。ここでは、マイクの個数が４個である場合を
例として方向検出装置４００の動作を説明する。４個の
マイクｍ（ｉ）（ｉ＝０，１，２，３）から音声が入力
されると、方向検出部４０１は、図２０に示すように、
入力された音声ｓｐ（ｍ（ｉ），ｔ）をフレームｆ（ｍ
（ｉ），ｊ）５０１（０≦ｊ）に分割する。例えば１フ
レームの長さは１６ｍｓとされる。次に方向検出部４０
１は、各フレームについてフレーム内の音声のエネルギ
ーｅ（ｍ（ｎ），ｊ）を求め、求められたエネルギーｅ
（ｍ（ｎ），ｊ）を長さｌ(例えば長さ１００)の循環メ
モリ（不図示）に順次蓄えていく。このとき方向検出部
４０１は、１フレーム毎のエネルギーが蓄えられる度に
各マイクについて過去ｌフレーム分のエネルギーの和を
求め、エネルギーの和が最大となるマイクを決定する。
続いて方向検出部４０１は、エネルギーの和の最大値を
予め実験的に定められた閾値Ｔｈｅと比較し、エネルギ
ーの和の最大値が閾値Ｔｈｅよりも大きければ、方向検
出部４０１からそのマイクへ至る方向が音声が聞こえて
くる方向であると判定する。こうして判定されたマイク
の番号ｉが、音声が入力された方向として方向検出部４
０１から出力する。First, the direction detecting device 400 will be described with reference to FIG. The direction detection device 400 includes a direction detection unit 401 and a plurality of microphones 402 connected to the direction detection unit 401, and the microphone 402 is attached to an object to be controlled. Here, the operation of the direction detection device 400 will be described by taking the case where the number of microphones is four as an example. When voice is input from the four microphones m (i) (i = 0, 1, 2, 3), the direction detection unit 401 causes
The input voice sp (m (i), t) is converted into the frame f (m
(I), j) 501 (0 ≦ j). For example, the length of one frame is 16 ms. Next, the direction detector 40
1 obtains the energy e (m (n), j) of the voice in each frame for each frame, and obtains the obtained energy e
(M (n), j) are sequentially stored in a circular memory (not shown) having a length 1 (for example, length 100). At this time, the direction detection unit 401 obtains the sum of the energy of the past 1 frame for each microphone every time the energy of each frame is stored, and determines the microphone having the maximum energy sum.
Subsequently, the direction detection unit 401 compares the maximum value of the energy sum with a threshold value The that is experimentally determined in advance, and if the maximum value of the energy sum is greater than the threshold value The, the direction detection unit 401 moves to the microphone. It is determined that the direction to reach is the direction in which the sound is heard. The microphone number i thus determined is the direction in which the voice is input, and the direction detection unit 4
Output from 01.

【０１１１】このように動作する方向検出装置４００
を、例えば、図２８に示すように動作装置１３０２と組
み合わせて用いれば、音声の聞こえた方向に応じて所定
の動作を行う音声反応装置１３０３を構成することがで
きる。具体的には、例えば対象物（例えば風船やぬいぐ
るみなど）にこれを動かすための動作装置１３０２およ
び方向検出装置１３０１（図１９では４００）を取り付
ければ、人間の声のする方に対象物が移動するというよ
うに、音声に応じて音声が聞こえてくる方向に向けて所
定の動作を行う装置を作ることができる。Direction detecting device 400 operating in this way
28 is used in combination with the operating device 1302 as shown in FIG. 28, for example, it is possible to configure a voice reaction device 1303 that performs a predetermined operation according to the direction in which the voice is heard. Specifically, for example, if an operation device 1302 and a direction detection device 1301 (400 in FIG. 19) for moving the object are attached to the object (for example, a balloon or a stuffed animal), the object moves to a person who makes a human voice. As described above, it is possible to make a device that performs a predetermined operation in the direction in which the sound is heard according to the sound.

【０１１２】なお、上述した動作装置１３０２の一例と
しては、対象物に取り付けられたプロペラ付きのモータ
ーを３個とこれらのモーターの駆動装置とを有してお
り、次に移動しようとする方向を入力すると、対象物が
その方向へ移動するように３個のモーターを制御する装
置がある。As an example of the above-mentioned operating device 1302, the operating device 1302 has three motors with propellers attached to an object and a driving device for these motors, and the direction to move next is set. There is a device that controls the three motors so that when input, the object moves in that direction.

【０１１３】次に図２１を参照しながら方向選択装置を
説明する。方向選択装置６００は、オフセット算出部６
０１、方位計６０２および目的方向メモリ６０３を有し
ており、対象物の移動する方向あるいは対象物の向きを
制御するための装置として用いられ得る。オフセット算
出部６０１は、次に対象物が移動すべき方向あるいは対
象物が向くべき方向を示す入力ｘ（ｘは負でない整数）
が入力されると、予めオフセット算出部６０１に格納さ
れているテーブルに基づいて、入力ｘに応じたオフセッ
トを出力する。出力されたオフセットは、方位計６０２
によって計測されたその時点での対象物の実際の方向に
加算されて目的方向メモリ６０３に送られる。目的方向
メモリ６０３は、方位計６０２からの実際の方向にオフ
セットを加えたものを次に対象物が移動すべき方向ある
いは次に対象物が向くべき方向として記憶する。Next, the direction selection device will be described with reference to FIG. The direction selection device 600 includes an offset calculation unit 6
01, an azimuth meter 602, and a target direction memory 603, and can be used as a device for controlling the moving direction of an object or the direction of an object. The offset calculation unit 601 inputs x (where x is a non-negative integer) indicating the direction in which the object should move next or the direction in which the object should face.
Is input, an offset corresponding to the input x is output based on the table stored in advance in the offset calculation unit 601. The output offset is the compass 602
It is added to the actual direction of the target object measured at that time and sent to the target direction memory 603. The target direction memory 603 stores a value obtained by adding an offset to the actual direction from the azimuth meter 602, as the direction in which the next object should move or the direction in which the next object should face.

【０１１４】このように図２１の方向選択装置は、入力
ｘに応じて、現在対象物が移動している方向あるいは対
象物が向いている方向を基準として対象物の方向を変え
るために用いられる。As described above, the direction selecting device of FIG. 21 is used to change the direction of an object according to the input x, with the direction in which the object is currently moving or the direction in which the object is facing as a reference. .

【０１１５】また、図２１の方向選択装置６００に代え
て図２２の方向選択装置７００を用いれば、現在の方向
を基準とした相対的な方向に対象物の方向を変えるので
はなく、絶対的な方向に変えることができる。図２２の
方向選択装置７００では、方向算出部７０１は、絶対的
な方向（例えば、北など）を示す入力ｘ（ｘは負でない
整数）を外部から受け取ると、入力ｘに対応する値を出
力する。出力された値は目的とする方向としてそのまま
目的方向メモリ６０３に記憶される。この方向算出部７
０１も上述したオフセット算出部６０１と同様に、入力
ｘに対する絶対的な方向の値をテーブルとして保持する
ことによって実現可能である。このように目的とする方
向をメモリ６０３に格納した後、方向選択装置７００
は、対象物が移動していく、あるいは向きを変えていく
中での現在の方向を方位計６０２で順次計測し、計測さ
れた方向と目的方向メモリ６０３に記憶された方向との
差分を出力する。この出力に基づいて対象物に対してフ
ィードバック制御を行えば、目的とする絶対的な方向に
対象物を移動させたり、対象物の向きを変えたりするこ
とができる。If the direction selecting device 700 shown in FIG. 22 is used instead of the direction selecting device 600 shown in FIG. 21, the direction of the object is not changed to a relative direction based on the current direction, but an absolute direction. You can change in any direction. In the direction selection device 700 of FIG. 22, when the direction calculation unit 701 receives an input x (x is a non-negative integer) indicating an absolute direction (for example, north) from the outside, it outputs a value corresponding to the input x. To do. The output value is stored in the target direction memory 603 as it is as the target direction. This direction calculation unit 7
Similarly to the offset calculator 601 described above, 01 can also be realized by holding the values of the absolute direction with respect to the input x as a table. After storing the target direction in the memory 603 in this way, the direction selection device 700
Measures the current direction in sequence while the object is moving or changing direction, and outputs the difference between the measured direction and the direction stored in the target direction memory 603. To do. If feedback control is performed on the object based on this output, the object can be moved in the intended absolute direction, or the direction of the object can be changed.

【０１１６】上述したような方向選択装置を、音声認識
装置および動作装置と組み合わせれば、図２９に示すよ
うに、対象物の向きあるいは移動方向を音声によって入
力すればそれに応じて対象物の向きあるいは移動方向が
変化する音声反応装置１４０２を実現することができ
る。音声反応装置１４０２では、音声認識装置１２０１
の認識結果を方向選択装置１４０１の入力とし、さらに
方向選択装置１４０１の出力を動作装置１３０２に入力
している。これにより、現在の対象物の向きあるいは移
動方向と目的とする方向とを比較しながら、対象物の動
作を制御することが可能になる。When the direction selection device as described above is combined with a voice recognition device and an operation device, as shown in FIG. 29, if the direction or the moving direction of the object is input by voice, the direction of the object will be changed accordingly. Alternatively, it is possible to realize the voice reaction device 1402 in which the moving direction changes. In the voice reaction device 1402, the voice recognition device 1201
Is input to the direction selection device 1401 and the output of the direction selection device 1401 is input to the operation device 1302. This makes it possible to control the operation of the target object while comparing the current direction or moving direction of the target object with the target direction.

【０１１７】例えば、北を０度とし、東回りを正の方向
としたときに、対象物が現在０度の方向を向いている場
合を考える。このとき、方向選択装置１４０１として上
述した方向選択装置６００（図２１参照）を用いている
ものとする。目的とする方向を示す音声が音声認識装置
１２０１により「右」という言葉であると認識される
と、方向選択装置６００のオフセット算出部６０１に
「右」という言葉に＋９０度が対応づけられているテー
ブルを格納しておけば、方向選択装置６００は、動作装
置１３０２に対して、対象物の向きあるいは移動方向を
現在の向きから東回りに９０度ほど変えるようにという
出力を送る。このとき、方向選択装置６００によって、
対象物の向きあるいは移動方向の変化中に現在の方向と
目的とする方向とは常に比較される。動作装置１３０２
は、方向選択装置６００の出力によって目的とする方向
に対象物の向きあるいは移動方向が変わるように制御さ
れる。あるいは方向選択装置１４０１として用いられて
いるのが図２２の方向選択装置７００である場合には、
目的とする方向を表す言葉として、「右」や「左」では
なく「北」や「南西」というような絶対的な方向を表す
言葉が入力されることになる。このとき、方向選択装置
７００は、入力された言葉が「北」であれば０度を、
「南西」であれば−１３５度を目的とする絶対的な方向
として目的方向メモリに格納し、上述したような動作を
行う。なお、ここで目的とする方向は−１８０度〜＋１
８０度とする。Consider, for example, a case where the object is currently facing the direction of 0 degree when the north is 0 degree and the east direction is the positive direction. At this time, the direction selecting device 600 (see FIG. 21) is used as the direction selecting device 1401. When the voice recognition device 1201 recognizes the voice indicating the target direction as the word “right”, the offset calculation unit 601 of the direction selection device 600 associates +90 degrees with the word “right”. If the table is stored, the direction selection device 600 sends an output to the operation device 1302 instructing the operation device 1302 to change the orientation or movement direction of the object by 90 degrees eastward from the current orientation. At this time, by the direction selection device 600,
The current direction and the target direction are constantly compared during the change of the direction or the moving direction of the object. Operating device 1302
Is controlled so that the direction or movement direction of the object changes to a target direction according to the output of the direction selection device 600. Alternatively, when the direction selecting device 1401 is the direction selecting device 700 of FIG. 22,
As a word indicating the target direction, words such as "north" and "southwest" are input instead of "right" and "left". At this time, the direction selection device 700 sets 0 degree if the input word is “north”,
If it is "southwest", -135 degrees is stored in the target direction memory as the target absolute direction, and the above-described operation is performed. The target direction here is -180 degrees to +1.
80 degrees.

【０１１８】また、本実施例の方向検出装置および方向
選択装置を動作装置と組み合わせてもよい。この場合、
図３０に示すように、方向検出装置１３０１の検出結果
を方向選択装置１４０１の入力とし、方向選択装置１４
０１の出力を動作装置１３０２の入力とする。これによ
り、対象物の向きあるいは移動している方向を、現在の
対象物の向きあるいは移動している方向と目的とする方
向とを比較しながら音声が聞こえてくる方向に変えると
いう音声反応装置１５０１を実現することができる。The direction detecting device and the direction selecting device of this embodiment may be combined with the operating device. in this case,
As shown in FIG. 30, the detection result of the direction detecting device 1301 is input to the direction selecting device 1401, and the direction selecting device 14
The output of 01 is input to the operating device 1302. Thus, the voice reaction device 1501 that changes the direction or moving direction of the target object to the direction in which the sound is heard while comparing the current direction or moving direction of the target object with the target direction. Can be realized.

【０１１９】（第６の実施例）本実施例では、音声認識
に関する装置を説明する。この装置は、図２６に示すよ
うに、音声終了点検出装置１１０１、音声検出装置１１
０２、特徴量抽出装置１１０３、距離計算装置１１０４
および辞書１１１０５を有している。(Sixth Embodiment) In this embodiment, an apparatus relating to voice recognition will be described. As shown in FIG. 26, this device includes a voice end point detection device 1101 and a voice detection device 11
02, feature quantity extraction device 1103, distance calculation device 1104
And a dictionary 11105.

【０１２０】まず、入力された音声に対応する信号を受
け取り、その信号に基づいて音声終了点を検出する音声
終了点検出装置１１０１を説明する。本明細書では「音
声終了点」は音声入力が終了した時間を意味するものと
する。First, a voice end point detection device 1101 that receives a signal corresponding to an input voice and detects a voice end point based on the signal will be described. In the present specification, the “voice end point” means the time when voice input ends.

【０１２１】本実施例の音声終了点検出装置１１０１
は、マイクなどの音声入力装置に接続されている。音声
入力装置から音声ｓ（ｔ）が入力されると、音声終了点
検出装置１１０１は、図２３に示すように入力された音
声ｓ（ｔ）をフレームｆ（ｉ）（ｉは負でない整数）に
分割し、各フレーム内のエネルギーｅ（ｉ）を求める。
図２３では、音声ｓ（ｔ）を曲線８０１で、エネルギー
ｅ（ｉ）を曲線８０２で表している。続いて音声終了点
検出装置１１０１は、１フレーム分の音声が入力される
度にそのフレームから所定個数前のフレームまでのエネ
ルギーの分散を求め、予め実験的に定められている閾値
Ｔｈｖと比較する。比較の結果、エネルギーの分散が閾
値Ｔｈｖと大きい方から小さい方に交差していれば、交
差した時点を音声終了点と判定する。The voice end point detection device 1101 of this embodiment
Is connected to a voice input device such as a microphone. When the voice s (t) is input from the voice input device, the voice end point detection device 1101 converts the input voice s (t) into a frame f (i) (i is a non-negative integer) as shown in FIG. To obtain energy e (i) in each frame.
In FIG. 23, the voice s (t) is represented by the curve 801, and the energy e (i) is represented by the curve 802. Next, the voice end point detection device 1101 obtains the variance of energy from the frame to a frame before a predetermined number of frames each time a frame of voice is input, and compares it with a threshold Thv that is experimentally determined in advance. . As a result of the comparison, if the energy dispersion intersects with the threshold Thv from the larger side to the smaller side, the intersection point is determined as the voice end point.

【０１２２】ここで一定期間のフレーム毎のエネルギー
から分散を求める方法を述べる。まず、循環メモリを使
う方法であるが、フレーム毎に求まるエネルギーを順
次、長さｌの循環メモリ８０３に格納していく。そし
て、１フレームのエネルギーが求まる度に、そこから一
定期間ほど遡ったフレームのエネルギーを循環メモリ８
０３から参照することにより、分散を求める。Here, a method of obtaining the variance from the energy of each frame for a certain period will be described. First, although the method of using the circular memory is used, the energy obtained for each frame is sequentially stored in the circular memory 803 of length 1. Then, every time the energy of one frame is obtained, the energy of the frame traced back for a certain period from the energy is circulated.
The variance is obtained by referring to the data from 03.

【０１２３】また、循環メモリを用いずにエネルギーの
分散を求める方法もある。この方法では、音声終了点検
出装置１１０１に過去の所定数個のフレームについての
平均ｍ（ｉ−１）と分散ｖ（ｉ−１）を保持させてお
き、新しいフレームに対してエネルギーｅ（ｉ）が求め
られる度に、新しく求められたエネルギーｅ（ｉ）と過
去のエネルギーの平均ｍ（ｉ−１）との重みづけした和
を新しいエネルギーの平均ｍ（ｉ）とし、同じく過去の
分散ｖ（ｉ−１）と｜ｅ（ｉ）−ｍ（ｉ）｜との重みづ
け和を新しい分散ｖ（ｉ）とする。このようにすれば擬
似的なエネルギーの分散を求めることができる。ここ
で、重みづけには減衰定数αを用い、次式を用いて新し
い平均と分散とを求める。αとしては１．０２を用いて
いる。There is also a method of obtaining the energy distribution without using a circular memory. In this method, the speech end point detection device 1101 holds the average m (i-1) and the variance v (i-1) for a predetermined number of past frames, and the energy e (i ), The weighted sum of the newly obtained energy e (i) and the past average energy m (i-1) is set as the new energy average m (i), and the past variance v The weighted sum of (i-1) and | e (i) -m (i) | is set as a new variance v (i). In this way, the pseudo energy distribution can be obtained. Here, the damping constant α is used for weighting, and a new mean and variance are obtained using the following equation. 1.02 is used as α.

【０１２４】[0124]

【数１】 [Equation 1]

【０１２５】このようにすることにより、循環メモリを
必要とせず、メモリの節約につながり、新しいエネルギ
ーが求まる度に一定期間内のエネルギーの総和を求める
等の手間が省け、処理時間の短縮にもつながる。By doing so, a circulating memory is not required, which leads to a memory saving, saves time and labor such as calculating the total energy within a fixed period each time new energy is obtained, and shortens the processing time. Connect

【０１２６】次に、実際に音声が発音された区間を抽出
する音声検出装置１１０２を説明する。この区間の抽出
のために、エネルギーを格納するための循環メモリ８０
３とは別に、平滑化エネルギーを格納するための循環メ
モリ９０２を用意しておき、図２４に示すように、１フ
レームのエネルギーが求まる度にメモリ８０３にはエネ
ルギー８０２を、メモリ９０２には平滑化エネルギー９
０１を蓄えてゆく。上述したようにして音声終了点９０
３が求まった時点では、これらの循環メモリ８０３およ
び９０２にはエネルギーおよび平滑化エネルギーの履歴
が残っており、これらの循環メモリの長さｌを十分な長
さ（例えば２秒に相当する長さ）にしておけば、一単語
分のエネルギーを残しておくことができる。そこで、音
声検出装置１１０２は、これらのメモリに格納されてい
るエネルギーおよび平滑化エネルギーを用いて音声が発
音された区間を抽出する。Next, the voice detection device 1102 for extracting the section in which the voice is actually pronounced will be described. Circular memory 80 for storing energy for extraction of this section
In addition to 3, the circulating memory 902 for storing the smoothing energy is prepared. As shown in FIG. 24, the energy 802 is stored in the memory 803 and the memory 902 is stored in the memory 902 every time the energy of one frame is obtained. Energy 9
01 is accumulated. Speech end point 90 as described above
When 3 is obtained, the history of energy and smoothing energy remains in these circular memories 803 and 902, and the length 1 of these circular memories is set to a sufficient length (for example, a length corresponding to 2 seconds). ), You can leave one word of energy. Therefore, the voice detection device 1102 extracts the section in which the voice is pronounced using the energy and the smoothing energy stored in these memories.

【０１２７】区間の抽出は次のような手順で行われる。
まず、後で説明するようにして閾値Ｔｈを決定する。こ
の閾値Ｔｈと循環メモリ８０３内に格納されているエネ
ルギーとを過去のものから順に比較していき、エネルギ
ーが初めてその閾値を超える点を音声が発音された区間
の始点とする。また、逆に音声終了点から過去に遡って
いくときにエネルギーが初めて閾値と交差する点を音声
が発音された区間の終点とする。このようにして、音声
が発音された区間を抽出する。The extraction of the section is performed in the following procedure.
First, the threshold Th is determined as described later. This threshold value Th and the energy stored in the circulation memory 803 are compared in order from the past one, and the point at which the energy exceeds the threshold value for the first time is set as the start point of the section where the sound is produced. On the contrary, the point at which the energy first crosses the threshold value when going back from the voice end point to the past is set as the end point of the section where the voice is pronounced. In this way, the section in which the voice is pronounced is extracted.

【０１２８】ここで閾値Ｔｈの決定の仕方を説明する。
まず、音声終了点が検出された時点でのメモリ８０３内
のエネルギーの最大値ｍａｘ１００１と、メモリ９０２
ないの平滑化エネルギーの最小値ｍｉｎ１００２とを求
める。これらの値を用いて、次式から閾値Ｔｈを算出す
る。Here, how to determine the threshold Th will be described.
First, the maximum energy value max1001 in the memory 803 at the time when the voice end point is detected, and the memory 902.
The smoothing energy minimum value min1002 is calculated. The threshold value Th is calculated from the following equation using these values.

【０１２９】[0129]

【数２】 [Equation 2]

【０１３０】ただし、βとしては０．０７程度の値を採
用した。However, a value of about 0.07 was adopted as β.

【０１３１】またここでは、エネルギーを平滑化する方
法としては一定ウインドウ内のメディアン値を採る方法
を用いている。しかし、平滑化の方法はこれに限定され
るものではなく、例えば平均値を採ってもかまわない。
なお、閾値Ｔｈを求める際に平滑化エネルギーの最大値
ではなくエネルギーの最大値を用いたのは、閾値Ｔｈを
求めるのに平滑化エネルギーの最大値を用いると、単語
の長さが変動した場合に最大値が大幅に変動し、それに
伴なって閾値Ｔｈも変動してしまい、結果的に良好な音
声検出ができなくなるからである。また、平滑化エネル
ギーの最小値を閾値Ｔｈの算出に用いているので、音声
ではないノイズが検出されるのを防ぐこともできる。Also, here, as a method of smoothing the energy, a method of taking a median value within a fixed window is used. However, the smoothing method is not limited to this, and for example, an average value may be taken.
It should be noted that the maximum value of the energy was used instead of the maximum value of the smoothing energy when the threshold value Th was calculated, because when the maximum value of the smoothing energy was used to calculate the threshold value Th, the word length varied. This is because the maximum value fluctuates significantly and the threshold Th also fluctuates accordingly, and as a result, good voice detection cannot be performed. Further, since the minimum value of the smoothing energy is used for calculating the threshold Th, it is possible to prevent noise other than voice from being detected.

【０１３２】上述したようにして、音声が発音されてい
る区間の抽出、すなわち入力された信号のうちの音声に
相当する部分の検出が音声検出装置１１０２によって行
われる。As described above, the voice detecting device 1102 extracts the section in which the voice is sounded, that is, detects the portion of the input signal corresponding to the voice.

【０１３３】次に、検出された音声から、特徴量抽出装
置１１０３によって、認識のための特徴量を抽出する。
特徴量もエネルギー同様、フレーム毎に求めるものと
し、循環メモリに蓄えていくものとする。ここで特徴量
とは、原信号のゼロ交差数と原信号の微分信号のゼロ交
差数と原信号のエネルギーの対数をとったもののフレー
ム間差分の３つの要素を含む特徴量ベクトルとする。Next, the feature quantity extraction device 1103 extracts a feature quantity for recognition from the detected voice.
Similar to energy, the feature quantity is also calculated for each frame and stored in the circulation memory. Here, the feature amount is a feature amount vector including three elements of the difference between frames, which is the logarithm of the zero crossing number of the original signal, the zero crossing number of the differential signal of the original signal, and the energy of the original signal.

【０１３４】このように音声終了点検出装置１１０１、
音声検出装置１１０２、および特徴量抽出装置１１０３
を経て得られた音声の特徴量ベクトルは、距離計算装置
１１０４に入力される。距離計算装置１１０４は、辞書
１１０５に予め登録されている複数の音声の特徴量ベク
トルのそれぞれと入力された特徴量ベクトルとを照合
し、最もスコアがよかったものを認識結果として出力す
る。照合の方法は単純にベクトル間のユークリッド距離
を取ってもよいし、ＤＰマッチング法を用いてもよい。In this way, the voice end point detection device 1101,
Speech detection device 1102 and feature amount extraction device 1103
The voice feature amount vector obtained through the above is input to the distance calculation device 1104. The distance calculation device 1104 collates each of a plurality of voice feature amount vectors registered in advance in the dictionary 1105 with the input feature amount vector, and outputs the one having the best score as the recognition result. The matching method may simply take the Euclidean distance between the vectors, or the DP matching method may be used.

【０１３５】以上説明したようにして、本実施例の装置
は音声認識を行う。この音声認識装置は、図２７に示す
ように実施例４で述べた音声選択装置１２０２と組み合
わせて用いることもできるし、図２９に示すように実施
例５で述べた方向選択装置１４０１、および動作装置１
３０２に組み合わせることもできる。また、単に動作装
置１３０２と組み合わせて、音声認識装置１２０１の結
果を動作装置１３０２の入力として目的の方向へ装置全
体を移動させる音声反応装置１６０１を構成することも
できる。As described above, the apparatus of this embodiment performs voice recognition. This voice recognition device can be used in combination with the voice selection device 1202 described in the fourth embodiment as shown in FIG. 27, or the direction selection device 1401 and the operation described in the fifth embodiment as shown in FIG. Device 1
It can also be combined with 302. It is also possible to form a voice reaction device 1601 which moves the entire device in a target direction by using the result of the voice recognition device 1201 as an input of the operation device 1302 simply in combination with the operation device 1302.

【０１３６】さらに、実施例４〜６で述べた音声反応装
置のうち音声認識装置１２０１を含むものでは、音声認
識装置側に信号送信装置１７０１を付加し、それぞれの
構成の中で音声認識装置の後段に来る音声選択装置１２
０２や方向選択装置１４０１や動作装置１３０２に信号
受信装置１７０２を付加すれば、音声認識装置のみを手
元のリモコンとして対象物を遠隔操作することが可能と
なる。ここで信号送受信に赤外線や無線を用いることが
可能である。Further, among the voice reaction devices described in Embodiments 4 to 6, the voice recognition device 1201 is included, and the signal transmission device 1701 is added to the voice recognition device side. Voice selection device 12 coming in the latter stage
02, the direction selection device 1401 and the operation device 1302, the signal receiving device 1702 can be added to remotely control the object by using only the voice recognition device as a remote controller. Here, it is possible to use infrared rays or wireless for signal transmission / reception.

【０１３７】また、上述した音声反応装置を風船に取り
つけることによって、風船と対話したり、風船をコント
ロールすることが可能になり、風船独特のあたたかみを
生かした玩具を作ることが可能となる。By attaching the above-described voice reaction device to a balloon, it becomes possible to interact with the balloon and control the balloon, and it becomes possible to make a toy that makes the best use of the warmth unique to the balloon.

【０１３８】また、図３３に示すように、上述した音声
認識装置と音声選択装置とを備えた音声反応装置１２０
３を風船１８０１に取り付けた物を２つ用意し、人がこ
の音声反応装置に話しかけるのではなく、２つの音声反
応装置同士がお互いに対話するように構成すれば、勝手
に対話するような玩具を作ることが可能となる。さら
に、この音声反応装置付き風船１８０１を複数用意し、
対話させることも可能である。このときに、それぞれの
音声反応装置付き風船に音声認識過程でリジェクト機能
を持たせれば、特定の言葉に対してのみ反応することが
可能となり、ある発声に対し一つの風船だけが反応する
ように構成することも可能となる。例えば、それぞれの
風船１８０１に名前を付け、その名前を呼んだ時だけ反
応させることが可能となる。ここでリジェクトの方法は
音声認識を行う時に内部の辞書と距離を計算するが、実
験的に閾値を決めておき、その閾値を越えたものをリジ
ェクトするというものがある。さらに、音声反応装置に
時計を組み込んで、所定の時間が経過したら、登録され
ている出力音声集合の中から１つの音声をランダムに選
んで出力させることにより、音声反応装置側から対話を
始めることのできる玩具を構成することも可能である。Further, as shown in FIG. 33, a voice reaction device 120 equipped with the above-mentioned voice recognition device and voice selection device.
If two objects with 3 attached to the balloon 1801 are prepared, and two voice reaction devices are configured to interact with each other instead of talking to the voice reaction device by a person, a toy that interacts freely It becomes possible to make. Furthermore, prepare a plurality of balloons 1801 with the voice reaction device,
It is also possible to have a dialogue. At this time, if each balloon with a voice reaction device has a reject function in the voice recognition process, it becomes possible to respond only to a specific word, and only one balloon responds to a certain utterance. It is also possible to configure. For example, it is possible to give a name to each balloon 1801 and make it react only when the name is called. Here, the reject method calculates a distance with an internal dictionary when performing voice recognition, but there is a method in which a threshold value is experimentally determined and one that exceeds the threshold value is rejected. Further, by incorporating a clock into the voice reaction device, and when a predetermined time has elapsed, one voice is randomly selected from the registered output voice set and outputted, so that the voice reaction device side starts the dialogue. It is also possible to construct a toy capable of playing.

【０１３９】なお、上記対象物は風船に限定されるもの
ではなく、ぬいぐるみや人形、あるいは写真や絵であっ
てもかまわない。また、ディスプレイ中の動画であって
もよい。また、対象物として、風船以外の反重力装置
（例えば、ヘリコプターのようにプロペラによって浮上
するものや、リニアモーターカーのように磁力によって
浮上するもの）を用いてもよい。The object is not limited to a balloon, and may be a stuffed animal, a doll, a photograph or a picture. It may also be a moving image on the display. Further, as the object, an anti-gravity device other than a balloon (for example, a device that is levitated by a propeller like a helicopter, or a device that is levitated by magnetic force like a linear motor car) may be used.

【０１４０】[0140]

【発明の効果】以上説明したように、本発明によれば、
人間にとって自然な音声による操作が可能であり、かつ
操作習熟を必要としないゲーム装置を得ることができ
る。また、音声のみから入力された言葉（コマンド）を
認識するのではなく、口唇の動きを利用しているので、
騒音下においても安定な操作が可能である。さらに、口
唇の動きをＬＥＤとフォトダイオード（フォトトランジ
スタ）の組み合わせでとらえるため、ビデオカメラ、あ
るいは超音波等を利用する場合と比較して、低コストで
実現することができる。As described above, according to the present invention,
It is possible to obtain a game device that can be operated by a voice that is natural to humans and does not require operation proficiency. Also, instead of recognizing words (commands) input from only voice, it uses the movement of the lips,
Stable operation is possible even under noise. Furthermore, since the movement of the lips is detected by the combination of the LED and the photodiode (phototransistor), it can be realized at a low cost as compared with the case of using a video camera or ultrasonic waves.

【０１４１】さらに、本発明の音声認識装置では、口唇
の動きから話者の発声区間を検出し、これを音声認識結
果の判断材料とするため、話者以外の発声による誤認識
を防止することができる。また、本発明の他の音声認識
装置では、口唇の動きから入力された言葉（コマンド）
を認識して飛行船の制御を行うために、騒音下において
も、また声が出しにくい状況や、発声に障害を持つ者の
利用も可能となる。Further, in the voice recognition device of the present invention, the utterance section of the speaker is detected from the movement of the lips, and this is used as the judgment material of the voice recognition result, so that the erroneous recognition due to the utterance of the person other than the speaker is prevented. You can In another voice recognition device of the present invention, a word (command) input from the movement of the lips
In order to control the airship by recognizing the above, it is possible to use even in a noisy situation, in a situation where it is difficult to make a voice, or for a person who has a speech disorder.

【０１４２】また、本発明の入力装置は、軽いヘッドセ
ットと支柱および台に安価な発光素子（ＬＥＤ等）と安
価な受光素子（フォトダイオード等）を取り付けてい
る。このため、非常に軽く、しかも安価な入力装置を実
現することができる。In the input device of the present invention, an inexpensive light emitting element (LED or the like) and an inexpensive light receiving element (photodiode or the like) are attached to a light headset, a support and a stand. Therefore, an extremely light and inexpensive input device can be realized.

【０１４３】以上説明したように、本発明の音声選択装
置は、入出力の状態を複数用意し過去の入出力の履歴に
より入出力の状態を遷移させる。このため、この音声選
択装置を用いることにより簡単な対話をする装置を提供
することが可能となる。また、本発明の音声選択装置は
１つの入力に対し複数の出力を用意しており、この中か
らランダムに選択した１つを出力するので、１つの入力
に対し常に同じ応答ではなく、変化のある応答をするこ
とができる。As described above, the voice selection device of the present invention prepares a plurality of input / output states and changes the input / output states according to the past input / output history. Therefore, it becomes possible to provide a device for a simple dialogue by using this voice selection device. Further, the voice selection device of the present invention prepares a plurality of outputs for one input and outputs one randomly selected from these, so that the same response is not always given to one input, but a change of You can give a response.

【０１４４】また、本発明の方向検出装置は、複数のマ
イクによって音声を入力し、エネルギーが最大となるマ
イクを検出する。これにより、音声が発声された方向を
検出することができる。さらに、本発明の方向選択装置
を用いれば、方位計によって現在の位置を検出しなが
ら、対象物を入力された方向に正確に移動させたり、あ
るいは入力された方向に対象物の向きを変えたりするこ
とができる。Further, the direction detecting device of the present invention inputs voice by a plurality of microphones and detects the microphone having the maximum energy. Thereby, the direction in which the voice is uttered can be detected. Further, by using the direction selection device of the present invention, while detecting the current position by the azimuth meter, the object can be moved accurately in the input direction, or the direction of the object can be changed to the input direction. can do.

【０１４５】また、本発明の音声認識装置は、音声終了
点検出装置によりまず大まかな音声の終了点を求めてか
ら、音声検出装置で自動的に閾値を求める。ここで、入
力された音声のエネルギーの最大値と、エネルギーを平
滑化したものの最小値とから閾値を決定しているので、
音声の発声区間の長短に関係なく、良好な音声区間抽出
を行うことができる。音声検出装置が閾値を用いて音声
を検出すると、この音声から特徴量を求め、これに基づ
いて音声認識を行う。Further, in the voice recognition apparatus of the present invention, the voice end point detection apparatus first obtains a rough voice end point, and then the voice detection apparatus automatically obtains a threshold value. Here, since the threshold value is determined from the maximum value of the energy of the input voice and the minimum value of the smoothed energy,
Good voice segment extraction can be performed regardless of the length of the vocal segment of the voice. When the voice detection device detects a voice using a threshold, a feature amount is obtained from this voice, and voice recognition is performed based on the feature amount.

【０１４６】また、上述した装置を適宜組み合わせるこ
とにより、様々な音声反応装置を得ることができる。例
えば、音声認識装置と音声選択装置を組み合わせれば、
人が声で話しかけると返答する音声反応装置が得られ、
これによりマン・マシンインターフェースを構築するこ
とが可能となる。また、方向検出装置と動作装置を組み
合わせれば、音声に反応して対象物を動作させることが
可能となるし、音声認識装置と方向選択装置と動作装置
を組み合わせれば、音声の内容が示す方向に対象物を正
確に移動させたり、音声の内容が示す方向に対象物の向
きを変えたりすることが可能となる。さらに、音声反応
装置のうちの音声認識装置に信号送信装置を接続し、音
声認識装置の後段にくる装置に信号受信装置を接続して
対象物に取り付ければ、遠隔からの操作が可能である音
声反応装置を実現することができる。Various voice reaction devices can be obtained by appropriately combining the above-mentioned devices. For example, if you combine a voice recognition device and a voice selection device,
I get a voice reaction device that responds when a person speaks in a voice,
This makes it possible to build a man-machine interface. Also, if the direction detection device and the operation device are combined, it becomes possible to move the object in response to the voice, and if the voice recognition device, the direction selection device and the operation device are combined, the content of the voice is shown. It is possible to accurately move the target object in the direction and to change the direction of the target object in the direction indicated by the voice content. Furthermore, if a signal transmission device is connected to the voice recognition device of the voice reaction devices, and a signal reception device is connected to a device that comes after the voice recognition device and attached to an object, a voice that can be remotely operated is possible. A reactor can be realized.

【０１４７】さらに、上述したような音声反応装置を複
数個用意すれば、音声反応装置間で自動的に対話をする
玩具を構成することも可能である。また、音声反応装置
をそれぞれ風船に付ければ、風船独特の暖かみを持ち、
しかも話しかけることが可能な玩具を作ることができ
る。また、時計を組み込み、ある時間がくれば適当な音
声を出力することによって人間から話かけるのではな
く、自分から話しかける音声反応装置を作ることも可能
である。Furthermore, by preparing a plurality of voice reaction devices as described above, it is possible to construct a toy that automatically interacts with each other. Also, if you attach each voice reaction device to the balloon, you will have the warmth peculiar to the balloon,
Moreover, it is possible to make toys that can talk to you. It is also possible to build a voice reaction device in which a human being speaks instead of speaking from a human by incorporating a clock and outputting an appropriate voice at a certain time.

[Brief description of drawings]

【図１】本発明の第１の実施例のゲーム装置の構成を示
すブロック図である。FIG. 1 is a block diagram showing a configuration of a game device according to a first embodiment of the present invention.

【図２】本発明の第１〜第３の実施例の画像入力部の詳
細な構成を示す図である。FIG. 2 is a diagram showing a detailed configuration of an image input unit according to first to third embodiments of the present invention.

【図３】本発明の第１の実施例における発声区間検出部
の詳細な構成を示す図である。FIG. 3 is a diagram showing a detailed configuration of a vocalization section detection unit in the first exemplary embodiment of the present invention.

【図４】本発明の第１の実施例における統合判断部の詳
細な構成を示すブロック図である。FIG. 4 is a block diagram showing a detailed configuration of an integrated determination unit in the first exemplary embodiment of the present invention.

【図５】本発明の第１〜第３の実施例における微分信号
の出力例を示すグラフである。FIG. 5 is a graph showing an output example of differential signals in the first to third embodiments of the present invention.

【図６】図３の発声区間検出部の処理動作を説明するた
めの図である。FIG. 6 is a diagram for explaining the processing operation of the utterance section detection unit of FIG.

【図７】図４の統合判断部の処理動作を説明するための
図である。FIG. 7 is a diagram for explaining the processing operation of the integrated determination unit of FIG.

【図８】本発明の第２の実施例のゲーム装置の構成を示
すブロック図である。FIG. 8 is a block diagram showing a configuration of a game device according to a second embodiment of the present invention.

【図９】本発明の第２、第３の実施例における口唇認識
部の詳細な構成を示すブロック図である。FIG. 9 is a block diagram showing a detailed configuration of a lip recognition unit in second and third embodiments of the present invention.

【図１０】本発明の第２、第３の実施例における微分回
路の処理動作を示す図である。FIG. 10 is a diagram showing the processing operation of the differentiating circuit in the second and third embodiments of the present invention.

【図１１】本発明の第２、第３の実施例のパターンマッ
チング部の処理動作を示す図である。FIG. 11 is a diagram showing the processing operation of the pattern matching unit according to the second and third embodiments of the present invention.

【図１２】本発明の第３の実施例のゲーム装置の構成を
示すブロック図である。FIG. 12 is a block diagram showing a configuration of a game device according to a third embodiment of the present invention.

【図１３】本発明の第３の実施例における統合判断部の
処理動作を示す図である。FIG. 13 is a diagram showing a processing operation of an integrated determination unit in the third embodiment of the present invention.

【図１４】本発明の第３の実施例における統合判断部の
処理動作を示す図である。FIG. 14 is a diagram showing a processing operation of an integrated determination unit in the third exemplary embodiment of the present invention.

【図１５】本発明の入力装置の具体的構成例を示す図で
ある。FIG. 15 is a diagram showing a specific configuration example of the input device of the present invention.

【図１６】本発明の第４の実施例の音声選択装置の構成
を示す図である。FIG. 16 is a diagram showing a configuration of a voice selection device according to a fourth exemplary embodiment of the present invention.

【図１７】図１６の音声選択装置における入出力状態を
示す図である。17 is a diagram showing an input / output state in the voice selection device of FIG.

【図１８】本発明の変形例の音声選択装置の構成を示す
図である。FIG. 18 is a diagram showing a configuration of a voice selection device of a modified example of the invention.

【図１９】本発明の第５の実施例の方向検出装置の構成
を示す図である。FIG. 19 is a diagram showing a configuration of a direction detection device according to a fifth exemplary embodiment of the present invention.

【図２０】入力された音声の波形とフレームとを説明す
る図である。FIG. 20 is a diagram illustrating a waveform and a frame of input voice.

【図２１】本発明の第５の実施例の方向選択装置の構成
を示す図である。FIG. 21 is a diagram showing a configuration of a direction selection device according to a fifth exemplary embodiment of the present invention.

【図２２】本発明の第５の実施例の他の方向選択装置の
構成を示す図である。FIG. 22 is a diagram showing the configuration of another direction selection device of the fifth exemplary embodiment of the present invention.

【図２３】音声波形、エネルギー、および循環メモリを
説明する図である。FIG. 23 is a diagram illustrating a speech waveform, energy, and a circular memory.

【図２４】本発明の第６の実施例における音声終了点の
検出方法を説明する図である。FIG. 24 is a diagram illustrating a method of detecting a voice end point according to the sixth embodiment of the present invention.

【図２５】本発明の第６の実施例における音声検出方法
を説明する図である。FIG. 25 is a diagram illustrating a voice detection method according to a sixth embodiment of the present invention.

【図２６】本発明の第６の実施例の音声認識装置の構成
を示すブロック図である。FIG. 26 is a block diagram showing a configuration of a voice recognition device according to a sixth embodiment of the present invention.

【図２７】本発明の音声認識装置、および音声選択装置
を用いた音声反応装置の構成を示す図である。FIG. 27 is a diagram showing a configuration of a voice reaction device using a voice recognition device and a voice selection device of the present invention.

【図２８】本発明の方向検出装置、および動作装置を用
いた音声反応装置の構成を示す図である。FIG. 28 is a diagram showing a configuration of a voice reaction device using the direction detection device and the operation device of the present invention.

【図２９】本発明の音声認識装置、方向選択装置、およ
び動作装置を用いた音声反応装置の構成を示す図であ
る。FIG. 29 is a diagram showing a configuration of a voice reaction device using the voice recognition device, the direction selection device, and the operation device of the present invention.

【図３０】本発明の方向検出装置、方向選択装置、およ
び動作装置を用いた音声反応装置の構成を示す図であ
る。FIG. 30 is a diagram showing a configuration of a voice reaction device using a direction detection device, a direction selection device, and an operation device of the present invention.

【図３１】本発明の音声認識装置、および動作装置を用
いた音声反応装置の構成を示す図である。FIG. 31 is a diagram showing a configuration of a voice reaction device using the voice recognition device and the operation device of the present invention.

【図３２】本発明の遠隔操作が可能な音声反応装置の構
成を示す図である。FIG. 32 is a diagram showing the configuration of a remotely operable voice reaction device of the present invention.

【図３３】本発明の音声反応装置を用いた玩具の一例を
示す図である。FIG. 33 is a diagram showing an example of a toy using the voice reaction device of the present invention.

【図３４】従来のゲーム装置の構成を示す図である。FIG. 34 is a diagram showing a configuration of a conventional game device.

[Explanation of symbols]

１音声入力部３画像入力部２音声認識部４発声区間検出部５、１２３統合判断部６制御部７飛行船２１ＬＥＤ２２フォトダイオード８１口唇認識部１００，１００ａ音声選択装置１０１乱数発生部１０２音声選択部１０３入出力状態メモリ１０４状態遷移部１０５入出力状態データベース４００、１３０１方向検出装置４０１方向検出部６００、７００、１４０１方向選択装置６０１オフセット算出装置６０２方位計６０３目的方向メモリ７０１方向算出装置１１０１音声終了点検出装置１１０２音声検出装置１１０３特徴量抽出装置１１０４距離計算装置１１０５辞書１２０１音声認識装置１２０２音声選択装置１３０２動作装置１７０１信号送信装置１７０２信号受信装置 1 voice input unit 3 image input unit 2 voice recognition unit 4 vocalization section detection unit 5, 123 integrated judgment unit 6 control unit 7 airship 21 LED 22 photodiode 81 lip recognition unit 100, 100a voice selection device 101 random number generation unit 102 voice selection Part 103 Input / output state memory 104 State transition part 105 Input / output state database 400, 1301 Direction detection device 401 Direction detection part 600, 700, 1401 Direction selection device 601 Offset calculation device 602 Direction meter 603 Target direction memory 701 Direction calculation device 1101 Voice End point detection device 1102 Speech detection device 1103 Feature amount extraction device 1104 Distance calculation device 1105 Dictionary 1201 Speech recognition device 1202 Speech selection device 1302 Operating device 1701 Signal transmission device 1702 Signal reception device

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号庁内整理番号ＦＩ技術表示箇所Ｇ１０Ｌ 9/12 ３０１ＢＨ０４Ｑ 9/00 ３０１Ｂ (72)発明者萱嶋一弘大阪府門真市大字門真1006番地松下電器産業株式会社内 (72)発明者松井謙二大阪府門真市大字門真1006番地松下電器産業株式会社内 (72)発明者松川善彦大阪府門真市大字門真1006番地松下電器産業株式会社内─────────────────────────────────────────────────── ─── Continuation of the front page (51) Int.Cl. ⁶ Identification number Reference number within the agency FI Technical display area G10L 9/12 301 B H04Q 9/00 301 B (72) Inventor Kazuhiro Kayashima Daimon Kadoma, Kadoma City, Osaka Prefecture 1006 address Matsushita Electric Industrial Co., Ltd.

Claims

[Claims]

1. Inputting at least one voice including a voice generated by an operator, and first inputting the input voice.
And a voice recognition means for recognizing the at least one voice based on the first electric signal output from the voice input means. An image input means for optically detecting the movement of the operator's lip, converting the detected movement of the lip into a second electric signal, and outputting the second electric signal; A generated section detecting unit that receives an electric signal and obtains a section in which the voice is generated by the speaker based on the received second electric signal, and the at least one voice recognized by the voice recognizing unit. And an integrated determination unit that extracts the voice generated by the operator from the at least one voice based on the period obtained by the generation period detection unit, and the integrated determination unit extracted by the integrated determination unit. voice Based on,
A game device comprising: a control unit that controls an object.

2. The voicing section detecting means includes a differentiating means for detecting a degree of change of the second electric signal output from the image inputting means, and a predetermined degree of the change detected by the differentiating means. The game device according to claim 1, further comprising: a unit that determines that the corresponding voice is generated by the operator when the value exceeds.

3. The integration judgment means adds an interval of a predetermined length to the interval obtained by the vocalization interval detection means to create an evaluation interval, and the speech recognition means recognizes the evaluation interval. At least 1
A unit that detects the recognition result output time of one voice output from the voice recognition unit, compares the recognition result output time with the evaluation section, and outputs the recognition result output time of the at least one voice. 3. The game apparatus according to claim 1, further comprising: a unit that determines that a voice included in the evaluation section is the voice uttered by the operator.

4. The movement of the operator's lips is optically input,
An image input unit that converts the input movement of the lip into an electric signal and outputs the electric signal, and obtains the movement of the lip based on the electric signal, and displays a word corresponding to the obtained movement of the lip. A game device comprising: a lip recognition unit that recognizes and outputs a recognition result; and a control unit that controls an object according to a control signal based on the recognition result.

5. The lip recognizing means selects a storage means for storing a predetermined number of words, selects one from the predetermined number of words according to the obtained movement of the lip, and selects the selected word. The game device according to claim 4, further comprising a matching unit that determines that the word is the word corresponding to the movement of the lip.

6. The storage means stores movements of lips corresponding to the predetermined number of words as a standard pattern, and the matching means stores the movements of the lips obtained for all of the standard patterns. 6. The game device according to claim 5, wherein the word corresponding to one of the standard patterns having the smallest distance is selected.

7. A voice input means for inputting voice, converting the voice into another electric signal, and outputting the other electric signal, and based on the other electric signal output from the voice input means. An integrated judgment means for outputting the control signal to be given to the control means based on both the voice recognition means for recognizing the voice, the recognition result by the voice recognition means, and the recognition result by the lip recognition means. The game apparatus according to claim 4, further comprising:

8. A means for obtaining a voice recognition reliability with respect to the recognition result by the voice recognition means, and a means for obtaining a lip recognition reliability with respect to the recognition result by the lip recognition means. The integrated determination means selects one of the recognition result by the voice recognition means and the recognition result by the lip recognition means based on the voice recognition reliability and the lip recognition reliability, and The game device according to claim 7, wherein is output as the control signal.

9. The image input means, a light emitting means for emitting light, and a light receiving means for receiving the light reflected by the lip of the operator and converting the received light into the second electric signal. The game device according to claim 1, comprising:

10. The image input means includes a light emitting means for emitting light and a light receiving means for receiving the light reflected by the lips of the operator and converting the received light into the electric signal. The game apparatus according to claim 4, wherein

11. The image input means includes a light emitting means for emitting light and a light receiving means for receiving the light reflected by the lips of the operator and converting the received light into the electric signal. The game device according to claim 7, which is operating.

12. The game device according to claim 9, wherein the light is laterally applied to the lip.

13. The game device according to claim 9, wherein the light illuminates the lip from the front.

14. The game device according to claim 1, wherein the voice input unit has at least one microphone.

15. The voice input unit has at least one microphone, and the at least one microphone and the light emitting unit and the light receiving unit of the image input unit are provided on one stand. The game device according to claim 11.

16. A headphone-like headset, a support having one end joined to the headset, and a base joined to the other end of the support, on which
A pedestal provided with at least one light emitting element that generates light that is applied to the lips of the operator, and at least one light receiving element that receives the light reflected by the lips;
An input device equipped with.

17. The input device according to claim 16, wherein voice input means for inputting voice is provided on the table.

18. A first storage means for storing a plurality of tables, wherein each of the plurality of tables includes a plurality of words that can be output with respect to one input.
Storage means, second storage means for storing one of the plurality of tables, and one of the tables stored in the second storage means in response to an external input. Selecting one word from the plurality of words, and outputting the selected one word as a voice, and the one table stored in the second storage means, And a transition means for updating from the plurality of tables stored in the storage means to another table determined according to the selected one word.

19. The voice selection device according to claim 18, further comprising means for generating a random number, wherein the selection means selects the one word from the plurality of words by using the random number.

20. Storage means for storing a table, wherein the table includes storage means containing a plurality of words that can be output in response to one input, and input from the outside, A voice provided with a selecting means for selecting one word from the plurality of words contained in the stored table by using a random number and outputting it as a voice, and a means for generating the random number. Selection device.

21. A voice selection device according to any one of claims 18, 19 and 20, and a voice recognition means for inputting voice, recognizing the voice, and giving a recognition result to the voice selection device. And a voice reaction device.

22. A game device comprising the voice reaction device according to claim 21.

23. A game device comprising a plurality of voice reaction devices according to claim 21, whereby the voice reaction devices interact with each other.

24. A plurality of voice input units for converting an input voice into an electric signal, wherein the plurality of voice input units correspond to different directions, and the energy of the electric signal is stored in the plurality of voice input units. For each of the plurality of voice input units, one of the plurality of voice input units having the largest energy is determined, and the voice is generated in the direction corresponding to the determined one voice input unit. A game device that includes a direction detecting unit that determines that the direction is a reversed direction.

25. The apparatus further comprises an operating means for moving the object, and a control means for controlling the operating means so as to change the operating direction of the object in the determined direction. The game device according to 1.

26. The game device, wherein a measuring means for measuring a current direction of movement of an object, and the determined direction are input, and a target direction is determined based on the current direction and the determined direction. The method further includes: direction selecting means having means for storing the desired direction, and means for operating the object, the direction selecting means including the direction of the target and the current direction. 25. The game device according to claim 24, wherein the movement means is controlled so that the current direction of the movement of the object and the target direction substantially coincide with each other by using the difference in the direction.

27. Input means for inputting a relative direction by voice, measuring means for measuring a current direction of an object, and a target direction based on the current direction and the input relative direction. A game device having a direction selection means having a means for determining and storing the target direction, the direction selection means using the difference between the target direction and the current direction, A game device for controlling the target object so that the current direction and the target direction substantially match.

28. The input unit according to claim 27, further comprising an input unit for inputting the voice, and a recognition unit for recognizing the relative direction based on the input voice. Game device.

29. Input means for inputting an absolute direction by voice, means for determining a target direction based on the absolute direction and storing the target direction, and measuring the current direction of the object. A game device comprising a direction selecting means having a measuring means, wherein the direction selecting means uses the difference between the target direction and the current direction to determine the current direction and the target direction of the object. A game device that controls the object so that and substantially match.

30. The input unit has an input unit for inputting the voice, and a recognition unit for recognizing the absolute direction based on the input voice. Game device.

31. Receiving an electrical signal corresponding to voice,
First detection means for detecting, from the electric signal, a voice end point which is a time when the input of the voice is finished, and the voice in a section where the voice is inputted is uttered based on the electric signal. Second detection means for determining a vocalization section which is a section, feature quantity extraction means for creating a feature quantity vector on the basis of the vocalization section of the electric signal, and a plurality of candidate speeches created in advance. Storage means for storing a feature quantity vector; and comparing the feature quantity vector from the feature quantity extraction means with each of the feature quantity vectors of the plurality of candidate speeches stored in the storage means, A voice recognition device comprising: means for recognizing input voice.

32. The first detecting means divides the electric signal into a plurality of frames each having a predetermined length, and the energy of the electric signal for each of the plurality of frames. 32. The voice recognition device according to claim 31, further comprising: a calculating unit that obtains the result and a determining unit that determines the voice end point based on the dispersion of the energy.

33. The deciding means decides the voice end point by comparing a predetermined threshold value with the variance of the energy, and the voice end point is the threshold value when the variance of the energy is the threshold value. 33. The speech recognition apparatus according to claim 32, wherein the variance is the time when the variance matches the threshold when changing from a larger value to a smaller value.

34. The speech recognition apparatus according to claim 32 or 33, wherein the determining means uses a variance of the energies of the plurality of frames for a predetermined number of frames.

35. The second detecting means includes means for smoothing the energy of the electric signal, and first circulating storage means for sequentially storing the energy of the electric signal for each frame without smoothing the energy. Second circular storage means for sequentially storing the smoothed energy for each frame, and the smoothing stored in the first cyclic storage means when the voice end point is detected. Threshold value calculation means for calculating a threshold value for vocalization interval detection using both the unenhanced energy and the smoothed energy stored in the second circulation type storage means, and the unsmoothed energy 33. The speech recognition apparatus according to claim 32, further comprising: a vocalization section determining unit that determines the vocalization section by comparing the vocalization section detection threshold value.

36. The threshold value calculation means, when the voice end point is detected, the maximum value of the unsmoothed energy stored in the first circulation type storage means, and the voice end point. 36. The voice recognition according to claim 35, wherein the threshold for voiced section detection is calculated using the minimum value of the smoothing energy stored in the second circulation type storage means at the time when is not detected. apparatus.

37. The feature amount detecting means, from the portion of the vocalization section of the electric signal, the number of zero crossings for each frame of the electric signal and zero for each frame of the signal obtained by differentiating the electric signal. The voice recognition device according to claim 35 or 36, wherein the number of intersections and the energy of the electric signal are calculated, and these are used as elements of the feature amount vector.

38. At least one voice recognition device according to claim 32, and at least one control means for controlling an object based on a recognition result of the at least one voice recognition device. , A voice reaction device comprising.

39. The voice reaction device is connected to the at least one voice recognition device,
Transmitting means for transmitting the recognition result by the at least one voice recognition device, and receiving means connected to the at least one control device for receiving the transmitted recognition result and giving it to the at least one control device 39. The voice of claim 38, further comprising: and the at least one controller and the receiving means are attached to the object, thereby enabling remote operation of the object. Reactor.