JPH1124694A

JPH1124694A - Instruction recognition device

Info

Publication number: JPH1124694A
Application number: JP9194745A
Authority: JP
Inventors: Hitoshi Hongo; 仁志本郷; Masahiro Ishiba; 正大石場
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1997-07-04
Filing date: 1997-07-04
Publication date: 1999-01-29

Abstract

PROBLEM TO BE SOLVED: To provide an instruction recognition device which can recognize the instruction by voice by an operator giving an instruction to the device. SOLUTION: This device comprises a camera 16 and a line of sight detecting part 18 in addition to a microphone 10 and voice recognition part 12 for performing a voice recognition. When a voice is inputted, an execution processing is performed only when the line of sight turned to an equipment having this instruction recognition executing device A loaded thereon is detected. Further, it is judged whether the mouth of a person having the detected line of sight is moved or not, and the execution processing is performed only when the above line of sight is detected in the voice input, and the movement of the mouth of the person having this line of sight is also present.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、命令認識装置に関
するものであり、特に、音声認識により命令を認識する
命令認識装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a command recognition device, and more particularly to a command recognition device for recognizing a command by voice recognition.

【０００２】[0002]

【従来の技術】従来より音声認識技術の開発が進められ
ており、例えば、パーソナルコンピュータの分野では、
この音声認識技術を使って操作者の命令を認識して実行
することが行われている。また、特開平８−８３０９３
号公報には、表示画面上のユーザの視線の動きに基づい
て音声認識する対象を制御する点が開示されている。2. Description of the Related Art Speech recognition technology has been developed conventionally. For example, in the field of personal computers,
It has been performed to recognize and execute an operator's command using this voice recognition technology. Also, JP-A-8-83093
Japanese Patent Application Laid-Open Publication No. H11-157, discloses that a target for voice recognition is controlled based on a movement of a user's line of sight on a display screen.

【０００３】[0003]

【発明が解決しようとする課題】しかし、今後音声認識
により操作することができる家電製品等の装置が多く世
に出てくることが予想されるが、音声認識をインタフェ
ースとする製品が多数存在する状況では、本来動作を意
図していない装置が誤って動作してしまうおそれがあ
る。例えば、音声認識機能を有する装置としてテレビと
エアコンが近くに存在する場合に、本来テレビを動作さ
せるつもりで「電源オン」と発声したところ、エアコン
まで動作してしまうおそれがある。そこで、本発明は、
操作者が本来命令を発しようとしている装置のみが音声
による命令を認識することができる命令認識装置を提供
することを目的とする。However, it is expected that many devices such as home appliances that can be operated by voice recognition will appear in the future, but there are many products that use voice recognition as an interface. In such a case, a device that is not originally intended to operate may operate erroneously. For example, when a television and an air conditioner are close to each other as a device having a voice recognition function, there is a possibility that the air conditioner may operate even if the user originally uttered “power on” with the intention of operating the television. Therefore, the present invention
It is an object of the present invention to provide a command recognition device capable of recognizing a voice command only by a device to which the operator originally intends to issue a command.

【０００４】[0004]

【課題を解決するための手段】本発明は上記問題点を解
決するために創作されたものであって、第１には、音声
により入力された命令を認識する命令認識装置であっ
て、音声を入力する音声入力手段と、任意の被写体を撮
影する撮影手段と、該撮影手段により撮影された画像に
より、視線を検出する視線検出手段と、該視線検出手段
により所定の視線が検出された場合に、上記音声入力手
段に入力された音声に基づく命令の発信を判断する判断
手段とを有することを特徴とする。この第１の構成の命
令認識装置においては、音声入力手段により音声が入力
される。一方、撮影手段が所定の被写体を撮影し、該撮
影手段が撮影した画像に基づき視線検出手段が視線を検
出する。そして、判断手段が、該視線検出手段により所
定の視線が検出された場合に、上記音声入力手段に入力
された音声に基づく命令の発信を判断する。よって、本
命令認識装置の方へ視線を向けていない無関係な音声に
よる誤作動を防止することができる。SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems, and a first object of the present invention is to provide a command recognition device for recognizing a command inputted by voice, A voice input unit for inputting an image, a photographing unit for photographing an arbitrary subject, a gaze detecting unit for detecting a gaze based on an image photographed by the photographing unit, and a case where a predetermined gaze is detected by the gaze detecting unit. And determining means for determining whether a command is transmitted based on the voice input to the voice input means. In the command recognition device of the first configuration, a voice is input by a voice input unit. On the other hand, the photographing means photographs a predetermined subject, and the gaze detecting means detects the gaze based on the image photographed by the photographing means. Then, when a predetermined line of sight is detected by the line-of-sight detection unit, the determination unit determines transmission of a command based on the voice input to the voice input unit. Therefore, it is possible to prevent a malfunction due to an irrelevant voice that is not directed toward the present command recognition apparatus.

【０００５】また、第２には、音声により入力された命
令を認識する命令認識装置であって、音声を入力する音
声入力手段と、任意の被写体を撮影する撮影手段と、該
撮影手段により撮影された画像により、視線を検出する
視線検出手段と、該視線検出手段により検出された視線
の人物の口が動いたかどうかを判定する口動作判定手段
と、該視線検出手段により所定の視線が検出され、か
つ、上記口動作判定手段により視線の人物の口が動いた
と判定された場合に、上記音声入力手段に入力された音
声に基づく命令の発信を判断する判断手段とを有するこ
とを特徴とする。この第２の構成の命令認識装置におい
ては、音声入力手段により音声が入力される。一方、撮
影手段が所定の被写体を撮影し、該撮影手段が撮影した
画像に基づき視線検出手段が視線を検出する。また、口
動作判定手段が、検出された視線の人物の口が動いたか
どうかを判定する。そして、判断手段が、該視線検出手
段により所定の視線が検出され、かつ、口動作判定手段
により視線の人物の口が動いたと判定された場合に、上
記音声入力手段に入力された音声に基づく命令の発信を
判断する。よって、視線が検出されても、口動作判定手
段により口の動きが検出されない場合には、命令が発信
されないので、音声とは無関係の視線を排除でき、音声
を発した者と視線の者とが異なる場合に、誤って動作す
ることを防止することができる。A second is a command recognition device for recognizing a command input by voice, comprising a voice input device for inputting voice, a shooting device for shooting an arbitrary subject, and a shooting process performed by the shooting device. Gaze detection means for detecting a gaze, a mouth movement determination means for determining whether or not the mouth of the person of the gaze detected by the gaze detection means has moved, and a predetermined gaze detected by the gaze detection means. And, when it is determined that the mouth of the line of sight has moved by the mouth movement determining means, the determining means to determine the transmission of a command based on the voice input to the voice input means, I do. In the command recognition device of the second configuration, a voice is input by a voice input unit. On the other hand, the photographing means photographs a predetermined subject, and the gaze detecting means detects the gaze based on the image photographed by the photographing means. Further, the mouth movement determining means determines whether the mouth of the person having the detected line of sight has moved. Then, when the predetermined line of sight is detected by the line of sight detecting unit and the mouth movement determining unit determines that the mouth of the person of the line of sight has moved, the determining unit is based on the voice input to the voice input unit. Judge the dispatch of the command. Therefore, even if the line of sight is detected, if the mouth movement is not detected by the mouth movement determining means, no command is transmitted, so that the line of sight unrelated to the voice can be excluded, and the person who has emitted the voice and the person with the line of sight If the numbers are different, it is possible to prevent erroneous operation.

【０００６】また、第３には、音声により入力された命
令を認識する命令認識装置であって、音声を入力する音
声入力手段と、該音声入力手段により入力された音声が
予め登録された音声と一致するか否かを判定する音声判
定手段と、視線を検出する視線検出手段と、該視線検出
手段により所定の視線が検出され、かつ、入力された音
声が登録された音声と一致する場合に、上記音声入力手
段に入力された音声に基づく命令の発信を判断する判断
手段とを有することを特徴とする。この第３の構成の命
令認識装置においては、音声入力手段により音声が入力
される。すると、上記音声判定手段が、入力された音声
が登録された音声と一致するか否かが判定される。一
方、上記視線検出手段が視線を検出する。そして、判断
手段が、該視線検出手段により所定の視線が検出され、
かつ、入力された音声が登録された音声と一致する場合
に、上記音声入力手段に入力された音声に基づく命令の
発信を判断する。よって、音声入力があり、視線が検出
されても、音声が異なる場合には命令が発信されないの
で、登録者とは異なる者の音声入力を排除でき、誤って
動作することを防止することができる。Thirdly, there is provided a command recognition device for recognizing a command input by voice, comprising voice input means for inputting voice, and voice input by the voice input means being registered in advance. Voice determining means for determining whether or not the visual line matches, a visual line detecting means for detecting a visual line, and a predetermined visual line detected by the visual line detecting means, and when the input voice matches the registered voice. And determining means for determining whether a command is transmitted based on the voice input to the voice input means. In the command recognition device of the third configuration, a voice is input by the voice input unit. Then, the voice determination means determines whether or not the input voice matches the registered voice. On the other hand, the sight line detecting means detects the sight line. Then, the determination unit detects a predetermined line of sight by the line of sight detection unit,
When the input voice matches the registered voice, it is determined that a command based on the voice input to the voice input unit is to be transmitted. Therefore, even if there is a voice input and the line of sight is detected, if the voice is different, no command is transmitted, so that voice input of a person different from the registrant can be excluded, and erroneous operation can be prevented. .

【０００７】また、第４には、上記第１から第３までの
いずれかの構成において、上記視線検出手段が、任意の
被写体を撮影する撮影手段を有し、上記視線検出手段
は、該撮影手段により撮影された画像により視線を検出
することを特徴とする。また、第５には、上記第１から
第４までのいずれかの構成において、上記所定の視線
が、所定の面又は空間を通過する視線であることを特徴
とする。また、第６には、上記第１から第５までのいず
れかの構成において、上記判断手段が、上記視線検出手
段により検出された視線が所定時間停留した場合に、音
声入力手段により入力された音声に基づく命令を発信す
ることを特徴とする。よって、単に視線が通過した場合
等を排除することができるので、動作の意志のない場合
の誤動作を防止することができる。Fourthly, in any one of the first to third configurations, the line-of-sight detecting means has a photographing means for photographing an arbitrary subject, and the line-of-sight detecting means comprises The gaze is detected from an image captured by the means. Fifth, in any one of the first to fourth configurations, the predetermined line of sight is a line of sight passing through a predetermined surface or space. Sixth, in any one of the first to fifth configurations, when the gaze detected by the gaze detection unit has stopped for a predetermined period of time, the determination unit may input the voice through the voice input unit. It is characterized by transmitting a command based on voice. Therefore, it is possible to eliminate the case where the line of sight is simply passed, thereby preventing a malfunction when there is no intention to operate.

【０００８】[0008]

【発明の実施の形態】本発明の実施の形態としての実施
例を図面を利用して説明する。本発明に基づく命令認識
実行装置Ａは、図１に示されるように、マイク１０と、
音声認識部１２と、命令認識部１４と、カメラ１６と、
視線検出部１８と、判断部２０と、命令発信部２２と、
実行部２４とを有している。DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described with reference to the drawings. As shown in FIG. 1, an instruction recognition execution device A according to the present invention includes a microphone 10,
A voice recognition unit 12, a command recognition unit 14, a camera 16,
A line-of-sight detection unit 18, a determination unit 20, a command transmission unit 22,
And an execution unit 24.

【０００９】ここで、上記音声入力手段としての上記マ
イク１０は、入力された音声を電気信号としての音声信
号に変換するものである。また、音声認識部１２は、上
記マイク１０から入力された音声信号の波長等を分析す
ることにより入力された音声を認識して、所定の音声デ
ータを出力するものである。この音声認識は、予め命令
語（単語）を登録しておくことで認識精度を向上させる
ことができる。具体的には、この音声認識部１２は、音
声認識のためのプログラムを格納する記憶部と該プログ
ラムに従い処理を行うＣＰＵ等により構成される。ま
た、上記命令認識部１４は、上記音声認識部１２から出
力された音声データに基づき命令内容を認識するもので
ある。例えば、「電源オン」との音声が入力された場合
には、「電源をオンする」旨の命令を認識する。そし
て、この認識した命令内容についての命令データは判断
部２０に送られる。Here, the microphone 10 as the voice input means converts the input voice into a voice signal as an electric signal. The voice recognition unit 12 recognizes the input voice by analyzing the wavelength and the like of the voice signal input from the microphone 10 and outputs predetermined voice data. This voice recognition can improve recognition accuracy by registering command words (words) in advance. Specifically, the voice recognition unit 12 is configured by a storage unit that stores a program for voice recognition, a CPU that performs processing according to the program, and the like. The command recognition unit 14 recognizes the content of a command based on voice data output from the voice recognition unit 12. For example, when a voice saying "power on" is input, a command to "power on" is recognized. Then, the command data for the recognized command content is sent to the determination unit 20.

【００１０】また、上記撮影手段としての上記カメラ１
６は、所定の被写体を撮影するものである。このカメラ
１６は、例えば、ＣＣＤカメラ等により構成される。ま
た、上記視線検出部１８は、上記カメラ１６により得ら
れた画像データに人間の顔があり、カメラ１６の方を向
いている場合に、その人の視線方向を検出するものであ
る。具体的には、撮影して得られた画像データに対して
エッジ抽出を行い、人間の顔が写っている場合に、その
目における黒目の方向を検出することにより行う。詳細
については後述する。この視線検出部１８は、実際に
は、視線検出を行うためのプログラムを格納した記憶部
と該プログラムに従い所定の処理を行うＣＰＵ等により
構成される。このカメラ１６と視線検出部１８とは上記
視線検出手段として機能する。The camera 1 as the photographing means
Reference numeral 6 denotes an image of a predetermined subject. The camera 16 is constituted by, for example, a CCD camera or the like. The gaze detection unit 18 detects the gaze direction of the person when the image data obtained by the camera 16 includes a human face and faces the camera 16. Specifically, edge extraction is performed on image data obtained by shooting, and when a human face is captured, the direction of the iris in the eyes is detected. Details will be described later. The gaze detection unit 18 is actually configured by a storage unit that stores a program for performing gaze detection, a CPU that performs predetermined processing according to the program, and the like. The camera 16 and the line-of-sight detecting unit 18 function as the line-of-sight detecting means.

【００１１】また、上記判断手段としての上記判断部２
０は、上記命令認識部１４から命令データが送られ、さ
らに、視線検出部１８により検出された視線が所定の対
象物の方を向いている場合に、該命令認識部１４から送
られた命令データを命令発信部２２に送信する。つま
り、所定の対象物を向いた所定の視線が検出された場合
には、入力された音声に基づく命令の発信を判断する。
この判断部２０は、カメラ１６の方向を向いているかど
うかの判断をも行う。また、命令発信部２２は、判断部
２０から送られた命令データに従い、実行部２４に対し
て所定の命令を発信する。実行部２４は、該命令発信部
２２からの発信された命令に従い所定の実行処理を行う
ものである。Further, the determination unit 2 as the determination means
0 is the command transmitted from the command recognition unit 14 when the command data is transmitted from the command recognition unit 14 and the line of sight detected by the line of sight detection unit 18 is directed to a predetermined target. The data is transmitted to the command transmission unit 22. That is, when a predetermined line of sight facing a predetermined object is detected, it is determined that a command based on the input voice is transmitted.
The determination unit 20 also determines whether or not the camera 16 is facing. In addition, the command transmission unit 22 transmits a predetermined command to the execution unit 24 according to the command data transmitted from the determination unit 20. The execution unit 24 performs a predetermined execution process according to the command transmitted from the command transmission unit 22.

【００１２】なお、上記命令認識実行装置Ａのうち、マ
イク１０と、音声認識部１２と、命令認識部１４と、カ
メラ１６と、視線検出部１８と、判断部２０が、上記命
令認識装置として機能する。なお、本実施例では、上記
のように視線検出手段をカメラ１６と視線検出部１８と
により構成するものとして説明したが、これには限られ
ず、人物の視線を検出できるものであれば他の構成とし
てもよい。Note that, of the command recognition execution device A, the microphone 10, the voice recognition unit 12, the command recognition unit 14, the camera 16, the line-of-sight detection unit 18, and the determination unit 20 constitute the command recognition device. Function. In the present embodiment, the line-of-sight detecting means is described as being configured by the camera 16 and the line-of-sight detecting unit 18 as described above. However, the present invention is not limited to this. It may be configured.

【００１３】上記構成に基づく命令認識実行装置Ａの動
作について、図２を利用して説明する。なお、この場合
には、図３に示すように、上記構成の命令認識実行装置
Ａがテレビジョン受信装置Ｐに搭載されているものとし
て説明する。まず、マイク１０に音声の入力があるかど
うか判定される（Ｓ１０）。そして、音声の入力がある
場合には、ステップＳ１１に移行し、音声の入力がない
場合には、ステップＳ１０に戻る。例えば、図３の甲が
「電源オン」と発声した場合には、音声の入力があるも
のと判定される。The operation of the instruction recognition execution device A based on the above configuration will be described with reference to FIG. In this case, as shown in FIG. 3, a description will be given assuming that the instruction recognition execution device A having the above configuration is mounted on the television receiver P. First, it is determined whether a voice is input to the microphone 10 (S10). If there is a voice input, the process proceeds to step S11, and if there is no voice input, the process returns to step S10. For example, when the user in FIG. 3 utters “power on”, it is determined that there is a voice input.

【００１４】ステップＳ１１では、視線が所定の対象物
の方を向いているか判定される（Ｓ１１）。ここでは、
上記所定の対象物としてテレビジョン受信装置Ｐの方を
向いているかどうかが判定される。すなわち、まず、カ
メラ１６が所定の被写体を撮影する。すると、このカメ
ラ１６により撮影された画像は視線検出部１８に送られ
る。視線検出部１８では、撮影された画像にエッジ抽出
処理が行われる。そして、エッジ抽出処理後に、まず、
人の顔があるかどうかが検出される。つまり、エッジ抽
出処理が行われた画像と標準パターン画像とを比較し
て、人の顔があるかどうかを推定する。なお、人の顔の
判定においては、肌色情報をも考慮して行うようにして
もよい。フレーム間の差分画像から移動した物体を検知
し、その物体と標準パターン画像とを比較して人の顔を
判定するようにしてもよい。In step S11, it is determined whether or not the line of sight is directed to a predetermined object (S11). here,
It is determined whether the predetermined object is facing the television receiver P. That is, first, the camera 16 photographs a predetermined subject. Then, the image captured by the camera 16 is sent to the visual line detection unit 18. The line-of-sight detection unit 18 performs an edge extraction process on the captured image. Then, after the edge extraction processing, first,
It is detected whether there is a human face. That is, the image subjected to the edge extraction processing is compared with the standard pattern image to estimate whether or not there is a human face. The determination of a human face may be performed in consideration of skin color information. The moving object may be detected from the difference image between frames, and the human face may be determined by comparing the moving object with the standard pattern image.

【００１５】人の顔があることが検出されたら、視線検
出部１８においては、視線が検出される。ここでは、視
線方向を検出する。すなわち、エッジ抽出することによ
り得られたエッジ情報から、検出された人の顔における
目の垂直方向の位置と目の水平方向の位置とを推定して
目の部分の画像を切り出し、この切り出した画像と濃淡
画像とから目における黒目の位置を水平方向及び垂直方
向に抽出する。その後、黒目の水平方向標準パターンと
垂直方向標準パターンとそれぞれ比較して、視線方向を
検出するのである。以上のようにして図３の甲の視線が
検出される。この視線方向のデータが判断部２０に送ら
れる。When it is detected that there is a human face, the line of sight is detected by the line of sight detecting unit 18. Here, the gaze direction is detected. That is, from the edge information obtained by extracting the edges, the vertical position of the eyes and the horizontal position of the eyes in the detected human face are estimated to cut out the image of the eye portion, and this cutout is cut out. The position of the iris in the eyes is extracted in the horizontal and vertical directions from the image and the grayscale image. After that, the gaze direction is detected by comparing the horizontal standard pattern and the vertical standard pattern of the iris. As described above, the line of sight of the instep in FIG. 3 is detected. The line-of-sight direction data is sent to the determination unit 20.

【００１６】そして、上記判断部２０では、検出された
視線の方向がテレビジョン受信装置Ｐを向いているかど
うかが判定される。つまり、撮影された画像におけるそ
の人の目の空間座標位置と、視線検出部１８により検出
された視線方向とにより、視線の空間位置を検出して、
これによりその人がテレビジョン受信装置Ｐの範囲を見
ているかどうかを判定する。なお、上記目の空間座標位
置の検出については次のように行う。つまり、図３に示
すＸ方向、Ｙ方向の座標については撮影された２次元画
像における座標により検出可能であり、また、Ｚ方向の
座標については、例えば、顔の大きさから推定する方法
や距離センサにより検出する方法等がある。なお、この
Ｚ方向の座標については、所定の光を照射し、顔や目等
に反射して帰ってくるまでの時間により算出してもよ
い。Then, the determination section 20 determines whether or not the direction of the detected line of sight is toward the television receiver P. That is, the spatial position of the line of sight is detected based on the spatial coordinate position of the eye of the person in the captured image and the line of sight direction detected by the line of sight detecting unit 18.
Thereby, it is determined whether or not the person is looking at the range of the television receiver P. The detection of the spatial coordinate position of the eye is performed as follows. That is, the coordinates in the X direction and the Y direction shown in FIG. 3 can be detected by the coordinates in the captured two-dimensional image, and the coordinates in the Z direction are, for example, a method of estimating from the size of the face and the distance. There is a method of detecting with a sensor. The coordinates in the Z direction may be calculated based on the time from when a predetermined light is emitted to when the light is reflected back on the face, eyes, and the like and returns.

【００１７】以上のようにして、目の空間位置と視線方
向とに基づき、空間内における視線の位置が決定される
ので、その視線がテレビジョン受信装置Ｐの前面を通過
するかどうかを判定する。具体的には、テレビジョン受
信装置Ｐの角部の座標（Ｘ１，Ｙ１）、（Ｘ２，Ｙ
２）、（Ｘ３，Ｙ３）、（Ｘ４，Ｙ４）の範囲内を通過
するかどうかにより、視線がテレビジョン受信装置Ｐの
方を向いているかどうかを判定する。上記の各座標につ
いては予め定めておく。なお、テレビジョン受信装置Ｐ
が占める空間のいずれかを視線を通過しているか否かに
より判定してもよい。つまり、視線が所定の面又は空間
を通過する視線であるか否かにより判定を行うようにす
る。As described above, since the position of the line of sight in the space is determined based on the spatial position of the eye and the direction of the line of sight, it is determined whether or not the line of sight passes through the front of the television receiver P. . Specifically, the coordinates (X1, Y1), (X2, Y1) of the corner of the television receiver P
2) It is determined whether or not the line of sight is facing the television receiver P based on whether or not the vehicle passes through the ranges of (X3, Y3) and (X4, Y4). Each of the above coordinates is determined in advance. Note that the television receiver P
May be determined based on whether or not any of the spaces occupied by the eyes passes through the line of sight. That is, the determination is made based on whether the line of sight is a line of sight passing through a predetermined surface or space.

【００１８】なお、視線がテレビジョン受信装置Ｐの方
を向いているかどうかを判定するに際して、注視点が検
出されるかどうかを条件とするのが好ましい。つまり、
テレビジョン受信装置Ｐのいずれかの位置を向いた視線
が一定時間停留しているかどうかにより判定する。つま
り、視線がテレビジョン受信装置Ｐの方を向いていたと
しても、単に視線を向けただけであるとか、テレビジョ
ン受信装置Ｐ以外の別の場所に視線を向けるために単に
視線方向が通過しただけの場合等には、テレビジョン受
信装置Ｐを動作させる意志がないものとして、所定時間
視線がテレビジョン受信装置Ｐに対して停留しているこ
とを条件とすることが望ましい。そして、このステップ
Ｓ１１により視線がテレビジョン受信装置Ｐの方を向い
ている場合には、ステップＳ１２に移行し、向いていな
い場合には、ステップＳ１０に戻る。When determining whether or not the line of sight is directed toward the television receiver P, it is preferable that a condition is determined on whether or not a point of gaze is detected. That is,
The determination is made based on whether or not the line of sight directed to any position of the television receiver P is stationary for a certain period of time. In other words, even if the line of sight is directed toward the television receiver P, the line of sight simply passes through the line of sight, or simply passes through the line of sight to direct the line of sight to another location other than the television receiver P. In such a case, it is desirable that there is no intention to operate the television receiver P, and that the condition is that the line of sight remains stationary with respect to the television receiver P for a predetermined time. If the line of sight is directed to the television receiver P in step S11, the process proceeds to step S12. If not, the process returns to step S10.

【００１９】ステップＳ１２では、音声認識と命令認識
が行われる（Ｓ１２）。つまり、上記音声認識部１２に
おいて入力された該音声が認識されて、音声認識部１２
からは音声データが出力される。例えば、「電源オン」
の音声が入力された場合には、この「電源オン」の音声
データが出力される。そして、命令認識部１４はこの音
声データに従い命令内容を認識して、命令データを出力
する。つまり、上記の例では、「電源をオン作動させ
る」旨の命令データが出力される。そして、上記判断部
２０は、上記命令データが送信された場合には、命令デ
ータを命令発信部２２に送信する。In step S12, voice recognition and command recognition are performed (S12). That is, the voice input in the voice recognition unit 12 is recognized, and the voice recognition unit 12
Output audio data. For example, "power on"
Is input, the audio data of "power on" is output. Then, the command recognition unit 14 recognizes the content of the command according to the voice data and outputs the command data. That is, in the above example, the command data indicating “turn on the power” is output. Then, when the command data is transmitted, the determination unit 20 transmits the command data to the command transmission unit 22.

【００２０】すると、命令発信部２２は、実行部２４に
対して所定の命令を発信する（Ｓ１３）。つまり、上記
の例では、テレビジョン受信装置Ｐの電源をオン作動さ
せる旨の命令が発信される。命令を受けた実行部２４
は、所定の命令を実行する。つまり、上記の例では、実
際に電源をオン作動させる。Then, the command transmission unit 22 transmits a predetermined command to the execution unit 24 (S13). That is, in the above example, a command to turn on the power of the television receiver P is transmitted. The execution unit 24 that has received the instruction
Executes a predetermined instruction. That is, in the above example, the power is actually turned on.

【００２１】なお、視線方向検出の方法は上記の方法に
は限られず、他の方法であってもよい。例えば、人の顔
があるかどうかの検出は上記と同様の方法で行い、視線
検出については、近赤外光を照射してその反射角度に基
づき算出するようにしてもよい。また、上記の方法と、
この近赤外光による方法とを併用するようにしてもよ
い。また、上記Ｚ方向の座標は、近赤外光が反射して戻
ってくる到達時間により検出してもよい。なお、視線検
出の方法については、特開平８−３２２７９６号公報や
特開平５−２０５０３０号公報に開示されている。The method of detecting the direction of the line of sight is not limited to the above method, but may be another method. For example, detection of whether or not there is a human face may be performed in the same manner as described above, and gaze detection may be performed by irradiating near-infrared light and calculating based on the reflection angle. Also, the method described above,
The method using near-infrared light may be used in combination. The coordinates in the Z direction may be detected based on the arrival time at which the near-infrared light is reflected and returned. The method of gaze detection is disclosed in JP-A-8-322796 and JP-A-5-205030.

【００２２】以上のようにすれば、音声入力があった場
合に、視線が検出された場合に限り実行処理を行うの
で、所定の実行処理を行いたい機器に対して視線を向け
て音声を発すればその機器において所定の動作を行うこ
とができ、他の機器にも上記機能を搭載させることによ
り、他の機器を誤って動作させるおそれがない。つまり
無関係な音声により誤って動作することがない。特に、
動作を希望する機器に目を向けて発声するのは極めて自
然な動作であるので、確実に所望の機器のみを動作させ
ることができる。According to the above, when a voice is input, the execution process is performed only when the line of sight is detected. By doing so, a predetermined operation can be performed in the device, and by mounting the above-described function in the other device, there is no possibility that the other device may be erroneously operated. In other words, there is no possibility of erroneous operation due to extraneous voice. Especially,
Since it is an extremely natural operation to speak while pointing to the device desired to operate, only the desired device can be reliably operated.

【００２３】なお、上記の例において、カメラ１６から
得た画像から検出された視線の人物の口の動きがあるか
どうかについても判定することが好ましい。すなわち、
この場合には、図２のフローチャートにおけるステップ
Ｓ１１において、視線が装置の方を向いているかどうか
についてと、さらに、口の動きがあるかどうかについて
が判定される。つまり、音声入力があり、視線が検出さ
れても、口の動きが検出されなければ、命令を発信しな
い。In the above example, it is preferable to determine whether or not there is a movement of the mouth of the person whose gaze is detected from the image obtained from the camera 16. That is,
In this case, in step S11 in the flowchart of FIG. 2, it is determined whether or not the line of sight is directed to the device and whether or not the mouth is moving. That is, even if there is a voice input and the line of sight is detected, no command is transmitted unless the movement of the mouth is detected.

【００２４】ここで、口の動きの検出は以下のように行
う。つまり、上記視線検出において得られたエッジ画像
から検出された人の顔において、口の位置を検出する。
口の位置の検出は、口の位置の垂直方向の位置と水平方
向の位置とを推定して口の部分を切り出す等して行う。
そして、複数のフレーム間において口の位置の画像に変
化がある場合には、口に動きがあるものと判定する。こ
の口の動きがあるか否かの判定は、上記判断部２０によ
り行う。つまり、ここでは判断部２０は上記口動作判定
手段として機能する。Here, the detection of the movement of the mouth is performed as follows. That is, the position of the mouth is detected in the human face detected from the edge image obtained in the line-of-sight detection.
The position of the mouth is detected by estimating the position of the mouth in the vertical direction and the position in the horizontal direction, and cutting out the portion of the mouth.
Then, when there is a change in the image of the position of the mouth between a plurality of frames, it is determined that there is movement in the mouth. The determination as to whether or not there is a movement of the mouth is performed by the determination unit 20. That is, here, the determination unit 20 functions as the mouth movement determination unit.

【００２５】以上のようにすれば、装置に視線を向けて
いる人物とは別の人物が音声を発した場合でも、音声と
は無関係な視線を排除して、誤って動作することを回避
することができる。つまり、音声入力と視線検出のみで
音声による命令を実行すると、装置に視線を向けている
者と音声を発した者とが別の場合でも動作してしまうお
それがあるが、口の動きについても検出することにより
誤動作を回避することができる。With the above arrangement, even when a person other than the person who is looking at the apparatus emits a voice, the line of sight unrelated to the voice is eliminated to avoid erroneous operation. be able to. In other words, if a voice command is executed only by voice input and eye gaze detection, there is a possibility that the person who is looking at the device and the person who emits the voice may operate even if they are different, The malfunction can be avoided by the detection.

【００２６】また、口の動きについて検出する代わり
に、使用者の声を登録しておき、使用者の声による音声
入力があり、かつ、所定の対象物への視線が検出された
場合に、命令を実行するようにしてもよい。この場合に
は、予め音声を登録しておき、入力された音声がこの登
録された音声と一致するか否かにより判定する。この判
定は、音声認識部１２により行う。つまり、該音声認識
部１２が音声判定手段として機能することになる。登録
できる音声の数は、命令認識実行装置が搭載される機器
の使用者の数に応じて定めるのが好ましい。また、上記
口の動きの検出と登録された音声の検出とを併用しても
よい。In addition, instead of detecting the movement of the mouth, the voice of the user is registered, and when there is a voice input by the voice of the user and a line of sight to a predetermined object is detected, The instruction may be executed. In this case, the voice is registered in advance, and it is determined whether or not the input voice matches the registered voice. This determination is performed by the voice recognition unit 12. That is, the voice recognition unit 12 functions as a voice determination unit. It is preferable that the number of voices that can be registered is determined according to the number of users of the equipment on which the instruction recognition execution device is mounted. Further, the detection of the movement of the mouth and the detection of the registered voice may be used in combination.

【００２７】なお、上記の例において、命令認識実行装
置Ａを搭載する機器をテレビジョン受信装置として説明
したが、これには限られず、エアコン、室内灯等の家電
製品でもよく、他のあらゆる機器に搭載が可能である。
また、上記の説明においては、視線が装置の方を向いて
いる場合に、音声認識を行い、命令認識を行うものとし
て説明したが、音声認識及び命令認識は常に行い、装置
の方を向いた視線を検出した場合に、命令を発するよう
にしてもよい。In the above example, the device equipped with the instruction recognition and execution device A has been described as a television receiver. However, the present invention is not limited to this, and may be home appliances such as an air conditioner and a room light. It can be mounted on.
Also, in the above description, it has been described that when the line of sight is directed toward the device, voice recognition is performed and command recognition is performed. However, voice recognition and command recognition are always performed and the device is directed toward the device. A command may be issued when the line of sight is detected.

【００２８】[0028]

【発明の効果】本発明に基づく請求項１に記載の命令認
識装置によれば、本命令認識装置の方へ視線を向けてい
ない無関係な音声による誤作動を防止することができ
る。また、請求項２に記載の命令認識装置によれば、視
線が検出されても、口動作判定手段により口の動きが検
出されない場合には、命令が発信されないので、音声と
は無関係の視線を排除でき、音声を発した者と視線の者
とが異なる場合に、誤って動作することを防止すること
ができる。また、請求項３に記載の命令認識装置によれ
ば、音声入力があり、視線が検出されても、音声が異な
る場合には命令が発信されないので、登録者とは異なる
者の音声入力を排除でき、誤って動作することを防止す
ることができる。また、特に請求項６に記載の命令認識
装置によれば、単に視線が通過した場合等を排除するこ
とができるので、動作の意志のない場合の誤動作を防止
することができる。According to the instruction recognition device of the first aspect based on the present invention, it is possible to prevent a malfunction caused by an irrelevant voice that is not directed toward the instruction recognition device. According to the command recognition device of the second aspect, even if a line of sight is detected, if no mouth movement is detected by the mouth motion determining means, no command is transmitted, so that a line of sight irrelevant to voice is generated. It can be excluded, and when the person who uttered the voice is different from the person whose eyes are different, it is possible to prevent erroneous operation. According to the command recognition device of the third aspect, even if there is a voice input and a line of sight is detected, no command is transmitted if the voice is different, so that a voice input of a person different from the registrant is excluded. Erroneous operation can be prevented. In particular, according to the instruction recognition device of the sixth aspect, it is possible to eliminate a case where the line of sight is simply passed, and thus it is possible to prevent a malfunction when there is no intention to operate.

[Brief description of the drawings]

【図１】本発明の実施例に基づく命令認識実行装置の構
成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of an instruction recognition execution device according to an embodiment of the present invention.

【図２】本発明の実施例に基づく命令認識実行装置の動
作を示すフローチャートである。FIG. 2 is a flowchart showing an operation of the instruction recognition execution device according to the embodiment of the present invention.

【図３】本発明の実施例に基づく命令認識実行装置の使
用状態を示す説明図である。FIG. 3 is an explanatory diagram showing a use state of the instruction recognition and execution device according to the embodiment of the present invention.

[Explanation of symbols]

Ａ命令認識実行装置１０マイク１２音声認識部１４命令認識部１６カメラ１８視線検出部２０判断部２２命令発信部２４実行部 A command recognition execution device 10 microphone 12 voice recognition unit 14 command recognition unit 16 camera 18 gaze detection unit 20 determination unit 22 command transmission unit 24 execution unit

Claims

[Claims]

1. A command recognition device for recognizing a command input by voice, comprising: voice input means for inputting voice; gaze detection means for detecting gaze; and a predetermined gaze detected by the gaze detection means. If
A command recognizing device comprising: a determination unit configured to determine whether a command is transmitted based on a voice input to the voice input unit.

2. A command recognition device for recognizing a command input by voice, comprising: voice input means for inputting voice; gaze detection means for detecting gaze; and a person having a gaze detected by the gaze detection means. A mouth movement determining means for determining whether the mouth has moved; and a predetermined line of sight detected by the line-of-sight detecting means, and when the mouth movement determining means determines that the mouth of the person with the line of sight has moved, A command recognition device, comprising: a determination unit configured to determine transmission of a command based on a voice input to a voice input unit.

3. A command recognition device for recognizing a command input by voice, comprising: voice input means for inputting voice; and whether voice input by the voice input means matches voice registered in advance. Voice determining means for determining whether or not a visual line is detected, a visual line detecting means for detecting a visual line, and when the predetermined visual line is detected by the visual line detecting means and the input voice matches the registered voice, the voice input is performed. Determining means for determining whether a command is transmitted based on a voice input to the means.

4. The apparatus according to claim 1, wherein said line-of-sight detecting means has a photographing means for photographing an arbitrary subject, and said line-of-sight detecting means detects a line of sight from an image photographed by said photographing means. Or the instruction recognition device according to 2 or 3.

5. The instruction recognition apparatus according to claim 1, wherein the predetermined line of sight is a line of sight passing through a predetermined surface or space.

6. The method according to claim 1, wherein said determining means transmits a command based on the voice input by the voice input means when the visual line detected by the visual line detecting means stays for a predetermined time. The instruction recognition device according to 2 or 3 or 4 or 5.