JP2012128523A

JP2012128523A - Information processing device and operating method thereof

Info

Publication number: JP2012128523A
Application number: JP2010277321A
Authority: JP
Inventors: Rei Ishikawa; 零石川
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2010-12-13
Filing date: 2010-12-13
Publication date: 2012-07-05

Abstract

PROBLEM TO BE SOLVED: To improve operability in recognizing an instruction from a user.SOLUTION: An information processing device measures the frequency of blinking of a user, identifies the user of which frequency of the blinking satisfies a predetermined criterion as an operator, recognizes an instruction input which is different from blinking and is from the user identified as an operator, and executes processing according to the instruction input.

Description

本発明は、ユーザからの指示を認識する技術に関する。 The present invention relates to a technique for recognizing an instruction from a user.

従来、ユーザのジェスチャ（体の動き）や音声による指示を認識し、認識された指示に応じた動作をする装置が知られている。一般にユーザは装置の利用中、装置への指示を行う他にも体を動かしたり、声を発したりする。そして、認識された指示に応じた動作をするディスプレイのような表示装置があった場合に、以下のような課題が生じる。即ち、ユーザが表示装置を見ながら手を動かして、食事をしたり、声を発して人と会話をしたりすると、装置は、ユーザのジェスチャや声が、装置への指示を意図して行ったものであるか判断できず、ユーザが意図しない動作が実行することがある。 2. Description of the Related Art Conventionally, an apparatus that recognizes a user's gesture (body movement) or a voice instruction and performs an operation according to the recognized instruction is known. In general, while using the device, the user moves his body or speaks in addition to giving instructions to the device. The following problems arise when there is a display device such as a display that operates in accordance with the recognized instruction. That is, when the user moves his hand while looking at the display device, eats or speaks with a person, the device makes the user's gesture or voice intentionally instruct the device. In some cases, it is impossible to determine whether the operation is unsuccessful, and an operation not intended by the user may be performed.

尚、この課題に対する解決策として、従来、ジェスチャ認識や音声認識を行う際に、ボタンを押したりメニューを選択したりする操作をユーザに行わせる技術がある。しかし、ジェスチャ認識は手の動きだけを、音声認識は口で発する音声だけを使って装置の操作が可能であることが利点である。即ち、ボタンの押下やメニューの選択といったそれ以外の操作が加わるとユーザにとって操作が煩雑となり、ジェスチャ認識や音声認識の利点を生かしきれない場合がある。 As a solution to this problem, conventionally, there is a technique for causing a user to perform an operation of pressing a button or selecting a menu when performing gesture recognition or voice recognition. However, it is an advantage that gesture recognition can be used to operate the device using only hand movements, and voice recognition can be performed using only voices emitted from the mouth. That is, when other operations such as button pressing and menu selection are added, the operation becomes complicated for the user, and the advantages of gesture recognition and voice recognition may not be fully utilized.

そこで、この課題を解決する方法として、ユーザの装置への凝視を指示開始の合図と判断し、凝視対象に応じてジェスチャ認識や音声認識を有効にする技術が知られている（特許文献１）。尚、この技術は、ユーザを観察するカメラ等を用いて、ユーザが向いている場所、領域、方向、物を検出し、それに応じて認識を有効とするものである。また、この技術は、表示領域の一部に擬人化イメージを提示し、ユーザが擬人化イメージを凝視した場合にユーザが発した音声の認識を開始するものである。 Therefore, as a method for solving this problem, a technique is known in which gaze on the user's device is determined as an instruction start signal, and gesture recognition and voice recognition are enabled according to the gaze target (Patent Document 1). . This technique detects a location, an area, a direction, and an object that the user is facing by using a camera or the like that observes the user, and makes recognition effective accordingly. In addition, this technique presents an anthropomorphic image in a part of the display area, and starts recognizing a voice uttered by the user when the user stares at the anthropomorphic image.

一方、従来から、目視している対象とまばたきの回数との関係を示す知見が存在する。例えば、まばたきはユーザの心理状況や目視している対象との距離等に応じてその特徴が変化することが知られている。特にテレビやモニタ等表示端末を見ているとき、まばたきの回数が極端に少なくなることが知られている（非特許文献１）。また、処理しなければならない刺激が提示されたり、処理をしている最中であったり、その刺激が提示されることを予期していたりするときにはまばたきが抑制され、その処理を終了した段階で、抑制が解除され、まばたきが多発することが知られている。（非特許文献２）。 On the other hand, conventionally, there is knowledge indicating the relationship between the object being visually observed and the number of blinks. For example, it is known that the characteristics of blinking change according to the psychological state of the user, the distance from the object being viewed, and the like. In particular, it is known that the number of blinks is extremely reduced when viewing a display terminal such as a television or a monitor (Non-Patent Document 1). In addition, when a stimulus that needs to be processed is presented, in the middle of processing, or when the stimulus is expected to be presented, blinking is suppressed, and when the processing is completed It is known that suppression is released and blinking occurs frequently. (Non-patent document 2).

特開平１０−３０１６７５号公報Japanese Patent Laid-Open No. 10-301675

難波哲子，堀田咲子，田淵昭雄，“ＶｉｓｕａｌＤｉｓｐｌａｙＴｅｒｍｉｎａｌ（ＶＤＴ）作業による瞬目回数・涙液量の変化と屈折矯正方法との関連”，川崎医療福祉学会，２００６年Tetsuko Namba, Sakiko Hotta, Akio Tanabe, “Relationship between the number of blinks and tear volume and the refractive correction method by Visual Display Terminal (VDT) work”, Kawasaki Medical Welfare Society, 2006 松尾太加志、“階層メニュー検索時における認知的負荷〜瞬目を指標とした分析〜”，２０００年度日本ディレクトリ学会研究助成報告書，２００２年Takashi Matsuo, “Cognitive load at the time of hierarchical menu search -Analysis using blinks as an index-", 2000 Japan Directory Society research grant report, 2002

しかしながら、特許文献１が示す方法において、ユーザと装置との距離が離れるにつれて、ユーザが凝視している対象の判別が困難となる。判別が困難となる第一の理由は、ユーザと装置との距離が離れるにつれて、ユーザの視線の向きをより高精度で算出することが必要になるからである。更にいえば、距離が離れるにつれて、ある対象を見ているときのユーザの視線の向きと、別の対象を見ているときのユーザの視線の角度の差が小さくなり、それに応じてユーザの目の映像上の差も小さくなるためである。また、判別が困難になる第二の理由は、ユーザと装置との距離が離れるにつれて、装置に取り付けられたカメラに映るユーザの目の映像が小さくなるため、細かい視線の向きの算出が困難になるからである。尚、ユーザの凝視対象を誤って判別した場合、ユーザが擬人化イメージを凝視しているのに認識が有効にならない、あるいは、ユーザが装置に表示されたコンテンツを見ているのに認識が有効になる、といった操作性の低下を招く場合がある。 However, in the method shown in Patent Document 1, it becomes difficult to determine the target that the user is staring at as the distance between the user and the device increases. The first reason that the determination becomes difficult is that the direction of the user's line of sight needs to be calculated with higher accuracy as the distance between the user and the device increases. Furthermore, as the distance increases, the difference between the direction of the user's line of sight when looking at one object and the angle of the user's line of sight when looking at another object decreases, and the user's eyes accordingly. This is because the difference in the image of the image becomes smaller. In addition, the second reason that it becomes difficult to distinguish is that as the distance between the user and the device increases, the image of the user's eyes reflected on the camera attached to the device becomes smaller, making it difficult to calculate the direction of the fine line of sight. Because it becomes. Note that if the user's gaze target is misidentified, recognition is not effective even though the user is staring at the personified image, or recognition is effective when the user is viewing the content displayed on the device. In some cases, the operability is reduced.

本発明は、以上の課題を解決するためのものであり、ユーザからの指示を認識する際の操作性を向上させることを目的とする。 SUMMARY An advantage of some aspects of the invention is to improve operability when recognizing an instruction from a user.

上記の目的を達成するために、本発明に係る情報処理装置は、ユーザの指示入力に応じた処理を実行する情報処理装置であって、ユーザのまばたきの頻度を測定する測定手段と、前記まばたきの頻度が所定の基準を満たすユーザを操作者として特定する特定手段と、まばたきとは異なる指示入力であって、操作者として特定したユーザによる指示入力を認識し、当該指示入力に応じた処理を実行する認識手段とを有する。 In order to achieve the above object, an information processing apparatus according to the present invention is an information processing apparatus that executes processing according to a user's instruction input, and includes a measurement unit that measures a user's blink frequency, and the blink. A means for specifying a user satisfying a predetermined standard as an operator and an instruction input different from blinking, recognizing the instruction input by the user specified as the operator, and performing a process according to the instruction input Recognition means to execute.

本発明によれば、ユーザからの指示を認識する際の操作性を向上させることが可能となる。 According to the present invention, it is possible to improve operability when recognizing an instruction from a user.

本発明に係る情報処理装置の機能構成を示す機能ブロック図。The functional block diagram which shows the function structure of the information processing apparatus which concerns on this invention. ユーザからの指示を認識する処理の流れを示すフローチャート。The flowchart which shows the flow of the process which recognizes the instruction | indication from a user. まばたきを判定する処理の流れ、まばたきの頻度を決定する処理の流れを示すフローチャート。The flowchart which shows the flow of the process which determines the blink, and the flow of the process which determines the frequency of blink. 動画像の各フレームにおけるまばたきの履歴と、視線の履歴、距離の履歴を示す図。The figure which shows the log | history of the blink in each flame | frame of a moving image, the log | history of eyes | visual_axis, and the log | history of distance. 本発明に係る情報処理装置の機能構成を示す機能ブロック図。The functional block diagram which shows the function structure of the information processing apparatus which concerns on this invention. ユーザからの指示を認識する処理の流れを示すフローチャート。The flowchart which shows the flow of the process which recognizes the instruction | indication from a user.

以下、本発明を実施するための形態について、図面を参照しながら説明する。尚、まばたきとは、まぶたを閉じたあと開けることを示す。 Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings. Note that blinking means opening after closing the eyelid.

（第１の実施形態）
図１（ａ）は、本発明に係る情報処理装置１０１の機能構成を示す機能ブロック図である。尚、情報処理装置１０１には出力部１０２といった動画像を表示するデバイスが接続されている。また、出力部１０２は、スピーカを備え、音声を出力してもよい。入力部１０３は、カメラ等によって構成される。また、入力部１０３は動画像を撮影し、撮影した動画像の各フレームを動作データ１０４に格納する。まばたき測定部１０５は、ＣＰＵ、ＲＯＭ、ＲＡＭ（以下、ＣＰＵ等）によって構成され、ＣＰＵがＲＯＭに格納されたプログラムをＲＡＭに展開し実行することでその機能を実現する。また、まばたき測定部１０５は動作データ１０４に格納された動画像の１フレームに映っているユーザを推定し、ユーザがまばたきをしているかを測定し、結果をまばたき履歴１０６に出力する。まばたき測定部１０５における処理は図３（ａ）に示す通りであり、詳細は後述する。認識対象特定部１０７は、ＣＰＵ等によって構成される。また、認識対象特定部１０７はまばたき履歴１０６、視線履歴１１０、距離履歴１１２からユーザを認識対象として特定し、特定したユーザを認識部１０８に指示する。認識部１０８は、ＣＰＵ等によって構成される。また、認識部１０８は動作データ１０４に格納された動画像の１フレームからユーザのジェスチャによる装置への指示を認識し、その指示に従って出力部１０２を動作させる。また、認識部１０８は、認識対象特定部１０７から指示された特定のユーザのジェスチャ認識を有効にする。視線測定部１０９は、ＣＰＵ等で構成される。また、視線測定部１０９は動作データ１０４に格納された動画像の１フレームに映っているユーザを推定し、ユーザの視線の向きを測定し、結果を視線履歴１１０に出力する。距離測定部１１１は、ＣＰＵ等によって構成される。また、距離測定部１１１は動作データ１０４に格納された動画像の１フレームに映っているユーザを推定し、ユーザと装置との距離を測定し、結果を距離履歴１１２に出力する。 (First embodiment)
FIG. 1A is a functional block diagram showing a functional configuration of the information processing apparatus 101 according to the present invention. Note that a device for displaying a moving image such as the output unit 102 is connected to the information processing apparatus 101. The output unit 102 may include a speaker and output sound. The input unit 103 is configured by a camera or the like. The input unit 103 captures a moving image and stores each frame of the captured moving image in the operation data 104. The blink measurement unit 105 includes a CPU, a ROM, and a RAM (hereinafter referred to as a CPU or the like), and the CPU realizes the function by developing and executing a program stored in the ROM on the RAM. Also, the blink measurement unit 105 estimates a user shown in one frame of a moving image stored in the motion data 104, measures whether the user is blinking, and outputs the result to the blink history 106. The processing in the blink measuring unit 105 is as shown in FIG. 3A, and details will be described later. The recognition target specifying unit 107 is configured by a CPU or the like. Further, the recognition target specifying unit 107 specifies a user as a recognition target from the blink history 106, the line-of-sight history 110, and the distance history 112, and instructs the recognition unit 108 of the specified user. The recognition unit 108 is configured by a CPU or the like. The recognizing unit 108 recognizes an instruction to the apparatus by a user's gesture from one frame of the moving image stored in the operation data 104, and operates the output unit 102 according to the instruction. In addition, the recognition unit 108 enables gesture recognition of a specific user instructed from the recognition target specifying unit 107. The line-of-sight measurement unit 109 is configured by a CPU or the like. Further, the line-of-sight measurement unit 109 estimates a user appearing in one frame of the moving image stored in the motion data 104, measures the direction of the user's line of sight, and outputs the result to the line-of-sight history 110. The distance measuring unit 111 is configured by a CPU or the like. In addition, the distance measuring unit 111 estimates a user shown in one frame of a moving image stored in the operation data 104, measures the distance between the user and the device, and outputs the result to the distance history 112.

図２は、ユーザからの指示を認識する処理の流れを示すフローチャートである。本フローチャートに示す処理は、入力部１０３が動画像の１フレームを撮影するたびに実行される。まず、ステップＳ２０１で、入力部１０３は撮影した動画像の１フレームを動作データ１０４に格納する。次に、ステップＳ２０２で、視線測定部１０９は動作データ１０４に格納された動画像のうち、最新の１フレームを表す画像に映っているユーザの名前を推定する。さらにユーザの視線が入力部１０３に向けられているか判定し、その結果とユーザの名前、現在時刻を視線履歴１１０に格納する。ステップＳ２０３で、距離測定部１１１は動作データ１０４に格納された動画像のうち、最新の１フレームを表す画像に映っているユーザの名前を推定する。尚、本処理は、ユーザの名前を推定する処理にかえて、他のユーザと識別可能となるよう各ユーザを特定する処理としてもよい。さらにステップＳ２０３において、ユーザと装置との距離を測定し、その結果と動作データ１０４に格納されたユーザの名前、現在時刻を距離履歴１１２に格納する。ステップＳ２０４で、まばたき測定部１０５は動作データ１０４に格納された動画像のうち、最新の１フレームを表す画像に映っているユーザの名前を推定する。さらにユーザがまばたきをしているかを判定し、その結果とユーザの名前、現在時刻をまばたき履歴１０６に格納する。ステップＳ２０５で、認識対象特定部１０７はまばたき履歴１０６よりまばたきの頻度を決定する。 FIG. 2 is a flowchart showing a flow of processing for recognizing an instruction from the user. The processing shown in this flowchart is executed every time the input unit 103 captures one frame of a moving image. First, in step S 201, the input unit 103 stores one frame of a captured moving image in the operation data 104. Next, in step S 202, the line-of-sight measurement unit 109 estimates the name of the user shown in the image representing the latest one frame among the moving images stored in the motion data 104. Further, it is determined whether the user's line of sight is directed to the input unit 103, and the result, the user's name, and the current time are stored in the line-of-sight history 110. In step S 203, the distance measurement unit 111 estimates the name of the user shown in the image representing the latest one frame among the moving images stored in the motion data 104. In addition, this process is good also as a process which specifies each user so that it can identify with another user instead of the process which estimates a user's name. In step S 203, the distance between the user and the apparatus is measured, and the result, the user name and the current time stored in the operation data 104 are stored in the distance history 112. In step S 204, the blink measuring unit 105 estimates the name of the user shown in the image representing the latest one frame among the moving images stored in the motion data 104. Further, it is determined whether the user is blinking, and the result, the user's name, and the current time are stored in the blink history 106. In step S 205, the recognition target specifying unit 107 determines the blink frequency from the blink history 106.

ステップＳ２０６で、認識対象特定部１０７は距離履歴１１２より、所定期間内のユーザと装置との距離の平均が予め定めておいた閾値より小さいか判定する。距離の平均が閾値より小さい場合、視線の向きのみからユーザの凝視を正確に判定できると判断し、ステップＳ２１０に進む。閾値より大きい場合、視線の向きとまばたきの頻度を用いないと凝視を正確に判定できないと判断し、ステップＳ２０７に進む。ステップＳ２０７で、認識対象特定部１０７はステップＳ２０５で決定したまばたきの頻度が予め定めておいた閾値より大きいか判定する。また視線履歴１１０より、所定期間内に視線が入力部１０３に向けられた期間が予め定めておいた閾値より大きいか判定する。まばたきの頻度が閾値より大きく、かつ視線が入力部１０３に向けられた期間が閾値より大きい場合は、ユーザが入力部１０３を凝視していると判断し、ステップＳ２０８に進む。まばたきの頻度が閾値より小さい、または視線が入力部１０３に向けられた期間が閾値より小さい場合は、ユーザが入力部１０３を凝視していないと判断し、一連の処理を終了する。即ち、ステップＳ２０７において、認識対象特定部１０７は、まばたきの頻度が所定の基準を満たすか否かを判定する。ステップＳ２０８で、認識対象特定部１０７はステップＳ２０７で入力部１０３を凝視していると判断したユーザの名前を認識部１０８に通知する。即ち、ステップＳ２０８において、認識対象特定部１０７は、操作者を特定する処理を実行する。ステップＳ２０９で、認識部１０８は認識対象特定部１０７より通知されたユーザのジェスチャ認識を有効にし、一連の処理を終了する。ステップＳ２１０で、認識対象特定部１０７は視線履歴１１０より、所定期間内に視線が入力部１０３に向けられた期間が閾値より大きいか判定する。視線が入力部に向けられた期間が閾値より大きい場合は、ユーザが入力部１０３を凝視していると判断し、ステップＳ２０８に進む。閾値より小さい場合は、ユーザが入力部１０３を凝視していないと判断し、一連の処理を終了する。 In step S206, the recognition target specifying unit 107 determines from the distance history 112 whether the average distance between the user and the device within a predetermined period is smaller than a predetermined threshold. If the average distance is smaller than the threshold, it is determined that the user's gaze can be accurately determined only from the direction of the line of sight, and the process proceeds to step S210. If it is larger than the threshold value, it is determined that the gaze cannot be accurately determined unless the gaze direction and the blink frequency are used, and the process proceeds to step S207. In step S207, the recognition target specifying unit 107 determines whether the blink frequency determined in step S205 is greater than a predetermined threshold. Further, it is determined from the line-of-sight history 110 whether the period during which the line of sight is directed to the input unit 103 within a predetermined period is greater than a predetermined threshold. If the blink frequency is greater than the threshold and the period when the line of sight is directed to the input unit 103 is greater than the threshold, it is determined that the user is staring at the input unit 103, and the process proceeds to step S208. When the blinking frequency is smaller than the threshold value or the period when the line of sight is directed to the input unit 103 is smaller than the threshold value, it is determined that the user is not staring at the input unit 103, and the series of processing ends. That is, in step S207, the recognition target specifying unit 107 determines whether or not the blinking frequency satisfies a predetermined criterion. In step S208, the recognition target specifying unit 107 notifies the recognition unit 108 of the name of the user who has determined that the input unit 103 is stared in step S207. That is, in step S208, the recognition target specifying unit 107 executes processing for specifying the operator. In step S209, the recognizing unit 108 enables the user's gesture recognition notified from the recognition target specifying unit 107, and ends the series of processes. In step S210, the recognition target specifying unit 107 determines, based on the line-of-sight history 110, whether the period during which the line of sight is directed to the input unit 103 within a predetermined period is greater than the threshold. If the period when the line of sight is directed to the input unit is greater than the threshold, it is determined that the user is staring at the input unit 103, and the process proceeds to step S208. If it is smaller than the threshold value, it is determined that the user is not staring at the input unit 103, and the series of processing ends.

図３（ａ）は、ステップＳ２０４において実行される処理であって、ユーザがまばたきをしているかを判定する処理の流れを説明するフローチャートである。尚、ステップＳ２０４では、動作データ１０４に格納された動画像のうち、最新の１フレームを表す画像に映っているユーザの人数分だけ、このフローチャートを実行する。 FIG. 3A is a flowchart for explaining the flow of the process executed in step S204 to determine whether the user is blinking. In step S204, this flowchart is executed for the number of users shown in the image representing the latest one frame among the moving images stored in the operation data 104.

まず、ステップＳ３０１で、まばたき測定部１０５は、最新の１フレームを表す画像に映ったユーザの目が開いているか判定する。開いている場合はステップＳ３０２に進む。閉じている場合はステップＳ３０３に進む。ステップＳ３０２で、まばたき測定部１０５は、一つ前の１フレームを表す画像に映ったユーザの目が閉じているか判定する。閉じている場合はステップＳ３０４に進む。開いている場合はステップＳ３０３に進む。ステップＳ３０３で、まばたき測定部１０５は、まばたき履歴１０６に、最新の１フレームが撮影された時刻においてユーザがまばたきしていないと設定し、一連の処理を終了する。一方、ステップＳ３０４で、まばたき測定部１０５は、まばたき履歴１０６に、最新の１フレームが撮影された時刻においてユーザがまばたきしたと設定し、一連の処理を終了する。 First, in step S301, the blink measuring unit 105 determines whether the user's eyes shown in the image representing the latest one frame are open. If it is open, the process proceeds to step S302. If it is closed, the process proceeds to step S303. In step S302, the blink measuring unit 105 determines whether the user's eyes shown in the image representing the previous frame are closed. If it is closed, the process proceeds to step S304. If it is open, the process proceeds to step S303. In step S303, the blink measuring unit 105 sets, in the blink history 106, that the user has not blinked at the time when the latest one frame was shot, and ends the series of processes. On the other hand, in step S304, the blink measuring unit 105 sets, in the blink history 106, that the user has blinked at the time when the latest one frame was shot, and ends the series of processes.

図３（ｂ）は、ステップＳ２０５において実行される処理であって、ユーザのまばたきの頻度を決定する処理の流れを説明するフローチャートである。尚、ステップＳ２０５では、動作データ１０４に格納された動画像のうち、最新の１フレームを表す画像に映っているユーザの人数分だけ、このフローチャートを実行する。 FIG. 3B is a flowchart illustrating the flow of processing that is executed in step S205 and that determines the frequency of user blinking. In step S205, this flowchart is executed for the number of users shown in the image representing the latest frame among the moving images stored in the operation data 104.

まず、ステップＳ３１１で、認識対象特定部１０７は、まばたき頻度の計測期間を設定する。ステップＳ３１２で、認識対象特定部１０７は、まばたき履歴１０６を参照し、現在時刻から設定された計測期間だけ過去に遡った時刻までにユーザが行ったまばたきの回数を取得する。ステップＳ３１３で、認識対象特定部１０７は、まばたきの頻度を下記の数式を用いて算出し、一連の処理を終了する。
（まばたきの頻度）＝（ステップＳ３１２で取得したまばたきの回数）／（ステップＳ３１１で決定した計測期間） First, in step S311, the recognition target specifying unit 107 sets a measurement period of the blink frequency. In step S 312, the recognition target specifying unit 107 refers to the blink history 106 and acquires the number of blinks performed by the user up to a time traced back in the past by the measurement period set from the current time. In step S313, the recognition target specifying unit 107 calculates the blink frequency using the following mathematical formula, and ends a series of processes.
(Blink frequency) = (Number of blinks acquired in step S312) / (Measurement period determined in step S311)

図４（ａ）は動作データに格納されたユーザを撮影した動画像の各フレームとまばたき履歴に格納されたデータとの関係を示す図である。例えば、２００９年１２月１７日２０時５９分４０．２秒において、最新のフレームにおいてユーザＴａｒｏの目が閉じており、かつ１つ前のフレームである２００９年１２月１７日２０時５９分４０．１秒のフレームにおいてユーザＴａｒｏの目が開いている。そのため、まばたき履歴にはユーザＴａｒｏがまばたきしたと格納されている。 FIG. 4A is a diagram illustrating a relationship between each frame of a moving image obtained by photographing the user stored in the operation data and data stored in the blink history. For example, at 20: 59: 40.2 seconds on December 17, 2009, the eyes of the user Taro are closed in the latest frame, and 20:59:40 on December 17, 2009, which is the previous frame. The user Taro's eyes are open in a 1 second frame. Therefore, it is stored in the blink history that the user Taro has blinked.

図４（ｂ）はまばたき履歴１０６、視線履歴１１０、距離履歴１１２を示す図である。図２のステップＳ２０６で用いる距離の閾値が２００ｃｍ、ステップＳ２０７で用いるまばたき頻度の閾値が毎秒０．４回、視線の向きの閾値が過去５秒間で４秒間、ステップＳ３１１で用いるまばたきの測定期間が５秒間であるとする。このとき、２００９年１２月１７日２０時５９分４５．０秒の時点において、距離履歴１１２により、４０１が示す直前の５秒間における装置とのユーザＴａｒｏとの距離の平均が２００ｃｍ以上である。そのため、ステップＳ２０６で閾値より小さくないと判定し、ステップＳ２０７に進む。ステップＳ２０７で、４０１が示す直前の５秒間において、ユーザＴａｒｏのまばたきの頻度が０．４回／秒と閾値以上であり、また視線の向きも５秒間で４．２秒と閾値以上であるため、ステップＳ２０８に進む。ステップＳ２０９で、認識部はユーザＴａｒｏがジェスチャ認識の対象（操作者）であると判断し、認識を有効とする。尚、認識が有効になると、情報処理装置１０１は、まばたきとは異なる指示入力であって、操作者として特定したユーザによる指示入力を認識し、当該指示入力に応じた処理を実行する。 FIG. 4B is a diagram showing the blink history 106, the line-of-sight history 110, and the distance history 112. The distance threshold used in step S206 in FIG. 2 is 200 cm, the blink frequency threshold used in step S207 is 0.4 times per second, the eye direction threshold is 4 seconds in the past 5 seconds, and the blink measurement period used in step S311 is Suppose that it is 5 seconds. At this time, at 20: 59: 45.0 seconds on December 17, 2009, the distance history 112 indicates that the average distance between the device and the user Taro in the last 5 seconds indicated by 401 is 200 cm or more. Therefore, it determines with it not being smaller than a threshold value by step S206, and progresses to step S207. In step S207, the blinking frequency of the user Taro is 0.4 times / second, which is equal to or greater than the threshold value, and the line-of-sight direction is 4.2 seconds, which is equal to or greater than the threshold value, in 5 seconds. The process proceeds to step S208. In step S209, the recognition unit determines that the user Taro is a gesture recognition target (operator) and validates the recognition. When the recognition is valid, the information processing apparatus 101 recognizes the instruction input by the user specified as the operator, which is an instruction input different from the blinking, and executes processing according to the instruction input.

以上説明したように、本実施形態によれば、まばたきの頻度を用いて、ユーザと装置との距離が離れている場合でも、ユーザが入力部を凝視していることを判別することができる。さらに、本実施形態によれば、まばたきの回数と視線の向きの両方を用いて、ユーザと装置との距離に関わらずユーザが入力部を凝視していることを判別している。これはユーザが入力部１０３と出力部１０２以外の表示端末以外の対象を凝視した場合、入力部１０３を凝視したときとまばたきの頻度が変わらないため、ジェスチャ認識が有効になってしまうという課題を解決する。さらに、本実施形態によれば、ユーザと装置との距離が近い場合は、まばたきの頻度を測定せず、視線の向きのみを測定して凝視の判別を行い、遠い場合は視線の向きとまばたきの頻度を測定して凝視の判別を行う。まばたきの頻度の測定には所定期間内のまばたきの回数を測定する必要があるため、測定に一定の時間がかかる。視線の向きのみで凝視を判別する場合、装置の応答性が向上する。 As described above, according to the present embodiment, it is possible to determine that the user is staring at the input unit even when the distance between the user and the apparatus is long, using the frequency of blinking. Furthermore, according to this embodiment, it is determined that the user is staring at the input unit regardless of the distance between the user and the apparatus, using both the number of blinks and the direction of the line of sight. This is because when the user gazes at an object other than the display unit other than the input unit 103 and the output unit 102, the frequency of blinking does not change from when the user gazes at the input unit 103, so that the gesture recognition becomes effective. Resolve. Furthermore, according to the present embodiment, when the distance between the user and the apparatus is short, the blink frequency is not measured, only the direction of the line of sight is measured to determine gaze, and when the distance is far, the direction of the line of sight is blinked. The gaze is determined by measuring the frequency. Since it is necessary to measure the number of blinks within a predetermined period in order to measure the blink frequency, the measurement takes a certain amount of time. When gaze is determined only by the direction of the line of sight, the responsiveness of the device is improved.

尚、本実施形態において、操作者が複数人検知された場合、情報処理装置１０１は、当該複数人によるジェスチャ等の指示入力に応じた処理を実行してもよい。また、操作者が複数人検知された場合、情報処理装置１０１は、操作者が複数人いる旨を、ユーザに報知するための画像や音声を出力部１０２に出力させてもよい。また、操作者が複数人検知された場合、情報処理装置１０１は、図２に示す一連の処理を終了してもよい。 In the present embodiment, when a plurality of operators are detected, the information processing apparatus 101 may execute processing according to an instruction input such as a gesture by the plurality of persons. Further, when a plurality of operators are detected, the information processing apparatus 101 may cause the output unit 102 to output an image or sound for notifying the user that there are a plurality of operators. When a plurality of operators are detected, the information processing apparatus 101 may end the series of processes illustrated in FIG.

（第２の実施形態）
第１の実施形態では、あるユーザのジェスチャ認識を有効にしたあと、無効にする方法について述べていない。ジェスチャ認識を無効にするタイミングは、ジェスチャ認識を有効にするタイミングと同じく重要な課題である。ユーザが装置への指示を行う意図があるにも関わらず認識を無効にしてしまうと、ユーザは再度認識が有効になるよう装置へ指示しなければならず、操作感が低下する。一方、ユーザが装置の指示を行う意図がないにも関わらず認識を有効のままにしてしまうと、ユーザの装置への指示以外を意図して行ったジェスチャを誤認識してしまい、操作感が低下する。そこで本実施形態では、まばたきの頻度の変化に応じてジェスチャ認識を無効にするタイミングを決定する。特に、まばたきの頻度が増えたタイミングで認識を終了する。ユーザによるジェスチャを用いた装置の操作を１つの処理ととらえると、装置の操作をしている間、ユーザのまばたきは抑制され、装置の操作を終了した段階で、抑制が解除されまばたきが多発すると考えられる。そのためまばたきの頻度が増えたタイミングが、ユーザが装置の操作を終了したタイミングであると見なすことができる。さらに本実施形態では、認識を無効とするときのまばたきの頻度の閾値として、認識を有効とした直後に発生したまばたきの頻度を用いている。一般にまばたきの頻度は、装置とユーザとの距離やユーザの周囲の明るさといった周辺環境、またユーザの年齢や体調といった身体特徴によって変動するため、常に特定の値を利用することは難しい。一方、認識を有効とした直後は、抑制が解除されまばたきが多発すると考えられる。そのため、このときのまばたきの頻度を、ユーザの操作が終了したタイミングの判定に用いることができる。 (Second Embodiment)
The first embodiment does not describe a method of disabling gesture recognition after enabling a certain user. The timing for disabling gesture recognition is as important as the timing for enabling gesture recognition. If the user invalidates the recognition even though the user intends to give an instruction to the apparatus, the user has to instruct the apparatus to make the recognition valid again, and the operational feeling is lowered. On the other hand, if the recognition is left valid even though the user does not intend to give an instruction to the device, a gesture made by intention other than the user's instruction to the device will be erroneously recognized, and the operational feeling will be lost. descend. Therefore, in the present embodiment, the timing for invalidating the gesture recognition is determined according to the change in the blinking frequency. In particular, the recognition ends at the timing when the blinking frequency increases. If the operation of the device using the gesture by the user is regarded as one process, the blinking of the user is suppressed while operating the device, and when the suppression is released at the stage where the operation of the device is completed, Conceivable. Therefore, the timing at which the blinking frequency is increased can be regarded as the timing at which the user finishes the operation of the apparatus. Furthermore, in the present embodiment, the frequency of blinking that occurs immediately after the recognition is validated is used as the threshold value of the blinking frequency when the recognition is invalidated. In general, the frequency of blinking varies depending on the surrounding environment such as the distance between the device and the user and the brightness around the user, and physical characteristics such as the user's age and physical condition, and thus it is difficult to always use a specific value. On the other hand, immediately after the recognition is made effective, it is considered that the flapping frequently occurs when the suppression is released. Therefore, the frequency of blinking at this time can be used to determine the timing when the user's operation is completed.

図５は本実施形態に係る情報処理装置５００の機能構成を示す機能ブロック図である。図１と同様の要素については、同符号を付しその説明を省略する。認識終了判断部５０１は、ＣＰＵ等によって構成される。また、認識終了判断部５０１は認識部１０８が認識を有効にしている場合、まばたき履歴１０６と認識終了閾値５０２から認識を無効にすべきか判断し、無効にすべきと判断した場合は認識部１０８へ認識の無効を通知する。具体的には、ユーザのまばたきの頻度が認識終了閾値５０２より大きい場合、ユーザの認識を無効とするよう認識部１０８に指示する。認識対象特定部１０７は、ユーザの認識を有効としたことを出力部１０２が出力してから所定期間後、まばたき履歴１０６より閾値として用いるまばたきの頻度を計測し、認識終了閾値５０２に設定する。所定期間の経過を待つ理由は、出力部１０２がユーザの認識を有効としたことを出力してから、ユーザが反応してまばたきを多発させるまでに若干の期間が必要だからである。 FIG. 5 is a functional block diagram showing a functional configuration of the information processing apparatus 500 according to the present embodiment. The same elements as those in FIG. 1 are denoted by the same reference numerals, and the description thereof is omitted. The recognition end determination unit 501 is configured by a CPU or the like. The recognition end determination unit 501 determines whether the recognition should be invalidated based on the blink history 106 and the recognition end threshold 502 when the recognition unit 108 enables the recognition. When the recognition end unit 501 determines that the recognition should be invalidated, the recognition unit 108. Notify that the recognition is invalid. Specifically, if the user's blink frequency is greater than the recognition end threshold 502, the recognition unit 108 is instructed to invalidate the user's recognition. The recognition target specifying unit 107 measures the blink frequency used as a threshold from the blink history 106 after a predetermined period after the output unit 102 outputs that the user's recognition is valid, and sets it as the recognition end threshold 502. The reason for waiting for the elapse of the predetermined period is that a certain period is required from when the output unit 102 outputs that the user's recognition is validated until the user reacts and frequently blinks.

図６は本実施形態において、ユーザからの指示を認識する処理の流れを示すフローチャートである。尚、図２と同様の処理については、同符号を付し、その説明を省略する。ステップＳ６０１で、認識終了判断部５０１は認識部１０８が認識を有効にしているか判定する。有効にしている場合はステップＳ６０２へ進む。有効にしていない場合はステップＳ２０６へ進む。ステップＳ６０２で、認識対象特定部１０７は、認識部１０８が認識を有効にしてからちょうど所定期間が経過したか判定する。経過した場合はステップＳ６０３へ進む。所定期間経過していない、または所定期間を超えた期間が経過した場合はステップＳ６０４へ進む。ステップＳ６０３で、認識対象特定部１０７は、下記に示した数式を用いて認識終了閾値５０２を算出し、一連の処理を終了する。
（認識終了閾値５０２）＝（直前の所定期間内にユーザが行ったまばたきの回数）／（直前の所定期間） FIG. 6 is a flowchart showing the flow of processing for recognizing an instruction from the user in this embodiment. In addition, about the process similar to FIG. 2, the same code | symbol is attached | subjected and the description is abbreviate | omitted. In step S601, the recognition end determination unit 501 determines whether the recognition unit 108 has enabled recognition. If it is valid, the process proceeds to step S602. If not valid, the process proceeds to step S206. In step S602, the recognition target specifying unit 107 determines whether a predetermined period has elapsed since the recognition unit 108 enabled recognition. If it has elapsed, the process proceeds to step S603. If the predetermined period has not elapsed or the period exceeding the predetermined period has elapsed, the process proceeds to step S604. In step S603, the recognition target specifying unit 107 calculates the recognition end threshold value 502 using the following formula, and ends the series of processes.
(Recognition end threshold 502) = (number of blinks performed by the user within the immediately preceding predetermined period) / (predetermined immediately preceding period)

ステップＳ６０４で、認識終了判断部５０１は認識を有効にしてから所定期間経過したかを判定する。経過した場合ステップＳ６０５へ進む。経過していない場合は一連の処理を終了する。本ステップで所定期間以上の経過を判定する理由は、認識を有効にした直後に頻発するユーザのまばたきを、認識終了のタイミングと誤判定しないためである。ユーザのまばたきが頻発する理由は、入力部１０３を凝視して装置に認識を開始してもらうユーザの処理が終了し、ユーザのまばたきの抑制が解除されるためである。ステップＳ６０５で、認識終了判断部５０１は認識部１０８より、認識中のユーザ名を取得する（ユーザと特定する）。ステップＳ６０６で、認識終了判断部５０１はステップＳ６０５で取得したユーザ名と関連したまばたき履歴１０６より、直前の所定期間内のまばたきの頻度が、認識終了閾値５０２より大きいか判定する。閾値より大きい場合、ユーザがジェスチャ操作を終了したと判定し、ステップＳ６０７に進む。閾値より小さい場合、ジェスチャ操作中であると判定し、一連の処理を終了する。ステップＳ６０７で、認識終了判断部５０１はステップＳ６０６でジェスチャ操作を終了したと判定したユーザの名前を認識部１０８に通知する。ステップＳ６０８で、認識部１０８は認識終了判断部５０１より通知されたユーザのジェスチャ認識を無効にし、一連の処理を終了する。 In step S604, the recognition end determination unit 501 determines whether a predetermined period has elapsed since the recognition was validated. If it has elapsed, the process proceeds to step S605. If it has not elapsed, the series of processing ends. The reason for determining whether a predetermined period or more has elapsed in this step is that the user's blink that occurs frequently immediately after the recognition is validated is not erroneously determined as the recognition end timing. The reason why the user blinks frequently is that the user's process of staring at the input unit 103 and starting recognition by the apparatus is terminated, and the suppression of the user's blinking is released. In step S605, the recognition end determination unit 501 acquires the user name being recognized from the recognition unit 108 (identifies as a user). In step S 606, the recognition end determination unit 501 determines whether the blink frequency within the predetermined period immediately before is greater than the recognition end threshold 502 based on the blink history 106 associated with the user name acquired in step S 605. When it is larger than the threshold, it is determined that the user has finished the gesture operation, and the process proceeds to step S607. If it is smaller than the threshold value, it is determined that a gesture operation is being performed, and a series of processing ends. In step S607, the recognition end determination unit 501 notifies the recognition unit 108 of the name of the user who has determined that the gesture operation has ended in step S606. In step S608, the recognition unit 108 invalidates the user's gesture recognition notified from the recognition end determination unit 501, and ends the series of processes.

図４（ｂ）が示すまばたき履歴１０６、視線履歴１１０、距離履歴１１２を用いて、本実施形態において認識対象特定部１０７が認識終了閾値５０２を設定する処理の流れを説明する。尚、ステップＳ６０２とステップＳ６０３で用いる一定の期間を５秒間であるとする。尚、２００９年１２月１７日２０時５９分４５．０秒の時点において、ユーザＴａｒｏの認識が有効になっているとする。２００９年１２月１７日２０時５９分５０．０秒の時点において、ステップＳ６０１において認識が有効であるためステップＳ６０２へ進む。ステップＳ６０２において認識有効後ちょうど５秒間が経過しているためステップＳ６０３へ進む。ステップＳ６０３において、４０２が示す直前の５秒間におけるまばたきの回数が３回であるため、認識終了閾値５０２に閾値０．６回／秒を設定する。 Using the blink history 106, the line-of-sight history 110, and the distance history 112 shown in FIG. 4B, the flow of processing in which the recognition target specifying unit 107 sets the recognition end threshold value 502 in the present embodiment will be described. Note that the fixed period used in step S602 and step S603 is 5 seconds. It is assumed that the recognition of the user Taro is valid at 20: 59: 45.0 seconds on December 17, 2009. At 20: 59: 50.0 seconds on December 17, 2009, since the recognition is valid in step S601, the process proceeds to step S602. In step S602, since exactly 5 seconds have elapsed after the recognition is valid, the process proceeds to step S603. In step S603, since the number of blinks in the last 5 seconds indicated by 402 is 3, the threshold value 0.6 times / second is set as the recognition end threshold value 502.

次に、図４（ｂ）が示すまばたき履歴１０６、視線履歴１１０、距離履歴１１２を用いて、本実施形態において認識終了判断部５０１が認識終了閾値５０２の閾値を用いてユーザの認識を終了すると判断する処理の流れを説明する。尚、ステップＳ６０４とステップＳ６０５で用いる一定の期間を５秒間であるとする。尚、２００９年１２月１７日２０時５９分４５．０秒の時点において、ユーザＴａｒｏの認識が有効になっており、２００９年１２月１７日２０時５９分５０．０秒の時点において、ユーザＴａｒｏの認識終了閾値５０２が０．６回／秒と設定されているとする。２００９年１２月１７日２１時００分２５．０の秒時点において、ステップＳ６０１において認識が有効であるためステップＳ６０２へ進む。ステップＳ６０２において認識有効後５秒間以上が経過しているためステップＳ６０４へ進む。ステップＳ６０４において認識有効後５秒間が計画しているためステップＳ６０５へ進む。ステップＳ６０５で認識中のユーザ名Ｔａｒｏを取得する。ステップＳ６０６で、４０３が示す直前の５秒間のまばたきの頻度が、認識終了閾値５０２に格納された０．６回／秒以上であるため、ステップＳ６０７へ進む。ステップＳ６０７で認識部１０８へユーザ名Ｔａｒｏを通知すると、ステップＳ１６０５で認識部１０８はジェスチャ認識を無効にする。 Next, using the blink history 106, the line-of-sight history 110, and the distance history 112 shown in FIG. 4B, in the present embodiment, when the recognition end determination unit 501 uses the threshold value of the recognition end threshold value 502 to end user recognition. The flow of processing for determination will be described. Note that the fixed period used in step S604 and step S605 is 5 seconds. Note that the recognition of the user Taro is effective at 20: 59: 45.0 seconds on December 17, 2009, and the user is recognized at 20: 59: 50.0 seconds on December 17, 2009. Assume that the Taro recognition end threshold 502 is set to 0.6 times / second. Since the recognition is valid in step S601 at the time of December 17, 2009, 21: 00: 25.0, the process proceeds to step S602. In step S602, since 5 seconds or more have elapsed after the recognition is valid, the process proceeds to step S604. In step S604, since 5 seconds after the recognition is valid is planned, the process proceeds to step S605. In step S605, the user name Taro being recognized is acquired. In step S606, since the blinking frequency for 5 seconds immediately before indicated by 403 is equal to or greater than 0.6 times / second stored in the recognition end threshold value 502, the process proceeds to step S607. If the user name Taro is notified to the recognition unit 108 in step S607, the recognition unit 108 invalidates gesture recognition in step S1605.

以上説明したように、本実施形態によれば、ユーザのまばたきの頻度を利用することで、ユーザが装置の操作を終えたタイミングでユーザの認識を無効とすることができ、操作性を低下させない。さらに、認識を無効とするときにユーザの周辺環境や身体特徴といった要因を加味した適切なまばたきの頻度の閾値を用いることで、ユーザが装置の操作を終えたタイミングをより正確に判定し、操作性を低下させない。尚、認識対象特定部１０７は、ステップＳ２０７で入力部１０３への凝視を判別するために用いるまばたきの頻度の閾値を、ユーザと表示装置との距離に応じて変更することも可能である。例えば、図６のステップＳ２０７で、距離履歴１１２を参照し、ユーザと表示装置との距離が１００ｃｍより小さい場合は、ユーザと表示装置との距離が近いため、ユーザのまばたきの頻度が通常時より少ないと想定する。そのため、まばたきの頻度が直前の５秒間に１回以上発生している場合にステップＳ２０８へ進む。一方、ユーザと表示装置との距離が１００ｃｍより大きい場合は、まばたきの頻度が直前の５秒間に２回以上発生している場合にステップＳ２０８へ進む。また、認識対象特定部１０７は、まばたきの頻度を計算する対象の期間を、ユーザと装置との距離に応じて変更することも可能である。具体的には、ユーザと装置との距離が近く、測定した視線の向きの精度が高い場合は期間を短めに取ることでユーザへの反応を早くする。一方、距離が遠く、測定した視線の向きの精度が低い場合はまばたきの頻度を正確に測定するため期間を長めに取る。また、認識対象特定部１０７は、認識部１０８がユーザの認識を有効にしたら、ユーザのジェスチャを正確に認識するため、撮影する動画像の中央に認識対象のユーザを映すよう入力部１０３に指示することも可能である。例えば、認識対象特定部１０７は動作データ１０４よりユーザの位置を測定する位置測定部が測定したユーザの位置をもとに、入力部１０３への指示をおこなうと、入力部１０３はユーザの正面を向くよう動作する。また、認識終了判断部５０１は、認識終了閾値５０２に加え、表示装置とユーザとの距離、および照度を測定する照度測定部が測定したユーザの周囲の明るさを元にしてユーザの認識を無効にするか判定することも可能である。表示装置とユーザとの距離が縮まると、まばたきの頻度が少なくなることが知られている。また周囲の明るさが変化した場合、まばたきの頻度が一時的に増えることが知られている。距離や明るさも用いて判定することで、ユーザの周辺環境を加味した適切なまばたきの頻度の閾値を求めることが可能であり、それによりユーザが装置の操作を終えたタイミングをより正確に判定し、操作性を低下させない。具体的には、以下に示す条件式を用いて、図６のステップＳ６０６で、認識終了判断部５０１がユーザの認識を無効とするか判定する。 As described above, according to the present embodiment, by using the user's blink frequency, the user's recognition can be invalidated at the timing when the user finishes the operation of the apparatus, and the operability is not deteriorated. . Furthermore, when the recognition is invalidated, the timing at which the user finishes the operation of the device can be determined more accurately by using an appropriate blink frequency threshold that takes into account factors such as the user's surrounding environment and body characteristics. Does not decrease the sex. Note that the recognition target specifying unit 107 can also change the threshold value of the blink frequency used for determining the gaze on the input unit 103 in step S207 according to the distance between the user and the display device. For example, referring to the distance history 112 in step S207 of FIG. 6, when the distance between the user and the display device is smaller than 100 cm, the user's blinking frequency is higher than normal because the distance between the user and the display device is short. Assumes few. Therefore, if the blinking frequency occurs once or more in the immediately preceding 5 seconds, the process proceeds to step S208. On the other hand, if the distance between the user and the display device is greater than 100 cm, the process proceeds to step S208 if the blinking frequency occurs twice or more in the immediately preceding 5 seconds. The recognition target specifying unit 107 can also change the target period for calculating the blink frequency according to the distance between the user and the device. Specifically, when the distance between the user and the apparatus is close and the accuracy of the measured gaze direction is high, the response to the user is accelerated by taking a short period. On the other hand, if the distance is long and the accuracy of the direction of the measured line of sight is low, a longer period is used to accurately measure the blink frequency. In addition, the recognition target specifying unit 107 instructs the input unit 103 to display the recognition target user in the center of the moving image to be captured in order to accurately recognize the user's gesture when the recognition unit 108 enables the user's recognition. It is also possible to do. For example, when the recognition target specifying unit 107 gives an instruction to the input unit 103 based on the position of the user measured by the position measurement unit that measures the position of the user from the operation data 104, the input unit 103 displays the front of the user. Operates to face. In addition to the recognition end threshold 502, the recognition end determination unit 501 invalidates user recognition based on the distance between the display device and the user and the brightness of the user's surroundings measured by the illuminance measurement unit that measures illuminance. It is also possible to determine whether or not. It is known that the frequency of blinking decreases as the distance between the display device and the user decreases. It is also known that the blinking frequency temporarily increases when the ambient brightness changes. By determining distance and brightness, it is possible to determine an appropriate blink frequency threshold that takes into account the user's surrounding environment, thereby more accurately determining when the user has finished operating the device. Does not decrease operability. Specifically, using the conditional expression shown below, in step S606 in FIG. 6, the recognition end determination unit 501 determines whether to invalidate the user's recognition.

尚、ａは、まばたきの頻度（回数／秒）であり、ｂは、認識終了閾値５０２であり、ｃは、現在の距離（ｃｍ）であり、ｄは、認識有効時の距離（ｃｍ）である。また、ｅは、現在の照度（ｌｕｘ）であり、ｆは、認識有効時の照度（ｌｕｘ）であり、ｇは、最後に照度が一定以上変化してからの経過時間（秒）である。即ち、ｃ／ｄは、はユーザとの距離に関するパラメータである。ユーザと装置との距離が小さくなるにつれてまばたきの頻度も減るため、それに応じて閾値を小さくする。（１＋（ｅ−ｆ）／（ｆ×ｇ））は、照度に関するパラメータである。ユーザの周囲の照度が変化すると、その変化量（ｅ−ｆ）が大きくなるにつれてまばたきの頻度も増えるため、それに応じて閾値を大きくする。また、照度の変化によるまばたきの頻度の増加は一時的で、時間経過により元の頻度に戻るため、最後に照度が一定以上変化してからの経過時間ｇが長くなるに応じて、照度に関するパラメータが１に近づくようにする。例えば、認識終了閾値５０２が１回／秒、現在の表示装置とユーザとの距離が２４０ｃｍ、認識有効時の表示装置とユーザとの距離が３００ｃｍとする。また現在の照度が６００ｌｕｘ、認識有効時の照度が１５０ｌｕｘ、最後に照度が一定以上変化してからの経過時間が６秒であるとする。このとき、上記の条件式より、認識終了判断部５０１は、図６で示すステップＳ６０６において、ユーザのまばたきの頻度が１．２５回／秒であるとき、ユーザの操作が終了したと判断する。そしてステップＳ６０７に進み、ステップＳ６０８で、認識部１０８はユーザの認識を無効にする。 Here, a is the frequency of blinking (number of times / second), b is the recognition end threshold value 502, c is the current distance (cm), and d is the distance (cm) when the recognition is valid. is there. Also, e is the current illuminance (lux), f is the illuminance (lux) when recognition is valid, and g is the elapsed time (seconds) since the last change of illuminance. That is, c / d is a parameter related to the distance to the user. As the distance between the user and the device decreases, the frequency of blinking also decreases, so the threshold value is decreased accordingly. (1+ (e−f) / (f × g)) is a parameter relating to illuminance. When the illuminance around the user changes, the frequency of blinking increases as the amount of change (ef) increases, and the threshold value is increased accordingly. In addition, the increase in the blinking frequency due to the change in illuminance is temporary and returns to the original frequency as time elapses. Therefore, as the elapsed time g after the last change in illuminance increases, the illuminance parameter increases. To approach 1. For example, it is assumed that the recognition end threshold 502 is 1 time / second, the distance between the current display device and the user is 240 cm, and the distance between the display device and the user when recognition is valid is 300 cm. It is also assumed that the current illuminance is 600 lux, the illuminance when recognition is enabled is 150 lux, and the elapsed time since the last change of illuminance is 6 seconds or more. At this time, from the above conditional expression, the recognition end determination unit 501 determines that the user operation has ended when the user's blink frequency is 1.25 times / second in step S606 shown in FIG. In step S607, the recognition unit 108 invalidates the user's recognition.

また、認識終了判断部５０１は、認識終了閾値５０２に加え、ユーザの視線や音声測定部が測定したユーザの発している音声をもとにユーザの認識を無効とするか判断することも可能である。特に、ユーザの視線が装置を向いていない、あるいはユーザが発声している場合は、ユーザが会話等別な事柄に集中していると判断し、まばたきが頻発してもユーザの認識を無効としないことも可能である。具体的には、図６のステップＳ６０６で、まばたきの頻度が閾値以上であることに加え、ユーザの視線の向きが装置を向いており、かつユーザが音声を発していないことを判定する。これらの結果が全て正のときのみステップＳ６０７に進み、ステップＳ６０８で認識を無効とする。 In addition to the recognition end threshold 502, the recognition end determination unit 501 can also determine whether to invalidate the user's recognition based on the user's line of sight or the voice emitted by the user measured by the voice measurement unit. is there. In particular, when the user's line of sight is not facing the device or the user is speaking, it is determined that the user is concentrating on other matters such as conversation, and the user's recognition is invalidated even if frequent blinking occurs. It is also possible not to. Specifically, in step S606 of FIG. 6, it is determined that the blinking frequency is equal to or greater than the threshold, the user's line of sight is facing the apparatus, and the user is not speaking. Only when these results are all positive, the process proceeds to step S607, and the recognition is invalidated in step S608.

また、認識対象特定部１０７は、ユーザに認識が開始されたことを知らせるため、ユーザの認識を開始した後、認識を開始したことユーザに伝えるよう出力部１０２に対して指示することも可能である。特に、出力部１０２は、ユーザの認識が開始されたら、目の映像をユーザに表示し、目の映像の視線の方向が徐々にユーザを向くようなアニメーションを出力することも可能である。人間が別な人物から凝視されていることに気づいときその人物に視線を向けるという反応は極めて自然であるため、表示装置が凝視に対して目の映像を出力し、その視線を認識対象のユーザへ向ける反応をおこなうことはユーザにとって直感的であり、操作性を高める。 The recognition target specifying unit 107 can also instruct the output unit 102 to notify the user that the recognition has started after starting the user's recognition in order to inform the user that the recognition has started. is there. In particular, the output unit 102 may display an eye image to the user when the user's recognition is started, and output an animation in which the direction of the eye line of the eye image gradually faces the user. When a person notices that a person is staring at another person, the reaction of turning his gaze toward that person is very natural, so the display device outputs an image of the eye to the gaze, and the user who recognizes the gaze It is intuitive for the user to perform a reaction toward the user and enhances operability.

また、認識対象特定部１０７は、同時にジェスチャ認識が有効となったユーザの中から、優先順位を用いて指示可能なユーザを決定することも可能である。優先順位はユーザにより予め定められていても良いし、それぞれのユーザのまばたきの頻度や視線の向き、装置との距離、位置を比較して決定しても良い。またどのユーザが装置に指示可能かを出力部１０２を介してユーザに通知しても良い。 Further, the recognition target specifying unit 107 can also determine a user who can be instructed using the priority order from the users whose gesture recognition is enabled at the same time. The priority order may be determined in advance by the user, or may be determined by comparing each user's blink frequency, line-of-sight direction, distance to the device, and position. In addition, it may be notified to the user via the output unit 102 which user can instruct the apparatus.

また、入力部１０３は動作データ１０４が格納する動画像のフレームの画像を削除することも可能である。まばたき測定部１０５はまばたき履歴１０６が格納するまばたき履歴を削除することも可能である。また、出力部１０２は、音声を発したり、別な装置を動作させたりすることにより、ユーザに多様な情報を伝えてもよい。また、距離測定部１１１は、ユーザの発した音声を収録したり、ユーザの温度を計測したり、赤外線等を用いてユーザと装置との距離を計測しても良い。また動作データ１０４は、音声データや温度データ、距離データを格納しても良い。また認識部１０８はジェスチャのほかに音声やユーザの姿勢を、装置への指示入力として認識しても良い。 Further, the input unit 103 can also delete a frame image of the moving image stored in the operation data 104. The blink measuring unit 105 can also delete the blink history stored in the blink history 106. Further, the output unit 102 may transmit various information to the user by generating a sound or operating another device. Further, the distance measuring unit 111 may record a voice uttered by the user, measure the temperature of the user, or measure the distance between the user and the apparatus using infrared rays or the like. The operation data 104 may store audio data, temperature data, and distance data. In addition to the gesture, the recognition unit 108 may recognize a voice or a user's posture as an instruction input to the apparatus.

（その他の実施形態）
また、本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。 (Other embodiments)
The present invention can also be realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, or the like) of the system or apparatus reads the program. It is a process to be executed.

Claims

An information processing apparatus that executes processing according to a user instruction input,
A measuring means for measuring a user's blink frequency;
Identifying means for identifying a user whose blinking frequency satisfies a predetermined criterion as an operator;
An information processing apparatus comprising: an instruction input that is different from blinking, and that recognizes an instruction input by a user specified as an operator and executes a process according to the instruction input.

The information processing apparatus according to claim 1, wherein the instruction input is an instruction input by a gesture.

The information processing apparatus according to claim 1, wherein the instruction input is a voice instruction input.

Furthermore, it has a detecting means for detecting the direction of the user's line of sight,
The information processing apparatus according to any one of claims 1 to 3, wherein the specifying unit specifies, as an operator, a user whose blinking frequency and the direction of the line of sight satisfy a predetermined criterion. apparatus.

An operation method of an information processing apparatus that executes processing according to a user's instruction input,
A measurement process for measuring the frequency of user blinking;
A specific step of identifying a user whose blinking frequency satisfies a predetermined criterion as an operator;
An operation method including a recognition step of recognizing an instruction input by a user specified as an operator, and executing a process according to the instruction input, which is an instruction input different from blinking.

A program for causing a computer to execute the operation method according to claim 5.