JP2013105203A

JP2013105203A - Information processing device, control method for the same, and information processing system

Info

Publication number: JP2013105203A
Application number: JP2011246707A
Authority: JP
Inventors: Masashi Yoshida; 雅史吉田
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2011-11-10
Filing date: 2011-11-10
Publication date: 2013-05-30
Anticipated expiration: 2031-11-10
Also published as: JP5868128B2

Abstract

PROBLEM TO BE SOLVED: To enhance the operating ease of specifying an information processing device preferred by a user as the object of operation in an environment in which there are multiple information processing devices permitting operation by gesture.SOLUTION: An information processing device to be used as one of multiple information processing devices each of which can execute processing matching an action by a user comprises: a recognizing unit 23 that recognizes actions by the user; a display control unit 26 that causes, on the basis of a first action by the user recognized by the recognizing unit 23, a display unit 13 to display an image showing at least some of information processing devices that constitute candidates for execution of the processing matching the actions by the user among the multiple information processing devices; and a specifying unit 25 that specifies, on the basis of a second action by the user recognized by the recognizing unit 23, the information processing device to be caused to execute the processing matching the actions by the user among the information processing devices shown in the displayed image as the candidates.

Description

本発明は、ユーザのジェスチャによる指示に応じて動作させる情報処理装置に関する。 The present invention relates to an information processing apparatus that operates according to an instruction by a user's gesture.

ユーザの動き（ジェスチャ）による操作を認識することによって、対応する処理を実行可能な情報処理装置が知られている。このような操作方法を、ジェスチャ操作という。ジェスチャ操作が可能な情報処理装置が複数存在する環境では、ユーザのジェスチャがいずれの機器に対するジェスチャ操作であるかを正しく判断できなければ、誤動作の原因となるという問題がある。
そこで特許文献１では、ユーザを撮影した映像を解析することでユーザの動きと視線の方向を判断し、ユーザが入力部を見ている情報処理装置と、指で差した情報処理装置が一致した場合に、その機器をジェスチャ操作の対象として特定している。また、特許文献１では、視線方向及び指差し方向を判定するための映像情報が類似する場合、特定した操作対象が正しいか否かを確認するための選択画面を提示し、ユーザにＹｅｓまたはでＮｏを回答させている。 There is known an information processing apparatus capable of executing a corresponding process by recognizing an operation based on a user's movement (gesture). Such an operation method is called gesture operation. In an environment in which there are a plurality of information processing apparatuses that can perform gesture operations, there is a problem that if a user's gesture cannot be correctly determined to which gesture operation, a malfunction may be caused.
Therefore, in Patent Document 1, the user's movement and the direction of the line of sight are determined by analyzing the video captured by the user, and the information processing apparatus in which the user is looking at the input unit matches the information processing apparatus that is pointed with the finger. In this case, the device is specified as a gesture operation target. Further, in Patent Document 1, when the video information for determining the line-of-sight direction and the pointing direction is similar, a selection screen for confirming whether or not the specified operation target is correct is presented, and Yes or No is displayed to the user. No is answered.

特開２００９−３７４３４号公報JP 2009-37434 A

しかしながら、特許文献１では、制御対象とする情報処理装置が誤って判断された場合に、ユーザが選択画面に対してＮｏと選択しても、本来はいずれの機器を操作したいかを示すことができなかったため、次回以降再び同様の誤りが生じ得るという課題があった。
本発明は、上記の課題を鑑みてなされたものであり、ジェスチャによる操作が可能な情報処理装置が複数存在する環境において、ユーザが意図する情報処理装置を操作対象として特定するための操作性を向上させることを目的とする。 However, in Patent Document 1, when the information processing apparatus to be controlled is erroneously determined, even if the user selects No on the selection screen, it may indicate which device is originally intended to be operated. Since it was not possible, there was a problem that the same error could occur again after the next time.
The present invention has been made in view of the above problems, and has an operability for specifying an information processing apparatus intended by a user as an operation target in an environment where there are a plurality of information processing apparatuses that can be operated by gestures. The purpose is to improve.

本発明は、上記課題を鑑みてなされたものであり、ユーザの動作に対応する処理を実行可能な複数の情報処理装置のうちの１つとして用いられる情報処理装置であって、ユーザの動作を認識する認識手段と、前記認識手段が認識したユーザの第１の動作に基づいて、前記複数の情報処理装置のうち、前記ユーザの動作に対応する処理を実行させる候補となる情報処理装置の少なくとも一部を示す画像を表示部に表示させる表示制御手段と、前記認識手段が認識したユーザの第２の動作に基づいて、前記表示部に表示された画像が示す前記候補となる情報処理装置の中から、前記ユーザの動作に対応する処理を実行させる情報処理装置を特定する特定手段とを備えることを特徴とする。 The present invention has been made in view of the above problems, and is an information processing apparatus that is used as one of a plurality of information processing apparatuses capable of executing processing corresponding to a user's operation. At least one of the plurality of information processing devices that is a candidate for executing a process corresponding to the user's operation based on the recognition unit that recognizes the first operation of the user recognized by the recognition unit. A display control unit configured to display an image showing a part on the display unit, and a candidate information processing device indicated by the image displayed on the display unit based on the second operation of the user recognized by the recognition unit. And a specifying unit that specifies an information processing apparatus that executes a process corresponding to the user's operation.

本発明によれば、ジェスチャによる操作が可能な情報処理装置が複数存在する環境において、ユーザが意図する情報処理装置を操作対象として特定するための操作性が向上する。 According to the present invention, in an environment where there are a plurality of information processing devices that can be operated by gestures, the operability for specifying an information processing device intended by the user as an operation target is improved.

システムの構成と情報処理装置のハードウェア構成を示す概要図Outline diagram showing system configuration and hardware configuration of information processing device 情報処理装置の機能ブロック図とリストの一例を示す概要図Schematic diagram showing an example of functional block diagram and list of information processing apparatus 情報処理装置のメイン処理の一例を示すフローチャートThe flowchart which shows an example of the main process of information processing apparatus 情報処理装置の情報取得処理の一例を示すフローチャートThe flowchart which shows an example of the information acquisition process of information processing apparatus 情報処理装置の制御対象選択処理の一例を示すフローチャートThe flowchart which shows an example of the control object selection process of information processing apparatus 情報処理装置の情報処理装置選択用画像の一例を示す図The figure which shows an example of the image for information processing apparatus selection of information processing apparatus 情報処理装置のメイン処理の一例を示すフローチャートThe flowchart which shows an example of the main process of information processing apparatus 情報処理装置の操作対象選択処理の一例を示すフローチャートThe flowchart which shows an example of the operation target selection process of information processing apparatus 情報処理装置のメイン処理の一例を示すフローチャートThe flowchart which shows an example of the main process of information processing apparatus 情報処理装置の機能ブロック図とリストの一例を示す概要図Schematic diagram showing an example of functional block diagram and list of information processing apparatus 情報処理装置のメイン処理の一例を示すフローチャートThe flowchart which shows an example of the main process of information processing apparatus システムの概要図と情報取得処理の一例を示すフローチャートSystem diagram and flowchart showing an example of information acquisition processing

以下、図面を参照して本発明の好適な実施形態について詳細に説明する。なお、本発明は以下の実施形態に限定されるものではなく、本発明の実施に有利な具体例を示すにすぎない。 DESCRIPTION OF EMBODIMENTS Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings. In addition, this invention is not limited to the following embodiment, It shows only the specific example advantageous for implementation of this invention.

［実施形態１］
図１（ａ）は、本実施形態における情報処理装置による情報処理システムの構成を示す概要図である。本実施形態では、ネットワーク上に３つの情報処理装置１０〜１２が接続されている。各情報処理装置には、ディスプレイ１３〜１５、及びカメラ１６〜１８が接続あるいは搭載されている。情報処理装置１０〜１２は、それぞれが単体に処理を行うものとする。なお、ここでは３つの情報処理装置が接続された環境を基に説明するが、本発明は複数の情報処理装置が接続される環境であれば、情報処理装置の数に関わらず実施可能である。 [Embodiment 1]
FIG. 1A is a schematic diagram illustrating a configuration of an information processing system using an information processing apparatus according to the present embodiment. In the present embodiment, three information processing apparatuses 10 to 12 are connected on the network. Each information processing apparatus is connected or mounted with displays 13 to 15 and cameras 16 to 18. Each of the information processing apparatuses 10 to 12 performs processing alone. Although the description is based on an environment in which three information processing devices are connected, the present invention can be implemented regardless of the number of information processing devices as long as the environment is connected to a plurality of information processing devices. .

図１（ｂ）は、本実施形態における情報処理装置１０のハードウェア構成図である。なお、以下では情報処理装置１０を例に説明するが、情報処理装置１１及び情報処理装置１２も同じハードウェア構成であるものとする。 FIG. 1B is a hardware configuration diagram of the information processing apparatus 10 in the present embodiment. Hereinafter, the information processing apparatus 10 will be described as an example, but the information processing apparatus 11 and the information processing apparatus 12 are also assumed to have the same hardware configuration.

ディスプレイ１３は、例えば液晶ディスプレイ、ＣＲＴディスプレイ等で構成され、情報処理装置１０から出力された画面情報の信号を表示し、情報処理装置１０の表示部として機能する。 The display 13 is configured by a liquid crystal display, a CRT display, or the like, for example, displays a screen information signal output from the information processing apparatus 10, and functions as a display unit of the information processing apparatus 10.

情報処理装置１０は、ＣＰＵ１０１、ＲＯＭ１０２、ＲＡＭ１０３、ディスプレイインターフェース１０４、入力インターフェース１０５、通信インターフェース１０６、ＨＤＤ１０７、バス１０８から構成される。 The information processing apparatus 10 includes a CPU 101, ROM 102, RAM 103, display interface 104, input interface 105, communication interface 106, HDD 107, and bus 108.

ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１０１は、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１０２、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）１０６に格納された制御プログラムを実行し、各デバイスを制御する。ＲＯＭ１０２は、各種の制御プログラムやデータを保持する。ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１０３は、ＣＰＵ１０１のワーク領域、エラー処理時の情報の退避領域、制御プログラムのロード領域などを有する。ディスプレイインターフェース１０４は、ディスプレイデバイスドライバからの画面情報をディスプレイ１３が表示処理可能な信号に変換し、出力する。入力インターフェース１０５は、カメラ１６でユーザの動画像を撮影し、ユーザの視線の向き、ジェスチャの入力等を解析するためのインターフェースとして機能する。通信インターフェース１０６は、インターネットやホームネットワーク等のネットワークを介して接続された他の情報処理装置と通信し、情報を交換するためのインターフェースである。本実施形態では、情報処理装置１０〜１２が、無線通信によって接続されることを可能とする通信インターフェースとする。ただし、ネットワークは、情報処理装置が相互に情報を交換可能であれば無線、有線に関わらず利用できる。ＨＤＤ１０７は、ＣＰＵ１０１に実行される各種の制御プログラムや、情報処理装置１０で再生される画像や文章、音声その他のファイル等を記憶している。バス１０８は、アドレスバス、データバス及びコントロールバスを含む。 A CPU (Central Processing Unit) 101 executes control programs stored in a ROM (Read Only Memory) 102 and an HDD (Hard Disk Drive) 106 to control each device. The ROM 102 holds various control programs and data. A RAM (Random Access Memory) 103 has a work area of the CPU 101, a save area for information during error processing, a load area for a control program, and the like. The display interface 104 converts screen information from the display device driver into a signal that can be displayed by the display 13 and outputs the signal. The input interface 105 functions as an interface for capturing a user's moving image with the camera 16 and analyzing the user's line-of-sight direction, gesture input, and the like. The communication interface 106 is an interface for communicating with other information processing apparatuses connected via a network such as the Internet or a home network and exchanging information. In the present embodiment, the information processing apparatuses 10 to 12 are communication interfaces that can be connected by wireless communication. However, the network can be used regardless of wireless or wired as long as the information processing apparatuses can exchange information with each other. The HDD 107 stores various control programs executed by the CPU 101 and images, texts, sounds, and other files reproduced by the information processing apparatus 10. The bus 108 includes an address bus, a data bus, and a control bus.

尚、以下特に断らない限り、既に示された図を用いて説明されたものには同一の符号を付し、その説明を省略する。 Unless otherwise specified, the same reference numerals are given to those already described with reference to the drawings and the description thereof is omitted.

図２（ａ）は情報処理装置１０の機能構成の一例を示すブロック図である。
本実施形態の情報処理装置１０は、記憶部２０、取得部２１、撮像部２２、認識部２３、設定部２４、特定部２５、表示制御部２６、調整部２７から構成される。 FIG. 2A is a block diagram illustrating an example of a functional configuration of the information processing apparatus 10.
The information processing apparatus 10 according to the present embodiment includes a storage unit 20, an acquisition unit 21, an imaging unit 22, a recognition unit 23, a setting unit 24, a specifying unit 25, a display control unit 26, and an adjustment unit 27.

記憶部２０は、例えば、ＨＤＤ１０７の記憶領域を使用して、後述する認識フラグ、認識スコア、補正値等を記憶する。ここで、認識フラグとは、情報処理装置１０がユーザのジェスチャによる操作を認識するかどうかを判定する為のフラグである。認識フラグは、初期値が「０」であり、情報処理装置がユーザのジェスチャを認識すると「１」となる。さらに、ジェスチャ操作が終了した場合、または、他の情報処理装置がジェスチャ操作の対象として選択された場合には「０」となる。情報処理装置１０は、認識フラグが「１」である場合にのみ、ユーザのジェスチャ操作を認識し、そのジェスチャによって指示された処理を実行する。また、認識スコアとは、後述する設定部２４によりユーザの顔画像を解析した結果、情報処理装置１０が操作対象の候補と判断された場合、操作対象の候補としての優先度を示す指標である。認識スコアが高いほど、ユーザに操作対象として指定された可能性が高いことを示すため、優先順位が高くなる。本実施形態において、認識スコアの最大値は「１００」、最小値は「０」である。本実施形態では、撮像部２２が撮像したユーザの動画像を解析した結果、ユーザが情報処理装置１０を凝視する動作が認識された場合に、ユーザがジェスチャ操作を行う対象として当該情報処理装置を選択していると判断する。その際、ユーザが正面からカメラ１６を凝視している場合には、認識スコアを最大値「１００」と設定し、カメラを見るユーザの視線方向の角度が大きくなるほどに認識スコアを低く設定する。補正値とは、情報処理装置毎に認識スコアを補正する際に用いる。例えば、ユーザを撮影するカメラの性能や位置の違い、計算処理能力の違いなどによって視線方向の認識精度が異なるような場合にも、認識スコアを一律に判断の基準として用いられるように補正を加える。補正値の初期値は「０」であり、後述する調整部２７により、補正値は調整される。補正値が高い場合は、認識スコアは高く補正され、補正値が低い場合は、認識スコアは低く補正される。 For example, the storage unit 20 uses a storage area of the HDD 107 to store a recognition flag, a recognition score, a correction value, and the like, which will be described later. Here, the recognition flag is a flag for determining whether the information processing apparatus 10 recognizes an operation by a user's gesture. The initial value of the recognition flag is “0”, and becomes “1” when the information processing apparatus recognizes the user's gesture. Further, when the gesture operation is completed, or when another information processing apparatus is selected as a target for the gesture operation, “0” is set. Only when the recognition flag is “1”, the information processing apparatus 10 recognizes the user's gesture operation and executes the process instructed by the gesture. The recognition score is an index indicating the priority as the candidate for the operation target when the information processing apparatus 10 is determined as the candidate for the operation target as a result of analyzing the face image of the user by the setting unit 24 described later. . Since the higher the recognition score, the higher the possibility that the user has been designated as the operation target, the higher the priority. In the present embodiment, the maximum value of the recognition score is “100” and the minimum value is “0”. In this embodiment, as a result of analyzing the user's moving image captured by the image capturing unit 22, when the user recognizes an operation of staring at the information processing apparatus 10, the information processing apparatus is set as a target for the user to perform a gesture operation. Judge that it is selected. At this time, when the user is staring at the camera 16 from the front, the recognition score is set to the maximum value “100”, and the recognition score is set to be lower as the angle in the line of sight of the user viewing the camera becomes larger. The correction value is used when correcting the recognition score for each information processing apparatus. For example, even when the recognition accuracy in the line of sight differs depending on the performance and position of the camera that captures the user, the difference in calculation processing ability, etc., correction is made so that the recognition score is uniformly used as a criterion for judgment. . The initial value of the correction value is “0”, and the correction value is adjusted by the adjusting unit 27 described later. When the correction value is high, the recognition score is corrected high. When the correction value is low, the recognition score is corrected low.

取得部２１は、ＣＰＵ１０１、ＲＯＭ１０２，ＲＡＭ１０３、通信インターフェース１０６によって構成される。本実施形態の情報処理装置１０では、取得部２１は、認識フラグが「１」である場合に、ネットワークで接続されている他の情報処理装置のうち認識フラグが「１」である情報処理装置から、識別情報及び認識スコアを示す情報を取得し、ＲＡＭ１０３に記憶する。また、取得部２１は、他の情報処理装置１１〜１２から認識フラグが「１」であるかの問い合わせ信号を受けた場合には、情報処理装置１０の識別情報及び後述する設定部２４によって設定された認識スコアを示す情報を、ネットワークを通じて返す。 The acquisition unit 21 includes a CPU 101, a ROM 102, a RAM 103, and a communication interface 106. In the information processing apparatus 10 according to the present embodiment, when the recognition flag is “1”, the acquisition unit 21 has an information processing apparatus whose recognition flag is “1” among other information processing apparatuses connected via the network. Information indicating the identification information and the recognition score is acquired from the RAM 103 and stored in the RAM 103. Further, when receiving an inquiry signal as to whether the recognition flag is “1” from the other information processing apparatuses 11 to 12, the acquisition unit 21 sets the identification information of the information processing apparatus 10 and the setting unit 24 described later. Information indicating the recognized recognition score is returned through the network.

撮像部２２は、カメラ１６及び入力インターフェース１０５によって構成され、ユーザの顔画像を含む動画像を撮影し、撮影した動画像のフレームをＲＡＭ１０３に記憶する。 The imaging unit 22 includes the camera 16 and the input interface 105, captures a moving image including the user's face image, and stores the captured moving image frame in the RAM 103.

認識部２３は、ＣＰＵ１０１、ＲＯＭ１０２，ＲＡＭ１０３によって構成され、撮像部２２によって撮影され、ＲＡＭ１０３に記憶されたフレームの画像を解析する。そして、ユーザの顔を撮影している画像の解析した結果からユーザの視線の方向を判断し、ユーザがカメラ１６を凝視しているかどうかを判定する。ここで凝視とは、一定時間同じものを見続ける動作のことをいう。本実施形態では、ユーザの顔の向きから視線方向を推定するため、ユーザを撮像した動画像を解析し、一定以上の数連続したフレームに渡って、ユーザの顔の向きが変わらない場合に、ユーザが視野内にある対象物を凝視していると判断する。ユーザの顔の画像を解析した結果、ユーザの顔が、カメラ１６が視野に含まれる角度を向いた状態で一定以上の時間が経過したと判断される場合には、認識フラグを「１」にして、ＲＡＭ１０３に保持するとともに記憶部２０に記憶する。また、認識部２３は、ＲＡＭ１０３上に保持された認識フラグが「１」である場合には、更にユーザの動きを撮影した動画像を解析し、ユーザが情報処理装置１０を操作するために行うジェスチャを認識する。 The recognition unit 23 includes a CPU 101, a ROM 102, and a RAM 103. The recognition unit 23 analyzes a frame image captured by the imaging unit 22 and stored in the RAM 103. And the direction of a user's eyes | visual_axis is judged from the result of having analyzed the image which has image | photographed the user's face, and it is determined whether the user is staring at the camera 16. FIG. Here, gaze refers to an operation that keeps watching the same thing for a certain period of time. In this embodiment, in order to estimate the line-of-sight direction from the direction of the user's face, a moving image captured by the user is analyzed, and when the direction of the user's face does not change over a certain number of consecutive frames, It is determined that the user is staring at an object in the field of view. As a result of analyzing the image of the user's face, if it is determined that a certain amount of time has elapsed with the camera 16 facing the angle included in the field of view, the recognition flag is set to “1”. And stored in the RAM 103 and stored in the storage unit 20. Further, when the recognition flag held on the RAM 103 is “1”, the recognition unit 23 further analyzes a moving image obtained by capturing the user's movement and performs the operation for the user to operate the information processing apparatus 10. Recognize gestures.

設定部２４は、ＣＰＵ１０１、ＲＯＭ１０２，ＲＡＭ１０３によって構成され、ユーザユーザがカメラ１６を凝視していると判定された場合に、認識スコアを決定する。本実施形態では、ユーザの顔画像と予め用意された複数の顔テンプレートとのマッチング度を求めることで、ユーザの顔の向きを求め、求められた顔の向きから推定される視線方向に対して認識スコアを決定する。詳細には、まず情報処理装置１０のカメラ１６をユーザが真正面に見る場合の顔の角度を９０度と定義する。認識スコアは、ユーザの顔以外の身体的部分が向く方向や、表示部であるディスプレイ１３の視野方向には依存しない。本実施形態では、ユーザがディスプレイ１３やカメラ１６の前に立っているかいないかに関わらず、ユーザの顔の真正面方向にカメラ１６が存在する場合に、「ユーザはカメラ１６を見ている」という推定をする。そして、ユーザが真正面にカメラ１６を見る状態からユーザ自身の右側（カメラ１６から見て左側）を向く顔の角度を１０度間隔で１００度〜１８０度、ユーザ自身の左側（カメラ１６から見て右側）を向く顔の角度を１０度間隔で８０度〜０度と定義する。情報処理装置１０は、これら１９パターン方向を向いた場合のユーザの顔を示す顔テンプレートを、予め記憶部２０に保持している。そして、カメラが撮影した顔画像と顔テンプレートとのマッチング度を求め、最もマッチング度の高かったテンプレートを基に、顔の向きから推定される視線方向に対して認識スコアを算出する。例えば、マッチング度が高かったテンプレートが、ユーザの顔の向きが９０度のテンプレートである場合、ユーザの視線方向も真正面にカメラ１６を見る向きであると推定して、認識スコアを１００とする。同様に、マッチング度の高かったテンプレートが８０度あるは１００度の場合には、ユーザの顔の向きは９０度±１０度であることから、推定される視線方向の認識スコアを１００−１０＝９０とする。 The setting unit 24 includes a CPU 101, a ROM 102, and a RAM 103, and determines a recognition score when it is determined that the user user is staring at the camera 16. In the present embodiment, the orientation of the user's face is obtained by obtaining the degree of matching between the user's face image and a plurality of face templates prepared in advance, and the gaze direction estimated from the obtained face orientation Determine the recognition score. Specifically, first, the angle of the face when the user looks at the camera 16 of the information processing apparatus 10 directly in front is defined as 90 degrees. The recognition score does not depend on the direction in which a physical part other than the user's face faces or the viewing direction of the display 13 serving as a display unit. In this embodiment, regardless of whether or not the user is standing in front of the display 13 or the camera 16, when the camera 16 is present in front of the user's face, it is estimated that “the user is looking at the camera 16”. do. Then, the angle of the face facing the user's right side (left side when viewed from the camera 16) from the state where the user looks at the camera 16 is 100 degrees to 180 degrees at intervals of 10 degrees, and the user's left side (viewed from the camera 16). The angle of the face facing right) is defined as 80 degrees to 0 degrees at 10 degree intervals. The information processing apparatus 10 holds in advance in the storage unit 20 a face template that indicates the face of the user when facing the 19 pattern directions. Then, the degree of matching between the face image photographed by the camera and the face template is obtained, and the recognition score is calculated for the line-of-sight direction estimated from the face direction based on the template having the highest matching degree. For example, if the template with a high matching degree is a template whose face direction of the user is 90 degrees, it is estimated that the direction of the user's line of sight is the direction of looking at the camera 16 directly in front, and the recognition score is 100. Similarly, when the template having a high matching degree is 80 degrees or 100 degrees, the orientation of the user's face is 90 degrees ± 10 degrees, and therefore the estimated recognition score of the gaze direction is 100−10 = 90.

表示制御部２６は、ＣＰＵ１０１、ＲＯＭ１０２、ＲＡＭ１０３、ディスプレイインターフェース１０４によって構成され、ディスプレイ１３に表示させる表示画像を生成して、ディスプレイ１３に出力する。本実施形態では、主に取得部２１が取得した他の情報処理装置の識別情報（装置名称）と認識スコア、及び設定部２４が設定した情報処理装置１０の認識スコアに基づいて、ユーザがジェスチャ操作の対象とする情報処理装置の候補名を優先順が判るように表示する。本実施形態では、認識スコアの高い順に表示させる。 The display control unit 26 includes a CPU 101, a ROM 102, a RAM 103, and a display interface 104, generates a display image to be displayed on the display 13, and outputs the display image to the display 13. In the present embodiment, the user performs a gesture based mainly on the identification information (device name) and recognition score of another information processing apparatus acquired by the acquisition unit 21 and the recognition score of the information processing apparatus 10 set by the setting unit 24. The candidate names of the information processing devices to be operated are displayed so that the priority order can be understood. In this embodiment, the recognition scores are displayed in descending order.

特定部２５は、ＲＡＭ１０３で保持されている情報処理装置１０の認識フラグが「１」である場合に認識部２３がユーザのジェスチャ操作を認識した結果を取得する。そして、ユーザの認識されたジェスチャ操作による情報処理装置１０への指示を特定し、ユーザが操作した結果を表示制御部２６及び調整部２７に伝える。本実施形態では主に、ディスプレイ１３に表示されたリストが示す複数の情報処理装置の中から、ユーザが１つの情報処理装置を選択するために行うジェスチャ操作を判定し、ユーザが選択した１つの情報処理装置を特定する。 The identifying unit 25 acquires the result of the recognition unit 23 recognizing the user's gesture operation when the recognition flag of the information processing apparatus 10 held in the RAM 103 is “1”. And the instruction | indication to the information processing apparatus 10 by the user's recognized gesture operation is specified, and the result operated by the user is transmitted to the display control unit 26 and the adjustment unit 27. In the present embodiment, a gesture operation performed for the user to select one information processing device from a plurality of information processing devices indicated by the list displayed on the display 13 is mainly determined, and one user selected by the user is selected. An information processing apparatus is specified.

調整部２７は、特定部２５が特定した結果を用いて、情報処理装置１０が選択された場合は記憶部２０に記憶される情報処理装置１０の補正値を大きくし、他の情報処理装置が選択された場合は、補正値を小さくする。 The adjustment unit 27 uses the result specified by the specifying unit 25 to increase the correction value of the information processing device 10 stored in the storage unit 20 when the information processing device 10 is selected. When selected, the correction value is decreased.

図３に、本実施形態における情報処理装置１０による処理の流れを示すメイン処理のフローチャートを示す。このフローチャートに対応するプログラムは例えばＨＤＤ１０７に記憶されており、このプログラムの起動指示に応じてＲＡＭ１０３にロードされ、ＣＰＵ１０１によって実行される。引き続き情報処理装置１０を例として処理の流れを説明するが、情報処理装置１１及び情報処理装置１２においても、同様の処理が実行されているものとする。また、以下では、図１（ａ）の情報処理装置１０がフォトフレームＡ、情報処理装置１１がテレビ、情報処理装置１２がフォトフレームＢという識別情報をもった機器であるとして説明していく。 FIG. 3 shows a flowchart of main processing showing the flow of processing by the information processing apparatus 10 in the present embodiment. A program corresponding to this flowchart is stored in, for example, the HDD 107, loaded into the RAM 103 in accordance with an instruction to start this program, and executed by the CPU 101. The flow of processing will be described using the information processing apparatus 10 as an example, but it is assumed that the same processing is executed in the information processing apparatus 11 and the information processing apparatus 12 as well. In the following description, it is assumed that the information processing apparatus 10 in FIG. 1A is a device having identification information such as the photo frame A, the information processing apparatus 11 is a television, and the information processing apparatus 12 is a photo frame B.

まず、キャリブレーション処理を行う（ステップＳ３０１）。キャリブレーション処理では、予め記憶部２０に記憶されている認識スコアの補正値を読み出して、ＲＡＭ１０３上に保持する。補正値は、情報処理装置毎のカメラの性能や位置の違い、計算処理能力の違いなどによって、ユーザの視線方向の認識精度がばらつく場合にも、認識スコアが一定の基準として用いられるように利用する。 First, calibration processing is performed (step S301). In the calibration process, the correction value of the recognition score stored in advance in the storage unit 20 is read and stored on the RAM 103. The correction value is used so that the recognition score is used as a constant reference even when the recognition accuracy of the user's gaze direction varies due to differences in camera performance and position, information processing capability, etc. To do.

続いて、撮像部２２がユーザの動画像を撮影し、その撮影した動画像のフレーム毎を認識部２３が解析し、ユーザによって情報処理装置１０をジェスチャ操作の対象として指定するために行う第１の動作がなされたかを判定する。本実施形態では、第１の動作として、認識部２３が、ユーザによってカメラ１６が凝視されているかを判断する（ステップＳ３０２）。その際、認識部２３は、撮像部２２がユーザを撮影した動画像のフレームうち、ユーザの顔部分の画像を解析し、ユーザの顔の向きが、カメラ１６が存在する方向に一致した状態で一定時間以上続く場合に、ユーザの凝視という第１の動作を認識する。ユーザによって凝視されていることを認識しなかった場合（ステップＳ３０２でＮＯ）、ステップＳ３０２に戻って処理を繰り返す。ユーザによって凝視されていることを認識した場合（ステップＳ３０２でＹＥＳ）、認識部２３は認識フラグを「１」に更新してＲＡＭ１０３上に保持する（ステップＳ３０３）。 Subsequently, the imaging unit 22 captures a moving image of the user, the recognition unit 23 analyzes each frame of the captured moving image, and the first is performed for the user to designate the information processing apparatus 10 as a gesture operation target. It is determined whether the operation is performed. In the present embodiment, as the first operation, the recognition unit 23 determines whether the camera 16 is stared by the user (step S302). At that time, the recognizing unit 23 analyzes the image of the user's face portion among the frames of the moving image captured by the imaging unit 22, and the orientation of the user's face matches the direction in which the camera 16 exists. When it continues for a certain period of time or more, the first action of user's gaze is recognized. If it is not recognized that the user is staring (NO in step S302), the process returns to step S302 and the process is repeated. When recognizing that the user is staring (YES in step S302), the recognizing unit 23 updates the recognition flag to “1” and holds it on the RAM 103 (step S303).

ここで、本実施形態では、図１（ａ）のように、３つの情報処理装置１０〜１２のうち、情報処理装置１１の前にユーザ１９が立ち、情報処理装置１１（テレビ）に対象としてジェスチャ操作を行おうとしているとする。また、本実施形態では、ユーザが５秒以上対象物を見続けた場合に、凝視していると判断することとする。まず、ユーザは情報処理装置１１に搭載されたカメラ１７を５秒以上凝視する。しかし、３つの情報処理装置が並べて設置されているため、情報処理装置１０及び情報処理装置１２もユーザの視野範囲内に存在する。従って、情報処理装置１０及び情報処理装置１２の認識部２３も、ユーザによって凝視されていることを認識し、それぞれが独立してメイン処理を継続する。 Here, in this embodiment, as shown in FIG. 1A, among the three information processing apparatuses 10 to 12, the user 19 stands in front of the information processing apparatus 11, and the information processing apparatus 11 (television) is targeted. Suppose you are trying to perform a gesture operation. In the present embodiment, it is determined that the user is staring when the user continues to look at the object for 5 seconds or longer. First, the user stares at the camera 17 mounted on the information processing apparatus 11 for 5 seconds or more. However, since the three information processing devices are installed side by side, the information processing device 10 and the information processing device 12 are also within the visual field range of the user. Therefore, the information processing device 10 and the recognition unit 23 of the information processing device 12 also recognize that the user is staring, and each independently continues the main process.

情報処理装置１０においては、続いて、設定部２４は、ユーザの顔部分の画像（顔画像）を基に視線方向を判断し、その結果から認識スコアを算出する。ステップＳ３０１のキャリブレーションによって補正値が設定されていた場合には、補正値を用いて認識スコアを補正し、ＲＡＭ１０３上に保持する（ステップＳ３０４）。本実施形態では、上述したように、情報処理装置１０のカメラ１６を真正面から見るユーザの顔の角度を顔の角度を９０度と定義して、視線方向に対して設定する認識スコアの基準としている。そして、１９パターンの方向を向いた場合のユーザの顔を示す顔テンプレートと、カメラが撮影した顔画像とのマッチング度を求め、最もマッチング度の高かったテンプレートを基に認識スコアを算出する。 Subsequently, in the information processing apparatus 10, the setting unit 24 determines the line-of-sight direction based on the image of the user's face (face image), and calculates a recognition score from the result. If a correction value has been set by the calibration in step S301, the recognition score is corrected using the correction value and stored on the RAM 103 (step S304). In the present embodiment, as described above, the angle of the face of the user viewing the camera 16 of the information processing apparatus 10 from the front is defined as the face angle of 90 degrees, and the recognition score is set with respect to the line-of-sight direction. Yes. Then, the degree of matching between the face template indicating the user's face when facing the direction of 19 patterns and the face image photographed by the camera is obtained, and the recognition score is calculated based on the template having the highest matching degree.

本実施形態で説明する例では、ユーザ１９は、情報処理装置１１を真正面に見ている。そして、情報処理装置１０の設定部２４が、カメラ１６が撮影したユーザの顔画像と、顔テンプレートとのマッチングを行った結果、ユーザの顔の向きが１１０度のテンプレートが最もマッチング度が高かったとする。この場合、情報処理装置１０におけるユーザの視線方向に対する評価値である認識スコアは１００−２０＝８０となる。同様に、情報処理装置１２でのマッチングの結果、ユーザの顔の向きは６０度のテンプレートとのマッチング度が最も高かったとすると、情報処理装置１２におけるユーザの視線方向に対する評価値である認識スコアは１００−３０＝７０となる。 In the example described in the present embodiment, the user 19 looks at the information processing apparatus 11 directly in front. Then, as a result of the matching between the face image of the user captured by the camera 16 and the face template, the setting unit 24 of the information processing apparatus 10 has the highest matching degree for the template whose orientation of the user's face is 110 degrees. To do. In this case, a recognition score that is an evaluation value for the user's line-of-sight direction in the information processing apparatus 10 is 100−20 = 80. Similarly, as a result of the matching in the information processing apparatus 12, if the orientation of the user's face is the highest matching degree with the template of 60 degrees, the recognition score that is an evaluation value for the user's line-of-sight direction in the information processing apparatus 12 is 100−30 = 70.

情報処理装置１０におけるメイン処理は、ステップＳ３０５の情報取得処理に進む。ここで、図４のフローチャートを参照して本実施形態の情報取得処理（ステップＳ３０５）を説明する。情報取得処理（ステップＳ３０５）が開始すると、まず取得部２１が、ネットワークで接続された複数の情報処理装置の中で、保持している認識フラグが「１」の装置があるかを問い合わせるための信号を、ネットワーク上の他の情報処理装置に送信する。ここで送信する問い合わせ信号には、情報処理装置１０の識別情報及び認識スコアが含まれる。なお、ここでの識別情報は、装置の名称に限らず、ネットワーク上でのアドレス情報を利用してもよい。問い合わせ信号を送信した取得部２１は、一定時間の間、ネットワークに接続された他の情報処理装置から信号を受け付ける状態になる。続いて、取得部２１が問い合わせに対する応答を受信したかを判定する（ステップＳ４０２）。本実施形態では、取得部２１が、ネットワークで接続された他の情報処理装置から送信された問い合わせ信号を受信したかによって判定を行う。なお、認識フラグが「０」であった場合には、情報取得処理（ステップＳ３０５）は実行されないので、問い合わせ信号が送信されることはすなわち認識フラグが「１」であることを示めしている。認識フラグが「１」である情報処理装置があった場合には（ステップＳ４０２でＹＥＳ）、問い合わせ信号に含まれる識別情報に基づいて、認識フラグが「１」である情報処理装置の識別情報と認識スコアを対応させたリストを生成する（ステップＳ４０３）。図２（ｂ）は、実施形態１の情報処理装置１０で生成されるリストの一例であり、情報処理装置１０が、情報処理装置１１（テレビ）及び情報処理装置１２（フォトフレームＢ）から取得した情報が格納されている。取得部２１が生成したリストをＲＡＭ１０３に保持して、情報取得処理（ステップＳ３０５）は終了し、メイン処理にリターンする。一方、認識フラグが「１」である情報処理装置がなかった場合には（ステップＳ４０２でＮｏの場合）、空のリストを生成してＲＡＭ１０３に保持し（ステップＳ４０４）、メイン処理にリターンする。 The main process in the information processing apparatus 10 proceeds to the information acquisition process in step S305. Here, the information acquisition process (step S305) of the present embodiment will be described with reference to the flowchart of FIG. When the information acquisition process (step S305) starts, the acquisition unit 21 first inquires whether there is a device having a recognition flag “1” held among a plurality of information processing devices connected via a network. The signal is transmitted to another information processing apparatus on the network. The inquiry signal transmitted here includes the identification information and the recognition score of the information processing apparatus 10. Note that the identification information here is not limited to the name of the device, but address information on the network may be used. The acquisition unit 21 that has transmitted the inquiry signal is in a state of receiving a signal from another information processing apparatus connected to the network for a certain period of time. Subsequently, it is determined whether the acquisition unit 21 has received a response to the inquiry (step S402). In the present embodiment, the acquisition unit 21 determines whether an inquiry signal transmitted from another information processing apparatus connected via a network has been received. If the recognition flag is “0”, the information acquisition process (step S305) is not executed, so that the inquiry signal is transmitted, that is, the recognition flag is “1”. . If there is an information processing apparatus whose recognition flag is “1” (YES in step S402), based on the identification information included in the inquiry signal, the identification information of the information processing apparatus whose recognition flag is “1” A list corresponding to the recognition scores is generated (step S403). FIG. 2B is an example of a list generated by the information processing apparatus 10 according to the first embodiment. The information processing apparatus 10 acquires the information processing apparatus 11 (television) and the information processing apparatus 12 (photo frame B). Stored information. The list generated by the acquisition unit 21 is held in the RAM 103, the information acquisition process (step S305) ends, and the process returns to the main process. On the other hand, if there is no information processing apparatus whose recognition flag is “1” (No in step S402), an empty list is generated and held in the RAM 103 (step S404), and the process returns to the main process.

ステップＳ３０５からメイン処理に戻ると、表示制御部２６が、ＲＡＭ１０３上に保持されているリストが空かどうかを判断する（ステップＳ３０６）。リストが空であれば（ステップＳ３０６でＹｅｓの場合）、ステップＳ３１２に進み、ユーザによるジェスチャ操作を認識可能となる（ステップＳ３１１）。リストが空ではない場合には（ステップＳ３０６でＮｏの場合）、表示制御部２６は、リストと情報処理装置１０の情報を基に、認識スコアが大きい順に、認識フラグが「１」である全ての情報処理装置の識別情報を上位の候補として表示する画像を生成する。そして、ディスプレイ１３に操作対象選択用画像を表示させる（ステップＳ３０７）。なお、本実施形態では、認識フラグが「１」である全ての情報処理装置の識別情報を表示するが、認識スコアが上位である一部の情報処理装置、例えば３つずつ、５つずつを一覧にして表示することもできる。図６は、本実施形態における３つの情報装置それぞれに、操作対象選択画像が表示された様子を示している。図６では、情報処理装置１１がテレビ６６、情報処理装置１０がフォトフレームＡ６７、情報処理装置１２がフォトフレームＢ６８であり、全ての情報処理装置で同一の画像を表示している。また、カメラ６９〜７１はそれぞれに搭載されたカメラであり、ユーザはこれらを凝視する第１の動作によって、ジェスチャ操作の対象として指定しようとする意思を示す。状態表示領域６１では、ユーザのジェスチャを認識可能な複数の機器が認識されている状態であることを示している。また、アイコン表示領域６２〜６４は、それぞれテレビ、フォトフレームＡ、フォトフレームＢのアイコンが表示される領域を表している。太枠６５は、操作対象として選択する候補を示すフォーカスを表すためものであり、図６では、最も認識スコアが高いテレビ６６のアイコン表示領域６２にフォーカスが当たった状態である。なお、本実施形態の操作対象選択用の画像では、識別情報として情報処理装置の名称を表示したがこれに限らない。例えば、名称ではなく、筐体の画像をそれぞれ識別情報に関連付けた上で表示してもよい。 When returning to the main process from step S305, the display control unit 26 determines whether or not the list stored in the RAM 103 is empty (step S306). If the list is empty (Yes in step S306), the process proceeds to step S312 and the user's gesture operation can be recognized (step S311). When the list is not empty (No in step S306), the display control unit 26, based on the list and the information of the information processing device 10, all the recognition flags are “1” in descending order of the recognition score. An image that displays the identification information of the information processing apparatus as a higher candidate is generated. Then, the operation target selection image is displayed on the display 13 (step S307). In this embodiment, the identification information of all the information processing apparatuses whose recognition flag is “1” is displayed. However, some information processing apparatuses having a higher recognition score, for example, three, five each. It can also be displayed as a list. FIG. 6 shows a state in which the operation target selection image is displayed on each of the three information devices in the present embodiment. In FIG. 6, the information processing device 11 is a television 66, the information processing device 10 is a photo frame A67, and the information processing device 12 is a photo frame B68, and the same image is displayed on all the information processing devices. In addition, the cameras 69 to 71 are cameras mounted on each of them, and the user indicates an intention to designate as a gesture operation target by a first operation of staring at them. The status display area 61 indicates that a plurality of devices that can recognize the user's gesture are recognized. The icon display areas 62 to 64 represent areas where icons of the television, the photo frame A, and the photo frame B are displayed, respectively. A thick frame 65 represents a focus indicating a candidate to be selected as an operation target. In FIG. 6, the icon display area 62 of the television 66 having the highest recognition score is in a focused state. In the image for selecting an operation target of the present embodiment, the name of the information processing apparatus is displayed as the identification information, but the present invention is not limited to this. For example, not the name but the image of the housing may be displayed after being associated with the identification information.

続いて、メイン処理では、操作対象選択処理が開始される（ステップＳ３０８）。ここで、図５のフローチャートを参照して、ユーザがジェスチャ操作によって操作対象を選択する処理を説明する。まず、操作対象選択処理が開始されると、ユーザのジェスチャによる選択操作を認識部２３が認識したかを判定する（ステップＳ５０１）。詳細には、撮像部２２が撮影しているユーザの動画像を認識部２３が解析し、ユーザがジェスチャ操作によって、操作対象選択用画像において表示された操作対象の候補の中から、１つの情報処理装置を選択するために行う第２の動作を認識したか判定する。第２の動作は、ディスプレイに表示されている画像が示す中から、操作対象を選択及び特定するための動作として、全ての情報処理装置に共通して登録されているジェスチャのパターンである。第１の動作を認識した情報処理装置のうち、認識スコアが上位である一部をだけ表示している場合には、第２の動作によって、表示されていない候補を順次表示させることができる。第２の動作としては、例えば、右腕を上下に動かすジェスチャによってフォーカスするアイコン表示領域６２〜６４を変更したり、一定時間以上腕の動きを静止することでフォーカスした対象の選択を決定したり、といったジェスチャを登録しておくことができる。また、表示されるアイコンに番号付し、ユーザが指で示す番号に対応する装置を操作対象として決定するというジェスチャを登録することもできる。これらは一例であり、他のジェスチャであっても構わない。ユーザの動画像の解析により、操作対象を選択するための動作を認識した場合は（ステップＳ５０１でＹｅｓの場合）、認識部２３は、認識されたジェスチャを特定部２５に伝える。そして特定部２５は、ユーザのジェスチャによる操作の内容を特定し、表示制御部２６にディスプレイ１３の表示内容の変更を指示する。そして、最終的に操作対象として選択される情報処理装置が特定されたかを判定する（ステップＳ５０２）。最終的な操作対象が特定されず、候補を選択する操作が継続されている場合は（ステップＳ５０２でＮｏの場合）、特定するための動作を認識するまで待機する。操作対象が特定された場合（ステップＳ５０２でＹｅｓの場合）、特定部２５は、選択された操作対象の識別情報をＲＡＭ１０３上に保持し、メイン処理に返し、操作対象選択処理を終了する（ステップＳ５０３）。本実施形態で説明している例では、ユーザは情報処理装置１１（テレビ６６）を操作対象として選択しようとしている。従って、図６の操作対象選択用画像に対しては、フォーカスするアイコンを変更する必要はなく、フォーカスしたアイコンに対応する情報処理装置を操作対象として選択する動作を行えばよい。 Subsequently, in the main process, an operation target selection process is started (step S308). Here, with reference to the flowchart of FIG. 5, processing in which the user selects an operation target by a gesture operation will be described. First, when the operation target selection process is started, it is determined whether the recognition unit 23 has recognized the selection operation by the user's gesture (step S501). Specifically, the recognition unit 23 analyzes the moving image of the user captured by the imaging unit 22, and one information is selected from the operation target candidates displayed in the operation target selection image by the user's gesture operation. It is determined whether the second operation to be performed for selecting the processing device has been recognized. The second operation is a gesture pattern that is commonly registered in all information processing apparatuses as an operation for selecting and specifying an operation target from among images displayed on the display. In the information processing apparatus that has recognized the first action, when only a part having a higher recognition score is displayed, candidates that are not displayed can be sequentially displayed by the second action. As the second operation, for example, the icon display areas 62 to 64 to be focused are changed by the gesture of moving the right arm up and down, the selection of the focused object is determined by stopping the movement of the arm for a certain time, Can be registered. It is also possible to register a gesture in which a number is assigned to an icon to be displayed and a device corresponding to a number indicated by a user is determined as an operation target. These are only examples, and other gestures may be used. When the operation for selecting the operation target is recognized by the analysis of the moving image of the user (Yes in step S501), the recognition unit 23 transmits the recognized gesture to the specifying unit 25. Then, the specifying unit 25 specifies the content of the operation by the user's gesture and instructs the display control unit 26 to change the display content on the display 13. Then, it is determined whether or not the information processing apparatus that is finally selected as the operation target has been identified (step S502). If the final operation target is not specified and the operation of selecting a candidate is continued (No in step S502), the process waits until an operation for specifying is recognized. When the operation target is specified (Yes in step S502), the specifying unit 25 stores the identification information of the selected operation target on the RAM 103, returns it to the main process, and ends the operation target selection process (step S503). In the example described in the present embodiment, the user intends to select the information processing apparatus 11 (television 66) as an operation target. Therefore, it is not necessary to change the focused icon for the operation target selection image in FIG. 6, and an operation of selecting an information processing apparatus corresponding to the focused icon as the operation target may be performed.

一方、認識部２３によるユーザの動画像の解析の結果、操作対象を選択するための第２の動作を認識しなかった場合（ステップＳ５０１でＮｏの場合）は、操作対象の選択をキャンセルするためのジェスチャ操作を認識したかを判定する（ステップＳ５０４）。例えば、ユーザの意思に途中で変更が生じたり、メイン処理のステップＳ３０２におけるユーザの凝視の認識に誤りがあったりした場合、ユーザが操作対象としたい情報処理装置が、操作対象の候補に含まれていない可能性がある。そのような場合に、操作対象の選択の中止を指示するためのジェスチャも、予め全ての情報処理装置に登録されているものとする。例として、タイムアウトや両腕を交差するようなジェスチャが、操作対象の選択の中止を指示するための操作として予め登録されているものとする。認識部２３が、キャンセル操作を認識しない場合（ステップＳ５０４でＮｏの場合）には、ステップＳ５０１に戻り、ユーザが操作対象を選択するためのジェスチャ操作を認識するまで待機する。一方、認識部２３が、キャンセル操作を認識した場合（ステップＳ５０４でＹｅｓの場合）には、メイン処理のステップＳ３０２に戻り、ユーザによる第１の動作を認識するまで待機する。 On the other hand, if the second motion for selecting the operation target is not recognized as a result of the analysis of the moving image of the user by the recognition unit 23 (No in step S501), the selection of the operation target is canceled. It is determined whether or not the gesture operation has been recognized (step S504). For example, when a change occurs in the user's intention or there is an error in recognition of the user's gaze in step S302 of the main process, the information processing apparatus that the user wants to operate is included in the operation target candidates. It may not be. In such a case, it is assumed that a gesture for instructing cancellation of selection of an operation target is also registered in advance in all information processing apparatuses. As an example, it is assumed that a time-out or a gesture that crosses both arms is registered in advance as an operation for instructing to cancel selection of an operation target. If the recognition unit 23 does not recognize the cancel operation (No in step S504), the process returns to step S501 and waits until the user recognizes a gesture operation for selecting an operation target. On the other hand, when the recognizing unit 23 recognizes the cancel operation (Yes in step S504), the process returns to step S302 of the main process and waits until the first operation by the user is recognized.

操作対象選択処理（ステップＳ３０８）からメイン処理に戻ると、特定部２５が、ＲＡＭ１０３上に保持している識別情報を基に、選択された操作対象が情報処理装置１０自身であったかを判定する（ステップＳ３０９）。選択された情報処理装置が他の情報処理装置の場合（ステップＳ３０９でＮｏの場合）、認識部２３は、認識フラグを「０」にする（ステップＳ３１２）。認識フラグが「０」の間は、認識部２３は、ユーザのジェスチャ操作を認識しないため、ユーザが他の情報処理装置を操作するために行ったジェスチャによって誤動作することはない。続いて、調整部２７は、最終的に選択された情報処理装置よりも、認識スコアが高かった場合、補正値を認識スコアの５％小さくする（ステップＳ３１３）。ただし、選択された情報処理装置よりも認識スコアが低かった場合は、調整は行わない。本実施形態で説明している例では、ユーザは最も認識スコアが高い情報処理装置１１を選択したため、情報処理装置１０では、補正値を調整する処理は行われない。例えば、情報処理装置１０自身の認識スコア「８０」が、情報処理装置１１よりも高かった場合には、調整部２７は補正値（初期値「０」）を５小さくして、「−５」に更新する。次回以降、ユーザの同じ顔の向きが同じであった場合に設定される認識スコアは「７５」となる。 When returning from the operation target selection process (step S308) to the main process, the specifying unit 25 determines whether the selected operation target is the information processing apparatus 10 itself based on the identification information stored in the RAM 103 ( Step S309). When the selected information processing apparatus is another information processing apparatus (No in step S309), the recognition unit 23 sets the recognition flag to “0” (step S312). While the recognition flag is “0”, the recognizing unit 23 does not recognize the user's gesture operation, and therefore does not malfunction due to the gesture performed by the user to operate another information processing apparatus. Subsequently, when the recognition score is higher than the finally selected information processing device, the adjustment unit 27 decreases the correction value by 5% of the recognition score (step S313). However, when the recognition score is lower than that of the selected information processing apparatus, no adjustment is performed. In the example described in the present embodiment, since the user has selected the information processing apparatus 11 having the highest recognition score, the information processing apparatus 10 does not perform the process of adjusting the correction value. For example, when the recognition score “80” of the information processing apparatus 10 itself is higher than that of the information processing apparatus 11, the adjustment unit 27 decreases the correction value (initial value “0”) by 5 to “−5”. Update to From the next time onward, the recognition score set when the orientation of the same face of the user is the same is “75”.

一方、選択されたのが情報処理装置１０自身である場合（ステップＳ３０９でＹｅｓの場合）、調整部２７は、記憶部２０に記憶されている情報処理装置１０の認識スコアを求めるための補正値を、大きくする。ただし、自身の認識スコアが最も高かった場合、調整は行わず、そうでない場合は、補正値を認識スコアの５％上げる（ステップＳ３１０）。例えば、図６の操作対象選択用画像に対する操作対象選択処理で、情報処理装置１０（フォトフレームＡ６７）が選択されていたとすると、元々の認識スコアは「８０」であったので、調整部２７は補正値（初期値「０」）を４に更新する。従って、次回、ユーザが同じ顔の向きで情報処理装置１０を凝視した際の認識スコアは「８４」となる。次回以降、ユーザの顔の向きが同じであった場合に設定される認識スコアは８４となる。そして、ユーザによるジェスチャ操作を認識可能な状態（認識フラグが「１」）を維持して、本実施形態によるメイン処理を終了する。なお、本実施形態では、調整部２７による補正値の調整量は、認識スコアの５％としたが、これは一例でありこれに限らない。本実施形態によって、ユーザのジェスチャによる操作対象として特定された情報処理装置では、以降認識部２３が認識したジェスチャに対応する処理が実行される。 On the other hand, when the information processing apparatus 10 itself has been selected (Yes in step S309), the adjustment unit 27 calculates a correction value for obtaining the recognition score of the information processing apparatus 10 stored in the storage unit 20. Increase the size. However, if the recognition score of itself is the highest, no adjustment is made, otherwise, the correction value is increased by 5% of the recognition score (step S310). For example, if the information processing apparatus 10 (photo frame A67) is selected in the operation target selection process for the operation target selection image in FIG. 6, the original recognition score is “80”, and therefore the adjustment unit 27 The correction value (initial value “0”) is updated to 4. Therefore, the recognition score when the user stares at the information processing apparatus 10 with the same face orientation next time is “84”. After the next time, the recognition score set when the user's face direction is the same is 84. And the state (recognition flag is "1") which can recognize a user's gesture operation is maintained, and the main process by this embodiment is complete | finished. In the present embodiment, the adjustment amount of the correction value by the adjustment unit 27 is set to 5% of the recognition score, but this is only an example and is not limited thereto. In the information processing apparatus specified as the operation target by the user's gesture according to the present embodiment, processing corresponding to the gesture recognized by the recognition unit 23 is subsequently executed.

以上のように、実施形態１は、複数の情報処理装置がそれぞれユーザによって凝視されたことを並行して略同時に認識し、ユーザによるジェスチャで操作される対象の候補となった場合にも、候補を表示部に表示して操作対象をユーザに選択させる。選択されなかった情報処理装置では、操作対象選択後はユーザのジェスチャを認識しないため、ユーザは確実に操作したい情報処理装置に対してジェスチャ操作ができると共に、他の情報処理装置が誤ってジェスチャを認識して誤動作することがない。また、ユーザによって凝視されたことを認識した際に、ユーザの視線方向を評価した認識スコアを設定し、認識スコアの高い順に、操作対象候補の情報処理装置を上位の候補として表示するので、ユーザが操作対象を少ない操作数で選択できる可能性が高い。認識スコアが最も高い候補が選択されなかった場合には、次回以降の処理における認識スコアを補正する補正値を設定するので、認識精度の誤差を修正し誤った認識が繰り返されることを防ぐことができる。 As described above, the first embodiment also recognizes that a plurality of information processing devices have been stared by the user in parallel at substantially the same time, and can be a candidate for a target operated by a user's gesture. Is displayed on the display unit to allow the user to select an operation target. Since the information processing apparatus that has not been selected does not recognize the user's gesture after the operation target is selected, the user can perform a gesture operation on the information processing apparatus that the user wants to operate reliably, and another information processing apparatus erroneously performs the gesture. It does not recognize and malfunction. In addition, when recognizing that the user has stared, a recognition score that evaluates the user's line-of-sight direction is set, and information processing devices that are candidates for operation are displayed as higher candidates in order of recognition score. There is a high possibility that the operation target can be selected with a small number of operations. When the candidate with the highest recognition score is not selected, a correction value for correcting the recognition score in the subsequent processing is set, so that the error in the recognition accuracy is corrected and erroneous recognition is prevented from being repeated. it can.

＜変形例１＞
ここで、本発明の変形例１について図面を参照して詳細に説明する。なお、実施形態１に準ずる箇所については、説明を省略する。 <Modification 1>
Here, Modification 1 of the present invention will be described in detail with reference to the drawings. In addition, description is abbreviate | omitted about the location corresponding to Embodiment 1. FIG.

実施形態１では、図６の操作対象選択用画像のように表示された候補から特定の情報処理装置を選択させるために、ユーザによるジェスチャ操作を認識する。この際、例えば、ユーザによる第１の動作（凝視）を認識した後で、ユーザとカメラとの間に障害物が発生すると、操作対象の候補として表示されているにも関わらず、ユーザの第２の動作（ジェスチャ操作）が認識されないない情報処理装置が発生してしまう。そこで変形例１では、ユーザによる凝視を認識し操作対象選択用画像を出力したにも関わらず、続くジェスチャ操作を認識しなかった場合には、取得部２１が、認識フラグが「１」である他の装置から、ユーザによる操作の結果を取得するものである。 In the first embodiment, a gesture operation by a user is recognized in order to select a specific information processing apparatus from candidates displayed like the operation target selection image in FIG. At this time, for example, when an obstacle occurs between the user and the camera after recognizing the first action (gaze) by the user, the user's first action is displayed despite being displayed as a candidate for the operation target. An information processing apparatus in which the second operation (gesture operation) is not recognized occurs. Therefore, in the first modification, when the subsequent gesture operation is not recognized even though the user's gaze is recognized and the operation target selection image is output, the acquisition unit 21 has the recognition flag “1”. The result of the operation by the user is obtained from another device.

変形例１では、記憶部２０はジェスチャ操作フラグを保持する。ジェスチャ操作フラグとは、ユーザによるジェスチャ操作を認識したかを表すフラグである。ジェスチャ操作フラグは、初期値が「０」であり、ユーザによるジェスチャ操作を認識すると「１」となる。 In the first modification, the storage unit 20 holds a gesture operation flag. The gesture operation flag is a flag indicating whether or not a user's gesture operation has been recognized. The initial value of the gesture operation flag is “0”, and becomes “1” when a gesture operation by the user is recognized.

図７は、変形例１におけるメインの処理の流れを示すフローチャートである。実施形態１１との違いは、ステップＳ３０８Ａの操作対象選択処理である。ステップＳ３０１〜ステップＳ３０７、及びステップＳ３０９以降の処理は、実施形態１と同様であるので、説明を省略する。 FIG. 7 is a flowchart showing a flow of main processing in the first modification. The difference from the eleventh embodiment is the operation target selection process in step S308A. Since the processing after step S301 to step S307 and step S309 is the same as that of the first embodiment, the description thereof is omitted.

図８は、操作対象選択処理（ステップＳ３０８Ａ）の詳細を示すフローチャートである。取得部２１は、ネットワークを介して接続されている情報処理装置のうち認識フラグが「１」である情報処理装置のジェスチャ操作フラグを取得し、ＲＡＭ１０３に保持する。続いて、記憶部２０に保持されているジェスチャ操作フラグを「０」に初期化する（ステップＳ８０１）。 FIG. 8 is a flowchart showing details of the operation target selection process (step S308A). The acquisition unit 21 acquires a gesture operation flag of an information processing apparatus whose recognition flag is “1” among information processing apparatuses connected via the network, and stores the gesture operation flag in the RAM 103. Subsequently, the gesture operation flag held in the storage unit 20 is initialized to “0” (step S801).

続いて、認識部２３は、実施形態１と同様、ユーザが操作対象とする装置を選択するために行うジェスチャ操作を認識したかどうかを判定する（ステップＳ８０２）。選択操作が認識されなかった場合（ステップＳ８０２でＮｏの場合）、実施形態１と同様、操作対象の選択をキャンセルするためのジェスチャ操作を認識したかを判定する（ステップＳ８０３）。例えば、タイムアウトや両腕を交差するようなジェスチャが、操作対象の選択の中止を指示するための操作として予め登録されているものとする。認識部２３が、そのようなキャンセル操作を認識した場合（ステップＳ８０３でＹｅｓの場合）には、メイン処理のステップＳ３０２に戻り、ユーザによる第１の動作を認識するまで待機する。一方、認識部２３が、キャンセル操作を認識しない場合（ステップＳ８０３でＮｏの場合）には、ステップＳ８０４に進む。そして、ネットワークで接続された情報処理装置の中に、認識フラグが「１」であってかつジェスチャ操作フラグが「１」である装置が存在するかどうかを問い合わせる信号を送信する（ステップＳ８０４）。ここで送信される信号には、送信元となる情報処理装置の識別情報が含まれる。次に、問い合わせに対して応答があったかを判定する（ステップＳ８０５）。応答があった場合（ステップＳ８０５でＹｅｓの場合）には、応答を返した情報処理装置の認識部２３が認識したからジェスチャ操作を取得し（ステップＳ８０６）、ステップＳ８１０に進む。 Subsequently, as in the first embodiment, the recognition unit 23 determines whether or not the user has recognized a gesture operation performed to select a device to be operated (step S802). If the selection operation is not recognized (No in step S802), it is determined whether a gesture operation for canceling the selection of the operation target is recognized (step S803), as in the first embodiment. For example, it is assumed that a gesture such as a time-out or crossing both arms is registered in advance as an operation for instructing to cancel the selection of the operation target. When the recognition unit 23 recognizes such a cancel operation (Yes in step S803), the process returns to step S302 of the main process and waits until the first operation by the user is recognized. On the other hand, when the recognition unit 23 does not recognize the cancel operation (No in step S803), the process proceeds to step S804. Then, a signal for inquiring whether there is an apparatus having the recognition flag “1” and the gesture operation flag “1” among the information processing apparatuses connected via the network is transmitted (step S804). The signal transmitted here includes identification information of the information processing apparatus that is the transmission source. Next, it is determined whether or not there is a response to the inquiry (step S805). If there is a response (Yes in step S805), the gesture unit is acquired because the recognition unit 23 of the information processing apparatus that returned the response has recognized (step S806), and the process proceeds to step S810.

一方、ステップＳ８０２で選択操作を認識した場合（ステップＳ８０２でＹｅｓの場合）、ジェスチャ操作フラグを「１」にする（ステップＳ８０７）。そして、ネットワークで接続された他の情報処理装置からの問い合わせ信号（ステップＳ８０４で送信される信号）を、取得部２１が取得したかを判定する（ステップＳ８０８）。問い合わせ信号があれば（ステップＳ８０８でＹｅｓの場合）、問い合わせ信号を送信した情報処理装置を、ジェスチャ操作を認識した結果を送信する送信先に設定する（ステップＳ８０９）。以降、認識部２３がジェスチャ操作を認識する度にネットワークを介して操作情報を送信する。問い合わせ信号を取得しない場合（ステップＳ８０８でＮｏの場合）には、送信先を設定しない。そして、実施形態１と同様、認識部２３がユーザのジェスチャ操作を認識した結果、あるいは取得部２１から取得した操作情報を基に、最終的に操作対象として選択される情報処理装置が特定されたかを判定する（ステップＳ８１０）。最終的な操作対象が特定されず、候補を選択する操作が継続されている場合は（ステップＳ８１０でＮｏの場合）、特定するための動作を認識するまで待機する。操作対象が特定された場合（ステップＳ８１０でＹｅｓの場合）、特定部２５は、選択された操作対象の識別情報をＲＡＭ１０３上に保持し、メイン処理に返し、操作対象選択処理を終了する（ステップＳ８１１）。 On the other hand, when the selection operation is recognized in step S802 (Yes in step S802), the gesture operation flag is set to “1” (step S807). And it is determined whether the acquisition part 21 acquired the inquiry signal (signal transmitted by step S804) from the other information processing apparatus connected by the network (step S808). If there is an inquiry signal (Yes in step S808), the information processing apparatus that transmitted the inquiry signal is set as a transmission destination for transmitting the result of recognizing the gesture operation (step S809). Thereafter, every time the recognition unit 23 recognizes the gesture operation, the operation information is transmitted via the network. If the inquiry signal is not acquired (No in step S808), the transmission destination is not set. Then, as in the first embodiment, whether the information processing apparatus to be finally selected as the operation target is identified based on the result of the recognition unit 23 recognizing the user's gesture operation or the operation information acquired from the acquisition unit 21. Is determined (step S810). When the final operation target is not specified and the operation for selecting a candidate is continued (No in step S810), the process waits until the operation for specifying is recognized. When the operation target is specified (Yes in step S810), the specifying unit 25 stores the identification information of the selected operation target on the RAM 103, returns it to the main process, and ends the operation target selection process (step S1). S811).

変形例１によれば、操作対象選択用画像を提示した後、ユーザに操作対象の装置を特定させる際に、ジェスチャの認識が困難になった場合あっても、操作対象の候補である全ての情報処理装置に対して操作を行うことが可能となる。なお、実施形態１及びその変形例１で説明した図１（ａ）の情報処理システムでは、システムを構成する全ての情報処理装置にカメラが搭載され、それぞれが自装置のカメラによって撮影された動画像を基にして、ユーザの動作を認識していた。しかし、変形例１のように、他の情報処理装置に対して、認識部２３が認識した第１及び第２操作の内容を送信することで、カメラを有さない情報処理装置においても、ジェスチャ操作を行うことも可能になる。その場合は、カメラを有する情報処理装置が、ネットワークに接続されたジェスチャ操作可能な全ての情報処理装置に位置関係を把握し、ユーザの顔の向きから、操作対象の候補とその認識スコアを設定する。 According to the first modification example, after the operation target selection image is presented, when the user specifies the operation target device, even if it is difficult to recognize the gesture, all the operation target candidates are displayed. It becomes possible to operate the information processing apparatus. Note that in the information processing system of FIG. 1A described in the first embodiment and the modification 1 thereof, the cameras are mounted on all the information processing apparatuses that form the system, and each of the videos is captured by the camera of the own apparatus. The user's movement was recognized based on the image. However, as in Modification 1, by transmitting the contents of the first and second operations recognized by the recognition unit 23 to another information processing apparatus, even in an information processing apparatus that does not have a camera, a gesture can be performed. It is also possible to perform operations. In that case, the information processing device having a camera grasps the positional relationship among all the information processing devices connected to the network that can perform gesture operations, and sets the operation target candidate and its recognition score from the orientation of the user's face. To do.

［実施形態２］
次に、本発明の実施形態２について図面を参照して詳細に説明する。なお、実施形態１に準ずる箇所については、説明を省略する。 [Embodiment 2]
Next, Embodiment 2 of the present invention will be described in detail with reference to the drawings. In addition, description is abbreviate | omitted about the location corresponding to Embodiment 1. FIG.

実施形態１では、ユーザをジェスチャ認識の対象と判定する際に視線の方向を用いた。本実施形態は、視線の方向に代わって、ユーザが指差した方向を用いてジェスチャ認識の対象を判定するものである。なお、実施形態２においても、図１（ａ）の情報処理システムにおける情報処理装置１０を主として説明するが、情報処理装置１１〜１２にいても同様に処理が実行されるものとする。 In the first embodiment, the direction of the line of sight is used when the user is determined as a gesture recognition target. In this embodiment, instead of the direction of the line of sight, the target of gesture recognition is determined using the direction pointed by the user. In the second embodiment, the information processing apparatus 10 in the information processing system of FIG. 1A will be mainly described, but it is assumed that the same processing is executed in the information processing apparatuses 11 to 12 as well.

実施形態２における、ハードウェアの構成図は実施形態１と同様、図１（ｂ）に示される。また、本実施形態における機能の構成図も、第１の実施形態同様に図２（ａ）で表される。ただし、実施形態１との違いは、認識部２３と設定部２４の機能、及び記憶部２０に記憶されたテンプレートの内容である。本実施形態の認識部２３は、ユーザによる第１の動作として、実施形態１ではユーザによる凝視を認識するのに替わり、ユーザによる指差し方向を認識する。従って、撮像部２２により撮影されたユーザの動画像から、ユーザが指差した方向を解析し情報処理装置が操作対象として選択されているかどうかを判定する。ユーザの指差した方向を解析するには、ユーザが指差した画像と予め記憶部２０に用意された複数のテンプレート画像とのマッチング度を求めることで、ユーザが指差した方向を特定する。そして、設定部２４は、認識された指差し方向に対して認識スコアを特定する。実施形態１では、ユーザがカメラ１６を凝視する動作を第１の動作としていたが、実施形態２では、ディスプレイ１３の中心部を指差す動作を第１の動作として登録するものとする。ここでテンプレートは、ユーザがディスプレイ１３の中心部分を正面から指差した状態を、カメラ１６の位置から撮影した場合を示す画像情報を基準として、角度９０度の状態として定義している。そして、左右９０度ずつ０〜１８０度の範囲を１０度間隔の角度から指を差す１９パターンの状態の画像情報を基に用意されている。なお、このような１９パターンのテンプレートは一例であり、本発明の実施形態はこれに限られない。例えば、ユーザが右手を使った場合と左手を使った場合を想定し、さらに複数のパターンを用意してもいい。 The hardware configuration diagram in the second embodiment is shown in FIG. 1B as in the first embodiment. In addition, the functional configuration diagram in the present embodiment is also represented in FIG. 2A as in the first embodiment. However, the difference from the first embodiment is the functions of the recognition unit 23 and the setting unit 24 and the contents of the template stored in the storage unit 20. The recognizing unit 23 according to the present embodiment recognizes the pointing direction by the user instead of recognizing the gaze by the user in the first embodiment as the first operation by the user. Therefore, the direction pointed by the user is analyzed from the moving image of the user captured by the imaging unit 22 to determine whether the information processing apparatus is selected as the operation target. In order to analyze the direction pointed by the user, the direction pointed by the user is specified by obtaining the degree of matching between the image pointed by the user and a plurality of template images prepared in advance in the storage unit 20. Then, the setting unit 24 specifies a recognition score for the recognized pointing direction. In the first embodiment, the operation in which the user stares at the camera 16 is the first operation. However, in the second embodiment, the operation pointing at the center of the display 13 is registered as the first operation. Here, the template defines a state in which the user points at the center of the display 13 from the front as a state at an angle of 90 degrees with reference to image information indicating a case where the image is taken from the position of the camera 16. And it is prepared based on the image information of the state of 19 patterns in which the finger is pointed from the angle of 10 degrees in the range of 0 to 180 degrees on the left and right. Note that such a 19-pattern template is an example, and the embodiment of the present invention is not limited to this. For example, assuming that the user uses the right hand and the left hand, a plurality of patterns may be prepared.

図９は、本実施形態におけるメイン処理のフローチャートである。実施形態１との違いは、ステップＳ３０２Ｂ及びステップＳ３０４Ｂである。なお、その他のステップの各処理は、実施形態１と同様に実行されるため、説明を省略する。 FIG. 9 is a flowchart of the main process in the present embodiment. The difference from the first embodiment is Step S302B and Step S304B. In addition, since each process of another step is performed similarly to Embodiment 1, description is abbreviate | omitted.

ステップＳ３０２Ｂでは、実施形態１と同様、撮像部２２がユーザの動画像を撮影し、その撮影した動画像のフレーム毎を認識部２３が解析し、ユーザによって情報処理装置１０をジェスチャ操作の対象とするために行う第１の動作がなされたかを判定する。ここで、本実施形態では、第１の動作として、認識部２３が、ユーザによってカメラ１６が指差されていることを認識する（ステップＳ３０２Ｂ）。その際、認識部２３は、撮像部２２がユーザを撮影した動画像のフレームうち、ユーザの上半身部分の画像を解析し、ユーザの腕及び指の向きが、カメラ１６が存在する方向に一致した状態で一定時間以上続く場合に、ユーザの指差しという第１の動作を認識する。ユーザによって指を差されていることを認識しなかった場合（ステップＳ３０２ＢでＮＯの場合）、ステップＳ３０２Ｂに戻って処理を繰り返す。ユーザによって指を差されていることを認識した場合（ステップＳ３０２ＢでＹＥＳの場合）、認識部２３は認識フラグを「１」に更新してＲＡＭ１０３上に保持する（ステップＳ３０３）。 In step S302B, as in the first embodiment, the imaging unit 22 captures a moving image of the user, the recognition unit 23 analyzes each frame of the captured moving image, and the user sets the information processing apparatus 10 as a gesture operation target. It is determined whether the first operation to be performed is performed. Here, in the present embodiment, as the first operation, the recognition unit 23 recognizes that the camera 16 is pointed by the user (step S302B). At that time, the recognizing unit 23 analyzes the image of the upper body part of the user among the frames of the moving image captured by the imaging unit 22, and the direction of the user's arm and finger matches the direction in which the camera 16 exists. When the state continues for a certain period of time or longer, the first action of pointing the user is recognized. If it is not recognized that the user is pointing a finger (NO in step S302B), the process returns to step S302B and the process is repeated. When recognizing that the user is pointing a finger (YES in step S302B), the recognizing unit 23 updates the recognition flag to “1” and holds it in the RAM 103 (step S303).

そして、設定部２４は、ユーザの画像を基に指差し方向を判断し、その結果から認識スコアを算出する。これにより、ユーザがジェスチャによって複数の情報処理装置が並べられているような環境においても、ユーザの意思を推定して、複数の候補に対し最も操作対象である可能性が高い順を設定することになる。ステップＳ３０１のキャリブレーションによって補正値が設定されていた場合には、補正値を用いて認識スコアを補正し、ＲＡＭ１０３上に保持する（ステップＳ３０４Ｂ）。本実施形態では、上述したように、情報処理装置１０のカメラ１６を真正面から指差す角度を９０度と定義して、指差し方向に対して設定する認識スコアの基準としている。そして、１９パターンの方向を指差した場合のユーザの様子を示すテンプレートと、カメラが撮影した画像とのマッチング度を求め、最もマッチング度の高かったテンプレートを基に認識スコアを算出する。 Then, the setting unit 24 determines the pointing direction based on the user image, and calculates a recognition score from the result. As a result, even in an environment where a plurality of information processing devices are arranged by a user's gesture, the user's intention is estimated, and the order that is most likely to be the operation target is set for a plurality of candidates. become. If a correction value has been set by the calibration in step S301, the recognition score is corrected using the correction value and stored on the RAM 103 (step S304B). In the present embodiment, as described above, the angle at which the camera 16 of the information processing apparatus 10 is pointed from the front is defined as 90 degrees, and is used as a reference for the recognition score set for the pointing direction. Then, the degree of matching between the template indicating the state of the user when pointing in the direction of the 19 patterns and the image captured by the camera is obtained, and the recognition score is calculated based on the template having the highest matching degree.

そして、以降の処理ステップによって、実施形態１と同様、優先順として認識スコアが高い順に一覧にされたジェスチャ操作を認識可能な操作対象の候補の中から、第２の動作となるジェスチャ操作によって、ユーザに所望とする情報処理装置を特定させる。 Then, in the subsequent processing steps, as in the first embodiment, the gesture operation that is the second action is selected from the operation target candidates that can recognize the gesture operations listed in order of the recognition score as the priority order. Let the user specify the desired information processing apparatus.

以上説明したように、本実施形態によれば、複数の情報処理装置がそれぞれユーザによって指差されたことを同時に認識し、ユーザによるジェスチャで操作される対象の候補となった場合にも、候補を表示して操作対象をユーザに選択させる。選択されなかった情報処理装置では、操作対象選択後はユーザのジェスチャを認識しないため、ユーザは確実に操作したい情報処理装置に対してジェスチャ操作を行うことができると共に、他の情報処理装置が誤ってジェスチャを認識して誤動作することがない。また、ユーザによって指を差されたことを認識した際に、ユーザの指差し方向を評価した認識スコアを設定し、認識スコアの高い順に、操作対象候補の情報処理装置を上位の候補として表示する。これにより、ユーザは、意図した操作対象を少ない操作数で選択できる可能性が高い。認識スコアが最も高い候補が選択されなかった場合には、次回以降の処理における認識スコアを補正する補正値を設定するので、認識精度の誤差を修正し誤った認識が繰り返されることを防ぐことができる。 As described above, according to the present embodiment, even when a plurality of information processing devices are simultaneously recognized as being pointed by the user and become candidates for a target operated by a user's gesture, Is displayed and the user selects the operation target. Since the information processing apparatus that has not been selected does not recognize the user's gesture after the operation target is selected, the user can reliably perform the gesture operation on the information processing apparatus that the user wants to operate, and other information processing apparatuses may be erroneous. It will not malfunction by recognizing gestures. Also, when recognizing that the user has pointed a finger, a recognition score that evaluates the pointing direction of the user is set, and information processing devices that are operation target candidates are displayed as higher candidates in descending order of the recognition score. . Thereby, it is highly possible that the user can select the intended operation target with a small number of operations. When the candidate with the highest recognition score is not selected, a correction value for correcting the recognition score in the subsequent processing is set, so that the error in the recognition accuracy is corrected and erroneous recognition is prevented from being repeated. it can.

また、本実施形態における第１の動作は、ユーザが情報処理装置のディスプレイを指差す動作とした。指を差すという動作は、ユーザにとっては特定の物を示すために用いる最も直感的で簡単な動作の１つである。ユーザが操作したい情報処理装置は、ユーザがディスプレイに表示される映像を見たいと所望している情報処理装置である可能性が高いため、本実施形態によれば、見たいディスプレイを指差すというより直感的な動作で、操作対象を選択することが可能になる。ユーザは、ディスプレイから視線を外してカメラを凝視する必要はなくなり、デザイン上カメラの位置がわかりにくい情報処理装置に対しても、選択操作がしやすくなる。また、左右だけでなく上下方向にも複数の角度からユーザが情報処理装置を指差している状態をテンプレートとして保持していれば、例えばテレビの上にフォトフレームが設置されているなど、上下に設置された複数の情報処理装置に対しても、本発明が適応できる。 In addition, the first operation in the present embodiment is an operation in which the user points to the display of the information processing apparatus. The action of pointing a finger is one of the most intuitive and simple actions for a user to indicate a specific object. The information processing device that the user wants to operate is likely to be the information processing device that the user wants to view the video displayed on the display, and according to the present embodiment, the user points to the display that he wants to see. The operation target can be selected with a more intuitive operation. The user does not need to remove the line of sight from the display and stare at the camera, and the user can easily select an information processing apparatus whose design makes it difficult to determine the position of the camera. Also, if the user is pointing at the information processing device from multiple angles in the vertical direction as well as the left and right as a template, for example, a photo frame is installed on the TV The present invention can also be applied to a plurality of installed information processing apparatuses.

なお、実施形態２においても、実施形態１と同様に、ユーザによる指差しを認識し、操作対象選択用画像を表示した後、取得部２１が、認識フラグが「１」である他の装置から、ユーザによる操作の結果を取得する変形例を用いることができる。このような変形例によれば、操作対象選択用画像を提示した後で、ユーザに操作対象の装置を特定させる際に、ジェスチャの認識が困難になった場合あっても、操作対象の候補である全ての情報処理装置に対して操作を行うことが可能となる。 In the second embodiment, as in the first embodiment, after the pointing by the user is recognized and the operation target selection image is displayed, the acquisition unit 21 receives the information from another device whose recognition flag is “1”. The modification which acquires the result of operation by a user can be used. According to such a modified example, even when it is difficult to recognize a gesture when the user specifies the operation target device after the operation target selection image is presented, It becomes possible to perform operations on all information processing apparatuses.

［実施形態３］
次に、本発明の実施形態３について図面を参照して詳細に説明する。なお、実施形態２と同様、実施形態１に準ずる箇所については、説明を省略する。 [Embodiment 3]
Next, Embodiment 3 of the present invention will be described in detail with reference to the drawings. Note that, as in the second embodiment, description of portions that are the same as in the first embodiment is omitted.

実施形態１及び実施形態２では、操作対象選の候補を表示する際には認識スコアに基づいた順で情報処理装置名を表示したが、本実施形態では認識スコアに代わって、ユーザと情報処理装置との距離に基づいて情報処理装置名を並び変えるものである。なお、実施形態３においても、図１（ａ）の情報処理システムにおける情報処理装置１０を主として説明するが、情報処理装置１１〜１２にいても同様に処理が実行されるものとする。 In the first embodiment and the second embodiment, the names of the information processing apparatuses are displayed in the order based on the recognition score when displaying the operation target selection candidates. In this embodiment, instead of the recognition score, the user and the information processing information are displayed. The information processing apparatus names are rearranged based on the distance to the apparatus. In the third embodiment, the information processing apparatus 10 in the information processing system of FIG. 1A will be mainly described, but it is assumed that the same processing is executed in the information processing apparatuses 11 to 12 as well.

実施形態３についても、ハードウェア構成図は実施形態１と同様に図１（ｂ）に示される。図１０（ａ）は、本実施形態における情報処理装置１０の機能の構成図である。実施形態１との違いは、認識部２３、調整部２７が無く、測定部２８が追加されていることである。測定部２８は、情報処理装置からユーザまでの距離を赤外線センサで測定する。なお、距離の測定するためセンサとしては、例えば、超音波センサ、深度センサ、光センサを用いてもよい。また、実施形態３の情報処理装置は、記憶部２０に、ユーザが情報処理装置を視聴するのに最適な距離を、所定の視聴距離情報として保持している。 As for the third embodiment, the hardware configuration diagram is shown in FIG. FIG. 10A is a functional configuration diagram of the information processing apparatus 10 in the present embodiment. The difference from the first embodiment is that a recognition unit 23 and an adjustment unit 27 are not provided, and a measurement unit 28 is added. The measurement unit 28 measures the distance from the information processing apparatus to the user with an infrared sensor. As a sensor for measuring the distance, for example, an ultrasonic sensor, a depth sensor, or an optical sensor may be used. Further, the information processing apparatus according to the third embodiment stores, in the storage unit 20, the optimum distance for the user to view the information processing apparatus as predetermined viewing distance information.

図１１は、本実施形態におけるメイン処理を示すフローチャートである。まず、取得部２１はネットワークで接続している全ての情報処理装置から、それぞれの識別情報とともに最適な視聴距離を取得する（ステップＳ３０１Ｃ）。続いて、撮像部２２はユーザの動画像を撮影し、認識部２３は、撮影した動画像のフレームを解析して、ユーザによる第１の動作を認識したかを判定する（ステップＳ３０２）。実施形態３では、第１の動作は実施形態１と同様に、ユーザによる第１の動作は、情報処理装置１０のカメラ１６を凝視する動作とする。ユーザによる第１の動作を認識しなかった場合（ステップＳ３０２でＮｏの場合）、ステップＳ３０２に戻って認識するまで待機する。ユーザによる第１の動作を認識した場合（ステップＳ３０２でＹｅｓの場合）、認識部２３は認識フラグを「１」にする（ステップＳ３０３）。続いて、測定部２８がユーザと情報処理装置１０との距離を測定（ステップＳ３０４Ｃ）する。次に、取得部２１は情報取得処理を行い（ステップＳ３０５Ｃ）、ネットワークで接続されている情報処理装置のうち、ジェスチャ認識の対象となった情報処理装置リストを生成しＲＡＭ１０３に保持する。本実施形態における情報取得処理（ステップＳ３０５Ｃ）は、実施形態１における情報取得処理（ステップＳ３０５）に準じるが、認識スコアに代わって、ユーザと情報処理装置との距離情報を用いる点が異なる。すなわち、ここでリスト化される情報は、情報処理装置の識別情報（名称）とステップＳ３０４Ｃで測定したユーザとの距離である。なお、識別情報はネットワークにおけるアドレス情報を利用してもよい。取得部２１は、情報取得処理（ステップＳ３０５Ｃ）からメイン処理に戻ると、表示制御部２６が、ＲＡＭ１０３上に保持されているリストが空かどうかを判断する（ステップＳ３０６）。情報処理装置リストが空の場合（ステップＳ３０６でＮｏの場合）、他の情報処理装置がジェスチャ認識の対象として認識されていない為、表示制御部２６によりジェスチャ操作が可能な状態にする（ステップＳ３１１）。リストが空ではない場合（ステップＳ３０６でＹｅｓの場合）、表示制御部２６は、ステップＳ３０５Ｃで取得したユーザとの距離と、ステップＳ３０１Ｃにより取得した所定の視聴距離との差分が小さい順番に情報処理装置名を一覧にした画面を生成する。生成した操作対象選択用の画像を、ディスプレイ１３に表示させる（ステップＳ３０７Ｃ）。 FIG. 11 is a flowchart showing the main processing in the present embodiment. First, the acquisition unit 21 acquires the optimum viewing distance together with the identification information from all information processing apparatuses connected via the network (step S301C). Subsequently, the imaging unit 22 captures a moving image of the user, and the recognition unit 23 analyzes the frame of the captured moving image to determine whether the first operation by the user has been recognized (step S302). In the third embodiment, similarly to the first embodiment, the first operation by the user is an operation of staring at the camera 16 of the information processing apparatus 10. If the first operation by the user is not recognized (No in step S302), the process returns to step S302 and waits until it is recognized. If the first operation by the user is recognized (Yes in step S302), the recognition unit 23 sets the recognition flag to “1” (step S303). Subsequently, the measurement unit 28 measures the distance between the user and the information processing apparatus 10 (step S304C). Next, the acquisition unit 21 performs information acquisition processing (step S305C), generates an information processing device list that is a target for gesture recognition among the information processing devices connected via the network, and stores the information processing device list in the RAM 103. The information acquisition process (step S305C) in the present embodiment is similar to the information acquisition process (step S305) in the first embodiment, but differs in that distance information between the user and the information processing apparatus is used instead of the recognition score. That is, the information listed here is the distance between the identification information (name) of the information processing apparatus and the user measured in step S304C. The identification information may use address information in the network. When the acquisition unit 21 returns to the main process from the information acquisition process (step S305C), the display control unit 26 determines whether or not the list held on the RAM 103 is empty (step S306). If the information processing device list is empty (No in step S306), since the other information processing device is not recognized as a gesture recognition target, the display control unit 26 makes a gesture operation possible (step S311). ). When the list is not empty (Yes in step S306), the display control unit 26 performs information processing in ascending order of difference between the distance from the user acquired in step S305C and the predetermined viewing distance acquired in step S301C. Generate a screen listing the device names. The generated operation target selection image is displayed on the display 13 (step S307C).

図１０（ｂ）は本実施形態において、ユーザによる凝視を認識し、操作対象の候補となった情報処理装置１０（フォトフレームＡ）、情報処理装置１１（テレビ）、情報処理装置１２（フォトフレームＢ）の最適な視聴距離及びユーザとの距離の一例である。テレビの最適な視聴距離は２ｍと設定されており、フォトフレームの最適な視聴距離は０．５ｍと設定されている。また、ユーザとテレビの距離は２ｍ、ユーザとフォトフレームＡの距離も２ｍ、ユーザとであったとする。この場合、ユーザとの距離と視聴距離との差分は、テレビは０ｍ、フォトフレームは１．５ｍである為、操作対象の候補を表示する画像では、テレビ、フォトフレームの順番に表示される。同順となるフォトフレームＡとフォトフレームＢは、識別情報に基づき名称順やアドレス順に表示してもよいし、並列関係を示すように横並び等でアイコンを表示してもよい。 FIG. 10B shows an information processing apparatus 10 (photo frame A), an information processing apparatus 11 (television), and an information processing apparatus 12 (photo frame) that recognize the gaze by the user and are candidates for operation in this embodiment. It is an example of the optimal viewing distance of B) and the distance with a user. The optimum viewing distance for television is set to 2 m, and the optimum viewing distance for photo frames is set to 0.5 m. Further, it is assumed that the distance between the user and the television is 2 m, and the distance between the user and the photo frame A is also 2 m. In this case, the difference between the distance from the user and the viewing distance is 0 m for the television and 1.5 m for the photo frame. Therefore, in the image displaying the operation target candidates, the images are displayed in the order of the television and the photo frame. The photo frame A and the photo frame B in the same order may be displayed in order of name or address based on the identification information, or icons may be displayed side by side to indicate a parallel relationship.

続いて、実施形態１と同様の操作対象選択処理が実行され（ステップＳ３０８）、ユーザが選択した情報処理装置の識別情報を取得する。続いて、特定部２５は、ＲＡＭ１０３上に保持している選択された情報処理装置の識別情報を基に、選択された操作対象が情報処理装置１０自身であったかを判定する（ステップＳ３０９）。ユーザが選択した情報処理装置が自身の場合（ステップＳ３０９でＹｅｓの場合）、ジェスチャ操作が可能な状態を維持する（ステップＳ３１０）。選択された情報処理装置が他の情報処理装置の場合（ステップＳ３０９でＮｏの場合）、認識部２３は、認識フラグを「０」にする（ステップＳ３１２）。 Subsequently, the same operation target selection process as that of the first embodiment is executed (step S308), and the identification information of the information processing apparatus selected by the user is acquired. Subsequently, the specifying unit 25 determines whether the selected operation target is the information processing apparatus 10 itself based on the identification information of the selected information processing apparatus held on the RAM 103 (step S309). When the information processing apparatus selected by the user is itself (Yes in step S309), the state where the gesture operation is possible is maintained (step S310). When the selected information processing apparatus is another information processing apparatus (No in step S309), the recognition unit 23 sets the recognition flag to “0” (step S312).

本実施形態では、認識部２３によりユーザの視線の方向を用いてユーザをジェスチャ認識の対象であるかどうかを判定したが、これに限らない。実施形態２のように、認識部２３により、ユーザによって指差されたことを第１の動作として認識してもよい。 In the present embodiment, the recognition unit 23 determines whether or not the user is a target for gesture recognition using the direction of the user's line of sight. However, the present invention is not limited to this. As in the second embodiment, the recognition unit 23 may recognize that the user has pointed as the first action.

以上、説明したように、本実施形態では、ユーザと情報処理装置との距離が、情報処理装置の最適な視聴距離に近い順番に候補を表示する。ユーザが操作したい情報処理装置は、ユーザがディスプレイに表示される映像を見たいと所望している可能性が高いため、ユーザがいる位置から最適な視聴距離で見ることができる情報処理装置は、操作対象として選択される可能性が高いと推定できる。従って、本実施形態のように、情報処理装置それぞれの最適な視聴距離と、実際のユーザとの距離との差分が小さい順に、上位の候補とすることで、ユーザは、少ない操作数で、所望とする操作対象を選択し易くなる。 As described above, in this embodiment, candidates are displayed in the order in which the distance between the user and the information processing apparatus is close to the optimum viewing distance of the information processing apparatus. Since the information processing apparatus that the user wants to operate is likely to be desired by the user to view the video displayed on the display, the information processing apparatus that can be viewed at the optimal viewing distance from the position where the user is It can be estimated that the possibility of being selected as an operation target is high. Therefore, as in the present embodiment, by setting the top candidate in the order from the smallest difference between the optimum viewing distance of each information processing apparatus and the distance to the actual user, the user can perform the desired operation with a small number of operations. It becomes easy to select the operation target.

なお、実施形態３においても、実施形態１と同様に、ユーザによる指差しを認識し、操作対象選択用画像を出力した後、取得部２１が、認識フラグが「１」である他の装置から、ユーザによる操作の結果を取得する変形例を用いることができる。このような変形例によれば、操作対象選択用画像を提示した後で、ユーザに操作対象の装置を特定させる際に、ジェスチャの認識が困難になった場合あっても、操作対象の候補である全ての情報処理装置に対して操作を行うことが可能となる。 In the third embodiment, as in the first embodiment, after the pointing by the user is recognized and the operation target selection image is output, the acquisition unit 21 receives the information from another device whose recognition flag is “1”. The modification which acquires the result of operation by a user can be used. According to such a modified example, even when it is difficult to recognize a gesture when the user specifies the operation target device after the operation target selection image is presented, It becomes possible to perform operations on all information processing apparatuses.

［実施形態４］
次に、実施形態４を説明する。なお、これまで説明した実施形態と同様、実施形態１に準ずる箇所については、説明を省略する。 [Embodiment 4]
Next, a fourth embodiment will be described. Note that, as in the embodiment described so far, the description of the same parts as in the first embodiment will be omitted.

実施形態１で説明した情報処理装置では、ユーザによる凝視を認識した場合には、ネットワーク上の装置が互いに問い合わせ信号を送信し、その信号を受信したことによって、他の操作対象候補の情報処理装置の存在を判断していた（ステップＳ３０５）。実施形態４では、ネットワーク上にサーバ装置を配置する。サーバ装置は、ネットワークに接続された全情報処理装置の数を把握し、全ての情報処理装置の状態情報を管理する。図１２（ａ）は、実施形態１における図１（ａ）に対応するもので、ネットワーク上にサーバ装置１２００が含まれる情報処理システムの一例を示す概要図である。各情報処理装置のハードウェア構成、及び機能構成は実施形態１に準ずるため、説明を省略する。実施形態１と同様図３のフローチャートに示されたメイン処理に従い、ステップＳ３０１〜ステップＳ３０４までの処理を実行した後、実施形態４では、図１２（ｂ）のフローチャートに示される情報取得処理（ステップＳ３０５Ｄ）に進む。ユーザによる凝視を認識した各情報処理装置は、ステップＳ３０５Ｄの情報取得処理では、まず、サーバ装置に対して、認識フラグが「１」となったことを示すため、識別情報と認識スコア含む信号を送信する（ステップＳ１２０１）。そして、サーバ装置からの応答があったかを確認する（ステップＳ１２０２）。応答がないときには（ステップＳ１２０２でＮｏの場合）、応答を受信するまで待機する。サーバ装置は、認識フラグが「１」である情報処理装置からの信号を最初に受信してから一定時間、受付状態となり、その時間内に同様の信号を送信してきた全ての装置が、ユーザの操作対象の候補であると判断する。この際には、信号が受信されなかった情報処理装置と通信し、認識フラグが「０」であることを確認してもよい。そして、サーバ装置が、候補となっている情報処理装置の識別情報と認識スコアを集計して、優先順として認識スコアが高い順のリストを生成し、候補である情報処理装置に配信する。操作候補である判断された情報処理装置は、サーバ装置の配信情報を応答として受け付け（ステップＳ１２０２でＹｅｓの場合）、リストを取得する（ステップＳ１２０３）し、メイン処理にリターンする。以降の処理は、実施形態１に準じるため、説明を省略する。 In the information processing apparatus described in the first embodiment, when the gaze by the user is recognized, the apparatuses on the network transmit inquiry signals to each other and receive the signals, whereby other information processing apparatuses that are candidates for operation targets. Has been determined (step S305). In the fourth embodiment, a server device is arranged on the network. The server device grasps the number of all information processing devices connected to the network and manages the status information of all the information processing devices. FIG. 12A corresponds to FIG. 1A in the first embodiment, and is a schematic diagram illustrating an example of an information processing system in which the server apparatus 1200 is included on the network. Since the hardware configuration and functional configuration of each information processing apparatus are the same as those in the first embodiment, description thereof is omitted. Similar to the first embodiment, after executing the processing from step S301 to step S304 according to the main processing shown in the flowchart of FIG. 3, in the fourth embodiment, the information acquisition processing (step shown in the flowchart of FIG. The process proceeds to S305D). In the information acquisition process of step S305D, each information processing device that has recognized the gaze by the user first sends a signal including identification information and a recognition score to the server device to indicate that the recognition flag is “1”. Transmit (step S1201). And it is confirmed whether there was a response from a server apparatus (step S1202). If there is no response (No in step S1202), the process waits until a response is received. The server device is in a reception state for a certain period of time after first receiving a signal from the information processing device whose recognition flag is “1”, and all devices that have transmitted similar signals within that time are It is determined that it is a candidate for the operation target. In this case, it may be confirmed that the recognition flag is “0” by communicating with the information processing apparatus that has not received the signal. And a server apparatus totals the identification information and recognition score of information processing apparatus used as a candidate, produces | generates the list | wrist of an order with a high recognition score as a priority order, and distributes it to a candidate information processing apparatus. The information processing apparatus determined to be an operation candidate accepts the distribution information of the server apparatus as a response (Yes in step S1202), acquires a list (step S1203), and returns to the main process. Since the subsequent processing is the same as in the first embodiment, description thereof is omitted.

このように、全ての情報処理装置を統括するサーバ装置を設ける場合は、各装置が独立して同じ処理を行うのに比較して、全体の負荷を抑えて複数の対象を管理することが容易になるという利点がある。なお、変形例として、ネットワークにサーバ専用の装置を設けるのではなく、ネットワークに接続された複数の情報処理装置の１つが代表となり、サーバ装置の役割を果たしてもよい。すなわち、１つの情報処理装置が代表となり、応答を返した他の候補の情報処理装置の識別情報と認識スコアを示す情報を集計して認識スコアが高い順に整理したリストを生成し、ネットワーク上の各情報処理装置に配信してもよい。代表となる情報処理装置は、例えば最先で問い合わせ信号を送信したものや、認識スコアが最大値であったものを選択するようにルール化しておけばよい。 In this way, when providing a server device that controls all information processing devices, it is easier to manage a plurality of targets while suppressing the overall load than when each device performs the same processing independently. There is an advantage of becoming. As a modification, instead of providing a server-dedicated device on the network, one of a plurality of information processing devices connected to the network may serve as a representative and serve as the server device. In other words, one information processing device is represented, and a list in which the identification information of other candidate information processing devices that have returned responses and information indicating the recognition scores are aggregated and arranged in descending order of recognition scores is generated. You may distribute to each information processing apparatus. The representative information processing apparatus may be ruled so that, for example, the one that transmits the inquiry signal at the earliest or the one that has the maximum recognition score is selected.

実施形態４では、実施形態１と同様に、第１の動作としてユーザによる凝視を認識する例を説明したが、第１の動作は、実施形態２のように情報処理装置を指差す動作であっても構わない。さらに、認識スコアを用いず、実施形態３のようにユーザと情報処理装置との距離に基づいて、操作対象の候補を上位から順に表示する順番を特定してもよい。また、操作対象の候補の識別情報のリストを、ユーザが操作対象として選択しようとする可能性が高い順に並び替える処理、その候補を表示する画像を生成する処理の少なくとも１つは、上述したサーバ装置あるいは代表の情報処理装置で実行されてもよい。 In the fourth embodiment, as in the first embodiment, the example in which the user's gaze is recognized as the first operation has been described. However, the first operation is an operation of pointing the information processing apparatus as in the second embodiment. It doesn't matter. Furthermore, the order in which candidates for operation targets are displayed in order from the top may be specified based on the distance between the user and the information processing apparatus, as in the third embodiment, without using the recognition score. In addition, at least one of the process of rearranging the list of identification information of candidate operation targets in the order in which the user is likely to be selected as the operation target, and the process of generating an image displaying the candidates are the server described above. It may be executed by a device or a representative information processing device.

［その他の実施形態］
また、本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。 [Other Embodiments]
The present invention can also be realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, or the like) of the system or apparatus reads the program. It is a process to be executed.

Claims

An information processing apparatus used as one of a plurality of information processing apparatuses capable of executing processing corresponding to a user's operation,
A recognition means for recognizing the user's action,
Based on the first action of the user recognized by the recognition means, an image showing at least a part of the information processing apparatus that is a candidate for executing a process corresponding to the action of the user among the plurality of information processing apparatuses. Display control means for displaying on the display unit;
Information processing for executing processing corresponding to the user's operation from among the candidate information processing devices indicated by the image displayed on the display unit based on the second operation of the user recognized by the recognition unit An information processing apparatus comprising: an identifying unit that identifies the apparatus.

The information processing apparatus according to claim 1, wherein the display control unit displays at least a part of the candidate information processing apparatus so that the priority order is known.

The information processing apparatus according to claim 2, wherein the priority order is determined based on a priority index acquired from each of the candidate information processing apparatuses.

The information processing apparatus according to claim 3, wherein the priority index includes a score indicating a possibility that the candidate information processing apparatus is designated based on a first action of the user. .

The first operation of the user is an operation in which the user stares at the information processing apparatus,
When the recognizing unit recognizes that the user has stared based on the user's image captured by the imaging unit, the setting unit has a score corresponding to the user's line-of-sight direction and the positional relationship of the own device. The information processing apparatus according to claim 4, wherein:

The first operation of the user is an operation in which the user points to the information processing device,
When the recognizing unit recognizes that the user has pointed his / her finger based on the user's image captured by the image capturing unit, the setting unit has a positional relationship between the direction of the user's finger and the own apparatus. The information processing apparatus according to claim 4, wherein a score according to the setting is set.

A measuring means for measuring a distance between the user and the information processing apparatus;
The information according to claim 4, wherein the setting unit sets a higher score as a difference between a distance between the user measured by the measuring unit and a predetermined viewing distance of the own device is smaller. Processing equipment.

A program that causes a computer to function as the information processing apparatus according to any one of claims 1 to 7 by being read and executed by a computer.

An information processing apparatus control method used as one of a plurality of information processing apparatuses capable of executing processing corresponding to a user operation,
A first recognition step for recognizing the first action of the user by the recognition means;
Information processing apparatus that is a candidate for executing processing corresponding to the user's action among the plurality of information processing apparatuses based on the first action of the user recognized in the first recognition step by the display control means A display control step of displaying an image showing at least a part of the image on the display unit;
A second recognition step of recognizing the user's second action by the recognition means;
Based on the second action of the user recognized in the second recognition step by the specifying means, the action of the user is selected from the candidate information processing devices indicated by the image displayed on the display unit. And a specifying step of specifying an information processing device for executing the processing to be performed.

An information processing apparatus used as one of a plurality of information processing apparatuses capable of executing processing corresponding to a user's operation,
Communication means for exchanging information with a server device connected to the plurality of information processing devices;
Recognition means for recognizing the user's action;
Transmitting means for transmitting identification information representing the device itself to the server device;
Obtaining means for obtaining identification information representing the information processing device that has recognized the first operation among the plurality of information processing devices from the server device;
Display control means for displaying an image showing at least a part of the information processing apparatus that has recognized the first action on the display unit based on the acquired identification information; and a user's second action recognized by the recognition means. Information processing apparatus comprising: a specifying unit that specifies an information processing apparatus that executes a process corresponding to the user's operation from among the information processing apparatuses indicated by the image displayed on the display unit. apparatus.

An information processing system that includes a plurality of information processing devices that recognize a user's action and execute processing corresponding to the action,
Each of the plurality of information processing devices
Communication means for exchanging information with other information processing devices;
A recognition means for recognizing the user's action,
Transmitting means for transmitting identification information representing its own device to the connected at least one information processing apparatus via the communication means in response to the recognition means recognizing the first operation of the user. When,
Obtaining means for obtaining, through the communication means, identification information transmitted by another information processing apparatus that has recognized the first operation among the plurality of information processing apparatuses;
Recognizing the first operation among the plurality of connected information processing devices in response to the acquisition means acquiring identification information transmitted by another information processing device that has recognized the first operation. Display control means for displaying an image showing the information processing apparatus on the display section; and in the information processing apparatus indicated by the image displayed on the display section in response to the recognition means recognizing the second operation of the user. And an identifying unit that identifies the information processing device selected by the user through the second operation.

An information processing system including a plurality of information processing devices and a server device that recognizes a user's operation and executes processing corresponding to the operation,
Each of the plurality of information processing devices
First communication means for exchanging information with the server device;
Recognition means for recognizing the user's action;
In response to the recognition unit recognizing the first operation of the user, a transmission unit that transmits identification information representing the own device to the server device;
A first acquisition unit configured to acquire identification information representing the information processing apparatus that has recognized the first operation among the plurality of information processing apparatuses from the server apparatus;
Based on the identification information acquired by the first acquisition means, display control means for displaying an image showing the information processing apparatus that has recognized the first action on a display unit; and the recognition means recognizes the second action of the user In response to the above, the information processing device indicated by the image displayed on the display unit includes a specifying unit that specifies the information processing device selected by the user by the second operation,
The server device
A second communication means for exchanging information with the plurality of information processing devices;
A second acquisition unit that acquires identification information representing each device transmitted by the transmission unit by the information processing device that has recognized the first operation of the user among the plurality of information processing devices;
Based on the information acquired by the second acquisition means, identification information representing the information processing apparatus that has recognized the first operation among the plurality of information processing apparatuses is transmitted via the second communication means. An information processing system comprising: distribution means for distributing to an information processing apparatus that has recognized the operation of 1.