JP6349800B2

JP6349800B2 - Gesture recognition device and method for controlling gesture recognition device

Info

Publication number: JP6349800B2
Application number: JP2014048911A
Authority: JP
Inventors: 田中　清明; 清明田中; 隆義山下; みずき古田
Original assignee: Omron Corp
Current assignee: Omron Corp
Priority date: 2014-03-12
Filing date: 2014-03-12
Publication date: 2018-07-04
Anticipated expiration: 2034-03-12
Also published as: KR20150106823A; CN104914989A; US20150261303A1; JP2015172887A; CN104914989B; KR101631011B1; US9557821B2

Description

本発明は、ジェスチャによる入力操作を認識するジェスチャ認識装置に関する。 The present invention relates to a gesture recognition device that recognizes an input operation by a gesture.

コンピュータや電子機器に対して、ジェスチャによって入力を行うことができる装置が普及の兆しを見せている。 Devices that can input with gestures to computers and electronic devices are showing signs of widespread use.

例えば、特許文献１には、利用者が空間上で行ったジェスチャをカメラなどによって撮像し、当該ジェスチャを入力コマンドに変換する入力装置が記載されている。当該装置は、特定のジェスチャと特定の入力コマンドを関連付けて記憶しており、ジェスチャを認識する手段と、認識したジェスチャを入力コマンドに変換する手段を有している。これにより利用者は、機器の前でジェスチャ動作をするだけで、入力デバイスを直接操作することなく任意のコマンドを入力することができる。 For example, Patent Document 1 describes an input device that captures a gesture made by a user in space with a camera or the like and converts the gesture into an input command. The apparatus stores a specific gesture and a specific input command in association with each other, and has means for recognizing the gesture and means for converting the recognized gesture into an input command. As a result, the user can input an arbitrary command by directly performing a gesture operation in front of the device without directly operating the input device.

特開２０１２−１２３６０８号公報JP 2012-123608 A

ジェスチャを認識する入力装置では、一般的に、ジェスチャを行う部位の画像から、当該部位の位置を代表する点（以下、代表点）を抽出し、抽出した代表点の動きを追跡することで、どのようなジェスチャが行われたのかを判別している。例えば、利用者が手のひらを開いて図形を描画するジェスチャを行う場合、手の中心点を検出して、当該中心点の軌跡を追跡することで、ジェスチャによって描画された図形の形状を認識する。 In an input device that recognizes a gesture, generally, by extracting a point representative of the position of the part (hereinafter referred to as a representative point) from the image of the part to be gestured, and tracking the movement of the extracted representative point, It is determined what kind of gesture has been made. For example, when a user performs a gesture of drawing a figure by opening the palm, the center point of the hand is detected and the locus of the center point is tracked to recognize the shape of the figure drawn by the gesture.

しかし、手を用いてジェスチャを行う場合、代表点を手の中心に設定することが必ずしも良いとは限らない。例えば、人差し指を立ててジェスチャを行う場合、人差し指の先端を追跡したほうが、利用者にとって、より自然な認識結果を得ることができる。このような場合、代表点を手の中心に設定してしまうと、手を大きく動かさずに指先のみを動かしたような場合に、入力が正しく行われないおそれがある。
このように、身体の一部を用いてジェスチャを行う場合、従来のジェスチャ認識装置では、代表点がどこにあるかを適切に判別することができず、結果として、ジェスチャによって表現された軌跡を利用者の意図通りに認識することができないケースがあった。 However, when performing a gesture using the hand, it is not always good to set the representative point at the center of the hand. For example, when performing a gesture with an index finger raised, a more natural recognition result can be obtained for the user by tracking the tip of the index finger. In such a case, if the representative point is set at the center of the hand, there is a possibility that the input may not be performed correctly when only the fingertip is moved without moving the hand greatly.
Thus, when performing a gesture using a part of the body, the conventional gesture recognition device cannot properly determine where the representative point is, and as a result, the trajectory expressed by the gesture is used. In some cases, it could not be recognized as intended.

本発明は上記の課題を考慮してなされたものであり、利用者が対象部位を移動させることで行ったジェスチャを認識するジェスチャ認識装置において、利用者の意図通りにジェスチャを認識する技術を提供することを目的とする。 The present invention has been made in consideration of the above problems, and provides a technique for recognizing a gesture as intended by a user in a gesture recognition apparatus that recognizes a gesture performed by a user moving a target part. The purpose is to do.

上記課題を解決するために、本発明に係るジェスチャ認識装置は、ジェスチャを行う身体部位である対象部位の形状を判定し、当該形状を考慮して代表点の位置を決定するという構成をとった。 In order to solve the above-described problem, the gesture recognition device according to the present invention has a configuration in which the shape of a target part, which is a body part that performs a gesture, is determined, and the position of a representative point is determined in consideration of the shape. .

具体的には、本発明に係るジェスチャ認識装置は、
取得した画像からジェスチャを検出し、当該ジェスチャに対応した、対象機器に対する命令を生成するジェスチャ認識装置であって、画像を取得する画像取得手段と、前記取得
した画像から、ジェスチャを行う対象部位を抽出する対象部位抽出手段と、前記抽出した対象部位の形状を特定する対象部位形状特定手段と、前記対象部位に対して、前記対象部位の位置を代表する点である代表点を設定する代表点決定手段と、前記代表点の動きに基づいてジェスチャを認識するジェスチャ認識手段と、前記認識したジェスチャに対応する命令を生成するコマンド生成手段と、を有し、前記代表点決定手段は、前記特定した対象部位の形状を用いて、前記対象部位に対応する代表点の位置を決定することを特徴とする。 Specifically, the gesture recognition device according to the present invention includes:
A gesture recognition device that detects a gesture from an acquired image and generates an instruction for a target device corresponding to the gesture, and an image acquisition unit that acquires the image, and a target site where the gesture is performed from the acquired image Target part extraction means for extracting, target part shape specifying means for specifying the shape of the extracted target part, and representative points for setting a representative point that is a point representing the position of the target part for the target part Determining means; gesture recognition means for recognizing a gesture based on movement of the representative point; and command generation means for generating a command corresponding to the recognized gesture, wherein the representative point determination means The position of the representative point corresponding to the target part is determined using the shape of the target part.

対象部位とは、利用者がジェスチャを行う部位であり、典型的には人間の手であるが、人体全体であってもよいし、利用者が保持している入力用のマーカー等であってもよい。入力されたジェスチャは、対象部位に対応する代表点の動きを追跡することで認識することができるが、本発明に係るジェスチャ認識装置では、対象部位における代表点の位置を、当該対象部位の形状を用いて決定する。
本発明に係るジェスチャ認識装置は、このように対象部位の形状を考慮して代表点の位置を設定することで、対象部位を移動させることで入力されたジェスチャを、利用者の意図通りに認識することができる。 The target part is a part where the user performs a gesture, and is typically a human hand, but may be the entire human body or an input marker held by the user. Also good. The input gesture can be recognized by tracking the movement of the representative point corresponding to the target part. However, in the gesture recognition device according to the present invention, the position of the representative point in the target part is determined by the shape of the target part. To determine.
The gesture recognition device according to the present invention recognizes a gesture input by moving the target part as intended by setting the position of the representative point in consideration of the shape of the target part in this way. can do.

また、前記代表点決定手段は、前記対象部位形状認識手段が特定した対象部位の形状が、突出した部位を含む形状である場合に、当該突出した部位の先端に代表点を設定することを特徴としてもよい。 Further, the representative point determining means sets a representative point at the tip of the protruding portion when the shape of the target portion specified by the target portion shape recognizing means is a shape including the protruding portion. It is good.

対象部位の一部が突出した形状である場合、当該突出した部位の動きによってジェスチャ入力を行おうとしている可能性が高い。例えば、利用者が一部の指を立てている場合や、棒状の入力用マーカーを保持している場合などである。このような場合は、当該部位の先端に代表点を設定するようにすることが好ましい。 When a part of the target part has a protruding shape, there is a high possibility that the gesture input is performed by the movement of the protruding part. For example, there are cases where the user is standing with some fingers or holding a stick-shaped input marker. In such a case, it is preferable to set a representative point at the tip of the part.

また、前記対象部位は、人間の手であり、前記代表点決定手段は、前記対象部位形状認識手段が特定した対象部位の形状が、第一の形状であるか、前記第一の形状とは異なる第二の形状であるかを判定し、前記対象部位の形状が第一の形状である場合に、指先に対応する位置に代表点を設定し、前記対象部位の形状が第二の形状である場合に、手の中心に対応する位置に代表点を設定することを特徴としてもよい。 Further, the target part is a human hand, and the representative point determination unit determines whether the shape of the target part specified by the target part shape recognition unit is the first shape or the first shape. It is determined whether the second shape is different. When the shape of the target portion is the first shape, a representative point is set at a position corresponding to the fingertip, and the shape of the target portion is the second shape. In some cases, a representative point may be set at a position corresponding to the center of the hand.

対象部位が人の手である場合、指先を用いてジェスチャを行う場合と、手全体を用いてジェスチャを行う場合の二通りのパターンが考えられる。従って、代表点の位置は、指先か、手の中心のいずれかとすることが好ましい。
手全体ではなく指先を用いてジェスチャを行っていると推定できる場合、指先に対応する位置に代表点を設定する。これにより、指先の小さな動きでジェスチャを行う場合であっても、当該ジェスチャを利用者の意図通りに認識することができる。 When the target part is a human hand, there are two possible patterns: a case where a gesture is performed using a fingertip and a case where a gesture is performed using the entire hand. Therefore, the position of the representative point is preferably either the fingertip or the center of the hand.
If it can be estimated that the gesture is performed using the fingertip instead of the entire hand, a representative point is set at a position corresponding to the fingertip. Thus, even when a gesture is performed with a small fingertip movement, the gesture can be recognized as intended by the user.

また、前記代表点決定手段は、前記対象部位形状認識手段が特定した、対象部位である手の形状が、一部の指が伸展している形状である場合に、第一の形状であると判定し、全ての指が伸展している形状または全ての指が屈曲している形状である場合に、第二の形状であると判定することを特徴としてもよい。また、対象部位である手の形状が、一本の指のみが伸展している形状である場合に、第一の形状であると判定することを特徴としてもよい。 In addition, the representative point determination unit is the first shape when the shape of the hand that is the target site specified by the target site shape recognition unit is a shape in which some fingers are extended. It may be determined that the second shape is determined when all fingers are extended or all fingers are bent. Further, when the shape of the hand that is the target portion is a shape in which only one finger is extended, the shape may be determined as the first shape.

五本の指のうちの一部が伸展している場合、指先を用いてジェスチャを行っていると推定できる。また、全ての指が伸展している場合や、全ての指が屈曲している場合、手全体を動かすことでジェスチャを行っていると推定できる。また、伸展している指を一本のみ検出した場合、指先を用いてジェスチャを行っていると推定することができる。なお、一
部の指が伸展している状態とは、指を全て開いた状態は含まない。 When a part of the five fingers is extended, it can be estimated that a gesture is performed using the fingertip. Further, when all fingers are extended or all fingers are bent, it can be estimated that a gesture is performed by moving the entire hand. Moreover, when only one extending finger is detected, it can be estimated that a gesture is performed using the fingertip. The state where some fingers are extended does not include the state where all fingers are opened.

また、前記代表点決定手段は、前記手の中心に対応する位置として、前記対象部位抽出手段によって抽出された、当該手に対応する領域の重心を用いることを特徴としてもよい。 Further, the representative point determining means may use the center of gravity of the region corresponding to the hand extracted by the target part extracting means as the position corresponding to the center of the hand.

手の中心に対応する位置としては、計算が容易である重心を用いることが好ましい。 As the position corresponding to the center of the hand, it is preferable to use a center of gravity that is easy to calculate.

また、本発明に係るジェスチャ認識装置は、設定された代表点の位置を利用者に通知する通知手段をさらに有することを特徴としてもよい。 In addition, the gesture recognition device according to the present invention may further include notification means for notifying the user of the position of the set representative point.

通知手段とは、例えば表示装置や音声出力装置などである。例えば、代表点がどこに設定されているかを画面上に表示してもよい。このように構成することで、利用者は、ジェスチャ認識装置が追跡している点がどこにあるかを知ることができ、よりユーザビリティを向上させることができる。 The notification means is, for example, a display device or an audio output device. For example, where the representative point is set may be displayed on the screen. With this configuration, the user can know where the gesture recognition device is tracking and can further improve usability.

なお、本発明は、上記手段の少なくとも一部を含むジェスチャ認識装置として特定することができる。また、前記ジェスチャ認識装置の制御方法や、前記ジェスチャ認識装置を動作させるためのプログラム、当該プログラムが記録された記録媒体として特定することもできる。上記処理や手段は、技術的な矛盾が生じない限りにおいて、自由に組み合わせて実施することができる。 In addition, this invention can be specified as a gesture recognition apparatus containing at least one part of the said means. Further, the control method of the gesture recognition device, the program for operating the gesture recognition device, and the recording medium on which the program is recorded can be specified. The above processes and means can be freely combined and implemented as long as no technical contradiction occurs.

本発明によれば、利用者が対象部位を移動させることで行ったジェスチャを認識するジェスチャ認識装置において、利用者の意図通りにジェスチャを認識することができる。 ADVANTAGE OF THE INVENTION According to this invention, in the gesture recognition apparatus which recognizes the gesture which the user performed by moving a target part, a gesture can be recognized as a user intends.

第一の実施形態に係るジェスチャ認識システムの構成図である。It is a lineblock diagram of the gesture recognition system concerning a first embodiment. ジェスチャ定義データの例を説明する図である。It is a figure explaining the example of gesture definition data. 対象部位の抽出例を説明する図である。It is a figure explaining the example of extraction of an object part. 対象部位の形状の違いを説明する図である。It is a figure explaining the difference in the shape of an object part. 第一の実施形態においてジェスチャ認識装置が行う処理を表すフローチャート図である。It is a flowchart figure showing the process which a gesture recognition apparatus performs in 1st embodiment. 第一の実施形態においてジェスチャ認識装置が行う処理を表すフローチャート図である。It is a flowchart figure showing the process which a gesture recognition apparatus performs in 1st embodiment. 第二の実施形態における通知画面の例である。It is an example of the notification screen in 2nd embodiment.

（第一の実施形態）
第一の実施形態に係るジェスチャ認識システムの概要について、システム構成図である図１を参照しながら説明する。第一の実施形態に係るジェスチャ認識システムは、ジェスチャ認識装置１００および対象機器２００からなるシステムである。 (First embodiment)
An outline of the gesture recognition system according to the first embodiment will be described with reference to FIG. 1 which is a system configuration diagram. The gesture recognition system according to the first embodiment is a system including the gesture recognition device 100 and the target device 200.

ジェスチャ認識装置１００は、利用者が行ったジェスチャを、カメラを用いて認識するとともに、当該ジェスチャに対応するコマンドを生成し、対象機器２００に送信する装置である。
また、対象機器２００は、ジェスチャ認識装置１００からコマンドを受信する装置（制御対象の機器）であり、典型的には、テレビ、ビデオレコーダ、コンピュータ、エアコン、テレビ会議システムなどの電気製品である。対象機器２００は、有線または無線によって、ジェスチャ認識装置１００からコマンドを受信することができれば、どのような機器
であってもよい。本実施形態では、対象機器２００はテレビ受信機であり、ジェスチャ認識装置１００は、当該テレビに内蔵された装置であるものとする。 The gesture recognition device 100 is a device that recognizes a gesture performed by a user using a camera, generates a command corresponding to the gesture, and transmits the command to the target device 200.
The target device 200 is a device (control target device) that receives a command from the gesture recognition device 100, and is typically an electrical product such as a television, a video recorder, a computer, an air conditioner, or a video conference system. The target device 200 may be any device as long as it can receive a command from the gesture recognition device 100 by wire or wireless. In the present embodiment, the target device 200 is a television receiver, and the gesture recognition device 100 is a device built in the television.

ジェスチャ認識装置１００について詳細に説明する。ジェスチャ認識装置１００は、画像取得部１０１、ジェスチャ抽出部１０２、ジェスチャ認識部１０３、コマンド生成部１０４を有する。 The gesture recognition device 100 will be described in detail. The gesture recognition device 100 includes an image acquisition unit 101, a gesture extraction unit 102, a gesture recognition unit 103, and a command generation unit 104.

画像取得部１０１は、外部から画像を取得する手段である。本実施形態では、テレビ画面の正面上部に取り付けられたカメラ（不図示）を用いて利用者を撮像する。画像取得部１０１が用いるカメラは、ＲＧＢ画像を取得するカメラであってもよいし、グレースケール画像や、赤外線画像を取得するカメラであってもよい。また、画像は必ずしもカメラによって取得される必要はなく、例えば、距離センサが生成した、距離の分布を表す画像（距離画像）であってもよい。また、距離センサとカメラの組み合わせ等であってもよい。
画像取得部１０１が取得する画像（以下、カメラ画像）は、利用者が行ったジェスチャの動きと、当該ジェスチャを行った身体部位の形状を取得することができれば、どのような画像であってもよい。また、カメラ画像の画角は、テレビの視野角と略同一であればよい。 The image acquisition unit 101 is a means for acquiring an image from the outside. In this embodiment, a user is imaged using a camera (not shown) attached to the upper front part of the television screen. The camera used by the image acquisition unit 101 may be a camera that acquires an RGB image, or may be a camera that acquires a grayscale image or an infrared image. The image is not necessarily acquired by the camera, and may be an image (distance image) representing a distribution of distances generated by the distance sensor, for example. Further, a combination of a distance sensor and a camera may be used.
The image acquired by the image acquisition unit 101 (hereinafter referred to as a camera image) may be any image as long as the movement of the gesture performed by the user and the shape of the body part where the gesture is performed can be acquired. Good. The angle of view of the camera image may be substantially the same as the viewing angle of the television.

ジェスチャ抽出部１０２は、画像取得部１０１が取得したカメラ画像から、ジェスチャを行う身体部位（以下、対象部位）を検出する手段である。本実施形態の場合、利用者は手を用いてジェスチャを行うものとする。ジェスチャ抽出部１０２は、例えば、カメラ画像の中から、人の手を表す領域を検出する。
また、ジェスチャ抽出部１０２は、検出した対象部位に対して、当該対象部位の位置を代表する点である代表点を設定し、当該代表点の動きを追跡する。これにより、ジェスチャによって表現された動きを抽出することができる。
対象部位の形状および代表点については、後ほど詳しく説明する。 The gesture extraction unit 102 is a unit that detects a body part (hereinafter, a target part) on which a gesture is performed from the camera image acquired by the image acquisition unit 101. In the case of this embodiment, a user shall perform a gesture using a hand. For example, the gesture extraction unit 102 detects an area representing a human hand from a camera image.
In addition, the gesture extraction unit 102 sets a representative point, which is a point representing the position of the target part, for the detected target part, and tracks the movement of the representative point. Thereby, the movement expressed by the gesture can be extracted.
The shape and representative points of the target part will be described in detail later.

ジェスチャ認識部１０３は、ジェスチャ抽出部１０２が抽出した代表点の動きに基づいて、ジェスチャの内容を特定する手段である。例えば、図２に示したような、代表点の動きとジェスチャの内容を関連付けるデータ（ジェスチャ定義データ）を記憶しており、ジェスチャによって表現された命令の特定を行う。ジェスチャによって表現された命令は、図２に示したように、一連のジェスチャ動作が単独の命令に対応するものであってもよいし、代表点の移動量および移動方向に基づいて、画面に表示されたポインタを移動させる命令であってもよい。 The gesture recognizing unit 103 is means for specifying the content of the gesture based on the movement of the representative point extracted by the gesture extracting unit 102. For example, as shown in FIG. 2, data (gesture definition data) that associates the movement of the representative point with the content of the gesture is stored, and the instruction expressed by the gesture is specified. As shown in FIG. 2, the instruction expressed by the gesture may be a series of gesture operations corresponding to a single instruction, or is displayed on the screen based on the moving amount and moving direction of the representative point. It may be an instruction to move the pointer.

コマンド生成部１０４は、ジェスチャ認識部１０３が特定したジェスチャに対応するコマンドを生成する手段である。コマンドとは、対象機器２００を制御するための信号であり、電気信号であってもよいし、無線によって変調された信号や、パルス変調された赤外線信号などであってもよい。 The command generation unit 104 is a unit that generates a command corresponding to the gesture specified by the gesture recognition unit 103. The command is a signal for controlling the target device 200, and may be an electric signal, a wirelessly modulated signal, a pulse-modulated infrared signal, or the like.

ジェスチャ認識装置１００は、プロセッサ、主記憶装置、補助記憶装置を有するコンピュータであり、補助記憶装置に記憶されたプログラムが主記憶装置にロードされ、プロセッサによって実行されることによって、前述した各手段が機能する（プロセッサ、主記憶装置、補助記憶装置はいずれも不図示）。 The gesture recognition device 100 is a computer having a processor, a main storage device, and an auxiliary storage device, and a program stored in the auxiliary storage device is loaded into the main storage device and executed by the processor, whereby each of the above-described means is performed. Functions (a processor, a main storage device, and an auxiliary storage device are not shown).

次に、ジェスチャの認識処理について詳しく説明する。
図３は、画像取得部１０１が取得した画像の例である。すなわち、テレビ画面に正対する利用者を、画面側から見た図である。
ジェスチャの認識処理は、ジェスチャを行う対象部位を画像から検出する処理と、検出した対象部位に対応する代表点を設定する処理と、当該代表点の動きを追跡してジェスチ
ャの内容を特定する処理と、からなる。 Next, gesture recognition processing will be described in detail.
FIG. 3 is an example of an image acquired by the image acquisition unit 101. That is, it is a view of the user facing the television screen viewed from the screen side.
The gesture recognition process includes a process for detecting a target part to be gestured from an image, a process for setting a representative point corresponding to the detected target part, and a process for identifying the content of the gesture by tracking the movement of the representative point. And consist of

まず、対象部位の検出について説明する。
ジェスチャ抽出部１０２は、取得した画像の中から、人の手が含まれる領域（符号３１）を検出する。人の手が含まれる領域は、例えば色彩や形状によって判定してもよいし、特徴点を検出することで判定してもよい。また、予め記憶されたモデルやテンプレート等と比較することで判定してもよい。手の検出には公知の技術を用いることができるため、詳細な説明は省略する。 First, detection of a target part will be described.
The gesture extraction unit 102 detects an area (reference numeral 31) including a human hand from the acquired image. An area including a human hand may be determined by, for example, color or shape, or may be determined by detecting a feature point. Alternatively, the determination may be made by comparing with a model or template stored in advance. Since a known technique can be used for detecting the hand, a detailed description is omitted.

次に、代表点について説明する。
代表点は、対象部位の位置を代表する点であり、一つの対象部位に対して一つが設定される。例えば、対象部位が人の手である場合、手のひらの中心を表す点（符号３２）を代表点とすることができる。
対象部位に対して代表点を設定し、代表点の動きを追跡することで、手の動きによって表されるジェスチャを取得することができる。例えば、「手全体を動かして方向を示す」、「図形を描画する」といったジェスチャを取得することができる。 Next, representative points will be described.
The representative point is a point representing the position of the target part, and one is set for one target part. For example, when the target part is a human hand, a point (symbol 32) representing the center of the palm can be used as the representative point.
By setting a representative point for the target part and tracking the movement of the representative point, it is possible to acquire a gesture represented by the movement of the hand. For example, gestures such as “move the entire hand to indicate direction” and “draw a figure” can be acquired.

ここで、図４を参照しながら、従来技術の問題点を説明する。
図４は、カメラ画像の例である（対象部位以外は図示を省略する）。図４（Ａ）は、人差し指の指先を移動するジェスチャを表し、図４（Ｂ）は、手のひらを平行に移動するジェスチャを表す。
当該二つのジェスチャは、どちらも「左方向への移動」を意味するものであるが、指を立ててジェスチャを行う場合、指先の細かな動きによってジェスチャを行おうとする場合が多く、手を開いてジェスチャを行う場合、腕全体の大きな動きによってジェスチャを行おうとする場合が多い。従って、対象部位のどの部分に代表点を設定するかで、ジェスチャの認識精度が変わってしまう。
例えば、図４（Ａ）の場合、人差し指の指先に代表点を設定すると、移動距離は符号４１で示した距離となるが、手のひらの中心に代表点を設定すると、移動距離は符号４２で示した距離となる。すなわち、移動距離によって異なるコマンドが割り当たっていた場合、所望のコマンドと異なるコマンドが入力されてしまうおそれがある。また、移動距離に応じてポインタを移動させたい場合、利用者が所望する移動量が得られないおそれがある。 Here, the problems of the prior art will be described with reference to FIG.
FIG. 4 is an example of a camera image (illustration is omitted except for the target part). 4A shows a gesture for moving the fingertip of the index finger, and FIG. 4B shows a gesture for moving the palm in parallel.
Both of these gestures mean “move to the left”, but when making a gesture with a finger raised, often the fingertips are trying to make a gesture and the hand is opened. When performing a gesture, the gesture is often attempted by a large movement of the entire arm. Therefore, the recognition accuracy of the gesture changes depending on which part of the target part the representative point is set.
For example, in the case of FIG. 4A, when a representative point is set at the fingertip of the index finger, the moving distance is the distance indicated by reference numeral 41. However, when a representative point is set at the center of the palm, the moving distance is indicated by reference numeral 42. Distance. That is, when a different command is assigned depending on the movement distance, a command different from a desired command may be input. In addition, when it is desired to move the pointer according to the moving distance, there is a possibility that the moving amount desired by the user cannot be obtained.

そこで、本実施形態に係るジェスチャ認識装置では、対象部位の形状を用いて代表点の位置を決定することでジェスチャの認識精度を向上させる。 Therefore, in the gesture recognition apparatus according to the present embodiment, the gesture recognition accuracy is improved by determining the position of the representative point using the shape of the target part.

本実施形態では、対象部位の形状を、立っている指の本数によって識別する。例えば、図４（Ａ）のケースの場合、立っている指の数は１本であり、図４（Ｂ）のケースの場合、立っている指の数は５本であると判定する。以降の説明において、指を１本立てた状態を「形状１」と称し、手を開いて指を５本立てた状態を「形状５」と称する。また、「形状ｎ」とは、立っている指がｎ本あることを意味するものとする。 In this embodiment, the shape of the target part is identified by the number of standing fingers. For example, in the case of FIG. 4A, the number of standing fingers is one, and in the case of FIG. 4B, it is determined that the number of standing fingers is five. In the following description, a state in which one finger is raised is referred to as “shape 1”, and a state in which the hand is opened and five fingers are raised is referred to as “shape 5”. Further, “shape n” means that there are n standing fingers.

立っている指の数は、例えば、検出した手の周辺において、指と思われる尖った形状（あるいは突出した部位）を探索することで判定してもよいし、テンプレート画像や、手の骨格モデル等を用いてマッチングを行うことで判定してもよい。指の数の判定には、既知の手法を用いることができるため、詳細な説明は省略する。 For example, the number of standing fingers may be determined by searching for a pointed shape (or protruding part) that seems to be a finger around the detected hand, or a template image or a skeleton model of the hand. It may be determined by performing matching using, for example. Since a known method can be used to determine the number of fingers, detailed description is omitted.

本実施形態では、ジェスチャ抽出部１０２が手の形状を判定し、指先でジェスチャを行っていると判定できる形状であった場合に、指先に対応する位置に代表点を設定する。それ以外の場合は、検出した手の重心に代表点を設定する。指先でジェスチャを行っている
と判定できる形状とは、例えば、立っている指の数が１本である場合（対象部位の形状が「形状１」である場合）とすることができる。 In the present embodiment, when the gesture extraction unit 102 determines the shape of the hand and determines that the gesture is performed with the fingertip, the representative point is set at a position corresponding to the fingertip. In other cases, a representative point is set at the center of gravity of the detected hand. The shape that can be determined that the fingertip is performing the gesture can be, for example, the case where the number of standing fingers is one (the shape of the target part is “shape 1”).

次に、本実施形態に係るジェスチャ認識装置１００が行う処理の全体を、処理フローチャートである図５および図６を参照しながら説明する。
図５に示した処理は、画像取得部１０１およびジェスチャ抽出部１０２が行う処理である。当該処理は、入力開始を表す操作があった場合（例えば、ジェスチャ入力が必要な機能を対象機器側で起動した際など）に開始してもよいし、対象機器の電源が投入された際に開始してもよい。 Next, the entire process performed by the gesture recognition apparatus 100 according to the present embodiment will be described with reference to FIGS. 5 and 6 which are process flowcharts.
The process illustrated in FIG. 5 is a process performed by the image acquisition unit 101 and the gesture extraction unit 102. The processing may be started when there is an operation indicating the start of input (for example, when a function that requires gesture input is activated on the target device side), or when the power of the target device is turned on. You may start.

まず、画像取得部１０１がカメラ画像を取得する（ステップＳ１１）。本ステップでは、例えばテレビ画面の正面上部に備えられたカメラを用いて、ＲＧＢカラー画像を取得する。
次に、ジェスチャ抽出部１０２が、取得したカメラ画像から、対象部位（手）の検出を試みる（ステップＳ１２）。対象部位の検出は、前述したように、色や形状を基準として行ってもよいし、パターンマッチング等によって行ってもよい。ここで、対象部位が検出されなかった場合は、所定の時間だけ待機した後に新たに画像を取得し、同様の処理を繰り返す。 First, the image acquisition unit 101 acquires a camera image (step S11). In this step, for example, an RGB color image is acquired using a camera provided in the upper front portion of the television screen.
Next, the gesture extraction unit 102 tries to detect a target part (hand) from the acquired camera image (step S12). As described above, the target part may be detected based on the color and shape, or may be performed by pattern matching or the like. If the target part is not detected, a new image is acquired after waiting for a predetermined time, and the same processing is repeated.

次に、ジェスチャ抽出部１０２が、検出した対象部位の形状を判定する（ステップＳ１３）。本例では、対象部位の形状が形状１であるか、それ以外であるかを判定するものとする。もし、対象部位の形状が、予め定義されたもの以外であった場合は、処理を中断してステップＳ１１へ戻ってもよいし、「該当なし」として処理を続けてもよい。
次に、検出した対象部位の形状が、前回実行時から変化しているかを判定する（ステップＳ１４）。ステップＳ１４が初めて実行された場合は、「変化あり」と判断する。
ステップＳ１４で「変化あり」と判断した場合、ステップＳ１５へ遷移し、ジェスチャ抽出部１０２が、検出した対象部位に代表点を設定する。具体的には、ステップＳ１３で判定した対象部位の形状が形状１であった場合、伸展している指の先端に代表点を設定し、それ以外の場合、手の重心点に代表点を設定する。
ステップＳ１４で「変化なし」と判断した場合、ステップＳ１６へ遷移する。 Next, the gesture extraction unit 102 determines the shape of the detected target part (step S13). In this example, it is determined whether the shape of the target part is the shape 1 or the other. If the shape of the target part is other than that defined in advance, the process may be interrupted and the process may return to step S11, or the process may be continued as “not applicable”.
Next, it is determined whether the shape of the detected target part has changed since the previous execution (step S14). When step S14 is executed for the first time, it is determined that there is “change”.
If it is determined that there is “change” in step S14, the process proceeds to step S15, and the gesture extraction unit 102 sets a representative point in the detected target part. Specifically, if the shape of the target part determined in step S13 is shape 1, a representative point is set at the tip of the extending finger, and in other cases, a representative point is set at the center of gravity of the hand. To do.
If it is determined in step S14 that there is no change, the process proceeds to step S16.

次に、ジェスチャ抽出部１０２が、カメラ画像中における代表点の座標を取得し、ジェスチャ認識部１０３へ送信する（ステップＳ１６）。
ステップＳ１１〜Ｓ１６の処理は繰り返し実行され、代表点の座標が、ジェスチャ認識部１０３へ逐次送信される。 Next, the gesture extraction unit 102 acquires the coordinates of the representative points in the camera image and transmits them to the gesture recognition unit 103 (step S16).
The processes of steps S11 to S16 are repeatedly executed, and the coordinates of the representative points are sequentially transmitted to the gesture recognition unit 103.

図６に示した処理は、ジェスチャ認識部１０３およびコマンド生成部１０４が行う処理である。当該処理は、図５に示した処理と同時に開始される。
ステップＳ２１は、ジェスチャ認識部１０３が、取得した代表点の座標に基づいてジェスチャを認識する処理である。例えば、図２に示したジェスチャ定義データがあった場合で、代表点の座標が右方向へ１００ピクセル以上移動していることを検出した場合、「音量ＵＰ」を表すジェスチャがなされたものと判定する。 The process illustrated in FIG. 6 is a process performed by the gesture recognition unit 103 and the command generation unit 104. This process is started simultaneously with the process shown in FIG.
Step S21 is processing in which the gesture recognition unit 103 recognizes a gesture based on the acquired coordinates of the representative point. For example, in the case where there is the gesture definition data shown in FIG. 2, when it is detected that the coordinates of the representative point have moved 100 pixels or more in the right direction, it is determined that a gesture indicating “volume up” has been made. To do.

なお、ジェスチャ抽出部１０２からジェスチャ認識部１０３へ送信される情報は、代表点の移動方向および移動量を表すことができれば、必ずしもカメラ画像における座標でなくてもよい。例えば、代表点の座標を、利用者を原点とする座標系に変換したものであってもよいし、移動方向や移動量を表す他のデータであってもよい。
また、カメラ画像中における対象部位の大きさを表す情報を同時に送信するようにしてもよい。ジェスチャ認識装置が取得する対象部位の移動量は、利用者と装置との距離に応じて変わるため、このようにすることで、対象部位の大きさに応じて移動量を補正するこ
とができる。 Note that the information transmitted from the gesture extraction unit 102 to the gesture recognition unit 103 does not necessarily have to be coordinates in the camera image as long as it can represent the movement direction and movement amount of the representative point. For example, the coordinates of the representative point may be converted into a coordinate system with the user as the origin, or other data representing the movement direction or movement amount.
Moreover, you may make it transmit simultaneously the information showing the magnitude | size of the object site | part in a camera image. Since the movement amount of the target part acquired by the gesture recognition device changes according to the distance between the user and the apparatus, the movement amount can be corrected according to the size of the target part.

次に、コマンド生成部１０４が、認識したジェスチャに対応する制御信号を生成して、対象機器２００に送信する（ステップＳ２２）。前述した例では、例えば「音量ＵＰ」という命令に対応付いた制御信号（音量を一段階上げる信号）を生成し、対象機器２００に送信する。 Next, the command generation unit 104 generates a control signal corresponding to the recognized gesture and transmits it to the target device 200 (step S22). In the example described above, for example, a control signal (a signal for raising the volume by one level) associated with the command “volume up” is generated and transmitted to the target device 200.

図５および図６に示した処理は周期的に実行され、入力終了を表す操作があった場合（例えば、ジェスチャ入力を用いた操作を対象機器側で終えた際など）に終了する。 The processing shown in FIGS. 5 and 6 is periodically executed, and ends when there is an operation indicating the end of input (for example, when an operation using gesture input is ended on the target device side).

以上説明したように、第一の実施形態に係るジェスチャ認識装置は、ジェスチャを行った対象部位の形状によって、代表点の位置を異なる位置に設定する。これにより、ジェスチャが指先を用いて行われている場合であっても、手全体を用いて行われている場合であっても、当該ジェスチャを正確に認識することができる。 As described above, the gesture recognition device according to the first embodiment sets the position of the representative point to a different position depending on the shape of the target part where the gesture is performed. Thereby, even if it is a case where a gesture is performed using a fingertip, or a case where it is performed using the whole hand, the said gesture can be recognized correctly.

なお、実施形態の説明では、ステップＳ１３にて対象部位の形状を判定しているが、当該ステップは、対象部位を検出した後の一回のみ実行し、ジェスチャが開始された後はスキップするようにしてもよい。このようにすることで、処理量を抑えることができる。
ただし、ジェスチャが終了し、続けて異なるジェスチャが開始される場合もあるため、このような場合は、当該ステップを再度実行するようにしてもよい。例えば、対象部位の形状や大きさが著しく変化した場合や、対象部位が画像からフレームアウトして再度フレームインした場合などは、異なるジェスチャが開始されたと判断し、ステップＳ１３を再度実行するようにしてもよい。また、明示的な操作によって再実行してもよい。 In the description of the embodiment, the shape of the target part is determined in step S13. However, this step is executed only once after the target part is detected, and is skipped after the gesture is started. It may be. By doing so, the processing amount can be suppressed.
However, since the gesture may end and a different gesture may start subsequently, in such a case, the step may be executed again. For example, if the shape or size of the target part has changed significantly, or if the target part has been out of the image and then in again, it is determined that a different gesture has started, and step S13 is executed again. May be. Moreover, you may re-execute by explicit operation.

（第二の実施形態）
第二の実施形態は、第一の実施形態に係るジェスチャ認識システムに、代表点の位置を利用者に通知する手段を追加した実施形態である。第二の実施形態に係るジェスチャ認識システムの構成は、以下に説明する点を除き、第一の実施形態と同様である。 (Second embodiment)
The second embodiment is an embodiment in which means for notifying the user of the position of the representative point is added to the gesture recognition system according to the first embodiment. The configuration of the gesture recognition system according to the second embodiment is the same as that of the first embodiment except for the points described below.

第二の実施形態に係るジェスチャ認識装置１００が行う処理のフローチャートは、図５および図６と同様であるが、ステップＳ１５で対象部位を設定した場合に、ジェスチャ抽出部１０２が、対象機器２００が有する画面（本例ではテレビ画面）を通して、代表点を設定した位置を利用者に通知するという点において相違する。
図７は、対象機器であるテレビ画面に表示される画像の例である。例えば、検出した対象部位の形状が形状１であった場合、指先を動かすことでジェスチャを行える旨を通知し、それ以外の形状であった場合、手全体を動かすことでジェスチャを行える旨を通知する。
なお、利用者に対して通知を行う方法は、これ以外の方法であってもよい。例えば、文章のみで案内を表示してもよいし、通常の操作画面に案内用のウインドウを追加し、図形や文章を表示してもよい。また、音声等によって通知してもよい。 The flowchart of the process performed by the gesture recognition device 100 according to the second embodiment is the same as that in FIGS. 5 and 6, but when the target region is set in step S <b> 15, the gesture extraction unit 102 determines that the target device 200 is the target device 200. It is different in that the user is notified of the position where the representative point is set through the screen (the television screen in this example).
FIG. 7 is an example of an image displayed on the television screen as the target device. For example, if the shape of the detected target part is shape 1, notify that the gesture can be performed by moving the fingertip, and if the shape is other than that, notify that the gesture can be performed by moving the entire hand. To do.
Note that the method for notifying the user may be other methods. For example, guidance may be displayed only with text, or a guidance window may be added to a normal operation screen to display graphics and text. Moreover, you may notify by an audio | voice etc.

第二の実施形態によると、システムが追跡する代表点がどこにあるかを利用者が認識することができるため、より直感的なジェスチャ入力を行うことができる。 According to the second embodiment, since the user can recognize where the representative point to be tracked by the system is, a more intuitive gesture input can be performed.

（変形例）
なお、各実施形態の説明は本発明を説明する上での例示であり、本発明は、発明の趣旨を逸脱しない範囲で適宜変更または組み合わせて実施することができる。
例えば、実施形態の説明では、ジェスチャ認識装置１００を、対象機器２００に組み込まれた装置であるとしたが、ジェスチャ認識装置１００は独立した装置であってもよい。
また、ジェスチャ認識装置１００は、対象機器２００上で動作するプログラムとして実
装されてもよい。プログラムとして実装する場合は、メモリに記憶されたプログラムをプロセッサが実行するように構成してもよいし、ＦＰＧＡ（Field Programmable Gate Array）やＡＳＩＣ（Application Specific Integrated Circuit）などによって実行されるように構成してもよい。 (Modification)
The description of each embodiment is an exemplification for explaining the present invention, and the present invention can be implemented with appropriate modifications or combinations without departing from the spirit of the invention.
For example, in the description of the embodiment, the gesture recognition apparatus 100 is an apparatus incorporated in the target device 200, but the gesture recognition apparatus 100 may be an independent apparatus.
The gesture recognition device 100 may be implemented as a program that operates on the target device 200. When implemented as a program, the processor may be configured to execute a program stored in a memory, or may be configured to be executed by an FPGA (Field Programmable Gate Array), an ASIC (Application Specific Integrated Circuit), or the like. May be.

また、実施形態の説明では、カメラを用いて画像を取得する例を挙げたが、ジェスチャを取得し、かつ、対象部位の形状を識別することができれば、例えばネットワーク経由で画像を受信するなど、例示した方法以外によって画像を取得するようにしてもよい。
また、対象部位は、必ずしも人の手でなくてもよい。例えば、他の身体部位であってもよいし、ジェスチャ入力用のマーカー等であってもよい。マーカーを用いる場合、ジェスチャ抽出部１０２が、マーカーの存在を検出し、当該マーカーの先端に代表点を設定するようにすればよい。 In the description of the embodiment, an example of acquiring an image using a camera has been described. However, if a gesture can be acquired and the shape of a target part can be identified, for example, an image is received via a network. The image may be acquired by a method other than the exemplified method.
In addition, the target part is not necessarily a human hand. For example, it may be another body part or a gesture input marker. When using a marker, the gesture extraction unit 102 may detect the presence of the marker and set a representative point at the tip of the marker.

また、本発明における「対象部位の形状」とは、ジェスチャ認識装置が画像を通して認識した形状を意味し、必ずしも対象部位を物理的に変形させたものである必要はない。 In addition, the “shape of the target part” in the present invention means a shape recognized by the gesture recognition device through the image, and does not necessarily have to be a physical deformation of the target part.

また、実施形態の説明では、対象部位の形状について、「形状１」と「それ以外の形状」の二種類を識別するものとしたが、他の形状を識別するようにしてもよい。他の形状とは、例えば、握った状態の手であってもよいし、指が２本立っている状態であってもよい。また、三種類以上の形状を識別するようにしてもよい。いずれの場合も、識別した形状に基づいて代表点の位置を決定し、前述した方法によって処理を行えばよい。
例えば、代表点を設定する位置を下記のようにしてもよい。 In the description of the embodiment, two types of “shape 1” and “other shapes” are identified for the shape of the target part, but other shapes may be identified. The other shape may be, for example, a gripped hand or a state where two fingers are standing. Also, three or more types of shapes may be identified. In either case, the position of the representative point may be determined based on the identified shape, and processing may be performed by the method described above.
For example, the position where the representative point is set may be as follows.

形状０である場合：拳の重心を代表点とする
形状１である場合：立っている指の指先を代表点とする
形状２である場合：立っている二本の指先の中点を代表点とする
形状３である場合：立っている三本の指のうち、真ん中の指の指先を代表点とする
形状４および５である場合：手のひらの重心を代表点とする When the shape is 0: The center of gravity of the fist is the representative point. When the shape is 1: The fingertip of the standing finger is the representative point. When the shape is 2: The midpoint of the two standing fingertips. In case of shape 3: Among the three standing fingers, the fingertip of the middle finger is the representative point. In case of shapes 4 and 5: The center of gravity of the palm is the representative point.

また、各実施形態の説明では、ステップＳ１２にてカメラ画像から対象部位を検出したのちに、ステップＳ１３にて当該対象部位の形状を識別したが、テンプレートマッチング等によってこれらの処理を同時に実行するようにしてもよい。対象部位が存在する位置と、当該対象部位の形状を取得することができれば、処理の内容および処理順序は特に限定されない。 In the description of each embodiment, after detecting the target part from the camera image in step S12, the shape of the target part is identified in step S13. However, these processes are performed simultaneously by template matching or the like. It may be. As long as the position where the target part exists and the shape of the target part can be acquired, the contents and order of processing are not particularly limited.

１００・・・ジェスチャ認識装置
１０１・・・画像取得部
１０２・・・ジェスチャ抽出部
１０３・・・ジェスチャ認識部
１０４・・・コマンド生成部
２００・・・対象機器 DESCRIPTION OF SYMBOLS 100 ... Gesture recognition apparatus 101 ... Image acquisition part 102 ... Gesture extraction part 103 ... Gesture recognition part 104 ... Command generation part 200 ... Target apparatus

Claims

A gesture recognition device that detects a gesture from an acquired image and generates a command for a target device corresponding to the gesture,
Image acquisition means for acquiring images;
Target part extraction means for extracting a human hand , which is a target part to be gestured, from the acquired image;
Target part shape specifying means for specifying the shape of the extracted target part;
Representative point determining means for setting a representative point, which is a point representing the position of the target portion, with respect to the target portion;
Gesture recognition means for recognizing a gesture based on the movement of the representative point;
Command generating means for generating an instruction corresponding to the recognized gesture;
Have
In the representative point determining means, the shape of the hand, which is the target part specified by the target part shape specifying means, is a first shape in which some fingers are extended, or all fingers are extended. Or a second shape that is a shape in which all fingers are bent, and when the shape of the target portion is the first shape, a representative point at a position corresponding to the fingertip , And a representative point is set at a position corresponding to the center of the hand when the shape of the target part is the second shape .

The representative point determining means sets a representative point at the tip of the protruding portion when the shape of the target portion specified by the target portion shape specifying means is a shape including the protruding portion. The gesture recognition device according to claim 1.

The representative point determining means, as a position corresponding to the center of the hand, extracted by the target region extraction means, which comprises using a center of gravity of the area corresponding to the hand, either claim 1 or 2 The gesture recognition device according to any one of the above.

Characterized by further comprising a notification means for notifying the location of the set representative point to the user, the gesture recognition apparatus according to any one of claims 1 to 3.

A method for controlling a gesture recognition device that detects a gesture from an acquired image and generates a command for a target device corresponding to the gesture,
An image acquisition step of acquiring an image;
A target part extraction step of extracting a human hand that is a target part to be gestured from the acquired image;
A target part shape specifying step for specifying the shape of the extracted target part;
A representative point determination step for setting a representative point, which is a point representing the position of the target portion, with respect to the target portion;
A gesture recognition step for recognizing a gesture based on the movement of the representative point;
A command generation step of generating an instruction corresponding to the recognized gesture;
Including
In the representative point determining step, the shape of the hand that is the target portion specified in the target portion shape specifying step is a first shape that is a shape in which some fingers are extended, or all fingers are extended. Or a second shape that is a shape in which all fingers are bent, and when the shape of the target portion is the first shape, a representative point at a position corresponding to the fingertip And a representative point is set at a position corresponding to the center of the hand when the shape of the target portion is the second shape .

In a gesture recognition device that detects a gesture from the acquired image and generates a command for the target device corresponding to the gesture,
An image acquisition step of acquiring an image;
A target part extraction step of extracting a human hand that is a target part to be gestured from the acquired image;
A target part shape specifying step for specifying the shape of the extracted target part;
A representative point determination step for setting a representative point, which is a point representing the position of the target portion, with respect to the target portion;
A gesture recognition step for recognizing a gesture based on the movement of the representative point;
A command generation step of generating an instruction corresponding to the recognized gesture;
And execute
In the representative point determining step, the shape of the hand that is the target portion specified in the target portion shape specifying step is a first shape that is a shape in which some fingers are extended, or all fingers are extended. Or a second shape that is a shape in which all fingers are bent, and when the shape of the target portion is the first shape, a representative point at a position corresponding to the fingertip , And a representative point is set at a position corresponding to the center of the hand when the shape of the target part is the second shape .

A recording medium on which the gesture recognition program according to claim 6 is recorded.