JP5381569B2

JP5381569B2 - Gesture recognition device, gesture recognition method, and gesture recognition program

Info

Publication number: JP5381569B2
Application number: JP2009225369A
Authority: JP
Inventors: 哲中島
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2009-09-29
Filing date: 2009-09-29
Publication date: 2014-01-08
Anticipated expiration: 2029-09-29
Also published as: JP2011076255A

Description

本発明は、カメラにより得られる画像を利用してジェスチャを認識する装置、方法、およびプログラムに係わる。 The present invention relates to an apparatus, a method, and a program for recognizing a gesture using an image obtained by a camera.

表示装置に表示されている対象物を選択するインタフェースの１つとして、カメラにより得られる画像を利用して、ユーザの手の動きとして例えばジャンケンのパーの形など所定の形状を保ったまま動かした手の動きを検出するジェスチャ認識が提案および実用化されている。ジェスチャ認識によるインタフェースは、ユーザが特別な機器を装着することなく、ユーザを撮影するカメラを用意することで実現される。このため、ジェスチャ認識によるインタフェースは、キーボードやマウス等の入力装置が適さない環境（たとえば、公共スペースに設けられる大型ディスプレイ等を利用するインタラクション）への適用が期待されている。 As one of the interfaces for selecting an object displayed on the display device, the image obtained by the camera is used to move the user's hand while maintaining a predetermined shape such as the shape of a janken par. Gesture recognition for detecting hand movement has been proposed and put into practical use. An interface based on gesture recognition is realized by preparing a camera for photographing a user without wearing a special device. For this reason, an interface based on gesture recognition is expected to be applied to an environment in which an input device such as a keyboard or a mouse is not suitable (for example, an interaction using a large display provided in a public space).

ジェスチャ認識は、例えば、ユーザを撮影した動画像データからそのユーザの手の動きを検出することにより実現される。以下、図１を参照しながら、動画像データを利用してユーザの手の動きを検出する方法の一例を説明する。 Gesture recognition is realized, for example, by detecting the movement of the user's hand from moving image data obtained by photographing the user. Hereinafter, an example of a method for detecting the movement of a user's hand using moving image data will be described with reference to FIG.

図１において、画像フレーム１〜３は、カメラにより得られる動画像データの一部である。画像フレーム２は、画像フレーム１よりも後のピクチャであり、画像フレーム３は、画像フレーム２よりも後のピクチャである。また、画像フレーム１〜３は、ユーザの手１０１の画像を含んでいる。 In FIG. 1, image frames 1 to 3 are a part of moving image data obtained by a camera. The image frame 2 is a picture after the image frame 1, and the image frame 3 is a picture after the image frame 2. The image frames 1 to 3 include the image of the user's hand 101.

ジェスチャ認識は、図１に示す方法では、手１０１に対応する領域内の特徴点を追跡することにより実現される。たとえば、画像フレーム１の特徴点ａ１は、画像フレーム２では特徴点ａ２として検出され、画像フレーム３では特徴点ａ３として検出される。この場合、手１０１は、特徴点ａ１から特徴点ａ２を経由して特徴点ａ３に至る経路で移動したものと判断される。これにより、ユーザがどのように手を動かしたのかが認識される。なお、動画像データからユーザの手に対応する領域を抽出して追跡することにより、特定の動作パターンを検出する方法は、例えば、非特許文献１に記載されている。 In the method shown in FIG. 1, gesture recognition is realized by tracking feature points in an area corresponding to the hand 101. For example, the feature point a1 of the image frame 1 is detected as the feature point a2 in the image frame 2, and is detected as the feature point a3 in the image frame 3. In this case, it is determined that the hand 101 has moved along a path from the feature point a1 to the feature point a3 via the feature point a2. Thereby, it is recognized how the user moved the hand. A method for detecting a specific motion pattern by extracting and tracking a region corresponding to a user's hand from moving image data is described in Non-Patent Document 1, for example.

情報処理学会論文誌、Vol.44、No.SIG5(CVIM 6)、Apr.2003「拡張机型インタフェースのための複数指先の追跡とその応用」IPSJ Journal, Vol.44, No.SIG5 (CVIM 6), Apr.2003 "Multiple fingertip tracking and its application for extended desk interface"

図１に示す方法では、ジェスチャを安定して認識することは難しい。特に、ユーザの手の動きを検出する場合は、画像上で手の形状が大きく変化することがあり、また、手と背景（例えば、ユーザの衣服、或いはそのユーザの後ろで動く別の人）とを識別することが困難なことがあり、特徴点の追跡が難しい。 In the method shown in FIG. 1, it is difficult to stably recognize a gesture. In particular, when detecting the movement of the user's hand, the shape of the hand may change significantly on the image, and the hand and background (eg, the user's clothes or another person moving behind the user) Can be difficult to identify, and it is difficult to track feature points.

例えば、図１に示す例では、画像上で手１０１の形状が変化しており、画像フレーム２において、画像フレーム１の特徴点ｂ１に対応する特徴点は検出されない。すなわち、特徴点ｂ１の追跡は失敗している。この場合、手１０１の動きを検出することができない。また、画像フレーム２においては、ノイズ１０２が発生している。ノイズ１０２は、例えば、背景の動きに起因する。このとき、画像フレーム１の特徴点ｃ１は、誤追跡により、画像フレーム２において特徴点ｃ２として検出されたとすると、実際の手の動きとは異なるジェスチャが認識されてしまう。このように、特徴点を追跡する方法においては、特徴点の誤検出あるいは未検出が発生することがあり、安定してジェスチャを認識することができない。 For example, in the example shown in FIG. 1, the shape of the hand 101 is changed on the image, and the feature point corresponding to the feature point b1 of the image frame 1 is not detected in the image frame 2. That is, the tracking of the feature point b1 has failed. In this case, the movement of the hand 101 cannot be detected. In the image frame 2, noise 102 is generated. The noise 102 is caused by, for example, background movement. At this time, if the feature point c1 of the image frame 1 is detected as the feature point c2 in the image frame 2 due to mistracking, a gesture different from the actual hand movement is recognized. As described above, in the method of tracking feature points, erroneous detection or non-detection of feature points may occur, and a gesture cannot be recognized stably.

本発明の課題は、カメラにより得られる画像を利用してジェスチャを安定して認識することである。 An object of the present invention is to stably recognize a gesture using an image obtained by a camera.

本発明の１つの態様のジェスチャ認識装置は、対象物の移動軌跡に基づいてジェスチャを認識する装置であって、異なる時刻に得られる複数の画像フレームにおいて、それぞれ、前記対象物に対応する対象領域を抽出する対象領域抽出部と、前記複数の画像フレームにおいて、それぞれ、前記対象領域の中から着目点を抽出する着目点抽出部と、前記複数の画像フレームにおいて、それぞれ、前記着目点の移動方向に基づいて前記対象領域の移動方向を判定する移動方向判定部と、前記複数の画像フレームについての前記移動方向判定部の判定結果を時系列に並べることにより得られる遷移データに基づいて、前記対象物の移動軌跡を検出する移動軌跡検出部、を有する。 A gesture recognition device according to one aspect of the present invention is a device that recognizes a gesture based on a movement trajectory of an object, and in each of a plurality of image frames obtained at different times, each object region corresponding to the object A target area extracting unit for extracting a target point, a target point extracting unit for extracting a target point from the target area, and a moving direction of the target point in the plurality of image frames, respectively. Based on transition data obtained by arranging the determination results of the movement direction determination unit for determining the movement direction of the target region based on the movement direction determination unit for the plurality of image frames in time series A movement trajectory detection unit for detecting a movement trajectory of the object;

本出願において開示される構成または方法によれば、カメラにより得られる画像を利用してジェスチャを安定して認識することができる。 According to the configuration or method disclosed in the present application, it is possible to stably recognize a gesture using an image obtained by a camera.

ユーザの手の動きを検出する方法の一例を説明する図である。It is a figure explaining an example of the method of detecting a user's hand movement. ジェスチャ認識システムの一例を示す図である。It is a figure which shows an example of a gesture recognition system. 第１の実施形態のジェスチャ認識装置の構成を示す図である。It is a figure which shows the structure of the gesture recognition apparatus of 1st Embodiment. ジェスチャ認識装置の動作を説明する図である。It is a figure explaining operation | movement of a gesture recognition apparatus. 着目点の動きベクトルを計算する方法の一例を説明する図である。It is a figure explaining an example of the method of calculating the motion vector of an attention point. 方向領域について説明する図である。It is a figure explaining a direction field. 遷移データ照合部の動作を説明する図である。It is a figure explaining operation | movement of a transition data collation part. 第１の実施形態のジェスチャ認識方法を示すフローチャートである。It is a flowchart which shows the gesture recognition method of 1st Embodiment. 第２の実施形態のジェスチャ認識方法を説明する図である。It is a figure explaining the gesture recognition method of 2nd Embodiment. アイコン画像データ格納部の実施例である。It is an Example of an icon image data storage part. 第３の実施形態のジェスチャ認識方法を説明する図である。It is a figure explaining the gesture recognition method of 3rd Embodiment. 複数の対象領域が抽出された画像フレームを示す図である。It is a figure which shows the image frame from which the several object area | region was extracted. ジェスチャ認識装置のハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of a gesture recognition apparatus.

図２は、実施形態のジェスチャ認識方法を提供するジェスチャ認識システムの一例を示す図である。ジェスチャ認識システムは、この実施例では、ジェスチャ認識装置１、カメラ２、表示装置３を備える。そして、ジェスチャ認識装置１は、この実施例では、コンピュータを用いてジェスチャ認識プログラムを実行することにより実現され、カメラ２により撮影される人物（以下、ユーザ）のジェスチャを認識する。 FIG. 2 is a diagram illustrating an example of a gesture recognition system that provides the gesture recognition method according to the embodiment. In this embodiment, the gesture recognition system includes a gesture recognition device 1, a camera 2, and a display device 3. In this embodiment, the gesture recognition apparatus 1 is realized by executing a gesture recognition program using a computer, and recognizes a gesture of a person (hereinafter referred to as a user) photographed by the camera 2.

カメラ２は、表示装置３の正面方向（表示画面にほぼ垂直な方向）を撮影するように、表示装置３の近傍に設置されている。なお、カメラ２は、表示装置３に取り付けられていてもよいし、表示装置３に内蔵されていてもよい。また、カメラ２は、例えばデジタルビデオカメラであり、カメラ２により得られる動画像データはジェスチャ認識装置１に有線または無線通信により送信される。表示装置３はジェスチャ認識装置１に有線または無線通信により接続されており、カメラ２により撮影される動画像をリアルタイムで表示することができる。したがって、表示装置３の正面にユーザが位置すると、そのユーザを含む画像のデータがジェスチャ認識装置１に送信されると共に、そのユーザを含む画像が表示装置３に表示される。 The camera 2 is installed in the vicinity of the display device 3 so as to photograph the front direction of the display device 3 (direction substantially perpendicular to the display screen). The camera 2 may be attached to the display device 3 or may be built in the display device 3. The camera 2 is a digital video camera, for example, and moving image data obtained by the camera 2 is transmitted to the gesture recognition device 1 by wire or wireless communication. The display device 3 is connected to the gesture recognition device 1 by wired or wireless communication, and can display a moving image captured by the camera 2 in real time. Therefore, when a user is positioned in front of the display device 3, image data including the user is transmitted to the gesture recognition device 1 and an image including the user is displayed on the display device 3.

ジェスチャ認識装置１は、カメラ２により得られる動画像データを利用して、表示装置３の正面に位置するユーザ（すなわち、カメラ２により撮影されるユーザ）のジェスチャを認識する。この実施例では、ジェスチャ認識装置１は、ユーザの手がどのような軌道パターンで動かされたのかを検出する。図２においては、ジェスチャ認識装置１は、ユーザが手で「輪」を描いたことを認識する。 The gesture recognition device 1 recognizes a gesture of a user (that is, a user photographed by the camera 2) located in front of the display device 3 using moving image data obtained by the camera 2. In this embodiment, the gesture recognition device 1 detects in which trajectory pattern the user's hand is moved. In FIG. 2, the gesture recognition device 1 recognizes that the user has drawn a “ring” by hand.

なお、この実施例では、ユーザは、あたかも鏡に映った自分の姿を見ているかのように、表示装置３に表示されている自分の姿を見ながら所望のジェスチャを行うことができる。したがって、特に限定されるものではないが、表示装置３には、図２に示すように、カメラ２で得られる画像の左右反転画像が表示されるようにしてもよい。ただし、表示装置３は、実施形態のジェスチャ認識方法を実現するための必須の構成要素ではない。 In this embodiment, the user can perform a desired gesture while looking at his / her appearance displayed on the display device 3 as if he / she was looking at his / her appearance in the mirror. Therefore, although not particularly limited, the display device 3 may display a horizontally reversed image of the image obtained by the camera 2 as shown in FIG. However, the display device 3 is not an essential component for realizing the gesture recognition method of the embodiment.

＜第１の実施形態＞
図３は、第１の実施形態のジェスチャ認識装置１の構成を示す図である。ジェスチャ認識装置１は、この実施例では、対象領域抽出部１１、着目点抽出部１２、移動方向判定部１３、移動軌跡検出部１４を備える。そして、ジェスチャ認識装置１には、カメラ２により得られる動画像データが入力される。動画像データは、特に限定されるものではないが、例えば、３０フレーム／秒である。なお、後述する第２および第３の実施形態においても、ジェスチャ認識装置の基本的な構成は同じである。 <First Embodiment>
FIG. 3 is a diagram illustrating a configuration of the gesture recognition device 1 according to the first embodiment. In this embodiment, the gesture recognition device 1 includes a target area extraction unit 11, a point of interest extraction unit 12, a movement direction determination unit 13, and a movement locus detection unit 14. Then, the moving image data obtained by the camera 2 is input to the gesture recognition device 1. The moving image data is not particularly limited, but is, for example, 30 frames / second. The basic configuration of the gesture recognition device is the same in the second and third embodiments described later.

対象領域抽出部１１は、異なる時刻に得られる複数の画像フレームにおいて、それぞれ対象物に対応する対象領域を抽出する。このとき、ジェスチャ認識装置１は、例えば、入力される動画像データを構成する各画像フレームにおいてそれぞれ対象領域を抽出する。ただし、対象領域抽出部１１は、所定の割合で間引きが行われた動画像データの各画像フレームから対象領域を抽出するようにしてもよい。 The target area extraction unit 11 extracts a target area corresponding to each target object in a plurality of image frames obtained at different times. At this time, the gesture recognition device 1 extracts, for example, each target region in each image frame constituting the input moving image data. However, the target area extracting unit 11 may extract the target area from each image frame of the moving image data that has been thinned out at a predetermined rate.

上記対象物は、この実施例では、カメラ２により撮影されるユーザの「手」である。したがって、対象領域抽出部１１は、各画像フレームにおいて、ユーザの手に対応する画像領域を抽出する。ユーザの手に対応する画像領域は、例えば、画素の色に基づいて抽出される。また、ジェスチャ認識装置１は、ユーザのジェスチャ（すなわち、「動き」）を検出して認識する。よって、対象領域抽出部１１は、画像フレーム内で動いている領域を、ユーザの手に対応する画像領域として抽出するようにしてもよい。さらに、対象領域抽出部１１は、上記２つの抽出条件を組み合わせてユーザの手に対応する画像領域を抽出するようにしてもよい。 In this embodiment, the object is the user's “hand” taken by the camera 2. Therefore, the target area extraction unit 11 extracts an image area corresponding to the user's hand in each image frame. The image area corresponding to the user's hand is extracted based on the color of the pixel, for example. In addition, the gesture recognition device 1 detects and recognizes a user's gesture (that is, “movement”). Therefore, the target area extraction unit 11 may extract an area moving in the image frame as an image area corresponding to the user's hand. Further, the target area extraction unit 11 may extract an image area corresponding to the user's hand by combining the two extraction conditions.

図４は、ジェスチャ認識装置１の動作を説明する図である。図４（ａ）において、画像フレーム１〜３は、カメラ２により得られる動画像データの一部である。画像フレーム２は、画像フレーム１よりも後のピクチャであり、画像フレーム３は、画像フレーム２よりも後のピクチャである。ここで、画像フレーム１〜３は、例えば、連続する３枚の画像フレームであってもよい。 FIG. 4 is a diagram for explaining the operation of the gesture recognition device 1. In FIG. 4A, image frames 1 to 3 are a part of moving image data obtained by the camera 2. The image frame 2 is a picture after the image frame 1, and the image frame 3 is a picture after the image frame 2. Here, the image frames 1 to 3 may be, for example, three consecutive image frames.

対象領域２１は、この実施例では、上述したように、カメラ２により撮影されるユーザの手に対応する画像領域である。ここで、図４（ａ）に示す例では、時間経過に伴って、ユーザの手の位置が移動するとともに、その形状も変化している。このため、画像フレーム１〜３において、対象領域２１の位置が移動しており、また、対象領域２１の形状も変化している。なお、画像フレーム１、３においては、対象領域２１のみが検出されているが、画像フレーム２においては、対象領域２１および対象領域２２が検出されている。対象領域２２は、上述した抽出条件に従って検出された領域ではあるが、ユーザの手に対応する画像領域でないものとする。すなわち、対象領域２２は、ノイズ画像領域である。なお、ノイズ画像領域は、例えば、ユーザの背景の動きに起因して生じる。 In this embodiment, the target area 21 is an image area corresponding to the user's hand photographed by the camera 2 as described above. Here, in the example shown in FIG. 4A, the position of the user's hand moves and the shape thereof changes with the passage of time. For this reason, in the image frames 1 to 3, the position of the target area 21 has moved, and the shape of the target area 21 has also changed. Note that only the target area 21 is detected in the image frames 1 and 3, but the target area 21 and the target area 22 are detected in the image frame 2. The target area 22 is an area detected according to the above-described extraction condition, but is not an image area corresponding to the user's hand. That is, the target area 22 is a noise image area. Note that the noise image region is generated due to, for example, the movement of the background of the user.

着目点抽出部１２は、各画像フレームにおいて、それぞれ、対象領域抽出部１１により抽出された対象領域（２１、２２）の中から、１以上の着目点を抽出する。この実施例では、各画像フレームにおいて複数の着目点が抽出される。画像フレーム１においては、対象領域２１から着目点ａ１〜ｇ１が抽出されている。また、画像フレーム２においては、対象領域２１から着目点ａ２〜ｇ２が抽出され、対象領域２２から着目点ｈ２が抽出されている。さらに、画像フレーム３においては、対象領域２１から着目点ａ３〜ｇ３が抽出されている。 The point-of-interest extraction unit 12 extracts one or more points of interest from the target regions (21, 22) extracted by the target region extraction unit 11 in each image frame. In this embodiment, a plurality of points of interest are extracted in each image frame. In the image frame 1, the points of interest a 1 to g 1 are extracted from the target region 21. In the image frame 2, the points of interest a 2 to g 2 are extracted from the target region 21, and the point of interest h 2 is extracted from the target region 22. Furthermore, in the image frame 3, the points of interest a 3 to g 3 are extracted from the target region 21.

着目点は、特に限定されるものではないが、画像上の特徴点である。特徴点は、画素値の変化が大きい点（または、画素値の変化が大きい位置の画素）であり、公知の技術（例えば、デジタルフィルタ）を利用して抽出することができる。なお、特徴点は、例えば、画像上のエッジの角に存在する。また、画素値は、例えば、各画素の輝度情報または色情報である。ここで、着目点として特徴点を使用すれば、後述する動きベクトル計算の精度が向上する。ただし、着目点は、特徴点である必要はなく、例えば、対象領域の重心点であってもよい。なお、着目点抽出部１２は、画像フレーム毎に独立して、それぞれ、１以上の着目点を抽出することができる。すなわち、着目点抽出部１２は、画像フレーム１において抽出される着目点ａ１〜ｇ１とは無関係に、着目点ａ２〜ｇ２、ａ３〜ｇ３を抽出することができ、画像フレーム２において抽出される着目点ａ２〜ｇ２とは無関係に、着目点ａ３〜ｇ３を抽出することができる。このように、実施形態のジェスチャ認識方法においては、画像フレーム間で着目点（または、特徴点）を追跡する必要はなく、また、画像フレーム間で着目点を対応づける必要もない。 The point of interest is not particularly limited, but is a feature point on the image. A feature point is a point with a large change in pixel value (or a pixel at a position with a large change in pixel value), and can be extracted using a known technique (for example, a digital filter). Note that the feature points exist at the corners of the edges on the image, for example. The pixel value is, for example, luminance information or color information of each pixel. Here, if a feature point is used as a point of interest, the accuracy of motion vector calculation described later is improved. However, the point of interest does not need to be a feature point, and may be, for example, the center of gravity of the target region. Note that the point-of-interest extraction unit 12 can extract one or more points of interest independently for each image frame. That is, the point-of-interest extraction unit 12 can extract the points of interest a2 to g2 and a3 to g3 regardless of the points of interest a1 to g1 extracted in the image frame 1, and the points of interest extracted in the image frame 2 Regardless of the points a2 to g2, the points of interest a3 to g3 can be extracted. As described above, in the gesture recognition method according to the embodiment, it is not necessary to track a point of interest (or a feature point) between image frames, and it is not necessary to associate a point of interest between image frames.

移動方向判定部１３は、動きベクトル計算部１３ａおよびヒストグラム作成部１３ｂを備える。そして、移動方向判定部１３は、各画像フレームにおいて、対象領域抽出部１１により抽出された対象領域の移動方向を判定する。ここで、対象領域の移動方向は、その対象領域から抽出された各着目点の移動方向に基づいて判定される。 The movement direction determination unit 13 includes a motion vector calculation unit 13a and a histogram creation unit 13b. Then, the movement direction determination unit 13 determines the movement direction of the target area extracted by the target area extraction unit 11 in each image frame. Here, the moving direction of the target area is determined based on the moving direction of each point of interest extracted from the target area.

動きベクトル計算部１３ａは、各着目点の動きベクトルを計算する。ここで、動画像データにおいて任意の画像領域の動きベクトルを計算する方法は、特に限定されるものではなく、公知の技術を利用することができる。 The motion vector calculation unit 13a calculates a motion vector of each target point. Here, a method for calculating a motion vector of an arbitrary image region in moving image data is not particularly limited, and a known technique can be used.

図５は、着目点の動きベクトルを計算する方法の一例を説明する図である。ここで、図５（ａ）は、着目点の動きベクトルを計算すべき画像フレーム（以下、当該フレーム）を示している。以下の説明では、図５（ａ）において、座標（４，４）に位置する画素が着目点であるものとする。また、図５（ｂ）は、当該フレームの直前の画像フレームを示している。なお、図５（ａ）および図５（ｂ）において、各画素に対応する数値は、たとえば、輝度情報などの画素値、または画像フレーム全体に対して所定のフィルタ演算を行った演算結果に相当する。 FIG. 5 is a diagram for explaining an example of a method for calculating a motion vector of a point of interest. Here, FIG. 5A shows an image frame (hereinafter referred to as the frame) in which the motion vector of the point of interest is to be calculated. In the following description, it is assumed that the pixel located at the coordinates (4, 4) is the point of interest in FIG. FIG. 5B shows an image frame immediately before the frame. In FIGS. 5A and 5B, the numerical value corresponding to each pixel corresponds to, for example, a pixel value such as luminance information or a calculation result obtained by performing a predetermined filter operation on the entire image frame. To do.

動きベクトル計算部１３ａは、まず、図５（ａ）に示す当該フレームにおいて、着目点画素および着目点画素に隣接する８個の画素を備える３×３画素領域（以下、領域４１）を抽出する。続いて、動きベクトル計算部１３ａは、図５（ｂ）に示す画像フレームにおいて、領域４１と最も相関の高い３×３画素領域をサーチする。領域間の相関は、たとえば、対応する画素間の画素値の差分の絶対値の総和により計算される。この場合、例えば、総和が最小になる３×３画素領域が抽出される。図５に示す例では、破線枠で囲まれた領域４２が検出される。そして、領域４１、４２間に座標の差分に基づいて、着目点の動きベクトルが算出される。この例では、当該フレームにおける着目点の座標が（４，４）であり、その直前のフレームの領域４２の中心座標が（５，５）である。したがって、この着目点の動きベクトルとして「−１，−１」が得られる。なお、上記説明の座標系は、図５が示されている紙面において、右方向がＸ軸正方向であり、下方向がＹ軸正方向であるものとしている。 First, the motion vector calculation unit 13a extracts a 3 × 3 pixel region (hereinafter referred to as a region 41) including a target pixel and eight pixels adjacent to the target pixel in the frame illustrated in FIG. . Subsequently, the motion vector calculation unit 13a searches for a 3 × 3 pixel region having the highest correlation with the region 41 in the image frame illustrated in FIG. The correlation between regions is calculated by, for example, the sum of absolute values of pixel value differences between corresponding pixels. In this case, for example, a 3 × 3 pixel region having the minimum sum is extracted. In the example shown in FIG. 5, a region 42 surrounded by a broken line frame is detected. Then, based on the coordinate difference between the regions 41 and 42, the motion vector of the point of interest is calculated. In this example, the coordinate of the point of interest in the frame is (4, 4), and the center coordinate of the area 42 of the immediately preceding frame is (5, 5). Therefore, “−1, −1” is obtained as the motion vector of this point of interest. In the coordinate system described above, the right direction is the X-axis positive direction and the lower direction is the Y-axis positive direction on the paper surface shown in FIG.

以下、同様に、動きベクトル計算部１３ａは、各画像フレームにおいて、各着目点の動きベクトルを計算する。なお、着目点の動きベクトルは、例えば、着目点を含む画像フレームおよびその直前の画像フレームを利用して計算されるが、これに限定されるものではない。すなわち、着目点を含む画像フレームおよびその画像フレームよりも２以上前の画像フレームを利用して動きベクトルが計算されてもよい。或いは、着目点を含む画像フレームおよびその画像フレームよりも後の画像フレームを利用して動きベクトルが計算されてもよい。 Hereinafter, similarly, the motion vector calculation unit 13a calculates the motion vector of each point of interest in each image frame. Note that the motion vector of the point of interest is calculated using, for example, an image frame including the point of interest and an image frame immediately before the frame, but is not limited thereto. That is, a motion vector may be calculated using an image frame including a point of interest and an image frame two or more before the image frame. Alternatively, the motion vector may be calculated using an image frame including the point of interest and an image frame after the image frame.

ヒストグラム作成部１３ｂは、各画像フレームにおいて、各着目点の移動方向が、図６に示す方向領域１〜９のいずれに属すのかを判定する。例えば、図５に示す例では、着目点の移動ベクトルは「−１，−１」である。この場合、この着目点の移動方向は１３５°である。そうすると、この着目点は、方向領域４に属すると判定される。さらに、他の着目点についても、同様に、移動方向が属する方向領域が判定される。そして、ヒストグラム作成部１３ｂは、各方向領域に属する着目点の数をカウントすることにより、移動方向ヒストグラムを作成する。 The histogram creation unit 13b determines which of the direction areas 1 to 9 illustrated in FIG. 6 the moving direction of each point of interest belongs to in each image frame. For example, in the example illustrated in FIG. 5, the movement vector of the point of interest is “−1, −1”. In this case, the moving direction of this point of interest is 135 °. Then, this point of interest is determined to belong to the direction area 4. Furthermore, the direction area to which the moving direction belongs is similarly determined for other points of interest. Then, the histogram creation unit 13b creates a moving direction histogram by counting the number of points of interest belonging to each direction area.

図４（ｂ）は、図４（ａ）に示す画像フレーム１〜３について作成された移動方向ヒストグラムの例である。画像フレーム１においては、多くの着目点（例えば、ａ１、ｄ１、ｅ１、ｆ１）が右方向または概ね右方向に移動している。したがって、画像フレーム１についてのヒストグラムでは、方向領域１（０〜４０°）に属する着目点の個数が最も多くなっている。また、画像フレーム２においては、多くの着目点（例えば、ｂ２、ｄ２、ｆ２、ｇ２）が右下方向または概ね右下方向に移動している。したがって、画像フレーム２についてのヒストグラムでは、方向領域８（２８０〜３２０°）に属する着目点の個数が最も多くなっている。さらに、画像フレーム３についてのヒストグラムにおいては、方向領域７（２４０〜２８０°）に属する着目点の個数が最も多くなっている。 FIG. 4B is an example of a moving direction histogram created for the image frames 1 to 3 shown in FIG. In the image frame 1, many points of interest (for example, a1, d1, e1, f1) are moving rightward or substantially rightward. Therefore, in the histogram for the image frame 1, the number of points of interest belonging to the direction region 1 (0 to 40 °) is the largest. In the image frame 2, many points of interest (for example, b2, d2, f2, and g2) are moved in the lower right direction or substantially in the lower right direction. Therefore, in the histogram for the image frame 2, the number of points of interest belonging to the direction area 8 (280 to 320 °) is the largest. Furthermore, in the histogram for the image frame 3, the number of points of interest belonging to the direction area 7 (240 to 280 °) is the largest.

ヒストグラム作成部１３ｂは、各画像フレームについて、作成した移動方向ヒストグラムに基づいて、対象領域２１の移動方向を判定する。この例では、着目点の個数が最も多い方向領域（着目点の属する頻度が最も高い方向領域）が、対象領域２１の尤らしい移動方向と判定される。したがって、図４に示す例では、画像フレーム１、２、３に対して、対象領域２１の移動方向の判定結果として、それぞれ「１」「８」「７」が出力される。 The histogram creation unit 13b determines the movement direction of the target region 21 for each image frame based on the created movement direction histogram. In this example, the direction region with the largest number of points of interest (the direction region with the highest frequency of interest of points of interest) is determined as the likely moving direction of the target region 21. Therefore, in the example illustrated in FIG. 4, “1”, “8”, and “7” are output as the determination results of the moving direction of the target area 21 for the image frames 1, 2, and 3, respectively.

移動軌跡検出部１４は、遷移データ作成部１４ａ、遷移データ照合部１４ｂ、ジェスチャ判定部１４ｃを備える。そして、移動軌跡検出部１４は、各画像フレームについての移動方向判定部１３の判定結果を時系列に並べることにより生成されるデータに基づいて、対象物の移動軌跡を検出する。ここで、この時系列データは、各画像フレームにおける対照領域２１の移動方向の時間変化を表している。したがって、以下では、この時系列データを、方向遷移データと呼ぶことがある。 The movement trajectory detection unit 14 includes a transition data creation unit 14a, a transition data verification unit 14b, and a gesture determination unit 14c. Then, the movement trajectory detection unit 14 detects the movement trajectory of the object based on data generated by arranging the determination results of the movement direction determination unit 13 for each image frame in time series. Here, this time-series data represents a temporal change in the moving direction of the reference region 21 in each image frame. Therefore, hereinafter, this time-series data may be referred to as direction transition data.

遷移データ作成部１４ａは、各画像フレームにおける対象領域の移動方向の判定結果を時系列に並べることにより、方向遷移データを生成する。例えば、図４（ａ）に示す画像フレーム１〜３に対して、図４（ｂ）に示す移動方向ヒストグラムが作成された場合は、判定結果「１」「８」「７」が得られる。この場合、図４（ｃ）に示す方向遷移データが生成される。 The transition data creation unit 14a generates direction transition data by arranging the determination results of the moving direction of the target region in each image frame in time series. For example, when the moving direction histogram shown in FIG. 4B is created for the image frames 1 to 3 shown in FIG. 4A, the determination results “1”, “8”, and “7” are obtained. In this case, the direction transition data shown in FIG. 4C is generated.

遷移データ照合部１４ｂは、遷移データ作成部１４ａにより生成された方向遷移データと、予め決められた移動軌跡を方向遷移データと同じ形式で表した参照パターンデータとを比較する。ここで、予め決められた移動軌跡が「円」であるものとする。移動軌跡が円である場合、対象物の移動方向は、時間経過に伴って連続的に変化する。すなわち、この場合、参照パターンデータは、例えば以下のように表される。なお、下記の例では、参照パターンデータは、１８桁で表されている。
参照パターンデータ（円）＝１１２２３３４４５５６６７７８８９９
また、この例では、下記の方向遷移データが得られているものとする。
方向遷移データ＝１３２２３３４５５５６５６７８８８９ The transition data collating unit 14b compares the direction transition data generated by the transition data creating unit 14a with reference pattern data representing a predetermined movement locus in the same format as the direction transition data. Here, it is assumed that the predetermined movement trajectory is “circle”. When the movement locus is a circle, the moving direction of the object changes continuously with time. That is, in this case, the reference pattern data is expressed as follows, for example. In the following example, the reference pattern data is represented by 18 digits.
Reference pattern data (circle) = 112234345567678899
In this example, it is assumed that the following direction transition data is obtained.
Direction transition data = 13223335555656788889

遷移データ照合部１４ｂは、図７に示すように、桁ごとに、方向遷移データと参照パターンデータとの間の差分値の絶対値を計算する。ここで、方向遷移データの各桁の値は、上述したように、図６に示す方向領域（すなわち、対象領域２１の移動方向）を表す。 As shown in FIG. 7, the transition data matching unit 14 b calculates the absolute value of the difference value between the direction transition data and the reference pattern data for each digit. Here, the value of each digit of the direction transition data represents the direction area shown in FIG. 6 (that is, the moving direction of the target area 21) as described above.

遷移データ照合部１４ｂは、各桁の差分絶対値の総和を計算する。図７に示す例では、「６」が得られている。さらに、遷移データ照合部１４ｂは、この総和と予め設定されている照合閾値とを比較する。そして、この比較結果は、ジェスチャ判定部１４ｃに与えられる。 The transition data matching unit 14b calculates the sum of the absolute difference values for each digit. In the example shown in FIG. 7, “6” is obtained. Further, the transition data collating unit 14b compares this sum with a preset collation threshold value. The comparison result is given to the gesture determination unit 14c.

ジェスチャ判定部１４ｃは、上記総和が照合閾値よりも小さければ、方向遷移データにより表される移動軌跡と参照パターンデータにより表される移動軌跡が類似していると判定する。上述の例では、対象物（すなわち、カメラ２により撮影されるユーザの手）の移動軌跡が円であると判定される。換言すれば、ジェスチャ判定部１４ｃは、ユーザが手で円を描くジェスチャを行った、と判定する。 The gesture determination unit 14c determines that the movement trajectory represented by the direction transition data is similar to the movement trajectory represented by the reference pattern data if the sum is smaller than the collation threshold. In the above example, it is determined that the movement trajectory of the object (that is, the user's hand photographed by the camera 2) is a circle. In other words, the gesture determination unit 14c determines that the user has made a gesture for drawing a circle by hand.

図８は、第１の実施形態のジェスチャ認識方法を示すフローチャートである。ステップＳ１では、カメラ２により得られる動画像データの各画像フレームがジェスチャ認識装置１に入力される。ステップＳ２において、対象領域抽出部１１は、各画像フレームにおいて、対象領域を抽出する。対象領域は、上述の例では、カメラ２により撮影されるユーザの手に対応する画像領域である。ステップＳ３において、着目点抽出部１２は、各画像フレームにおいて、抽出されている対象領域から複数の着目点を抽出する。着目点は、例えば、画像の特徴点である。 FIG. 8 is a flowchart illustrating the gesture recognition method according to the first embodiment. In step S 1, each image frame of moving image data obtained by the camera 2 is input to the gesture recognition device 1. In step S2, the target area extraction unit 11 extracts the target area in each image frame. In the above example, the target area is an image area corresponding to the hand of the user photographed by the camera 2. In step S3, the point-of-interest extraction unit 12 extracts a plurality of points of interest from the extracted target area in each image frame. The point of interest is, for example, a feature point of the image.

ステップＳ４において、動きベクトル計算部１３ａは、各画像フレームにおいて、各着目点の動きベクトルを計算する。そして、動きベクトル計算部１３ａは、算出した動きベクトルに基づいて、各着目点の移動方向（上述の例では、図６に示す方向領域）を判定する。ステップＳ５において、ヒストグラム作成部１３ｂは、各画像フレームについて、各着目点の移動方向についての出現頻度を表す移動方向ヒストグラムを作成する。画像フレームごとに作成された移動方向ヒストグラムは、ジェスチャ認識装置１が備えるメモリ領域に格納される。このとき、ジェスチャ認識装置１は、最新の数秒間分の画像フレームについての移動方向ヒストグラムのみを保持するようにしてもよい。 In step S4, the motion vector calculation unit 13a calculates the motion vector of each point of interest in each image frame. Then, the motion vector calculation unit 13a determines the moving direction of each point of interest (the direction area shown in FIG. 6 in the above example) based on the calculated motion vector. In step S 5, the histogram creation unit 13 b creates a movement direction histogram representing the appearance frequency of each focus point in the movement direction for each image frame. The moving direction histogram created for each image frame is stored in a memory area provided in the gesture recognition device 1. At this time, the gesture recognition device 1 may hold only the movement direction histogram for the image frames for the latest several seconds.

ステップＳ６において、遷移データ作成部１４ａは、各画像フレームについての移動方向ヒストグラムを参照し、最も出現頻度の高い移動方向を判定する。そして、遷移データ作成部１４ａは、各画像フレームについての判定結果（すなわち、対象領域の尤らしい移動方向を表すデータ）を時系列に並べることにより、方向遷移データを生成する。ステップＳ７において、遷移データ照合部１４ｂは、作成された方向遷移データと参照パターンデータとの差分を計算する。参照パターンデータは、予め決められた特定のジェスチャを表す。 In step S6, the transition data creation unit 14a refers to the movement direction histogram for each image frame and determines the movement direction with the highest appearance frequency. Then, the transition data creating unit 14a generates direction transition data by arranging the determination results for each image frame (that is, data representing the likely moving direction of the target region) in time series. In step S7, the transition data matching unit 14b calculates a difference between the created direction transition data and reference pattern data. The reference pattern data represents a specific gesture determined in advance.

ステップＳ８において、ジェスチャ判定部１４ｃは、方向遷移データと参照パターンデータとの差分と照合閾値とを比較する。そして、上記差分が照合閾値よりも小さければ、ジェスチャ判定部１４ｃは、ユーザが上記特定のジェスチャを行ったと判定する。一方、上記差分が照合閾値以上であれば、ジェスチャ判定部１４ｃは、ユーザが上記特定のジェスチャを行っていないと判定する。 In step S8, the gesture determination unit 14c compares the difference between the direction transition data and the reference pattern data with a matching threshold value. And if the said difference is smaller than a collation threshold value, the gesture determination part 14c will determine with the user having performed the said specific gesture. On the other hand, if the difference is equal to or greater than the collation threshold, the gesture determination unit 14c determines that the user is not performing the specific gesture.

このように、第１の実施形態のジェスチャ認識装置１は、画像フレームごとに対象領域（ユーザの手に対応する画像領域）を抽出するとともに、その対象領域から着目点を抽出する。そして、抽出された着目点について前画像フレームからの移動方向が特定され、その移動方向の時間変化に基づいて対象物（ユーザの手）の移動軌跡が検出される。すなわち、ユーザのジェスチャが認識される。これにより、画像上で対象物の位置および形状の時間変化が大きくても、その移動軌跡を検出することができる。 As described above, the gesture recognition device 1 according to the first embodiment extracts a target area (an image area corresponding to a user's hand) for each image frame, and extracts a point of interest from the target area. Then, the movement direction from the previous image frame is specified for the extracted point of interest, and the movement locus of the object (user's hand) is detected based on the time change of the movement direction. That is, the user's gesture is recognized. Thereby, even if the time change of the position and shape of the object is large on the image, the movement trajectory can be detected.

また、ジェスチャ認識装置１は、画像フレーム毎に複数の着目点を抽出し、各着目点の動きベクトル（すなわち、移動方向）を計算する。各着目点の移動方向は、所定角度幅ごとに出現数をカウントするヒストグラムで管理される。このヒストグラムを利用して、最も出現頻度の高い移動方向が特定され、この特定された移動方向が対象物の移動方向と判定される。このため、例えば、対象物（ユーザの手）の形状が変わることに起因して、各着目点の移動方向が一定でない場合であっても、対象物全体としての移動方向を精度よく検出できる。また、図４（ａ）に示すように、画像フレーム２においてノイズ（対象領域２２）が発生しても、そのノイズの影響は適切に除去または抑制される。よって、第１の実施形態のジェスチャ認識装置１は、対象物の移動軌跡を精度よく検出でき、ユーザのジェスチャを精度よく認識できる。 In addition, the gesture recognition device 1 extracts a plurality of points of interest for each image frame, and calculates a motion vector (that is, a moving direction) of each point of interest. The moving direction of each point of interest is managed by a histogram that counts the number of appearances for each predetermined angle width. Using this histogram, the moving direction with the highest appearance frequency is specified, and the specified moving direction is determined as the moving direction of the object. For this reason, for example, even when the moving direction of each point of interest is not constant due to a change in the shape of the object (user's hand), the moving direction of the entire object can be detected with high accuracy. Further, as shown in FIG. 4A, even if noise (target region 22) occurs in the image frame 2, the influence of the noise is appropriately removed or suppressed. Therefore, the gesture recognition device 1 according to the first embodiment can accurately detect the movement trajectory of the object and can accurately recognize the user's gesture.

次に、図１に示す方法と、第１の実施形態の方法とを比較する。図１に示す方法では、まず、ある画像フレームにおいて特徴点を抽出し、以降の画像フレームにおいてその特徴点が追跡される。すなわち、特徴点が複数の画像フレームに渡って時系列に対応づけられる。このため、ユーザの手の位置および形状の変化が大きいと、画像フレーム間で対応づけられるべき特徴点の検出が困難（或いは、不可能）である。 Next, the method shown in FIG. 1 is compared with the method of the first embodiment. In the method shown in FIG. 1, first, feature points are extracted in a certain image frame, and the feature points are tracked in subsequent image frames. That is, feature points are correlated in time series over a plurality of image frames. For this reason, if the position and shape of the user's hand changes greatly, it is difficult (or impossible) to detect feature points to be associated between image frames.

これに対して、第１の実施形態の方法によれば、各画像フレームにおいてそれぞれユーザの手に対応する領域から複数の着目点を抽出し、各着目点の移動方向を特定する。そして、最も出現頻度の高い移動方向に基づいて、支配的な移動方向が判定される。このとき、移動方向を判定する処理は、画像フレーム毎に独立して行うことができ、複数の画像フレームに渡って着目点を追跡して対応づける必要はない。このため、第１の実施形態の方法によれば、特徴点の誤検出あるいは未検出の問題が発生することはなく、認識精度が低下することはない。 On the other hand, according to the method of the first embodiment, a plurality of points of interest are extracted from regions corresponding to the user's hand in each image frame, and the moving direction of each point of interest is specified. Then, the dominant moving direction is determined based on the moving direction with the highest appearance frequency. At this time, the process of determining the moving direction can be performed independently for each image frame, and it is not necessary to track and associate the point of interest over a plurality of image frames. For this reason, according to the method of the first embodiment, there is no problem of misdetection or non-detection of feature points, and recognition accuracy does not decrease.

なお、ユーザがカメラ２の前でジェスチャを行うとき、手の移動速度はユーザ毎に異なる。このため、ジェスチャ認識装置１は、データ長の異なる複数の参照パターンデータを保持し、方向遷移データを各参照パターンデータにそれぞれ照合するようにしてもよい。この場合、ジェスチャ認識装置１は、例えば、下記の参照パターンデータを保持する。
参照パターン（高速）＝１２３４５６７８９
参照パターン（中速）＝１１２２３３４４５５６６７７８８９９
参照パターン（低速）＝１１１２２２３３３４４４５５５６６６７７７８８８９９９
あるいは、ジェスチャ認識装置１は、必要に応じて、参照パターンデータのデータ長に合わせるように、生成した方向遷移データのデータ長を伸張または圧縮するようにしてもよい。 Note that when the user performs a gesture in front of the camera 2, the moving speed of the hand varies from user to user. For this reason, the gesture recognition device 1 may hold a plurality of reference pattern data having different data lengths and collate the direction transition data with each reference pattern data. In this case, the gesture recognition device 1 holds the following reference pattern data, for example.
Reference pattern (high speed) = 1234456789
Reference pattern (medium speed) = 1122334455567787899
Reference pattern (low speed) = 11212223334445555666778889999
Alternatively, the gesture recognition device 1 may expand or compress the data length of the generated direction transition data so as to match the data length of the reference pattern data as necessary.

また、ユーザが手で円を描くジェスチャをするとき、その移動軌跡の開始位置はユーザ毎に異なる。例えば、あるユーザは、円の最下点から軌道を開始するかも知れないし、他のユーザは円の最上点から軌道を開始するかも知れない。したがって、ジェスチャ認識装置１は、開始位置の異なる複数の参照パターンデータを保持し、方向遷移データを各参照パターンデータに照合するようにしてもよい。この場合、ジェスチャ認識装置１は、例えば、下記の参照パターンデータを保持する。
参照パターン＝１１２２３３４４５５６６７７８８９９
参照パターン＝２２３３４４５５６６７７８８９９１１
参照パターン＝３３４４５５６６７７８８９９１１２２
参照パターン＝４４５５６６７７８８９９１１２２３３
参照パターン＝５５６６７７８８９９１１２２３３４４
参照パターン＝６６７７８８９９１１２２３３４４５５
参照パターン＝７７８８９９１１２２３３４４５５６６
参照パターン＝８８９９１１２２３３４４５５６６７７
参照パターン＝９９１１２２３３４４５５６６７７８８ Further, when the user makes a gesture of drawing a circle by hand, the start position of the movement locus differs for each user. For example, one user may start a trajectory from the lowest point of the circle, and another user may start a trajectory from the highest point of the circle. Therefore, the gesture recognition apparatus 1 may hold a plurality of reference pattern data having different start positions, and collate the direction transition data with each reference pattern data. In this case, the gesture recognition device 1 holds the following reference pattern data, for example.
Reference pattern = 112234345456678789
Reference pattern = 222334455566778789911
Reference pattern = 3344555667788991122
Reference pattern = 4455667778899112233
Reference pattern = 5566777889911223344
Reference pattern = 6677788991122334455
Reference pattern = 778889911223454566
Reference pattern = 889911223344565677
Reference pattern = 99112233345456667788

さらに、ユーザが手で円を描くジェスチャをするとき、時計回りで円が描かれるかも知れないし、反時計回りで円が描かれるかも知れない。したがって、ジェスチャ認識装置１は、例えば下記の２つの参照パターンデータを保持し、方向遷移データを各参照パターンデータに照合するようにしてもよい。
参照パターン＝１１２２３３４４５５６６７７８８９９
参照パターン＝１１９９８８７７６６５５４４３３２２ Furthermore, when the user makes a gesture of drawing a circle by hand, the circle may be drawn clockwise or the circle may be drawn counterclockwise. Therefore, the gesture recognition device 1 may hold, for example, the following two reference pattern data, and collate the direction transition data with each reference pattern data.
Reference pattern = 112234345456678789
Reference pattern = 11998887766555443322

また、上述の実施例では、各画像フレームについての移動方向ヒストグラムにおいて、それぞれ１つの移動方向が特定されているが、２以上の移動方向の出現頻度がほぼ同じになることがある。例えば、画像フレーム１において、方向領域１および方向領域２の出現頻度がほぼ同じ程度に高いものとする。この場合、移動軌跡検出部１４は、画像フレーム１の方向領域データが「１」である第１の方向遷移データ、および画像フレーム１の方向領域データが「２」である第２の方向遷移データを生成する。そして、移動軌跡検出部１４は、第１および第２の方向遷移データをそれぞれ参照パターンデータと照合する。このとき、第１または第２の方向遷移データのいずれか一方と参照パターンデータとの差分の総和が照合閾値よりも小さければ、移動軌跡検出部１４は、ユーザが参照パターンデータに対応するジェスチャを行ったと判定することができる。 In the above-described embodiment, one movement direction is specified in the movement direction histogram for each image frame, but the appearance frequencies of two or more movement directions may be substantially the same. For example, in the image frame 1, it is assumed that the appearance frequencies of the direction area 1 and the direction area 2 are approximately the same. In this case, the movement locus detection unit 14 includes first direction transition data in which the direction area data of the image frame 1 is “1”, and second direction transition data in which the direction area data of the image frame 1 is “2”. Is generated. Then, the movement trajectory detection unit 14 collates the first and second direction transition data with the reference pattern data. At this time, if the sum of the differences between one of the first and second direction transition data and the reference pattern data is smaller than the collation threshold, the movement trajectory detection unit 14 causes the user to perform a gesture corresponding to the reference pattern data. It can be determined that it has been performed.

さらに、上述の実施例では、各画像フレームにおいて複数の着目点が抽出され、各着目点の移動方向に基づいてジェスチャが認識されるが、本発明はこれに限定されるものではない。すなわち、各画像フレームにおいて対象領域から１つの着目点を抽出し、その着目点の移動方向に基づいてジェスチャを認識するようにしてもよい。ただし、各画像フレームにおいて抽出する着目点の数を多くすれば、ジェスチャの認識精度は高くなる。 Furthermore, in the above-described embodiment, a plurality of points of interest are extracted in each image frame, and a gesture is recognized based on the moving direction of each point of interest. However, the present invention is not limited to this. That is, one focus point may be extracted from the target area in each image frame, and the gesture may be recognized based on the moving direction of the focus point. However, if the number of points of interest to be extracted in each image frame is increased, the gesture recognition accuracy increases.

さらに、上述の実施例では、ジェスチャ認識装置１は、ユーザが手で「円」を描くジェスチャを認識するが、本発明はこれに限定されるものではない。すなわち、第１の実施形態のジェスチャ認識装置１は、他のジェスチャを認識してもよい。たとえば、ジェスチャ認識装置１は、一筆書きで描くことができる図形、文字、数字などを認識することができる。この場合、ジェスチャ認識装置１は、各図形、文字、数字に対応する参照パターンデータを保持し、各画像フレームから作成される方向遷移データを各参照パターンデータに照合することにより、ユーザのジェスチャを認識する。 Furthermore, in the above-described embodiment, the gesture recognition device 1 recognizes a gesture in which a user draws a “circle” by hand, but the present invention is not limited to this. That is, the gesture recognition device 1 according to the first embodiment may recognize other gestures. For example, the gesture recognition device 1 can recognize figures, characters, numbers, and the like that can be drawn with a single stroke. In this case, the gesture recognition apparatus 1 holds reference pattern data corresponding to each figure, character, and number, and matches the direction transition data created from each image frame with each reference pattern data, thereby making it possible to recognize the user's gesture. recognize.

＜第２の実施形態＞
第２の実施形態のジェスチャ認識装置は、図９に示すように、カメラ２により得られる動画像に所望の画像を重畳させることができる。重畳される画像は、特に限定されるものではないが、例えば、ユーザにより選択される対象を表す画像（以下、アイコン画像）である。図９では、アイコン２３〜２５が表示装置３に表示されている。この場合、表示装置３の正面にユーザが位置すると、表示装置３には、カメラ２で撮影されたユーザの画像およびアイコン２３〜２５が重畳して表示される。このとき、ユーザは、表示装置３に表示されている自分の姿（および重畳表示されているアイコン２３〜２５）を見ることができる。 <Second Embodiment>
As shown in FIG. 9, the gesture recognition apparatus according to the second embodiment can superimpose a desired image on a moving image obtained by the camera 2. The image to be superimposed is not particularly limited, but is, for example, an image representing an object selected by the user (hereinafter, an icon image). In FIG. 9, icons 23 to 25 are displayed on the display device 3. In this case, when the user is positioned in front of the display device 3, the user image captured by the camera 2 and the icons 23 to 25 are superimposed on the display device 3. At this time, the user can see his / her appearance displayed on the display device 3 (and the icons 23 to 25 displayed in a superimposed manner).

ユーザは、表示装置３に表示されている所望のアイコンを選択する場合には、表示装置３の画面上で、選択したいアイコンの近傍領域で自分の手に対応する画像が円を描くように、自分の手を動かす。図９に示す例では、アイコン２３が選択される。このとき、ジェスチャ認識装置１は、アイコン２３の近傍領域において、ユーザの手に対応する対象領域の移動軌跡が円であることを検出すると、ユーザがアイコン２３を選択したものと判断する。 When the user selects a desired icon displayed on the display device 3, the image corresponding to his / her hand draws a circle on the screen of the display device 3 in the vicinity of the icon to be selected. Move your hand. In the example shown in FIG. 9, the icon 23 is selected. At this time, if the gesture recognition device 1 detects that the movement locus of the target region corresponding to the user's hand is a circle in the region near the icon 23, the gesture recognition device 1 determines that the user has selected the icon 23.

第２の実施形態のジェスチャ認識装置１は、図３に示す構成に加えて、点線で示すアイコン画像データ格納部３１および画像重畳部３２を備える。アイコン画像データ格納部３１には、図１０に示すように、表示装置３に重畳して表示する各アイコンについて、画像データ、表示位置データ、および判定領域データが格納されている。画像データは、例えば、表示すべきアイコンのビットマップデータである。表示位置データは、表示装置３の表示画面上でのアイコンの表示領域の中心座標を表す。判定領域データは、表示装置３の表示画面において、対象物の移動軌跡を検出するための判定領域の範囲を指定する。この例では、判定領域は長方形であり、左上角および右下角の座標が格納されている。 The gesture recognition device 1 according to the second embodiment includes an icon image data storage unit 31 and an image superimposing unit 32 indicated by dotted lines in addition to the configuration shown in FIG. As shown in FIG. 10, the icon image data storage unit 31 stores image data, display position data, and determination area data for each icon displayed superimposed on the display device 3. The image data is, for example, bitmap data of icons to be displayed. The display position data represents the center coordinates of the icon display area on the display screen of the display device 3. The determination area data specifies the range of the determination area for detecting the movement trajectory of the object on the display screen of the display device 3. In this example, the determination area is a rectangle, and coordinates of the upper left corner and the lower right corner are stored.

画像重畳部３２は、カメラ２から入力される動画像データの各画像フレームにアイコン画像を重畳することにより表示データを作成する。各アイコンの表示位置は、アイコン画像データ格納部３１に格納されている表示位置データに従う。そして、表示装置３は、表示データに従ってカメラ画像およびアイコン画像を重畳して表示する。 The image superimposing unit 32 creates display data by superimposing an icon image on each image frame of moving image data input from the camera 2. The display position of each icon follows the display position data stored in the icon image data storage unit 31. Then, the display device 3 superimposes and displays the camera image and the icon image according to the display data.

対象領域抽出部１１は、各画像フレームにおいて、対象領域（ここでは、ユーザの手に対応する画像領域）を抽出する。ただし、第２の実施形態では、対象領域抽出部１１は、画像フレーム全体から対象領域を抽出するのではなく、判定領域データにより指定される判定領域内で対象領域を抽出するようにしてもよい。図９に示す例では、アイコン２３〜２５に対してそれぞれ判定領域２６〜２８が設定されている。なお、判定領域２６〜２８は、図９では破線で描かれているが、表示装置３の画面上には表示されない。ただし、表示装置３に判定領域が表示されるようにしてもよい。 The target area extraction unit 11 extracts a target area (here, an image area corresponding to the user's hand) in each image frame. However, in the second embodiment, the target area extraction unit 11 may extract the target area within the determination area specified by the determination area data, instead of extracting the target area from the entire image frame. . In the example shown in FIG. 9, determination areas 26 to 28 are set for the icons 23 to 25, respectively. Although the determination areas 26 to 28 are drawn with broken lines in FIG. 9, they are not displayed on the screen of the display device 3. However, the determination area may be displayed on the display device 3.

着目点抽出部１２、移動方向判定部１３、移動軌跡検出部１４の動作は、基本的に、図２〜図８を参照しながら説明した手順と同じである。すなわち、対象領域から複数の着目点が抽出され、各着目点の移動方向が計算され、出現頻度の高い移動方向に基づいて各画像フレームにおける対象物の移動方向が判定される。そして、対象物の移動方向の時間変化に基づいてその対象物の移動軌跡が検出される。 The operations of the point-of-interest extraction unit 12, the movement direction determination unit 13, and the movement locus detection unit 14 are basically the same as those described with reference to FIGS. That is, a plurality of points of interest are extracted from the target region, the moving direction of each point of interest is calculated, and the moving direction of the object in each image frame is determined based on the moving direction with a high appearance frequency. Then, the movement trajectory of the object is detected based on the temporal change in the movement direction of the object.

ただし、第２の実施形態のジェスチャ認識装置は、ユーザのジェスチャによる対象物の移動軌跡が、いずれの判定領域２６〜２８において検出されたのかを特定する機能を備えている。そして、例えば、判定領域２６において予め決められた所定の移動軌跡（この例では、円を描く軌跡）が検出されたときは、ジェスチャ認識装置は、ユーザによりアイコン２３が選択されたと判断する。この場合、ジェスチャ認識装置（または、ジェスチャ認識装置に接続された他のコンピュータ）は、選択されたアイコンに対応する処理を実行する。 However, the gesture recognition apparatus according to the second embodiment has a function of specifying in which determination region 26 to 28 the movement trajectory of the object by the user's gesture is detected. For example, when a predetermined movement trajectory (in this example, a trajectory for drawing a circle) determined in advance in the determination area 26 is detected, the gesture recognition device determines that the icon 23 has been selected by the user. In this case, the gesture recognition device (or another computer connected to the gesture recognition device) executes processing corresponding to the selected icon.

このように、第２の実施形態のジェスチャ認識装置は、ユーザのジェスチャを認識することにより、表示装置３に表示された選択対象物の中からユーザが選択した対象物を検出することができる。なお、第２の実施形態では、判定領域についてのみ画像処理（領域抽出、着目点抽出、動きベクトル計算など）を行うようにすれば、ジェスチャ認識装置の処理量を少なくなる。 As described above, the gesture recognition apparatus according to the second embodiment can detect an object selected by the user from among the selection objects displayed on the display device 3 by recognizing the user's gesture. Note that in the second embodiment, if image processing (region extraction, focus point extraction, motion vector calculation, etc.) is performed only for the determination region, the processing amount of the gesture recognition device is reduced.

＜第３の実施形態＞
第３の実施形態のジェスチャ認識装置は、第１または第２の実施形態と同様に、対象領域抽出部１１、着目点抽出部１２、移動方向判定部１３、移動軌跡検出部１４を備え、対象物の移動方向の時間変化に基づいて移動軌跡を検出する。そして、第３の実施形態のジェスチャ認識装置は、画像フレーム上での対象物の位置情報の時間変化を表すデータを利用して、上記移動軌跡の検出処理を補完する。以下、図１１を参照しながら第３の実施形態について説明する。 <Third Embodiment>
Similar to the first or second embodiment, the gesture recognition device of the third embodiment includes a target region extraction unit 11, a point of interest extraction unit 12, a movement direction determination unit 13, and a movement trajectory detection unit 14, A movement trajectory is detected based on a temporal change in the moving direction of the object. And the gesture recognition apparatus of 3rd Embodiment complements the detection process of the said movement locus | trajectory using the data showing the time change of the positional information on the target object on an image frame. The third embodiment will be described below with reference to FIG.

第３の実施形態においても、対象領域抽出部１１は、各画像フレームにおいて、対象領域（ユーザの手に対応する画像領域）を抽出する。続いて、ジェスチャ認識装置は、各画像フレームにおいて、抽出した対象領域の重心位置を計算する。重心位置は、たとえば、着目点抽出部１２により抽出された複数の着目点の重心座標を計算することにより得られる。重心位置は、図１１に示す位置検出ブロック１〜９によって表される。なお、位置検出ブロックは、画像フレームまたはその一部領域を複数の領域に分割することで設定される。 Also in the third embodiment, the target area extraction unit 11 extracts a target area (an image area corresponding to the user's hand) in each image frame. Subsequently, the gesture recognition device calculates the barycentric position of the extracted target region in each image frame. The center-of-gravity position is obtained, for example, by calculating the center-of-gravity coordinates of a plurality of points of interest extracted by the point of interest extraction unit 12. The barycentric position is represented by position detection blocks 1 to 9 shown in FIG. The position detection block is set by dividing an image frame or a partial area thereof into a plurality of areas.

第３の実施形態のジェスチャ認識装置は、各画像フレームの重心位置を時系列に並べることにより得られる位置遷移データと、参照パターンデータとを比較する。この参照パターンは、位置遷移データと同じデータ形式であり、所定のジェスチャに対応する対象物の移動軌跡を表す。ここで、対象物の移動軌跡が円であるものとすると、対象物は、画像フレーム上で、図１１に示すように、例えば、位置検出ブロック１から、位置検出ブロック２、３、６、９、８、７、４を順番に通過した後に、位置対象ブロック１に戻る経路を移動する。この場合、参照パターンデータは、例えば以下のように表される。なお、下記の例では、参照パターンデータは、１９桁で表されている。
参照パターンデータ（円）＝１１２２３３６６９９８８７７４４１１２
また、この例では、各画像フレームの対象領域の重心位置を時系列に並べることにより、下記の位置遷移データが得られているものとする。
遷移データ＝１１２２２３３６６９８８８７７４４１１ The gesture recognition device according to the third embodiment compares position transition data obtained by arranging centroid positions of image frames in time series and reference pattern data. This reference pattern has the same data format as the position transition data, and represents the movement trajectory of the object corresponding to a predetermined gesture. Here, assuming that the movement trajectory of the object is a circle, the object moves from the position detection block 1 to the position detection blocks 2, 3, 6, 9 on the image frame as shown in FIG. , 8, 7 and 4 in order, and then move along the route returning to the position target block 1. In this case, the reference pattern data is expressed as follows, for example. In the following example, the reference pattern data is represented by 19 digits.
Reference pattern data (circle) = 1123363699987744112
In this example, it is assumed that the following position transition data is obtained by arranging the centroid positions of the target regions of the image frames in time series.
Transition data = 112223336698888774411

続いて、ジェスチャ認識装置は、位置遷移データと参照パターンデータとの相違度を計算する。相違度は、桁ごとに、位置遷移データおよび参照パターンデータの値の差分の絶対値を算出し、各差分絶対値の和を計算することで得られる。そして、この相関度が予め決められた閾値よりも小さければ、対象物の移動軌跡が円であると判定される。 Subsequently, the gesture recognition device calculates the degree of difference between the position transition data and the reference pattern data. The degree of difference is obtained by calculating the absolute value of the difference between the values of the position transition data and the reference pattern data for each digit and calculating the sum of the absolute values of the differences. If the correlation is smaller than a predetermined threshold value, it is determined that the movement locus of the object is a circle.

なお、ジェスチャ認識装置は、位置遷移データと参照パターンデータとの間の相関度を計算してもよい。相関度は、桁ごとに、位置遷移データおよび参照パターンデータの値を互いに比較し、一致している桁の数をカウントすることで得られる。この場合、相関度が予め決められた閾値よりも高ければ、対象物の移動軌跡が円であると判定される。 Note that the gesture recognition device may calculate the degree of correlation between the position transition data and the reference pattern data. The degree of correlation is obtained by comparing the values of position transition data and reference pattern data with each other and counting the number of matching digits for each digit. In this case, if the degree of correlation is higher than a predetermined threshold, it is determined that the movement locus of the object is a circle.

さらに、ジェスチャ認識装置は、対象物の移動方向の時間変化に基づく検出結果と、対象物の位置情報の時間変化に基づく検出結果とを比較する。そして、これらの検出結果が互いに一致していれば、その検出結果を出力する。例えば、移動方向の時系列データに基づいて得られる移動軌跡が円であり、位置情報の時系列データに基づいて得られる移動軌跡も円であった場合に、ジェスチャ認識装置は、ユーザが円を描くジェスチャを行ったと認識する。一方、移動方向の時系列データに基づいて得られる移動軌跡が円であっても、位置情報の時系列データに基づいて同様の検出結果が得られなければ、ジェスチャ認識装置は、ユーザが円を描くジェスチャを行ったと認識しないようにすることができる。したがって、第３の実施形態によれば、ユーザのジェスチャの認識精度がさらに向上する。 Furthermore, the gesture recognition device compares the detection result based on the temporal change in the moving direction of the target object with the detection result based on the temporal change in the position information of the target object. If these detection results match each other, the detection results are output. For example, when the movement trajectory obtained based on the time-series data in the movement direction is a circle and the movement trajectory obtained based on the time-series data in the position information is also a circle, the gesture recognition device causes the user to Recognize that you have made a drawing gesture. On the other hand, even if the movement trajectory obtained based on the time-series data in the moving direction is a circle, if a similar detection result is not obtained based on the time-series data of the position information, the gesture recognition device allows the user to change the circle. It is possible not to recognize that a gesture has been made. Therefore, according to the third embodiment, the recognition accuracy of the user's gesture is further improved.

このように、第３の実施形態では、画像上での対象物の移動方向に基づいてその対象物の移動軌跡が検出され、さらに、画像上での対象物の位置情報を利用してその検出処理が補完される。したがって、例えば、図１２に示すように、対象領域抽出部１１により複数の対象領域２１、２２が検出され、各対象領域上の着目点の移動方向が互いに異なっている場合などにおいて、対象物の移動軌跡の検出精度が向上する。或いは、ユーザのジェスチャを誤って認識する可能性が低下する。 As described above, in the third embodiment, the movement trajectory of the object is detected based on the movement direction of the object on the image, and further, the detection is performed using the position information of the object on the image. Processing is complemented. Therefore, for example, as shown in FIG. 12, when the target area extraction unit 11 detects a plurality of target areas 21 and 22 and the moving directions of the points of interest on the target areas are different from each other, The detection accuracy of the movement trajectory is improved. Alternatively, the possibility of erroneously recognizing the user's gesture is reduced.

なお、第１〜第３の実施形態の構成および動作は、互いに矛盾の生じない限りにおいて任意に組み合わせるようにしてもよい。 The configurations and operations of the first to third embodiments may be arbitrarily combined as long as no contradiction arises.

＜ジェスチャ認識装置のハードウェア構成＞
図１３は、ジェスチャ認識装置のハードウェア構成を示す図である。図１３において、ＣＰＵ５１は、メモリ５３を利用してジェスチャ認識プログラムを実行することにより、実施形態のジェスチャ認識方法を提供する。記憶装置５２は、例えばハードディスクであり、ジェスチャ認識プログラムを格納する。なお、記憶装置５２は、外部記録装置であってもよい。メモリ５３は、例えば半導体メモリであり、ＲＡＭ領域およびＲＯＭ領域を含んで構成される。なお、参照パターンデータは、例えば、記憶装置５２またはメモリ５３に格納される。 <Hardware configuration of gesture recognition device>
FIG. 13 is a diagram illustrating a hardware configuration of the gesture recognition device. In FIG. 13, the CPU 51 provides a gesture recognition method according to the embodiment by executing a gesture recognition program using the memory 53. The storage device 52 is, for example, a hard disk, and stores a gesture recognition program. The storage device 52 may be an external recording device. The memory 53 is a semiconductor memory, for example, and includes a RAM area and a ROM area. The reference pattern data is stored in the storage device 52 or the memory 53, for example.

読み取り装置５４は、ＣＰＵ５１の指示に従って可搬型記録媒体５５にアクセスする。可搬性記録媒体５５は、例えば、半導体デバイス、磁気的作用により情報が入出力される媒体、光学的作用により情報が入出力される媒体を含むものとする。通信インタフェース５６は、ＣＰＵ５１の指示に従って、ネットワークを介してデータを送受信する。入出力装置５７は、例えば、ユーザからの指示を受け付けるデバイス等に相当する。 The reading device 54 accesses the portable recording medium 55 according to an instruction from the CPU 51. The portable recording medium 55 includes, for example, a semiconductor device, a medium in which information is input / output by a magnetic action, and a medium in which information is input / output by an optical action. The communication interface 56 transmits / receives data via a network in accordance with instructions from the CPU 51. The input / output device 57 corresponds to, for example, a device that receives an instruction from a user.

実施形態に係わるジェスチャ認識プログラムは、例えば、下記の形態で提供される。
（１）記憶装置５２に予めインストールされている。
（２）可搬型記録媒体５５により提供される。
（３）プログラムサーバ６０からダウンロードする。 The gesture recognition program according to the embodiment is provided in the following form, for example.
(1) Installed in advance in the storage device 52.
(2) Provided by the portable recording medium 55.
(3) Download from the program server 60.

そして、上記構成のコンピュータでジェスチャ認識プログラムを実行することにより、実施形態に係わるジェスチャ認識装置が実現される。すなわち、上記構成のコンピュータでジェスチャ認識プログラムを実行することにより、対象領域抽出部１１、着目点抽出部１２、移動方向判定部１３、移動軌跡検出部１４の一部または全部が実現される。 And the gesture recognition apparatus concerning embodiment is implement | achieved by running a gesture recognition program with the computer of the said structure. That is, by executing the gesture recognition program on the computer having the above-described configuration, a part or all of the target region extraction unit 11, the point of interest extraction unit 12, the movement direction determination unit 13, and the movement locus detection unit 14 is realized.

以上の各実施例を含む実施形態に関し、さらに以下の付記を開示する。
（付記１）
対象物の移動軌跡に基づいてジェスチャを認識するジェスチャ認識装置であって、
異なる時刻に得られる複数の画像フレームにおいて、それぞれ、前記対象物に対応する対象領域を抽出する対象領域抽出部と、
前記複数の画像フレームにおいて、それぞれ、前記対象領域の中から着目点を抽出する着目点抽出部と、
前記複数の画像フレームにおいて、それぞれ、前記着目点の移動方向に基づいて前記対象領域の移動方向を判定する移動方向判定部と、
前記複数の画像フレームについての前記移動方向判定部の判定結果を時系列に並べることにより得られる遷移データに基づいて、前記対象物の移動軌跡を検出する移動軌跡検出部、
を有するジェスチャ認識装置。
（付記２）
付記１に記載のジェスチャ認識装置であって、
前記着目点抽出部は、前記複数の画像フレームのそれぞれにおいて、前記対象領域の中から複数の着目点を抽出し、
前記移動方向判定部は、前記複数の画像フレームのそれぞれにおいて、各着目点の移動方向に基づいて、前記対象領域の尤らしい移動方向を判定する
ことを特徴とするジェスチャ認識装置。
（付記３）
付記２に記載のジェスチャ認識装置であって、
前記移動方向判定部は、前記複数の画像フレームのそれぞれにおいて、各着目点の移動方向が、所定角度に区切られた複数の方向領域のいずれに属すのかを決定し、前記着目点の移動方向が属する頻度が高い方向領域に基づいて、前記尤らしい移動方向を判定する
ことを特徴とするジェスチャ認識装置。
（付記４）
付記３に記載のジェスチャ認識装置であって、
第１および第２の方向領域において前記着目点の移動方向が属する頻度が高いときは、前記移動軌跡検出部は、前記第１の方向領域に基づいて判定される移動方向を含む第１の遷移データおよび前記第２の方向領域に基づいて判定される移動方向を含む第２の遷移データを利用して、前記対象物の移動軌跡を検出する
ことを特徴とするジェスチャ認識装置。
（付記５）
付記１〜４のいずれか１つに記載のジェスチャ認識装置であって、
人物を撮影したカメラ画像に選択対象画像を重畳することで画像フレームを生成する画像重畳部、をさらに備え、
前記対象物は、前記人物の手であり、
前記対象領域抽出部は、前記画像フレーム内の前記選択対象画像を含む所定の判定領域において、前記人物の手に対応する対象領域を抽出する
ことを特徴とするジェスチャ認識装置。
（付記６）
付記１〜５のいずれか１つに記載のジェスチャ認識装置であって、
前記移動軌跡検出部は、予め決められた移動軌跡を表す参照パターンデータと前記遷移データとを比較することにより、前記対象物の移動軌跡を検出する
ことを特徴とするジェスチャ認識装置。
（付記７）
付記１〜６のいずれか１つに記載のジェスチャ認識装置であって、
移動軌跡検出部は、前記複数の画像フレームを利用して前記対象物の位置の時間変化を表す位置情報を生成し、前記遷移データおよび前記位置情報に基づいて前記対象物の移動軌跡を検出する
ことを特徴とするジェスチャ認識装置。
（付記８）
対象物の移動軌跡に基づいてジェスチャを認識するために、コンピュータを、
異なる時刻に得られる複数の画像フレームにおいて、それぞれ、前記対象物に対応する対象領域を抽出する対象領域抽出部、
前記複数の画像フレームにおいて、それぞれ、前記対象領域の中から着目点を抽出する着目点抽出部、
前記複数の画像フレームにおいて、それぞれ、前記着目点の移動方向に基づいて前記対象領域の移動方向を判定する移動方向判定部、
前記複数の画像フレームについての前記移動方向判定部の判定結果を時系列に並べることにより得られる遷移データに基づいて、前記対象物の移動軌跡を検出する移動軌跡検出部、
として機能させるためのジェスチャ認識プログラム。
（付記９）
対象物の移動軌跡に基づいてジェスチャを認識するジェスチャ認識方法であって、
異なる時刻に得られる複数の画像フレームにおいて、それぞれ、前記対象物に対応する対象領域を抽出し、
前記複数の画像フレームにおいて、それぞれ、前記対象領域の中から着目点を抽出し、
前記複数の画像フレームにおいて、それぞれ、前記着目点の移動方向に基づいて前記対象領域の移動方向を判定し、
前記複数の画像フレームについての前記判定結果を時系列に並べることにより得られる遷移データに基づいて、前記対象物の移動軌跡を検出する、
ことを特徴とするジェスチャ認識方法。 The following supplementary notes are further disclosed with respect to the embodiments including the above examples.
(Appendix 1)
A gesture recognition device for recognizing a gesture based on a movement trajectory of an object,
In a plurality of image frames obtained at different times, respectively, a target area extraction unit that extracts a target area corresponding to the target object;
In each of the plurality of image frames, a point-of-interest extraction unit that extracts a point of interest from the target region,
In each of the plurality of image frames, a moving direction determination unit that determines a moving direction of the target area based on a moving direction of the point of interest;
A movement trajectory detection unit for detecting a movement trajectory of the object based on transition data obtained by arranging the determination results of the movement direction determination unit for the plurality of image frames in time series;
Gesture recognition device.
(Appendix 2)
The gesture recognition device according to attachment 1, wherein
The point-of-interest extraction unit extracts a plurality of points of interest from the target area in each of the plurality of image frames,
The gesture recognition device, wherein the movement direction determination unit determines a likely movement direction of the target region based on a movement direction of each target point in each of the plurality of image frames.
(Appendix 3)
The gesture recognition device according to attachment 2, wherein
The moving direction determination unit determines in each of the plurality of image frames whether a moving direction of each target point belongs to a plurality of direction areas divided by a predetermined angle, and the moving direction of the target point is The gesture recognizing device, wherein the likely moving direction is determined based on a direction region having a high frequency of belonging.
(Appendix 4)
The gesture recognition device according to attachment 3, wherein
When the movement direction of the point of interest belongs to the first and second direction areas, the movement trajectory detection unit performs a first transition including the movement direction determined based on the first direction area. A gesture recognizing device, wherein the movement trajectory of the object is detected using second transition data including a movement direction determined based on the data and the second direction area.
(Appendix 5)
The gesture recognition device according to any one of appendices 1 to 4,
An image superimposing unit that generates an image frame by superimposing a selection target image on a camera image capturing a person;
The object is the hand of the person;
The gesture recognition device, wherein the target region extraction unit extracts a target region corresponding to the hand of the person in a predetermined determination region including the selection target image in the image frame.
(Appendix 6)
The gesture recognition device according to any one of appendices 1 to 5,
The gesture recognizing apparatus, wherein the movement trajectory detection unit detects a movement trajectory of the object by comparing reference pattern data representing a predetermined movement trajectory with the transition data.
(Appendix 7)
The gesture recognition device according to any one of appendices 1 to 6,
The movement trajectory detection unit generates position information representing a temporal change in the position of the object using the plurality of image frames, and detects the movement trajectory of the object based on the transition data and the position information. The gesture recognition apparatus characterized by the above-mentioned.
(Appendix 8)
In order to recognize a gesture based on the movement trajectory of the object,
In a plurality of image frames obtained at different times, a target area extracting unit that extracts a target area corresponding to the target object,
In the plurality of image frames, a point-of-interest extraction unit that extracts a point of interest from the target region,
In each of the plurality of image frames, a movement direction determination unit that determines a movement direction of the target area based on a movement direction of the point of interest,
A movement trajectory detection unit for detecting a movement trajectory of the object based on transition data obtained by arranging the determination results of the movement direction determination unit for the plurality of image frames in time series;
Gesture recognition program to function as.
(Appendix 9)
A gesture recognition method for recognizing a gesture based on a movement trajectory of an object,
In a plurality of image frames obtained at different times, respectively, extract a target area corresponding to the target object,
In each of the plurality of image frames, a point of interest is extracted from the target area,
In each of the plurality of image frames, the moving direction of the target area is determined based on the moving direction of the point of interest,
Detecting a movement locus of the object based on transition data obtained by arranging the determination results for the plurality of image frames in time series;
A gesture recognition method characterized by the above.

１ジェスチャ認識装置
２カメラ
３表示装置
１１対象領域抽出部
１２着目点抽出部
１３移動方向判定部
１３ａ動きベクトル計算部
１３ｂヒストグラム作成部
１４移動軌跡検出部
１４ａ遷移データ作成部
１４ｂ遷移データ照合部
１４ｃジェスチャ判定部
３１アイコン画像データ格納部
３２画像重畳部 DESCRIPTION OF SYMBOLS 1 Gesture recognition apparatus 2 Camera 3 Display apparatus 11 Target area extraction part 12 Point of interest extraction part 13 Movement direction determination part 13a Motion vector calculation part 13b Histogram creation part 14 Movement locus detection part 14a Transition data creation part 14b Transition data collation part 14c Gesture Determination unit 31 Icon image data storage unit 32 Image superimposition unit

Claims

A gesture recognition device for recognizing a gesture based on a movement trajectory of an object,
In a plurality of image frames obtained at different times, respectively, a target area extraction unit that extracts a target area corresponding to the target object;
In each of the plurality of image frames, a point-of-interest extraction unit that extracts a point of interest from the target region,
In each of the plurality of image frames, a moving direction determination unit that determines a moving direction of the target area based on a moving direction of the point of interest;
A first detection unit that detects a movement trajectory pattern of the object based on transition data obtained by arranging the determination results of the movement direction determination unit for the plurality of image frames in time series ;
A second detection unit that generates position transition data representing a temporal change in the position of the object using the plurality of image frames, and detects a movement trajectory pattern of the object based on the position transition data;
A gesture recognition unit that identifies a gesture corresponding to the detected movement trajectory pattern when the movement trajectory patterns detected by the first detection unit and the second detection unit coincide with each other;
Gesture recognition device.

The gesture recognition device according to claim 1,
The point-of-interest extraction unit extracts a plurality of points of interest from the target area in each of the plurality of image frames,
The gesture recognition device, wherein the movement direction determination unit determines a likely movement direction of the target region based on a movement direction of each target point in each of the plurality of image frames.

The gesture recognition device according to claim 2,
The moving direction determination unit determines in each of the plurality of image frames whether a moving direction of each target point belongs to a plurality of direction areas divided by a predetermined angle, and the moving direction of the target point is The gesture recognizing device, wherein the likely moving direction is determined based on a direction region having a high frequency of belonging.

The gesture recognition device according to any one of claims 1 to 3,
An image superimposing unit that generates an image frame by superimposing a selection target image on a camera image capturing a person;
The object is the hand of the person;
The gesture recognition device, wherein the target region extraction unit extracts a target region corresponding to the hand of the person in a predetermined determination region including the selection target image in the image frame.

In order to recognize a gesture based on the movement trajectory of the object,
In a plurality of image frames obtained at different times, a target area extracting unit that extracts a target area corresponding to the target object,
In the plurality of image frames, a point-of-interest extraction unit that extracts a point of interest from the target region,
In each of the plurality of image frames, a movement direction determination unit that determines a movement direction of the target area based on a movement direction of the point of interest,
A first detection unit that detects a movement trajectory pattern of the object based on transition data obtained by arranging the determination results of the movement direction determination unit for the plurality of image frames in time series;
A second detection unit that generates position transition data representing a temporal change in the position of the object using the plurality of image frames, and detects a movement trajectory pattern of the object based on the position transition data;
A gesture recognition unit that identifies a gesture corresponding to the detected movement trajectory pattern when the movement trajectory patterns detected by the first detection unit and the second detection unit coincide with each other;
Gesture recognition program to function as.

A gesture recognition method for recognizing a gesture based on a movement trajectory of an object,
In a plurality of image frames obtained at different times, respectively, extract a target area corresponding to the target object,
In each of the plurality of image frames, a point of interest is extracted from the target area,
In each of the plurality of image frames, the moving direction of the target area is determined based on the moving direction of the point of interest,
Based on the direction transition data obtained by arranging the determination results for the plurality of image frames in time series, the movement trajectory pattern of the object is detected ,
Using the plurality of image frames to generate position transition data representing temporal changes in the position of the object, detecting a movement trajectory pattern of the object based on the position transition data;
When the movement trajectory pattern obtained based on the direction transition data and the movement trajectory pattern obtained based on the position transition data match each other, a gesture corresponding to the detected movement trajectory pattern is specified.
A gesture recognition method characterized by the above.