JP2007272839A

JP2007272839A - Gesture recognition device, online operation system using the same, gesture recognition method, and computer readable medium

Info

Publication number: JP2007272839A
Application number: JP2006101205A
Authority: JP
Inventors: Yoshiyasu Mutou; 佳恭武藤; Yoshiyasu Nishino; 嘉泰西野
Original assignee: Nippon Systemware Co Ltd
Current assignee: Nippon Systemware Co Ltd
Priority date: 2006-03-31
Filing date: 2006-03-31
Publication date: 2007-10-18
Anticipated expiration: 2026-03-31
Also published as: JP4613142B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a gesture recognition device for recognizing gesture by a simple method without designating the start and the end of the gesture by any means. <P>SOLUTION: This gesture recognition device is constructed of a photographing means 10, a frame image processing means 20 acquiring and accumulating the operation direction of a mobile object as a vector from frame images inputted continuously from the photographing means 10, a gesture factor registration means 30 storing status feature map data and status weight map data, a status position initialization part 41 initializing respective status positions, a vector feature quantity matching part 42 converting an outputted vector into a vector feature quantity, inputting the status map, and temporarily storing and outputting a comparison value between a predetermined status and statuses before and after it based on both of the vector feature quantity and the status map, a status transition determination part 43 determining and outputting transition of the respective statuses based on the comparison value, and a gesture determination part 44 determining whether a gesture recognition flag is established or not. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、ジェスチャ認識装置、これを用いたオンライン動作システム、ジェスチャ認識方法及びコンピュータ可読媒体に関する。 The present invention relates to a gesture recognition device, an online operation system using the same, a gesture recognition method, and a computer-readable medium.

ジェスチャ認識装置とは、連続したカメラからの画像を逐次処理装置に取り込み、デジタル画像処理によって手の動きを検出し、ジェスチャを認識する装置である。 A gesture recognition device is a device that recognizes a gesture by sequentially capturing images from cameras into a processing device, detecting hand movements by digital image processing.

近年、コンピュータの発展とともにより高機能・高知能的な電化製品が開発されている。そこで重要なのは人とコンピュータのインターフェース（以下ＨＣＩ（ｈｕｍａｎ−ｃｏｍｐｕｔｅｒｉｎｔｅｒａｃｔｉｏｎと言う）である。例えば、コンピュータを操作する場合にはキーボードやマウスを使う。これらは間接的だがＨＣＩに必要不可欠なツールである。近年ＨＣＩツールとして、声紋や指紋といった人自体が本来持つ能力を認識するシステムが開発されている。手のジェスチャはその一つだと位置づけられる。本来、手を使ったジェスチャは人同士のコミュニケーション手段として使われており、ジェスチャ認識システムが重要なＨＣＩツールになる可能性がある。 In recent years, with the development of computers, more sophisticated and intelligent appliances have been developed. Therefore, what is important is the human-computer interface (hereinafter referred to as HCI (human-computer interaction). For example, a keyboard and mouse are used to operate a computer. These are indirect but essential tools for HCI. In recent years, as an HCI tool, a system for recognizing human ability such as voiceprints and fingerprints has been developed.Hand gestures are regarded as one of them.Originally, hand gestures are communication between people. Used as a means, gesture recognition systems can be an important HCI tool.

ジェスチャ認識を行うときによく用いられる手法としては、隠れマルコフモデル（以下、ＨＭＭ（ｈｉｄｄｅｎＭａｒｋｏｖｍｏｄｅｌ）と言う）が挙げられる。ＨＭＭは順次ジェスチャを確認するには有効であるが、認識前のパラメータのチューニングで非常に多くのトレーニングが必要であり新たに認識したいジェスチャを追加する場合は非現実的である（例えば非特許文献１参照）。
ＨＭＭ以外の手法としては、ダイナミックプログラミング（以下、ＤＰ（ｄｙｎａｍｉｃｐｒｏｇｒａｍｍｉｎｇ）と言う）マッチングがある（例えば、非特許文献２参照）。ＤＰマッチングは特に複雑な処理を行う必要がなく認識結果が良好である。ただし認識にはジェスチャの開始と終了を指定する必要がある。
Masato Aoba and Yoshiyasu Takefuji：“Motion feature extraction usingsecond-order neural network and self-organizing map for gesture recognition”，情報処理学会研究報，2004-MPS-52，pp.5-8，2004年12月 Osaki, R., Shimada, M. and Uehara, K. : Extraction of primitivemotions by using clustering and segmentation of motion-captured data, JSAIVol.15, No.5, pp.878-886,(2000). As a technique often used when performing gesture recognition, there is a hidden Markov model (hereinafter referred to as HMM (hidden Markov model)). The HMM is effective for sequentially checking gestures, but it requires an extremely large amount of training for tuning parameters before recognition, and it is impractical to add a new gesture to be recognized (for example, non-patent literature). 1).
As a method other than the HMM, there is dynamic programming (hereinafter referred to as DP (dynamic programming)) matching (for example, see Non-Patent Document 2). DP matching does not require a particularly complicated process and has a good recognition result. However, it is necessary to specify the start and end of the gesture for recognition.
Masato Aoba and Yoshiyasu Takefuji: “Motion feature extraction using second-order neural network and self-organizing map for gesture recognition”, Information Processing Society of Japan Research Report, 2004-MPS-52, pp.5-8, December 2004 Osaki, R., Shimada, M. and Uehara, K .: Extraction of primitivemotions by using clustering and segmentation of motion-captured data, JSAIVol.15, No.5, pp.878-886, (2000).

具体的には、従来のＤＰマッチングにおいては、ジェスチャの開始と終了を指定しジェスチャ全体でマッチングを行っていた。そのため、ジェスチャの動作時間を考慮する必要がある。すなわち、同じジェスチャであっても、速い動作の場合と遅い動作の場合があり、動作速度によって、時間が変化する。これらの動作を従来のＤＰマッチングでジェスチャとして認識するためには、無理やり時間を合わせて動的にマッチングさせる必要があった。
またジェスチャ全体でマッチングしているため、動作終了から認識処理を開始し、処理時間がかかってしまう。
したがって、本発明の課題は、ジェスチャの始めと終わりを何らかの方法で抽出することなしに、リアルタイムでジェスチャの認識が可能であるジェスチャ認識装置、これを用いたオンライン動作システム、ジェスチャ認識方法及びコンピュータ可読媒体を提供することである。 Specifically, in conventional DP matching, the start and end of a gesture are specified and matching is performed for the entire gesture. Therefore, it is necessary to consider the operation time of the gesture. That is, even with the same gesture, there are cases of a fast operation and a slow operation, and time changes depending on the operation speed. In order to recognize these operations as gestures by conventional DP matching, it is necessary to match them dynamically forcibly by matching the time.
In addition, since the entire gesture is matched, the recognition process starts from the end of the operation, and processing time is required.
Accordingly, an object of the present invention is to provide a gesture recognition device capable of recognizing a gesture in real time without extracting the beginning and end of the gesture by any method, an online operation system using the same, a gesture recognition method, and a computer readable To provide a medium.

上記課題を解決する本発明は、下記項目に関する。
［１］Ａジェスチャを連続的にフレーム画像として入力するための撮像手段と、
Ｂ前記撮像手段から連続的に入力されたフレーム画像から移動物体の動作方向をベクトルとして取得するベクトル取得部と、取得したベクトルを連続して出力するベクトル出力部とから構成されるフレーム画像処理手段と、
Ｃ前記連続して出力されたベクトルに基づいて、ベクトルの長さを正規化して、角度に重み付けしたベクトル特徴量から作成された状態特徴マップデータを格納し、前記連続した特徴量と状態特徴マップとから作成された状態重みマップデータを格納するジェスチャ要素登録手段と、
Ｄ前記ジェスチャ要素登録手段に格納された状態特徴マップと状態重みマップを入力し、各状態位置を初期化する状態位置初期化部と、前記フレーム画像処理手段で出力されたベクトルをベクトル特徴量に変換し、連続的に入力し、さらに前記ジェスチャ要素登録手段に格納された状態マップを入力し、両者に基づいて、所定の状態およびその前後の状態の比較値として一時的に格納し、出力するベクトル特徴量マッチング部と、前記比較値に基づいて、前記初期化された状態位置と前記比較値により各状態の推移を判定し、出力する状態遷移判定部と、前記各状態における推移と状態重みマップとを比較して、所定の閾値によりジェスチャ認識フラグを成立させるか否かを判定するジェスチャ判定部と、
から構成されるジェスチャ認識手段と、
を備えたことを特徴とするジェスチャ認識装置。 The present invention for solving the above problems relates to the following items.
[1] Imaging means for continuously inputting A gestures as frame images;
B Frame image processing means comprising a vector acquisition unit that acquires the motion direction of the moving object as a vector from the frame image continuously input from the imaging unit, and a vector output unit that continuously outputs the acquired vector When,
C: normalizing the length of the vector based on the continuously output vector, storing state feature map data created from the vector feature amount weighted to the angle, and storing the continuous feature amount and state feature map Gesture element registration means for storing state weight map data created from
D: A state feature map and a state weight map stored in the gesture element registration unit are input, a state position initialization unit for initializing each state position, and a vector output by the frame image processing unit as a vector feature amount Converted, continuously input, and further input a state map stored in the gesture element registration means, and based on both, temporarily store and output as a comparison value of a predetermined state and the state before and after that A vector feature amount matching unit, a state transition determination unit that determines and outputs a transition of each state based on the initialized state position and the comparison value based on the comparison value, and a transition and a state weight in each state A gesture determination unit that compares the map and determines whether to establish a gesture recognition flag with a predetermined threshold;
Gesture recognition means comprising:
A gesture recognition device comprising:

前記ジェスチャ要素登録手段は、
前記連続して出力されたベクトルを蓄積するベクトル蓄積部と、
蓄積されたベクトルからベクトル行列を取得するベクトル行列取得部と、
取得したベクトル行列からベクトルの長さを正規化して、角度に重み付けしたベクトル特徴量に変換するベクトル特徴量変換部と、
連続した特徴量から状態特徴マップを作成し格納する状態特徴マップ作成部と、
前記連続した特徴量と状態特徴マップとから状態重みマップを作成し格納する状態重みマップ作成部と、
から構成されることを特徴とする、［１］に記載のジェスチャ認識装置。
［３］さらに、取得したベクトルを連続して表示するための表示手段を備えたことを特徴とする、［１］または［２］に記載のジェスチャ認識装置。 The gesture element registration means includes:
A vector accumulating unit for accumulating the continuously output vectors;
A vector matrix obtaining unit for obtaining a vector matrix from the accumulated vectors;
A vector feature amount conversion unit that normalizes the length of the vector from the acquired vector matrix and converts it into a vector feature amount weighted to an angle;
A state feature map creating unit for creating and storing a state feature map from continuous feature quantities;
A state weight map creating unit for creating and storing a state weight map from the continuous feature quantity and the state feature map;
The gesture recognition device according to [1], comprising:
[3] The gesture recognition device according to [1] or [2], further comprising display means for continuously displaying the acquired vectors.

［４］［１］から［３］のいずれか１項に記載のジェスチャ認識装置を用いて、ジェスチャを認識させるジェスチャ認識方法であって、
予め取得した状態特徴マップと状態重みマップを入力する工程（Ｓ１０１）と、
各状態位置を初期化する工程（Ｓ１０２）と、
撮像手段からのジェスチャをフレーム画像として連続的に入力する工程（Ｓ１０３）と、
前記撮像手段から連続的に入力されたフレーム画像から移動物体の動作方向をベクトルとして取得する工程（Ｓ１０４）と、
取得したベクトルを連続して出力する工程（Ｓ１０５）と、
前記出力されたベクトルをベクトル特徴量に変換する工程（Ｓ１０６）と、
前記変換された各ベクトル特徴量を連続的に入力し、さらに予め作成された状態マップを入力し、両者に基づいて、所定の状態およびその前後の状態の比較値として一時的に格納し、出力する工程（Ｓ１０７）と、
前記比較値に基づいて、前記初期化された状態位置と前記比較値により各状態の推移を判定し、前記各状態における推移と状態重みマップとを比較して、所定の閾値によりジェスチャ認識フラグを成立させるか否かを判定する工程（Ｓ１０８）と、
を含み、
前記工程Ｓ１０８でジェスチャ認識フラグが成立した場合には、ジェスチャ認識を完了し、前記ジェスチャ認識フラグが成立しない場合には、工程（Ｓ１０３）へ移行することを特徴とするジェスチャ認識方法。 [4] A gesture recognition method for recognizing a gesture using the gesture recognition device according to any one of [1] to [3],
Inputting a state feature map and a state weight map acquired in advance (S101);
A step (S102) of initializing each state position;
Continuously inputting a gesture from the imaging means as a frame image (S103);
Obtaining a moving direction of a moving object as a vector from frame images continuously input from the imaging means (S104);
A step of continuously outputting the acquired vector (S105);
Converting the output vector into a vector feature (S106);
Each of the converted vector feature values is continuously input, and a state map prepared in advance is further input, and based on both, temporarily stored as a comparison value between a predetermined state and a state before and after that, and output A step (S107) of performing,
Based on the comparison value, the transition of each state is determined based on the initialized state position and the comparison value, the transition in each state is compared with the state weight map, and a gesture recognition flag is set based on a predetermined threshold. A step of determining whether or not to establish (S108);
Including
If the gesture recognition flag is established in the step S108, the gesture recognition is completed. If the gesture recognition flag is not established, the gesture recognition method moves to the step (S103).

［５］前記初期化が状態数分のゼロ行列を作成することによって行われ、前記推移の判定がフレーム処理で得られるベクトルに対し、現状の状態と前後の状態の3つを比較対象として、それぞれの比較値を次の状態の一致度が高い場合は進行すると判定し、進行した状態の値を1とし、現状の状態の一致度が高い場合は滞在とし、現状の状態の値を+1とし、前の状態の一致度が高い時は現状の状態の値を-1とし、0になったとき状態が後退すると判定することを特徴とする［４］に記載のジェスチャ認識方法。
なお、［５］において、前記工程（Ｓ１０８）における推移の判定を所定の閾値に基づいて行うことも可能である。 [5] The initialization is performed by creating zero matrices for the number of states, and the transition determination is performed on a vector obtained by frame processing. Each comparison value is determined to proceed if the degree of coincidence of the next state is high, and the value of the advanced state is set to 1, if the degree of coincidence of the current state is high, the stay is set, and the value of the current state is +1 The gesture recognition method according to [4], wherein when the degree of coincidence of the previous state is high, the value of the current state is set to −1, and when it becomes 0, it is determined that the state is retreated.
In [5], it is also possible to determine the transition in the step (S108) based on a predetermined threshold.

［６］［１］から［３］のいずれか１項に記載のジェスチャ認識装置を用いて、入力されたジェスチャを解析するジェスチャ解析方法であって、
請求項１から請求項３のいずれか１項に記載のジェスチャ認識装置を用いて、入力されたジェスチャを登録するジェスチャ登録方法であって、
撮像手段からのジェスチャをフレーム画像として連続的に入力する工程（Ｓ２０１）と、
前記撮像手段から連続的に入力されたフレーム画像から移動物体の動作方向をベクトルとして取得する工程（Ｓ２０２）と、
取得したベクトルを連続して出力する工程（Ｓ２０３）と、
前記連続して出力されたベクトルを蓄積する工程（Ｓ２０４）と、
蓄積されたベクトルからベクトル行列を取得する工程（Ｓ２０５）と、
取得したベクトル行列からベクトルの長さを正規化して、角度に重み付けしたベクトル特徴量に変換する工程（Ｓ２０６）と、
Ｃ-４連続した特徴量から状態特徴マップを作成し格納する工程（Ｓ２０７）と、
Ｃ-５前記連続した特徴量と状態特徴マップとから状態重みマップを作成し格納する工程（Ｓ２０８）と、
を含むことを特徴とするジェスチャ登録方法。 [6] A gesture analysis method for analyzing an input gesture using the gesture recognition device according to any one of [1] to [3],
A gesture registration method for registering an input gesture using the gesture recognition device according to any one of claims 1 to 3.
Continuously inputting a gesture from the imaging means as a frame image (S201);
Obtaining a moving direction of a moving object as a vector from frame images continuously input from the imaging means (S202);
A step of continuously outputting the acquired vector (S203);
Storing the continuously output vectors (S204);
Obtaining a vector matrix from the accumulated vectors (S205);
Normalizing the length of the vector from the acquired vector matrix and converting it into a vector feature quantity weighted to an angle (S206);
C-4 creating and storing a state feature map from consecutive feature quantities (S207);
C-5 creating and storing a state weight map from the continuous feature quantity and the state feature map (S208);
The gesture registration method characterized by including.

なお、前記ベクトルを取得する工程において、前記撮像手段から連続的に入力されたフレーム画像の解像度を変換して低解像度画像とし、変換した低解像度画像をＬ*ａ*ｂ*イメージ変換しＬ*ａ*ｂ*イメージ変換した画像から移動物体を検出し、Ｌ*ａ*ｂ*イメージ変換した画像を平滑化処理し、平滑化した画像を二値化し、二値化した画像をクラスタリング処理し、クラスタリング処理したトラッキング処理した後に前記撮像手段から連続的に入力されたフレーム画像から移動物体の動作方向をベクトルとして取得することも可能であり、前記移動体の検出をフレーム差分法により行うことも可能である。 In the step of acquiring the vector, the resolution of the frame image continuously input from the imaging unit is converted into a low resolution image, and the converted low resolution image is converted into L * a * b * image and L *. a moving object is detected from the a * b * image converted image, the L * a * b * image converted image is smoothed, the smoothed image is binarized, and the binarized image is clustered. It is also possible to acquire the moving direction of the moving object as a vector from the frame images continuously input from the imaging means after the tracking processing after clustering processing, and it is also possible to detect the moving body by the frame difference method It is.

［７］前記ベクトル変換工程において、同じ方向の角度の値を1とし、逆向きの角度の値を0とし、x軸0〜360およびy軸0〜1のグラフとしてマッピングすることを特徴とする［６］に記載のジェスチャ認識方法。 [7] In the vector conversion step, the value of the angle in the same direction is set to 1, the value of the angle in the reverse direction is set to 0, and mapping is performed as a graph of x-axis 0 to 360 and y-axis 0 to 1. The gesture recognition method according to [6].

［８］［４］から［７］のいずれか１項に記載の工程を実行するプログラムが格納されたコンピュータ可読媒体。
なお、本発明のコンピュータ可読媒体において、前記ジェスチャ認識結果がコンピュータアプリケーションの動作と関連付けされていることも可能である。
［９］オンラインに所定の動作を行う装置を操作するオンライン動作システムであって、請求項１から請求項３のいずれか１に記載のジェスチャ認識装置と、前記ジェスチャ認識装置と公衆回線を介して接続され、前記ジェスチャ認識装置の認識結果に応じて所定の動作を行う装置とから構成されたオンライン動作システム。 [8] A computer-readable medium storing a program for executing the process according to any one of [4] to [7].
In the computer readable medium of the present invention, the gesture recognition result may be associated with the operation of the computer application.
[9] An online operation system that operates a device that performs a predetermined operation online, and includes the gesture recognition device according to any one of claims 1 to 3, the gesture recognition device, and a public line. An on-line operation system comprising a device connected and performing a predetermined operation in accordance with a recognition result of the gesture recognition device.

本発明によると、ジェスチャを「状態」の移り変わりとして設定する。すなわち、ジェスチャを全体の動作として認識するのではなく、瞬間的動きを認識し、その積み重ねとしてジェスチャを認識するので、リアルタイムでジェスチャを認識可能である。
さらに、このようにしてジェスチャを「状態」として設定するので、動作速度に関わらず認識可能である。さらに、従来技術と相違してジェスチャの開始と終了を指定することなしに、動作速度に依存することなく簡単な手法でジェスチャ認識することが可能となる。
このジェスチャ認識装置をオンラインで動作させたい動作対象装置と接続することによって、ジェスチャの始めと終わりを意識することなく効率よく認識させて、動作対象装置をオンラインでより容易に動作可能となる。 According to the present invention, a gesture is set as a transition of “state”. That is, instead of recognizing a gesture as an entire motion, it recognizes an instantaneous movement and recognizes a gesture as a stack of the gestures, so that the gesture can be recognized in real time.
Furthermore, since the gesture is set as the “state” in this way, it can be recognized regardless of the operation speed. Furthermore, unlike the prior art, it is possible to recognize a gesture by a simple method without depending on the operation speed without specifying the start and end of the gesture.
By connecting this gesture recognition device to an operation target device that is desired to operate online, it is possible to efficiently recognize the operation target device online without being aware of the beginning and end of the gesture, and to operate more easily online.

以下、本発明を添付図面に基づいて説明する。なお、本発明においてジェスチャ認識装置とは、撮像手段であるカメラからの連続した画像を逐次処理装置（例えば、コンピュータシステム）に取り込み、デジタル画像処理によって移動体、代表的には手の動きを検出し、ジェスチャを認識する装置である。 Hereinafter, the present invention will be described with reference to the accompanying drawings. In the present invention, the gesture recognition device refers to a sequential processing device (for example, a computer system) that sequentially captures images from a camera as an imaging means, and detects a moving object, typically a hand, by digital image processing. And a device for recognizing gestures.

本発明において使用する用語「ジェスチャ」とは「移動体（手）の形ではなく、動作を示すものである。例えば“手を振る”、“円を描く”などの行為を指すものである。本発明は、このようなジェスチャ認識装置において、独特の画像処理アルゴリズム及びそのアルゴリズムを具現化する装置に基づくものである。
本発明においては、ジェスチャを、「状態」として設定し、設定した状態の遷移に基づいて認識することを特徴としている。 The term “gesture” used in the present invention refers to an action such as “waving a hand” or “drawing a circle”. The present invention is based on a unique image processing algorithm and an apparatus embodying the algorithm in such a gesture recognition apparatus.
The present invention is characterized in that a gesture is set as a “state” and recognized based on the transition of the set state.

（ジェスチャ認識装置）
まず、図１〜図９を用いて本発明のジェスチャ認識装置について説明する。
図１は、本発明のジェスチャ認識装置全体を示す概略図であり、図２は、図１に示すフレーム画像処理手段、ジェスチャ要素登録手段、ベクトル特徴量マッチング手段、およびジェスチャ認識手段を示す概略図であり、図３は、ベクトル蓄積部３１においてジェスチャをフレーム毎にベクトル分割した状態を示す図面であり、図４は、重みづけの例を示す図面であり、図５は、状態特徴マップの一例を示す図面であり、図６は、状態重みマップの一例を示す図面であり、図７はベクトル特徴量マッチング部４２における比較値を求める例を示す図面であり、図８は、状態特徴マップにおける状態の遷移の一例を示す図面であり、そして図９は、ジェスチャ判断部により比較する滞在値を示す図面である。 (Gesture recognition device)
First, the gesture recognition device of the present invention will be described with reference to FIGS.
FIG. 1 is a schematic diagram showing the entire gesture recognition apparatus of the present invention, and FIG. 2 is a schematic diagram showing the frame image processing means, gesture element registration means, vector feature amount matching means, and gesture recognition means shown in FIG. 3 is a diagram illustrating a state in which a gesture is vector-divided for each frame in the vector storage unit 31, FIG. 4 is a diagram illustrating an example of weighting, and FIG. 5 is an example of a state feature map. 6 is a diagram showing an example of a state weight map, FIG. 7 is a diagram showing an example of obtaining a comparison value in the vector feature amount matching unit 42, and FIG. FIG. 9 is a diagram illustrating an example of state transition, and FIG. 9 is a diagram illustrating stay values to be compared by the gesture determination unit.

図１に示す通り、本実施形態のジェスチャ認識装置１は、ジェスチャを連続的にフレーム画像として入力するための撮像手段１０と、フレーム画像処理手段２０と、ジェスチャ要素登録手段３０と、ジェスチャ認識手段４０とから主として構成されている。より一般的には、デジタル（ビデオ）カメラ等の撮像手段１０と、前記撮像手段１０と接続されたコンピュータシステムとから本実施形態のジェスチャ認識装置１が構成される。 As shown in FIG. 1, the gesture recognition device 1 of the present embodiment includes an imaging unit 10 for continuously inputting a gesture as a frame image, a frame image processing unit 20, a gesture element registration unit 30, and a gesture recognition unit. 40. More generally, the gesture recognition apparatus 1 according to the present embodiment includes an imaging unit 10 such as a digital (video) camera and a computer system connected to the imaging unit 10.

すなわち、撮像手段１０からの連続的フレーム画像を入力するためのインターフェースと、前記インターフェースを介して入力された画像を一時的に保存する保存装置であるメモリと、前記メモリに一時的に保存した画像データを処理するための演算子であるＣＰＵ及び後述する処理プログラム及び各種データを格納するための格納手段、及びジェスチャ認識結果を出力する出力手段を含むコンピュータシステムとから本実施形態のジェスチャ認識装置１が構成されるのが一般的である。好ましくは、出力された認識結果は、ディスプレイ等の表示手段に連続的に出力される。
しかしながら、同等の機能を有する装置であれば本発明は、撮像手段１０とコンピュータシステムに限定されず、同等の機能を有していれば本発明の範囲内である。 That is, an interface for inputting continuous frame images from the imaging means 10, a memory that is a storage device that temporarily stores images input via the interface, and an image that is temporarily stored in the memory The gesture recognition apparatus 1 according to the present embodiment includes a CPU that is an operator for processing data, a processing program to be described later, storage means for storing various data, and a computer system including output means for outputting a gesture recognition result. Is generally constructed. Preferably, the output recognition result is continuously output to display means such as a display.
However, the present invention is not limited to the imaging means 10 and the computer system as long as the apparatus has an equivalent function, and is within the scope of the present invention if it has an equivalent function.

本実施形態のジェスチャ認識装置1における撮像手段１０は、ジェスチャの動作主体である移動体を連続的に入力して、後段のコンピュータシステムに出力できるものであれば特に限定されるものではなく、当該技術分野に周知のデジタルビデオカメラを用いることができる。また、連続して出力するフレーム画像の解像度及びフォーマットも本発明を具現することが可能であれば特に限定されるものではなく、後段での処理の容易さ及び画質を鑑みて適宜解像度を決定することが可能である。なお、本発明で適用可能な画像フォーマットは、例えばjpg・bmp・pngなどの汎用の画像圧縮形式による画像バイナリデータであることが好ましい。 The imaging means 10 in the gesture recognition device 1 of the present embodiment is not particularly limited as long as it can continuously input a moving body that is a gesture operation main body and output it to a subsequent computer system. Digital video cameras well known in the technical field can be used. Further, the resolution and format of frame images to be output continuously are not particularly limited as long as the present invention can be embodied, and the resolution is appropriately determined in view of ease of processing and image quality in the subsequent stage. It is possible. The image format applicable in the present invention is preferably image binary data in a general-purpose image compression format such as jpg, bmp, and png.

本実施形態におけるフレーム画像処理手段２０は、前記撮像手段１０により撮像され、出力されるフレーム画像から動作対象を抽出して連続したベクトルとして出力するための手段であり、図２に示す通り、前記撮像手段から連続的に入力されたフレーム画像から移動物体の動作方向をベクトルとして取得するベクトル取得部２１と、取得したベクトルを連続して出力するベクトル出力部２２とから構成される。 The frame image processing means 20 in the present embodiment is means for extracting an operation target from the frame image captured and output by the imaging means 10 and outputting it as a continuous vector. As shown in FIG. The vector acquisition unit 21 acquires a motion direction of a moving object as a vector from frame images continuously input from the imaging unit, and the vector output unit 22 outputs the acquired vector continuously.

ベクトル取得部２１は、例えば、コンピュータシステムにおける中央演算子、メモリ、記憶媒体、前記記憶媒体に格納したプログラムより構成され、従来公知の技術により、撮像手段からの連続した画像、例えばＲＧＢイメージ画像を取り込み、解像度を変換（低解像度化）、L*a*b*イメージ変換し、ジェスチャの動作主体である移動体を検出し、平滑処理し、２値化し、クラスタリングし、トラッキングした後に、本発明の特徴である移動物体の動作方向をベクトルとして取得するコンポーネントである。 The vector acquisition unit 21 is composed of, for example, a central operator in a computer system, a memory, a storage medium, and a program stored in the storage medium, and a continuous image from the imaging unit, for example, an RGB image image is obtained by a conventionally known technique. After capturing, converting resolution (lowering resolution), converting L * a * b * image, detecting the moving object that is the main subject of gesture, smoothing, binarizing, clustering and tracking, the present invention It is a component which acquires the motion direction of the moving object which is the feature of as a vector.

ベクトル出力部２２は、前記ベクトル取得部２１により取得したベクトルを連続して
後段のジェスチャ要素登録手段３０と、ジェスチャ認識手段４０に出力する機能を有している。 The vector output unit 22 has a function of continuously outputting the vector acquired by the vector acquisition unit 21 to the subsequent gesture element registration unit 30 and the gesture recognition unit 40.

本実施形態におけるジェスチャ要素登録手段３０は、後述するジェスチャ認識手段４０による認識を行うのに必要な要素を登録（学習）する手段である。すなわち、ジェスチャ要素登録手段３０は、ジェスチャ認識手段４０によりジェスチャを認識するために必要な要素を事前に登録するための手段であり、一般的には状態特徴マップデータ及び状態重みマップデータの各要素を格納する格納手段とから構成されている。
これらのデータは事前にジェスチャ要素登録手段３０に格納されていてもよいが、本実施形態では、ジェスチャを登録して格納する機能を有する場合について説明する。
ジェスチャ要素登録手段３０は、例えば事前に登録した要素（状態特徴マップデータ及び状態重みマップデータの要素）を格納したＲＯＭなどで提供することも可能であるが、新たに認識したいジェスチャを登録したい場合には、前記撮像手段１０で入力され、前記フレーム画像処理手段２２から出力されたベクトルに基づいて状態特徴マップデータ及び状態重みマップデータを作成し、格納する必要がある。
すなわち、本実施形態におけるジェスチャ要素登録手段３０は、前記ベクトル出力部２２から前記連続して出力されたベクトルを蓄積するベクトル蓄積部３１と、蓄積されたベクトルからベクトル行列を取得するベクトル行列取得部３２と、取得したベクトル行列からベクトルの長さを正規化して、角度に重み付けしたベクトル特徴量に変換するベクトル特徴量変換部３３と、連続した特徴量から状態特徴マップを作成し格納する状態特徴マップ作成部３４と、前記連続した特徴量と状態特徴マップとから状態重みマップを作成し格納する状態重みマップ作成部３５とから構成されている。
なお本発明において使用する用語「角度に重み付けしたベクトル特徴量」とは、取得したベクトルの方向が角度0〜360°に対してどのくらいずれているかを示す量であり、例えば同じ方向の角度の値が1で、逆向きの角度の値が0であるx軸が0〜360、y軸が0〜1の特徴量を意味する。このように本発明において、角度に重み付けしたベクトル特徴量を取得することによって取得したベクトルは下図に示す通り、０°から３６０°のいずれか一点を示しているのに対して、周囲の重み付けを行うことによって、多少ずれたベクトルでも同方向として取り扱うことができる。
また、本発明において使用する用語「状態特徴マップ」とは、ジェスチャを表す連続したベクトル特徴量の集合を意味し、決まった順番で方向が移り変わっていることを示す。 The gesture element registration means 30 in the present embodiment is means for registering (learning) elements necessary for recognition by a gesture recognition means 40 described later. That is, the gesture element registration means 30 is means for previously registering elements necessary for recognizing a gesture by the gesture recognition means 40. Generally, each element of the state feature map data and the state weight map data Storage means for storing the.
These data may be stored in advance in the gesture element registration unit 30, but in the present embodiment, a case will be described in which a function for registering and storing a gesture is provided.
The gesture element registration means 30 can be provided, for example, in a ROM storing elements registered in advance (elements of state feature map data and state weight map data). However, when it is desired to register a gesture to be newly recognized. In this case, it is necessary to create and store the state feature map data and the state weight map data based on the vectors input by the imaging unit 10 and output from the frame image processing unit 22.
That is, the gesture element registration unit 30 in the present embodiment includes a vector accumulation unit 31 that accumulates the continuously output vectors from the vector output unit 22, and a vector matrix acquisition unit that obtains a vector matrix from the accumulated vectors. 32, a vector feature amount conversion unit 33 that normalizes the vector length from the acquired vector matrix and converts it into a vector feature amount weighted to an angle, and a state feature that creates and stores a state feature map from the continuous feature amount The map creation unit 34 includes a state weight map creation unit 35 that creates and stores a state weight map from the continuous feature amount and the state feature map.
The term “vector feature weighted by angle” used in the present invention is an amount indicating how much the direction of the acquired vector is relative to an angle of 0 to 360 °, for example, an angle value in the same direction. Means a feature amount of 0 to 360 on the x axis and 0 to 1 on the y axis where the reverse angle value is 0. In this way, in the present invention, the vector acquired by acquiring the vector feature amount weighted to the angle indicates one point from 0 ° to 360 ° as shown in the following figure, while the surrounding weight is set. By doing so, even slightly shifted vectors can be handled in the same direction.
Further, the term “state feature map” used in the present invention means a set of continuous vector feature amounts representing a gesture, and indicates that the direction is changed in a predetermined order.

これらのベクトル蓄積部３１は、ジェスチャのベクトル行列を得るためベクトルを蓄積する機能を有しており、ベクトル行列取得部３２は、前記ベクトル蓄積部３１に蓄積されたジェスチャ毎に時間的に連続したベクトルからベクトル行列を取得する機能を有している。
具体的には、図３に示す通り、ジェスチャをフレーム毎にベクトル分割し（Ｖ（ｔ）、Ｖ（ｔ-１）、Ｖ（ｔ-２）、Ｖ（ｔ-３）・・・、このベクトルの集合がジェスチャのベクトル行列とする。 These vector accumulation units 31 have a function of accumulating vectors in order to obtain a vector matrix of gestures, and the vector matrix acquisition unit 32 is temporally continuous for each gesture accumulated in the vector accumulation unit 31. It has a function to acquire a vector matrix from a vector.
Specifically, as shown in FIG. 3, the gesture is divided into vectors for each frame (V (t), V (t-1), V (t-2), V (t-3),. A set of vectors is a gesture vector matrix.

ベクトル特徴量変換部３３は、時間に左右されないベクトルの方向・角度に重点を置く特徴量を取得する目的で、ベクトルの長さを正規化し（すなわちベクトルの長さを１として）、角度に重み付けした特徴量に変換する機能を有している。求める角度の重みは検出角度との差が小さければ大きな値、差が大きければ小さな値をとる。重みの例として図４に示す通り、正規分布を用いる。なお、所望に応じて角度を量子化してもよい。 The vector feature amount conversion unit 33 normalizes the length of the vector (that is, sets the vector length to 1) and weights the angle for the purpose of acquiring the feature amount with emphasis on the direction and angle of the vector that is not influenced by time. It has a function of converting into the feature amount. The weight of the obtained angle takes a large value if the difference from the detected angle is small, and takes a small value if the difference is large. As an example of the weight, a normal distribution is used as shown in FIG. Note that the angle may be quantized as desired.

状態特徴マップ作成部３４は、ジェスチャを代表的なベクトル特徴量（状態）の移り変わりで表すために状態特徴マップを作成する機能を有する。
すなわち、状態特徴マップは、ベクトル特徴量に変換したベクトル行列において、いくつかの代表的なベクトル特徴量に集約する機能を有している。
代表的なベクトル特徴量を一つの状態と考え、状態数をベクトル行列分に初期化し徐々に状態数を減らしていき、この時ベクトル行列の一つ一つのベクトル状態量がどの状態に最もマッチングしているか検索する。状態全体のマッチング和が閾値より小さくなったときの状態群を状態特徴マップとするものである。
またジェスチャを追加する場合は、以前の状態特徴マップと上記処理を行った状態特徴マップで比較し、状態の統合・追加・削除を行い新しい状態特徴マップを作成する。 The state feature map creating unit 34 has a function of creating a state feature map in order to represent a gesture by a transition of a representative vector feature amount (state).
That is, the state feature map has a function of collecting several representative vector feature amounts in the vector matrix converted into vector feature amounts.
Considering a typical vector feature as one state, initialize the number of states to the vector matrix and gradually reduce the number of states. At this time, each vector state amount in the vector matrix most closely matches which state. To find out. The state group when the matching sum of the entire state becomes smaller than the threshold is used as the state feature map.
When adding a gesture, the previous state feature map is compared with the state feature map subjected to the above processing, and the state is integrated, added, or deleted to create a new state feature map.

代表状態のマッチング和は差分和のノルムや積和ノルム、分散値を用いて作成される。
状態特徴マップは、例えば図５に示すような代表状態数毎にベクトル特徴量が存在する3次元のマップである。 The matching sum of the representative state is created using the norm of the difference sum, the product sum norm, and the variance value.
The state feature map is, for example, a three-dimensional map in which vector feature amounts exist for each number of representative states as shown in FIG.

状態重みマップ作成部３５は、各代表状態の重み比率マップを学習しておき、ジェスチャ認識の判定に使用する目的で、状態特徴マップにおける状態に対して、ベクトル行列の一つ一つのベクトル状態量が最もマッチングする状態をカウントし、重み比率でグラフ化した状態重みマップを作成する機能を有している。 The state weight map creation unit 35 learns the weight ratio map of each representative state and uses it for each of the states in the state feature map for the purpose of determining the gesture recognition. Has the function of counting the most matching states and creating a state weight map graphed by the weight ratio.

すなわち、前記ベクトル行列取得部３２からの時間的に連続したベクトル特徴量の行列及び状態特徴マップ作成部３４からの状態特徴マップから状態重みマップを作成する。
状態重みマップは、例えば図６に示す通りの代表状態毎の重みを全体との比率で表した2次元マップである。 That is, a state weight map is created from the matrix of temporally continuous vector feature amounts from the vector matrix acquisition unit 32 and the state feature map from the state feature map creation unit 34.
The state weight map is a two-dimensional map in which the weight for each representative state as shown in FIG.

これらのベクトル蓄積部３１、ベクトル行列取得部３２、ベクトル特徴量変換部３３、状態特徴マップ作成部３４は、例えば、コンピュータシステムにおける中央演算子、メモリ、記憶媒体、前記記憶媒体に格納したプログラムより構成される。
ジェスチャ要素登録手段３０は、前記連続して出力されたベクトルに基づいて、ベクトルの長さを正規化して、角度に重み付けしたベクトル特徴量から作成された状態特徴マップデータを格納し、前記連続した特徴量と状態特徴マップとから作成された状態重みマップデータを格納し、必要に応じて後段のジェスチャ認識手段に必要データを出力する構成とすることも可能である。 These vector storage unit 31, vector matrix acquisition unit 32, vector feature quantity conversion unit 33, and state feature map creation unit 34 are, for example, a central operator in a computer system, a memory, a storage medium, and a program stored in the storage medium. Composed.
The gesture element registration unit 30 normalizes the length of the vector based on the continuously output vectors, stores state feature map data created from vector feature amounts weighted to angles, and stores the continuous feature vectors. It is also possible to store the state weight map data created from the feature quantity and the state feature map and output the necessary data to the subsequent gesture recognition means as necessary.

ジェスチャ認識手段４０は、前記ジェスチャ要素登録手段３０に格納された状態特徴マップと状態重みマップを入力し、各状態位置を初期化する状態位置初期化部４１と、前記フレーム画像処理手段２０で出力されたベクトルをベクトル特徴量に変換し、さらに前記ジェスチャ要素登録手段３０に格納された状態マップを入力し、両者に基づいて、所定の状態およびその前後の状態の比較値として一時的に格納し、出力するベクトル特徴量マッチング部４２と、前記ベクトル特徴量マッチング部４２における比較値に基づいて、前記初期化された状態位置と前記比較値により各状態の推移を判定し、出力する状態遷移判定部４３と、前記各状態における推移と状態重みマップとを比較して、所定の閾値によりジェスチャ認識フラグを成立させるか否かを判定するジェスチャ判定部４４とから構成される。 The gesture recognition unit 40 receives the state feature map and the state weight map stored in the gesture element registration unit 30 and outputs the state position initialization unit 41 for initializing each state position and the frame image processing unit 20. The converted vector is converted into a vector feature amount, and a state map stored in the gesture element registration unit 30 is input, and temporarily stored as a comparison value between a predetermined state and a state before and after the predetermined state based on both. , A vector feature value matching unit 42 to output, and a state transition determination to determine and output the transition of each state based on the initialized state position and the comparison value based on the comparison value in the vector feature value matching unit 42 The unit 43 compares the transition in each state and the state weight map, and establishes a gesture recognition flag with a predetermined threshold. Comprised determines the gesture determination unit 44 whether.

状態位置初期化部４１は、ジェスチャ解析処理で得られた（またはジェスチャ要素登録手段に予め格納されている）状態特徴マップと状態重みマップとから構成されるモーション特徴マップを読み込み、状態位置を初期状態に設定する機能を有している。すなわち、状態位置初期化部４１は、認識するジェスチャの特徴マップを読み込んで、状態遷移認識の準備をするために状態位置を初期状態に設定する。状態位置初期化部４１の機能により、モーション特徴マップ（状態特徴マップ＋状態重みマップ）がメモリ上に存在し、状態位置が初期位置に設定される。 The state position initialization unit 41 reads a motion feature map composed of a state feature map and a state weight map obtained by the gesture analysis process (or stored in advance in the gesture element registration unit), and initializes the state position. It has a function to set the state. That is, the state position initialization unit 41 reads the feature map of the gesture to be recognized, and sets the state position to the initial state in order to prepare for state transition recognition. With the function of the state position initialization unit 41, a motion feature map (state feature map + state weight map) exists in the memory, and the state position is set as the initial position.

ベクトル特徴量マッチング部４２は、前記フレーム画像処理手段２０で出力されたベクトルを前記ベクトル行列取得部３２と同様にベクトル特徴量に変換する認識用ベクトル特徴量変換部４２Ａと、前記ジェスチャ要素登録手段３０に格納された状態マップを入力し、両者に基づいて、所定の状態およびその前後の状態の比較値として一時的に格納し、出力するマッチング部４２Ｂとから構成されている。
具体的には、状態位置初期化部４１の機能によりメモリ上に存在するモーション特徴マップ（状態特徴マップ＋状態重みマップ）と、認識用ベクトル特徴量変換部４２Ａからリアルタイムに出力されるベクトル特徴量とを比較して、この比較値をリアルタイムに連続して出力する。
比較値は、例えば図７に示す通り、差分和のノルムの他に積和ノルムや分散値を用いて取得する。好ましい実施形態においては、比較は現在の状態と一つ前の状態、次の状態の３つ行う。 The vector feature amount matching unit 42 includes a recognition vector feature amount conversion unit 42A that converts the vector output from the frame image processing unit 20 into a vector feature amount in the same manner as the vector matrix acquisition unit 32, and the gesture element registration unit. The matching unit 42B is configured to input the state map stored in 30, and temporarily store and output as a comparison value between a predetermined state and a state before and after the predetermined state based on both.
Specifically, the motion feature map (state feature map + state weight map) existing in the memory by the function of the state position initialization unit 41, and the vector feature amount output in real time from the recognition vector feature amount conversion unit 42A And the comparison value is continuously output in real time.
For example, as shown in FIG. 7, the comparison value is acquired using a product sum norm or a variance value in addition to the norm of the difference sum. In the preferred embodiment, three comparisons are made: the current state, the previous state, and the next state.

状態遷移判定部４３は、状態特徴マップにおける状態の遷移を行うためベクトル特徴マッチングで得られた比較値を元に、閾値によって状態を進行、滞在、後退と遷移する機能。また、各状態にカウンタを設け滞在値として保存する機能を有している。 The state transition determination unit 43 has a function of transitioning the state between advance, stay, and retreat according to a threshold value based on a comparison value obtained by vector feature matching in order to perform state transition in the state feature map. Each state has a function of providing a counter and storing it as a stay value.

すなわち、状態遷移判定部４３は、前記ジェスチャ要素登録手段３０に格納された状態特徴マップを読み込み、さらに前記状態遷移判定部４３により判定されたベクトル特徴マッチング比較値より、状態特徴マップにおける状態の遷移を行う。具体的には、例えば、図８に示す通り次の状態におけるマッチング度が高い場合は状態を進行する。滞在値は遷移した状態位置の値を１とする。現在の状態のマッチング度が高い場合は状態を滞在する。滞在値は状態位置に＋１する。上記２つを外れた場合、滞在値は状態位置に−１する。０になったとき、状態を後退する。これらを元に状態遷移を滞在グラフとして出力し、保存する。 That is, the state transition determination unit 43 reads the state feature map stored in the gesture element registration unit 30, and further uses the vector feature matching comparison value determined by the state transition determination unit 43 to change the state transition in the state feature map. I do. Specifically, for example, when the degree of matching in the next state is high as shown in FIG. 8, the state is advanced. The stay value is 1 as the value of the transitioned state position. If the matching degree of the current state is high, the state is stayed. The stay value is incremented by 1 to the state position. If the above two are not satisfied, the stay value is decremented by 1 to the state position. When it reaches 0, the state is retracted. Based on these, the state transition is output as a stay graph and stored.

より具体的には、初めに状態数分のゼロ行列を作り（状態の初期化[0 0 0 0 0 ・・・]）、フレーム処理で得られるベクトルに対し、現状の状態と前後の状態の3つを比較対象として、それぞれの比較値を求める。次の状態の一致度が高い場合は進行する。進行した状態の値を1にする。現状の状態の一致度が高い場合は滞在、現状の状態の値を+1する。前の状態の一致度が高い時は現状の状態の値を-1とし、0になったとき状態が後退する。実際には一致度に閾値を設けている。
より具体的には、特徴状態の遷移のみで、状態を認識することが考えられるが、この場合には少しの誤差であっても状態が後退してしまう場合があり、正しいジェスチャ認識ができない場合がある。そのため、状態が後退する場合に少しウエイトをかけてから後退するように修正して誤差を吸収することができた。
また、最終状態に到達した瞬間にジェスチャ成立となってしまうため、誤認識する可能がある。そこで、前記した滞在値と状態重みを導入し、各状態における滞在比率でジェスチャの成立判断をすることとした。
これにより、瞬間的なノイズに強く、滞在比率により動作速度が遅くても速くても同様に取り扱うことが可能となった。 More specifically, a zero matrix corresponding to the number of states is first created (initialization of states [0 0 0 0 0...), And the current state and the previous and next states are compared with the vector obtained by frame processing. Each comparison value is obtained by comparing three. Proceeds when the degree of matching of the next state is high. Set the value of the advanced state to 1. If the degree of coincidence of the current state is high, the stay and the current state value are incremented by one. When the degree of coincidence with the previous state is high, the value of the current state is set to -1, and when it becomes 0, the state moves backward. In practice, a threshold is set for the degree of coincidence.
More specifically, it is conceivable that the state is recognized only by the transition of the feature state, but in this case, even if there is a slight error, the state may retreat, and correct gesture recognition cannot be performed. There is. Therefore, it was possible to absorb the error by correcting the state so that the state moves backward after a little weight.
Moreover, since the gesture is established at the moment when the final state is reached, there is a possibility of erroneous recognition. Therefore, the stay value and the state weight described above are introduced, and the establishment of the gesture is determined based on the stay ratio in each state.
As a result, it is resistant to instantaneous noise and can be handled in the same manner whether the operation speed is slow or fast depending on the stay ratio.

ジェスチャ判定部４４は、ジェスチャの成立判定を行うため前記各状態における推移と状態重みマップとを比較して、所定の閾値によりジェスチャ認識フラグを成立させるか否かを判定する。
すなわち、ジェスチャ判定部４４は、前記ジェスチャ要素登録手段３０に格納された状態特徴マップを読み込み、さらに状態遷移判定部４３からの各状態における滞在値を、例えば図９に示す通り、差分和のノルムまたは積和ノルムや分散値を用いて比較してジェスチャ認識フラグを成立するか否かを判断する。 The gesture determination unit 44 compares the transition in each state with the state weight map to determine whether or not to establish a gesture, and determines whether or not to establish a gesture recognition flag based on a predetermined threshold.
That is, the gesture determination unit 44 reads the state feature map stored in the gesture element registration unit 30, and further calculates the stay value in each state from the state transition determination unit 43, for example, as shown in FIG. Alternatively, it is determined whether or not the gesture recognition flag is established by comparison using the product-sum norm or the variance value.

以上のように構成された本発明のジェスチャ認識装置は、従来技術と相違してジェスチャの始めと終わりを何らかの方法で指定することなしに、なおかつジェスチャの動作速度に無関係に容易にジェスチャを認識することが可能である。
また、本発明のジェスチャ認識装置は、認識する際のデータ量が少なく、また認識すべきジェスチャを簡単に登録可能である。
さらに、本発明の別の実施形態において、認識すべき要素をＲＯＭ、ハードディスクなの記憶手段に予め登録しておくことにより、簡単な構成でリアルタイムでジェスチャを認識可能なジェスチャ認識装置を構成することが可能となる。 The gesture recognition apparatus of the present invention configured as described above easily recognizes a gesture without specifying the start and end of the gesture in any way, and regardless of the operation speed of the gesture, unlike the related art. It is possible.
The gesture recognition apparatus of the present invention has a small amount of data for recognition, and can easily register a gesture to be recognized.
Furthermore, in another embodiment of the present invention, a gesture recognition device capable of recognizing a gesture in real time with a simple configuration can be configured by previously registering elements to be recognized in a storage means such as a ROM or a hard disk. It becomes possible.

さらに、本発明の好ましい実施形態では、取得したベクトルを連続して表示するための表示手段（図示せず）を備えることが可能である。このように取得したベクトルを連続して表示することにより、ユーザは、ジェスチャの結果を視覚的に確認することが可能である。また、当該表示手段で関連したジェスチャを予め表示させることにより、ユーザは認識させるべきジェスチャを忘れた場合でも前記表示手段により表示させることにより、対応可能である。
さらに、本発明のジェスチャ認識装置は、単一のジェスチャだけでなく、例えば複数のジェスチャについて各々解析し状態特徴マップを保存しておき、認識手段を並列に処理することで適合するジェスチャを認識することも可能である。 Furthermore, in a preferred embodiment of the present invention, it is possible to provide display means (not shown) for continuously displaying the acquired vectors. By continuously displaying the acquired vectors in this way, the user can visually confirm the result of the gesture. Further, by displaying the related gesture in advance by the display unit, the user can cope with the case by forgetting the gesture to be recognized by the display unit.
Furthermore, the gesture recognition apparatus of the present invention recognizes not only a single gesture but also a suitable gesture by, for example, analyzing each of a plurality of gestures, storing a state feature map, and processing the recognition means in parallel. It is also possible.

なお、本発明の別の実施形態では、オンラインに所定の動作を行う装置を操作するオンライン動作システムであって、本発明のジェスチャ認識装置と、前記ジェスチャ認識装置と公衆回線を介して接続され、前記ジェスチャ認識装置の認識結果に応じて所定の動作を行う装置とから構成されたオンライン動作システムが提供される。 In another embodiment of the present invention, an online operation system that operates a device that performs a predetermined operation online, is connected to the gesture recognition device of the present invention, the gesture recognition device via a public line, An online operation system is provided that includes an apparatus that performs a predetermined operation according to a recognition result of the gesture recognition apparatus.

なお、本発明におけるオンラインに所定の動作を行う装置とは、何らかの入力があった際に動作を行う装置であり、オンラインにより本発明のジェスチャ認識装置と接続可能であれば特に限定されるものではない。具体的には、ホームセキュリティ分野、風呂、エアコン装置、老人介護システムやカーナビ、テレビ、オーディオ、ＤＶＤ、ビデオ、テレビ、照明機器、ゲームなどのリモコン・スイッチ・操作系全般が対象である。 Note that a device that performs a predetermined operation online in the present invention is a device that operates when there is some input, and is not particularly limited as long as it can be connected to the gesture recognition device of the present invention online. Absent. Specifically, it covers the home security field, baths, air conditioners, elderly care systems, car navigation systems, televisions, audios, DVDs, videos, televisions, lighting equipment, games, etc.

（動作：ジェスチャ認識方法）
次に本発明のジェスチャ認識装置の動作、すなわちジェスチャ認識方法を、図１０〜図１１に基づいて説明する。
図１０Ａは、本発明におけるジェスチャ登録方法を示すフローチャートであり、図１０Ｂは、本発明におけるジェスチャ認識方法を示すフローチャートである。図１１は、図１０Ａの工程Ｓ２０６を説明するためのグラフである。 (Operation: Gesture recognition method)
Next, the operation of the gesture recognition apparatus of the present invention, that is, the gesture recognition method will be described with reference to FIGS.
FIG. 10A is a flowchart showing a gesture registration method according to the present invention, and FIG. 10B is a flowchart showing a gesture recognition method according to the present invention. FIG. 11 is a graph for explaining step S206 of FIG. 10A.

本発明におけるジェスチャ登録方法は、図１０Ａに示す通りまず、撮像手段からのジェスチャをフレーム画像として連続的に入力する（Ｓ２０１）。 In the gesture registration method according to the present invention, as shown in FIG. 10A, first, a gesture from the imaging means is continuously input as a frame image (S201).

次いで、工程Ｓ２０１で連続的に入力されたフレーム画像から移動物体の動作方向をベクトルとして取得し、（Ｓ２０２）、取得したベクトルを連続して出力する（Ｓ２０３）。
なお、Ｓ２０２、Ｓ２０３は、従来公知の技術により、撮像手段からの連続した画像、例えばＲＧＢイメージ画像を取り込み、解像度を変換（低解像度化）、L*a*b*イメージ変換し、ジェスチャの動作主体である移動体を検出し、平滑処理し、２値化し、クラスタリングし、トラッキングした後に、移動物体の動作方向をベクトルとして取得することができる。
なお、前記移動体の検出は、従来公知の方法、例えばフレーム差分法により行うことができる。 Next, the motion direction of the moving object is acquired as a vector from the frame images continuously input in step S201 (S202), and the acquired vector is continuously output (S203).
In S202 and S203, a continuous image from an imaging unit, for example, an RGB image image is captured by a conventionally known technique, the resolution is converted (lower resolution), the L * a * b * image is converted, and the gesture operation is performed. After detecting the moving body that is the subject, smoothing, binarizing, clustering, and tracking, the moving direction of the moving object can be acquired as a vector.
The moving object can be detected by a conventionally known method, for example, a frame difference method.

次いで、工程Ｓ２０４において、前記連続して出力されたベクトルを蓄積し、蓄積されたベクトルからベクトル行列を取得し（Ｓ２０５）、取得したベクトル行列からベクトルの長さを正規化して、角度に重み付けしたベクトル特徴量に変換し（Ｓ２０６）、連続した特徴量から状態特徴マップを作成し格納し（Ｓ２０７）、そして前記連続した特徴量と状態特徴マップとから状態重みマップを作成し格納する（Ｓ２０８）。
すなわち、工程Ｓ２０４〜Ｓ２０８は、ジェスチャ単位でベクトルを蓄積、行列化したものを入力とし、ベクトル特徴量に変換、状態特徴マップと状態重みマップから成るモーション特徴マップを作成し出力する工程である。 Next, in step S204, the continuously output vectors are accumulated, a vector matrix is obtained from the accumulated vectors (S205), the vector length is normalized from the obtained vector matrix, and the angle is weighted Conversion into vector feature amounts (S206), a state feature map is created and stored from the continuous feature amounts (S207), and a state weight map is created and stored from the continuous feature amounts and the state feature map (S208). .
In other words, steps S204 to S208 are steps in which a vector obtained by accumulating and matrixing each gesture is input, converted into a vector feature amount, and a motion feature map including a state feature map and a state weight map is generated and output.

工程Ｓ２０６において、同じ方向の角度の値を1とし、逆向きの角度の値を0とし、x軸0〜360およびy軸0〜1のグラフとしてマッピングすることが好ましい。
すなわち、図１１に示す通り、ベクトルとは方向を示すものであり、0〜360度の中の1点だけが求まっている状態である。これに対して、周りの角度に重みを持たせることで、ベクトル特徴量への変換が容易になる。 In step S206, it is preferable that the value of the angle in the same direction is 1 and the value of the angle in the reverse direction is 0, and mapping is performed as a graph of x-axis 0 to 360 and y-axis 0 to 1.
That is, as shown in FIG. 11, the vector indicates the direction, and only one point from 0 to 360 degrees is obtained. On the other hand, by giving weights to surrounding angles, conversion to vector feature amounts is facilitated.

このように図１０Ａに示す各工程により、本発明のジェスチャ認識装置にジェスチャの認識要素を容易に登録可能である。還元すると、このジェスチャ登録方法は、本発明のジェスチャ認識方法のためのジェスチャの解析方法と言える。すなわち、これらの認識要素の取得は、以下に説明するジェスチャ認識方法に使用するための準備段階とも言える。しかしながら、本発明の特定の実施形態において、予めジェスチャ認識方法を登録しておくことも可能である。
次に、本発明のジェスチャ認識装置において、ジェスチャを認識する方法について図１０Ｂに基づいて説明する。
まず、工程Ｓ１０１において、状態特徴マップと状態重みマップを入力し、各状態位置を初期化する（Ｓ１０２）。
次いで、フレーム画像を入力し（Ｓ１０３）、移動体の動作方向をベクトルとして取得し（Ｓ１０４）、そして前記工程Ｓ２０６と同様に取得したベクトル行列からベクトルの長さを正規化して、角度に重み付けしたベクトル特徴量に変換する（Ｓ１０５）。
そして、このベクトル特徴量と各状態における推移と状態重みマップとを比較して、所定の閾値によりジェスチャ認識フラグを成立させるか否かを判定する（すなわち、得られたベクトル特徴量がジェスチャとして成立するか否かをマッチングする）（Ｓ１０６）。
この際、入力した状態マップとベクトル特徴量に基づいて、所定の状態およびその前後の状態の比較値として一時的に格納し、出力し（Ｓ１０７）、前記比較値に基づいて、前記初期化された状態位置と前記比較値により各状態の推移を判定し、前記各状態における推移と状態重みマップとを比較して、所定の閾値によりジェスチャ認識フラグを成立させるか否かを判定する。
ジェスチャ認識フラグが成立した場合には（Ｙｅｓ）、ジェスチャ認識を完了する（エンド）。一方、ジェスチャ認識フラグが成立しない場合には、工程１０３へ移行して
フレーム画像を入力する。 As described above, the gesture recognition elements can be easily registered in the gesture recognition device of the present invention by the steps shown in FIG. 10A. In other words, this gesture registration method can be said to be a gesture analysis method for the gesture recognition method of the present invention. That is, acquisition of these recognition elements can be said to be a preparation stage for use in the gesture recognition method described below. However, in a specific embodiment of the present invention, it is possible to register a gesture recognition method in advance.
Next, a method for recognizing a gesture in the gesture recognition device of the present invention will be described with reference to FIG. 10B.
First, in step S101, a state feature map and a state weight map are input, and each state position is initialized (S102).
Next, a frame image is input (S103), the moving direction of the moving body is acquired as a vector (S104), and the length of the vector is normalized from the acquired vector matrix in the same manner as in step S206, and the angle is weighted. Conversion into a vector feature amount (S105).
Then, this vector feature amount is compared with the transition in each state and the state weight map, and it is determined whether or not the gesture recognition flag is established based on a predetermined threshold (that is, the obtained vector feature amount is established as a gesture). (S106).
At this time, based on the input state map and the vector feature amount, it is temporarily stored and output as a comparison value between the predetermined state and the state before and after it (S107), and is initialized based on the comparison value. The transition of each state is determined based on the state position and the comparison value, the transition in each state is compared with the state weight map, and it is determined whether or not the gesture recognition flag is established based on a predetermined threshold.
When the gesture recognition flag is established (Yes), the gesture recognition is completed (end). On the other hand, if the gesture recognition flag is not established, the process proceeds to step 103 and a frame image is input.

なお、工程Ｓ１０２における初期化が状態数分のゼロ行列を作成することによって行われ（段落００４５参照）、前記工程Ｓ１０８における推移の判定がフレーム処理で得られるベクトルに対し、現状の状態と前後の状態の3つを比較対象として、それぞれの比較値を次の状態の一致度が高い場合は進行すると判定し、進行した状態の値を1とし、現状の状態の一致度が高い場合は滞在とし、現状の状態の値を+1とし、前の状態の一致度が高い時は現状の状態の値を-1とし、0になったとき状態が後退すると判定することが好ましい。
このように判断するのは、前述の通り、瞬間的なノイズに強く、様々な動作速度に対応して認識処理することが可能だからである。
また、前記工程Ｓ１０８における推移の判定が所定の閾値に基づいて行うことができる。 The initialization in step S102 is performed by creating zero matrices corresponding to the number of states (see paragraph 0045), and the transition determination in step S108 is performed with respect to the vector obtained by the frame processing and Three of the states are compared, and each comparison value is determined to be advanced if the degree of coincidence of the next state is high, the value of the advanced state is set to 1, and if the degree of coincidence of the current state is high, it is regarded as stay Preferably, the current state value is set to +1, the current state value is set to -1 when the degree of coincidence of the previous state is high, and the state is determined to retreat when it becomes 0.
The reason for this determination is that, as described above, it is resistant to instantaneous noise and can perform recognition processing corresponding to various operating speeds.
Further, the transition determination in the step S108 can be performed based on a predetermined threshold value.

なお、本発明のジェスチャ認識方法において、図１０Ａのジェスチャ登録（解析）は、ジェスチャ認識をするための予備的ジェスチャ解析と捉えることができる。
すなわち、予めベクトル特徴量を作成し、状態特徴マップ、状態重みマップが作成され、利用可能であれば図１０Ａに示す登録処理を省略して図１０Ｂに示す認識処理を直接実行することが可能である。
以上のように構成された本発明のジェスチャ認識方法は、従来技術と相違してジェスチャの開始と終了を指定することなしに、動作速度に依存することなく簡単な手法でリアルタイムにジェスチャ認識することが可能となる。しかも、取り扱う情報も従来のジェスチャ認識方法に比較して非常に少ないので、装置に多大の負荷をかけずに（より安価な装置で）、ジェスチャ認識を行うことが可能である。 In the gesture recognition method of the present invention, the gesture registration (analysis) in FIG. 10A can be regarded as a preliminary gesture analysis for gesture recognition.
That is, a vector feature amount is created in advance, a state feature map and a state weight map are created, and if available, the registration process shown in FIG. 10A can be omitted and the recognition process shown in FIG. 10B can be directly executed. is there.
The gesture recognition method of the present invention configured as described above recognizes a gesture in real time in a simple manner without depending on the operation speed without specifying the start and end of the gesture, unlike the prior art. Is possible. Moreover, since the amount of information handled is very small compared to the conventional gesture recognition method, it is possible to perform gesture recognition without imposing a great load on the device (with a cheaper device).

また、前記ジェスチャ認識方法の各工程を実行するためのプログラムをコンピュータ可読媒体に格納することによって、例えばプレゼンテーションプログラム等のアプリケーションプログラムと関連付けして、ジェスチャに応じて、アプリケーションプログラムを動作させることも可能である。 Further, by storing a program for executing each step of the gesture recognition method in a computer-readable medium, it is possible to operate the application program according to the gesture in association with an application program such as a presentation program. It is.

以上、本発明の実施形態を説明したが、本発明はこれらに限定されるものではなく、幅広く適用できる。
例えば、本発明の撮像手段に、連続的に入力されたフレーム画像の解像度を変換して低解像度画像とする画像変換部と、前記画像変換手段で低解像度化した画像を、Ｌ^*ａ^*ｂ^*イメージ変換するＬ^*ａ^*ｂ^*イメージ変換部と、前記Ｌ^*ａ^*ｂ^*イメージ変換部でＬ^*ａ^*ｂ^*イメージ変換した画像から移動物体を検出する移動体検出部と、Ｌ^*ａ^*ｂ^*イメージ変換した画像を平滑化処理するスムージングフィルタと、前記スムージングフィルタで平滑化した画像を二値化する二値化処理部と、二値化した画像をクラスタリング処理するクラスタリング処理部と設けることが可能である。このような撮像手段は、本発明のジェスチャ認識装置に適用すると、後段の各種処理が軽減される。また、ジェスチャ認識以外に防犯用・駐車場管理装置などの移動体を検出する装置に使用可能。出力可能なデータが予め処理された二値化データであるので移動体を処理する際の負荷が軽減される。 As mentioned above, although embodiment of this invention was described, this invention is not limited to these, It can apply widely.
For example, an image conversion unit that converts the resolution of continuously input frame images into a low-resolution image in the imaging unit of the present invention, and an image that has been reduced in resolution by the image conversion unit is converted into L ^* a ^* b. ^* and L ^* a ^* b ^* image converter for image conversion, and the L ^* a ^* b ^* moving body detection unit for detecting a moving object from the L ^* a ^* b ^* image converted image by the image conversion unit, L ^* a smoothing filter for smoothing the image obtained by a ^* b ^* image conversion, a binarization processing unit for binarizing the image smoothed by the smoothing filter, and a clustering processing unit for clustering the binarized image; It is possible to provide. When such an imaging unit is applied to the gesture recognition apparatus of the present invention, various processes in the subsequent stage are reduced. In addition to gesture recognition, it can be used for devices that detect moving objects, such as crime prevention and parking management devices. Since the data that can be output is binarized data that has been processed in advance, the load when processing the moving object is reduced.

本発明のジェスチャ認識装置全体を示す概略図。BRIEF DESCRIPTION OF THE DRAWINGS Schematic which shows the whole gesture recognition apparatus of this invention. 図１に示すフレーム画像処理手段、ジェスチャ要素登録手段、ベクトル特徴量マッチング手段、およびジェスチャ認識手段を示す概略図。FIG. 2 is a schematic diagram illustrating a frame image processing unit, a gesture element registration unit, a vector feature amount matching unit, and a gesture recognition unit shown in FIG. 1. ベクトル蓄積部においてジェスチャをフレーム毎にベクトル分割した状態を示す図面。The figure which shows the state which divided the gesture into the vector for every frame in the vector storage part. 重みづけの例を示す図面。Drawing which shows the example of weighting. 状態特徴マップの一例を示す図面。The figure which shows an example of a state characteristic map. 状態重みマップの一例を示す図面。The figure which shows an example of a state weight map. ベクトル特徴量マッチング部における比較値を求める例を示す図面。The figure which shows the example which calculates | requires the comparison value in a vector feature-value matching part. 状態特徴マップにおける状態の遷移の一例を示す図面。The figure which shows an example of the transition of the state in a state feature map. ジェスチャ判断部により比較する滞在値を示す図面。The figure which shows the stay value compared by a gesture judgment part. 本発明におけるジェスチャ登録方法を示すフローチャート。The flowchart which shows the gesture registration method in this invention. 本発明におけるジェスチャ認識方法を示すフローチャート。The flowchart which shows the gesture recognition method in this invention. 図１０Ａの工程Ｓ２０６を説明するためのグラフ。The graph for demonstrating process S206 of FIG. 10A.

Explanation of symbols

１ジェスチャ認識装置
１０撮像手段
２０フレーム画像処理手段
３０ジェスチャ要素登録手段
３１ベクトル蓄積部
３２ベクトル行列取得部
３３ベクトル特徴量変換部
３４状態特徴マップ作成部
３５状態重みマップ作成部
４０ジェスチャ認識手段
４１状態位置初期化部
４２ベクトル特徴量マッチング部
４２Ａ認識用ベクトル特徴量変換部
４２Ｂマッチング部
４３状態遷移判定部
４４ジェスチャ判定部
DESCRIPTION OF SYMBOLS 1 Gesture recognition apparatus 10 Imaging means 20 Frame image processing means 30 Gesture element registration means 31 Vector accumulation | storage part 32 Vector matrix acquisition part 33 Vector feature-value conversion part 34 State feature map creation part 35 State weight map creation part 40 Gesture recognition means 41 State Position initialization unit 42 Vector feature value matching unit 42A Recognition vector feature value conversion unit 42B Matching unit 43 State transition determination unit 44 Gesture determination unit

Claims

A imaging means for continuously inputting a gesture as a frame image;
B, a vector acquisition unit that acquires a motion direction of a moving object as a vector from frame images continuously input from the imaging unit;
A frame image processing means comprising a vector output unit for continuously outputting the acquired vector;
C: normalizing the length of the vector based on the continuously output vector, storing state feature map data created from the vector feature amount weighted to the angle, and storing the continuous feature amount and state feature map Gesture element registration means for storing state weight map data created from
D a state position initialization unit that inputs a state feature map and a state weight map stored in the gesture element registration unit and initializes each state position;
The vector output by the frame image processing means is converted into a vector feature amount, continuously input, and further a state map stored in the gesture element registration means is input. Based on both, a predetermined state and its state A vector feature amount matching unit that temporarily stores and outputs the comparison value of the previous and subsequent states;
Based on the comparison value, a state transition determination unit that determines and outputs a transition of each state based on the initialized state position and the comparison value;
A gesture determination unit that compares the transition in each state with the state weight map and determines whether or not to establish a gesture recognition flag based on a predetermined threshold;
Gesture recognition means comprising:
A gesture recognition device comprising:

The gesture element registration means includes:
A vector accumulating unit for accumulating the continuously output vectors;
A vector matrix obtaining unit for obtaining a vector matrix from the accumulated vectors;
A vector feature amount conversion unit that normalizes the length of the vector from the acquired vector matrix and converts it into a vector feature amount weighted to an angle;
A state feature map creating unit for creating and storing a state feature map from continuous feature quantities;
A state weight map creating unit for creating and storing a state weight map from the continuous feature quantity and the state feature map;
The gesture recognition device according to claim 1, comprising:

The gesture recognition apparatus according to claim 1, further comprising display means for continuously displaying the acquired vectors.

A gesture recognition method for recognizing a gesture using the gesture recognition device according to any one of claims 1 to 3.
Inputting a state feature map and a state weight map acquired in advance (S101);
A step (S102) of initializing each state position;
Continuously inputting a gesture from the imaging means as a frame image (S103);
Obtaining a moving direction of a moving object as a vector from frame images continuously input from the imaging means (S104);
A step of continuously outputting the acquired vector (S105);
Converting the output vector into a vector feature (S106);
Each of the converted vector feature values is continuously input, and a state map prepared in advance is further input, and based on both, temporarily stored as a comparison value between a predetermined state and a state before and after that, and output A step (S107) of performing,
Based on the comparison value, the transition of each state is determined based on the initialized state position and the comparison value, the transition in each state is compared with the state weight map, and a gesture recognition flag is set based on a predetermined threshold. A step of determining whether or not to establish (S108);
Including
If the gesture recognition flag is established in the step S108, the gesture recognition is completed. If the gesture recognition flag is not established, the gesture recognition method moves to the step (S103).

The initialization is performed by creating a zero matrix for the number of states,
For the vector obtained by the frame processing for the transition determination, the current state and the previous and next states are compared, and each comparison value is determined to proceed when the degree of coincidence of the next state is high. The value of the completed state is 1, and if the degree of coincidence of the current state is high, it is stayed, the value of the current state is +1,
5. The gesture recognition method according to claim 4, wherein when the degree of coincidence of the previous state is high, the value of the current state is set to -1, and when it becomes 0, it is determined that the state is retreated.

A gesture registration method for registering an input gesture using the gesture recognition device according to any one of claims 1 to 3.
Continuously inputting a gesture from the imaging means as a frame image (S201);
Obtaining a moving direction of a moving object as a vector from frame images continuously input from the imaging means (S202);
A step of continuously outputting the acquired vector (S203);
Storing the continuously output vectors (S204);
Obtaining a vector matrix from the accumulated vectors (S205);
Normalizing the length of the vector from the acquired vector matrix and converting it into a vector feature quantity weighted to an angle (S206);
Creating and storing a state feature map from continuous feature quantities (S207);
Creating and storing a state weight map from the continuous feature quantity and the state feature map (S208);
The gesture registration method characterized by including.

7. In the vector conversion step, the angle value in the same direction is set to 1, the angle value in the opposite direction is set to 0, and mapping is performed as a graph of x-axis 0 to 360 and y-axis 0 to 1. The gesture registration method described in.

A computer-readable medium in which a program for executing the process according to any one of claims 4 to 7 is stored.

An online operation system for operating a device that performs a predetermined operation online, wherein the gesture recognition device according to any one of claims 1 to 3 is connected to the gesture recognition device via a public line. An online operation system comprising an apparatus that performs a predetermined operation in accordance with a recognition result of the gesture recognition apparatus.