JP4992618B2

JP4992618B2 - Gesture recognition device and gesture recognition method

Info

Publication number: JP4992618B2
Application number: JP2007230846A
Authority: JP
Inventors: 潤一羽斗
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2007-09-05
Filing date: 2007-09-05
Publication date: 2012-08-08
Anticipated expiration: 2027-09-05
Also published as: JP2009064199A

Description

本発明は、ジェスチャー認識装置及びジェスチャー認識方法に関する。 The present invention relates to a gesture recognition device and a gesture recognition method.

ジェスチャーの意味内容をコンピュータに認識させるジェスチャー認識装置が知られている。例えば、特許文献１には、ジェスチャーを行う者を撮像することによって得られたフレーム画像を複数の画像ブロックに分割し、画像ブロック毎にフレーム間の変化を示す動き量を算出し、ブロック毎の動き量に基づいて、ジェスチャーを判別する装置が開示されている。
特開２００６−２３５７７１号公報 2. Description of the Related Art A gesture recognition device that makes a computer recognize the meaning content of a gesture is known. For example, in Patent Document 1, a frame image obtained by capturing an image of a person who performs a gesture is divided into a plurality of image blocks, a motion amount indicating a change between frames is calculated for each image block, and each block is calculated. An apparatus for discriminating a gesture based on a movement amount is disclosed.
JP 2006-235771 A

しかしながら、特許文献１に記載された遠隔操作装置では、例えば、撮影条件によるノイズの影響が考慮されておらず、また、分割されたブロック全体からジェスチャーを認識していないために、ジェスチャーの認識精度に問題があった。 However, in the remote control device described in Patent Document 1, for example, the influence of noise due to shooting conditions is not considered, and the gesture is not recognized from the entire divided block. There was a problem.

より詳細には、上述の遠隔操作装置では、ノイズの影響により動き量が増大した場合であっても、動き量が所定の閾値を超えてさえいれば、そのブロックに動きがあると判別してしまう。このため、ジェスチャーがない場合であっても、誤ってジェスチャーがあると認識してしまう可能性がある。例えば、撮影範囲に、テレビジョン等の常に動きのあるものを含む場合、ジェスチャーによる動きと認識すべきでないテレビジョン等の画面上の動きを、ジェスチャーによる動きと認識する結果、ジェスチャーがあると認識する可能性がある。 More specifically, in the above-described remote control device, even if the amount of movement increases due to the influence of noise, if the amount of movement exceeds a predetermined threshold, it is determined that there is movement in the block. End up. For this reason, even if there is no gesture, there is a possibility that a gesture is erroneously recognized. For example, if the shooting range includes something that is constantly moving, such as a television, it recognizes that there is a gesture as a result of recognizing the movement on the screen such as a television that should not be recognized as the movement due to the gesture as the movement due to the gesture. there's a possibility that.

本発明は、上記問題に鑑みてなされたものであり、認識精度が高いジェスチャー認識装置及びジェスチャー認識方法を提供することを目的とする。 The present invention has been made in view of the above problems, and an object thereof is to provide a gesture recognition device and a gesture recognition method with high recognition accuracy.

上記目的を達成するために、本発明の第１の観点に係るジェスチャー認識装置は、
撮像した対象の動作に基づくジェスチャーを認識するジェスチャー認識装置において、
ジェスチャーの認識条件を設定する認識条件設定手段と、
撮像された画像を複数の領域に分割した分割領域毎に、分割領域内の各画素の輝度の変化量に基づいて動き量を検出する動き量検出手段と、
前記動き量検出手段により検出された動き量が所定の閾値以上である分割領域を動き領域として検出する動き領域検出手段と、
前記動き領域検出手段により検出された動き領域が、ジェスチャーの認識に採用すべき特定領域であるか否かを判別する特定領域判別手段と、
少なくとも、前記特定領域判別手段により特定領域であると判別された動き領域と、前記認識条件設定手段により設定されたジェスチャーの認識条件とに基づいて、ジェスチャーを認識するジェスチャー認識手段と、
を備える、ことを特徴とする。 In order to achieve the above object, a gesture recognition apparatus according to the first aspect of the present invention includes:
In a gesture recognition device for recognizing a gesture based on an action of an imaged object,
Recognition condition setting means for setting gesture recognition conditions;
A motion amount detection means for detecting a motion amount based on the amount of change in luminance of each pixel in the divided region for each divided region obtained by dividing the captured image into a plurality of regions;
A motion region detection unit that detects a segmented region in which a motion amount detected by the motion amount detection unit is equal to or greater than a predetermined threshold as a motion region;
Specific area determination means for determining whether or not the movement area detected by the movement area detection means is a specific area to be adopted for gesture recognition;
Gesture recognition means for recognizing a gesture based on at least a motion area determined to be a specific area by the specific area determination means and a gesture recognition condition set by the recognition condition setting means;
It is characterized by comprising.

前記認識条件設定手段は、前記所定の閾値を設定する閾値設定手段を備え、
前記動き領域検出手段は、前記動き量検出手段により検出された動き量が前記閾値設定手段により設定された閾値以上である分割領域を動き領域として検出してもよい。 The recognition condition setting means includes threshold setting means for setting the predetermined threshold,
The motion region detection unit may detect a divided region in which the motion amount detected by the motion amount detection unit is equal to or more than a threshold set by the threshold setting unit as a motion region.

前記閾値設定手段は、順次撮像された画像の各画素の輝度の変化量に基づいて閾値を設定してもよい。 The threshold value setting means may set the threshold value based on the amount of change in luminance of each pixel of sequentially captured images.

前記動き量検出手段は、分割領域内の各画素の輝度の変化量を自乗した値の総和を動き量として検出してもよい。 The motion amount detection means may detect a sum of values obtained by squaring the amount of change in luminance of each pixel in the divided region as a motion amount.

前記特定領域判別手段は、前記動き領域検出手段により所定時間以上継続して検出された動き領域を前記特定領域であると判別しなくともよい。 The specific area determination unit may not determine that the movement area detected by the movement area detection unit for a predetermined time or longer is the specific area.

前記特定領域判別手段は、前記動き領域検出手段により複数の動き領域が検出された場合に、前記動き領域検出手段により検出された動き領域のうち、前記複数の動き領域を含む所定の領域に含まれる動き領域を前記特定領域であると判別してもよい。 The specific area determination unit is included in a predetermined area including the plurality of movement areas among the movement areas detected by the movement area detection unit when a plurality of movement areas are detected by the movement area detection unit. The motion area to be moved may be determined as the specific area.

前記特定領域判別手段により特定領域であると判別された動き領域が複数存在する場合に、該特定領域であると判別された複数の動き領域の画像内の位置から、画像内におけるジェスチャーの動きの中心の位置を検出する中心位置検出手段、をさらに備え、
前記ジェスチャー認識手段は、前記中心位置検出手段により検出された中心の位置に基づいてジェスチャーを認識してもよい。 In the case where there are a plurality of motion areas determined to be specific areas by the specific area determination means, the movement of the gesture in the image is determined from the positions of the plurality of motion areas determined to be the specific areas in the image. A center position detecting means for detecting a center position;
The gesture recognition means may recognize a gesture based on the center position detected by the center position detection means.

撮像された複数の画像からジェスチャーの動きの方向を判別する動き方向判別手段と、
前記特定領域判別手段により特定領域であると判別された動き領域が複数存在する場合に、該特定領域であると判別された複数の動き領域のうち、前記中心位置検出手段により検出された動きの中心位置を基準として、前記動き方向判別手段により判別されたジェスチャーの動きの方向の最も離れた位置に存在する動き領域を検出する先端位置検出手段と、をさらに備え、
前記ジェスチャー認識手段は、前記先端位置検出手段により検出された最も離れた位置に存在する動き領域の画像内の位置に基づいてジェスチャーを認識してもよい。 Movement direction determination means for determining the direction of movement of a gesture from a plurality of captured images;
In a case where there are a plurality of motion areas determined to be the specific area by the specific area determination unit, the motion detected by the center position detection unit among the plurality of motion areas determined to be the specific area. Tip position detecting means for detecting a movement region existing at the most distant position in the direction of movement of the gesture determined by the movement direction determining means with reference to the center position; and
The gesture recognizing unit may recognize a gesture based on a position in the image of a motion region present at the most distant position detected by the tip position detecting unit.

順次撮像された画像の撮像時刻を取得する撮像時刻取得手段、をさらに備え、
前記ジェスチャー認識手段は、前記撮像時刻取得手段により取得された画像の撮像時刻に基づいて、前記特定領域判別手段により継続して前記動き領域が特定領域であると判別された時間を求め、該求められた時間に応じたジェスチャーとして認識してもよい。 Imaging time acquisition means for acquiring the imaging time of images taken sequentially, further comprising:
The gesture recognizing unit obtains the time when the specific region determining unit continuously determines that the motion region is a specific region based on the image capturing time of the image acquired by the imaging time acquisition unit, It may be recognized as a gesture according to the given time.

上記目的を達成するために、本発明の第２の観点に係るジェスチャー認識方法は、
撮像した対象の動作に基づくジェスチャーを認識するジェスチャー認識方法において、
ジェスチャーの認識条件を設定する認識条件設定ステップと、
撮像された画像を複数の領域に分割した分割領域毎に、分割領域内の各画素の輝度の変化量に基づいて動き量を検出する動き量検出ステップと、
前記動き量検出ステップで検出された動き量が所定の閾値以上である分割領域を動き領域として検出する動き領域検出ステップと、
前記動き領域検出ステップで検出された動き領域が、ジェスチャーの認識に採用すべき特定領域であるか否かを判別する特定領域判別ステップと、
少なくとも、前記特定領域判別ステップで特定領域であると判別された動き領域と、前記認識条件設定ステップで設定されたジェスチャーの認識条件とに基づいて、ジェスチャーを認識するジェスチャー認識ステップと、
を備える、ことを特徴とする。 In order to achieve the above object, a gesture recognition method according to the second aspect of the present invention includes:
In a gesture recognition method for recognizing a gesture based on an action of an imaged object,
A recognition condition setting step for setting a recognition condition for the gesture;
A motion amount detection step for detecting a motion amount based on the amount of change in luminance of each pixel in the divided region for each divided region obtained by dividing the captured image into a plurality of regions;
A motion region detection step of detecting, as a motion region, a divided region in which the motion amount detected in the motion amount detection step is equal to or greater than a predetermined threshold;
A specific region determination step for determining whether or not the motion region detected in the motion region detection step is a specific region to be adopted for gesture recognition;
A gesture recognition step for recognizing a gesture based on at least a motion region determined to be a specific region in the specific region determination step and a gesture recognition condition set in the recognition condition setting step;
It is characterized by comprising.

本発明にかかるジェスチャー認識装置及びジェスチャー認識方法によれば、ジェスチャーの認識精度を向上することができる。 According to the gesture recognition apparatus and the gesture recognition method according to the present invention, it is possible to improve gesture recognition accuracy.

以下、図面を参照して、本発明の実施の形態に係るジェスチャー認識装置の構成と動作について説明する。 Hereinafter, the configuration and operation of a gesture recognition device according to an embodiment of the present invention will be described with reference to the drawings.

まず、図１を用いて、本発明の実施の形態に係るジェスチャー認識装置１００の物理的な構成を説明する。 First, the physical configuration of the gesture recognition apparatus 100 according to the embodiment of the present invention will be described with reference to FIG.

ジェスチャー認識装置１００は、図１に示すように、ＣＰＵ２０がＲＯＭ３０に格納されたプログラムを実行することにより実現されるもので、カメラ２００から供給されるスクリーン４００と被撮像者５００とを撮像範囲に含む画像データに基づいてジェスチャーを認識する。また、ジェスチャー認識装置１００は、認識したジェスチャーに対応するコマンドに応じた画像データをプロジェクタ３００に供給することにより、画像データに対応した画像をスクリーン４００に表示させる。 As shown in FIG. 1, the gesture recognition device 100 is realized by the CPU 20 executing a program stored in the ROM 30, and the screen 400 supplied from the camera 200 and the person to be imaged 500 are included in the imaging range. Recognize gestures based on the included image data. In addition, the gesture recognition apparatus 100 displays image corresponding to the image data on the screen 400 by supplying the projector 300 with image data corresponding to the command corresponding to the recognized gesture.

ジェスチャー認識装置１００は、バス１０と、ＣＰＵ（Central Processing Unit）２０と、ＲＯＭ（Read Only Memory）３０と、ＲＡＭ（Random Access Memory）４０と、Ｉ／Ｏ（Input Output）部５０と、ハードディスク６０と、入力バッファ７０と、出力バッファ８０と、タイマカウンタ９０とを備える。 The gesture recognition apparatus 100 includes a bus 10, a CPU (Central Processing Unit) 20, a ROM (Read Only Memory) 30, a RAM (Random Access Memory) 40, an I / O (Input Output) unit 50, and a hard disk 60. An input buffer 70, an output buffer 80, and a timer counter 90.

バス１０は、ジェスチャー認識装置１００内の各デバイスを相互に接続する。各デバイスは、バス１０を介して相互にデータを送受信する。 The bus 10 connects each device in the gesture recognition apparatus 100 to each other. Each device transmits and receives data to and from each other via the bus 10.

ＣＰＵ２０は、ＲＯＭ３０に格納されているプログラムを実行することにより、カメラ２００から供給された画像データを解析してジェスチャーを認識し、認識したジェスチャーに対応するコマンドを実行する。また、ＣＰＵ２０は、入力バッファ７０に供給された画像データや、Ｉ／Ｏ部５０を介して図示しない外部機器から供給されたデータに基づいて、ジェスチャーの認識条件を設定する。 The CPU 20 executes a program stored in the ROM 30, thereby analyzing the image data supplied from the camera 200 to recognize a gesture, and executes a command corresponding to the recognized gesture. Further, the CPU 20 sets gesture recognition conditions based on image data supplied to the input buffer 70 and data supplied from an external device (not shown) via the I / O unit 50.

ＲＯＭ３０は、ＣＰＵ２０の動作を制御するためのプログラムを記憶する。 The ROM 30 stores a program for controlling the operation of the CPU 20.

ＲＡＭ４０は、ＣＰＵ２０のワークエリアとして機能する。また、ＲＡＭ４０は、後述する閾値設定処理（ステップＳ５０）や連続ノイズ除去処理（ステップＳ６０）などで使用する各種変数を記憶する。さらに、ＲＡＭ４０は、入力バッファ７０から供給された１フレーム前の画像データと、現在のフレームの画像データとを記憶する。また、ＲＡＭ４０は、後述する画像データの取得時刻を記憶する。 The RAM 40 functions as a work area for the CPU 20. Further, the RAM 40 stores various variables used in a threshold setting process (step S50) and a continuous noise removal process (step S60) which will be described later. Further, the RAM 40 stores the image data of the previous frame supplied from the input buffer 70 and the image data of the current frame. The RAM 40 stores image data acquisition time, which will be described later.

Ｉ／Ｏ部５０は、図示しない外部機器と接続され、外部機器とのデータの入出力を行う。具体的には、Ｉ／Ｏ部５０は、プロジェクタ３００に表示させる画像データやジェスチャーの認識処理で使用するジェスチャーの認識条件を示すデータ等を外部機器から入力する。また、Ｉ／Ｏ部５０は、認識したジェスチャーに対応するコマンドに応じたデータを外部機器に出力してもよい。 The I / O unit 50 is connected to an external device (not shown) and inputs / outputs data to / from the external device. Specifically, the I / O unit 50 inputs image data to be displayed on the projector 300, data indicating a gesture recognition condition used in gesture recognition processing, and the like from an external device. Further, the I / O unit 50 may output data corresponding to the command corresponding to the recognized gesture to the external device.

ハードディスク６０は、Ｉ／Ｏ部５０を介して外部機器から供給された画像データやジェスチャーの認識条件を示すデータ等を記憶する。具体的には、ハードディスク６０は、プロジェクタ３００に表示させる複数ページ分の画像データや、ブロックのサイズや、後述するジェスチャーのパターンを示すパターンデータなどのジェスチャーの認識条件を示すデータ等を記憶する。 The hard disk 60 stores image data supplied from an external device via the I / O unit 50, data indicating gesture recognition conditions, and the like. Specifically, the hard disk 60 stores image data for a plurality of pages to be displayed on the projector 300, data indicating a recognition condition of a gesture such as a block size and pattern data indicating a gesture pattern to be described later.

ここで、ブロックは、１フレーム分の画像を構成する縦方向及び横方向に二次元に配置された画素を、縦方向及び横方向に所定の画素数毎に分割したときの画素の集まりである。例えば、縦方向に８画素毎に分割し、横方向に８画素毎に分割した場合、１ブロックは８×８＝６４画素から構成される。ジェスチャーを認識するにあたり、ブロック単位で画像の動きをとらえることにより、計算量を減らし、使用するＲＡＭの容量を小さくすることが可能となる。 Here, a block is a collection of pixels when pixels arranged two-dimensionally in the vertical and horizontal directions constituting an image for one frame are divided every predetermined number of pixels in the vertical and horizontal directions. . For example, if the vertical direction is divided every 8 pixels and the horizontal direction is divided every 8 pixels, one block is composed of 8 × 8 = 64 pixels. When recognizing a gesture, it is possible to reduce the amount of calculation and the capacity of the RAM to be used by capturing the movement of the image in units of blocks.

入力バッファ７０は、カメラ２００と接続され、カメラ２００から供給される画像データを記憶する。入力バッファ７０は、カメラ２００から１フレーム分の画像データが供給されたことを検出すると、バス１０を介してＣＰＵ２０に割り込み信号を送信する。 The input buffer 70 is connected to the camera 200 and stores image data supplied from the camera 200. When the input buffer 70 detects that one frame of image data has been supplied from the camera 200, the input buffer 70 transmits an interrupt signal to the CPU 20 via the bus 10.

出力バッファ８０は、プロジェクタ３００と接続され、記憶している画像データをプロジェクタ３００に供給する。出力バッファ８０が記憶する画像データは、ＣＰＵ２０の制御のもと、ハードディスク６０から供給される。 The output buffer 80 is connected to the projector 300 and supplies the stored image data to the projector 300. Image data stored in the output buffer 80 is supplied from the hard disk 60 under the control of the CPU 20.

タイマカウンタ９０は、例えば水晶発振子を有する回路から構成される。タイマカウンタ９０は、検出後タイマ、標準タイマ及び停止後タイマの値を格納するレジスタを内部に備え、各タイマの値をカウントアップする。検出後タイマは、各ブロックが動きブロックとして検出された時刻からの経過時間（すなわち、各ブロックの閾値以上の動き量が連続して検出される時間）を示すブロック毎のタイマである。標準タイマは、ジェスチャー認識処理を開始した時刻からの経過時間を示すタイマである。停止後タイマは、動きブロックが１つも検出されなくなった時刻からの経過時間を示すタイマである。 The timer counter 90 is composed of a circuit having a crystal oscillator, for example. The timer counter 90 includes a register for storing values of the post-detection timer, the standard timer, and the post-stop timer, and counts up the value of each timer. The post-detection timer is a timer for each block indicating an elapsed time from the time when each block is detected as a motion block (that is, a time during which a motion amount equal to or greater than the threshold of each block is continuously detected). The standard timer is a timer indicating an elapsed time from the time when the gesture recognition process is started. The post-stop timer is a timer indicating an elapsed time from the time when no motion block is detected.

カメラ２００は、スクリーン４００と被撮像者５００の手とを含む領域を撮像し、撮像した画像データを入力バッファ７０に供給する。 The camera 200 captures an area including the screen 400 and the hand of the person 500 to be captured, and supplies the captured image data to the input buffer 70.

プロジェクタ３００は、出力バッファ８０から供給された画像データに対応した画像をスクリーン４００に投影する。 The projector 300 projects an image corresponding to the image data supplied from the output buffer 80 on the screen 400.

スクリーン４００には、プロジェクタ３００から供給された画像データに応じた画像が投影される。 An image corresponding to the image data supplied from the projector 300 is projected on the screen 400.

被撮像者５００は、スクリーン４００の端に立ち、スクリーン４００に投影された画像に対して操作するようにジェスチャーを行う。 The person 500 to be imaged stands on the edge of the screen 400 and performs a gesture to operate the image projected on the screen 400.

次に、上述の構成により実現される機能的構成を説明する。 Next, a functional configuration realized by the above configuration will be described.

図２は、本発明の実施の形態に係るジェスチャー認識装置１００の機能ブロック図である。 FIG. 2 is a functional block diagram of gesture recognition apparatus 100 according to the embodiment of the present invention.

ジェスチャー認識装置１００は、画像入力部１１０、撮像時刻取得部１２０、認識条件設定部１３０、動き量検出部１４０、動きブロック検出部１５０、特定ブロック検出部１６０、ジェスチャー認識部１７０、コマンド処理部１８０、及び、画像出力部１８０の機能ブロックから構成される。 The gesture recognition apparatus 100 includes an image input unit 110, an imaging time acquisition unit 120, a recognition condition setting unit 130, a motion amount detection unit 140, a motion block detection unit 150, a specific block detection unit 160, a gesture recognition unit 170, and a command processing unit 180. , And functional blocks of the image output unit 180.

画像入力部１１０は、撮像部２１０が供給する画像データを入力する。 The image input unit 110 inputs image data supplied from the imaging unit 210.

時刻取得部１２０は、時刻を取得する。時刻取得部１２０は、取得した時刻に対応するデータをノイズ除去部１６１と、ジェスチャー認識部１７０とに供給する。また、時刻取得部１２０は、撮像時刻取得部１２１を備える。撮像時刻取得部１２１は、画像入力部１１０に画像データが供給された時刻である撮像時刻を取得し、撮像時刻に対応するデータをパターンマッチング部１７４に供給する。 The time acquisition unit 120 acquires time. The time acquisition unit 120 supplies data corresponding to the acquired time to the noise removal unit 161 and the gesture recognition unit 170. The time acquisition unit 120 includes an imaging time acquisition unit 121. The imaging time acquisition unit 121 acquires an imaging time that is the time when the image data is supplied to the image input unit 110, and supplies data corresponding to the imaging time to the pattern matching unit 174.

認識条件設定部１３０は、ジェスチャーの認識条件を設定する。認識条件設定部１３０は、閾値設定部１３１を備える。閾値設定部１３１は、画像入力部１１０から供給された画像データに基づいて、動きブロックを検出するための閾値を設定する。 The recognition condition setting unit 130 sets gesture recognition conditions. The recognition condition setting unit 130 includes a threshold setting unit 131. The threshold setting unit 131 sets a threshold for detecting a motion block based on the image data supplied from the image input unit 110.

動き量検出部１４０は、画像入力部１１０から供給された画像データに基づいて、ブロック毎に動き量を検出する。 The motion amount detection unit 140 detects the motion amount for each block based on the image data supplied from the image input unit 110.

動きブロック検出部１５０は、閾値設定部１３１により設定された閾値と、動き量検出部１４０により検出したブロック毎の動き量とに基づいて、動きブロックを検出する。 The motion block detection unit 150 detects a motion block based on the threshold set by the threshold setting unit 131 and the motion amount for each block detected by the motion amount detection unit 140.

特定ブロック検出部１６０は、動きブロック検出部１５０により検出された動きブロックから後述する特定ブロックを検出する。特定ブロック検出部１６０は、連続ノイズ除去部１６１と、微少ノイズ除去部１６２と、を備える。 The specific block detection unit 160 detects a specific block described later from the motion blocks detected by the motion block detection unit 150. The specific block detection unit 160 includes a continuous noise removal unit 161 and a minute noise removal unit 162.

連続ノイズ除去部１６１は、動きブロック検出部１５０により検出された動きブロックのうち、継続して動きブロックとして検出された動きブロックを、特定ブロックの候補から除外する。連続して動きブロックとして検出されたブロックは、ノイズによって誤って動きブロックとして検出されたブロックに過ぎず、ジェスチャーに起因するものではないと考えられるためである。 The continuous noise removal unit 161 excludes motion blocks that have been continuously detected as motion blocks from the motion blocks detected by the motion block detection unit 150 from the specific block candidates. This is because the blocks that are continuously detected as motion blocks are only blocks that are erroneously detected as motion blocks due to noise, and are not attributed to gestures.

微少ノイズ除去部１６２は、動きブロック検出部１５０により検出された動きブロックのうち、複数の動きブロックを含む所定の範囲外の動きブロックを、特定ブロックの候補から除外する。ジェスチャーに起因して動きブロックと検出されるブロックは、所定の範囲内において高い割合で検出されるものであり、所定の範囲外で検出された動きブロックはノイズによって誤って動きブロックとして検出されたブロックに過ぎないと考えられるためである。 The minute noise removal unit 162 excludes motion blocks outside a predetermined range including a plurality of motion blocks from the motion block detected by the motion block detection unit 150 from the specific block candidates. A block detected as a motion block due to a gesture is detected at a high rate within a predetermined range, and a motion block detected outside the predetermined range is erroneously detected as a motion block due to noise. This is because it is considered to be only a block.

ジェスチャー認識部１７０は、特定ブロック検出部１６０により検出された特定ブロックに基づいて、ジェスチャーを認識する。ジェスチャー認識部１７０は、中心位置検出部１７１と、先端位置検出部１７１と、軌跡検出部１７３と、パターンマッチング部１７０と、を備える。 The gesture recognition unit 170 recognizes a gesture based on the specific block detected by the specific block detection unit 160. The gesture recognition unit 170 includes a center position detection unit 171, a tip position detection unit 171, a locus detection unit 173, and a pattern matching unit 170.

中心位置検出部１７１は、特定ブロック検出部１６０により検出された特定ブロックから特定ブロックの中心位置を検出する。 The center position detection unit 171 detects the center position of the specific block from the specific block detected by the specific block detection unit 160.

先端位置検出部１７２は、特定ブロック検出部１６０により検出された特定ブロックから特定ブロックの先端位置を検出する。 The tip position detection unit 172 detects the tip position of the specific block from the specific block detected by the specific block detection unit 160.

軌跡検出部１７３は、中心位置検出部１７１により検出された中心位置の軌跡と、先端位置検出部１７２により検出された先端位置の軌跡とを検出する。 The locus detection unit 173 detects the locus of the center position detected by the center position detection unit 171 and the locus of the tip position detected by the tip position detection unit 172.

パターンマッチング部１７４は、軌跡検出部１７３により検出された中心位置の軌跡と、あらかじめ設定されているジェスチャーの各パターンに対応する中心位置の軌跡とのマッチングを行う。パターンマッチング部１７４は、同じパターンにマッチングした場合でも、撮像時刻取得部１２０により取得された撮像時刻から求められるジェスチャーの時間や軌跡検出部１７３により検出された先端位置の軌跡に応じて異なるコマンドを発行しても良い。 The pattern matching unit 174 performs matching between the locus of the center position detected by the locus detection unit 173 and the locus of the center position corresponding to each preset gesture pattern. Even when the pattern matching unit 174 matches the same pattern, the pattern matching unit 174 sends different commands depending on the gesture time obtained from the imaging time acquired by the imaging time acquisition unit 120 and the tip position locus detected by the locus detection unit 173. May be issued.

コマンド処理部１８０は、ジェスチャー認識部１７０により認識されたジェスチャーのパターンに対応したコマンドを実行する。 The command processing unit 180 executes a command corresponding to the gesture pattern recognized by the gesture recognition unit 170.

画像出力部１９０は、コマンド処理部１８０の制御のもとコマンドに応じた画像データを投影部３１０に供給する。 The image output unit 190 supplies image data corresponding to the command to the projection unit 310 under the control of the command processing unit 180.

続いて、ジェスチャー認識装置１００の動作を説明する。図１に示す各部が以下に説明する動作を実行することにより、図２に示す各機能ブロックが実現される。なお、理解を容易にするため、機能ブロックへの逐一の言及は行わないものとする。 Next, the operation of the gesture recognition device 100 will be described. Each unit shown in FIG. 1 executes the operation described below, thereby realizing each functional block shown in FIG. In order to facilitate understanding, the function blocks are not referred to one by one.

カメラ２００及びプロジェクタ３００の電源がオンしている状態において、Ｉ／Ｏ部５０を介して外部機器からジェスチャー認識処理の開始信号を受信すると、ＣＰＵ２０は、図３のフローチャートに示すジェスチャー認識処理を実行する。 When the gesture recognition processing start signal is received from the external device via the I / O unit 50 with the camera 200 and the projector 300 turned on, the CPU 20 executes the gesture recognition processing shown in the flowchart of FIG. To do.

まず、ＣＰＵ２０は、初期設定を行う（ステップＳ１０）。初期設定では、ＣＰＵ２０は、ＲＡＭ４０に格納された各種変数の初期化等を行う。具体的には、ＣＰＵ２０は、全ての検出後タイマを停止し、クリアする。また、ＣＰＵ２０は、標準タイマをクリアし、起動する。また、ＣＰＵ２０は、Ｉ／Ｏ部５０を介して外部機器から画像データやジェスチャーの認識条件を示すデータ等を入力し、入力したデータをハードディスク６０に記憶させる。 First, the CPU 20 performs initial setting (step S10). In the initial setting, the CPU 20 initializes various variables stored in the RAM 40. Specifically, the CPU 20 stops and clears all post-detection timers. Further, the CPU 20 clears the standard timer and starts it. In addition, the CPU 20 inputs image data, data indicating gesture recognition conditions, and the like from an external device via the I / O unit 50, and stores the input data in the hard disk 60.

ＣＰＵ２０は、初期設定が終了すると、画像表示を開始する（ステップＳ２０）。ＣＰＵ２０は、ハードディスク６０に記憶されている画像データのうち、スクリーン４００に表示させる初期画像に対応する画像データを出力バッファ８０に供給する。出力バッファ８０に供給された画像データは、直ちにプロジェクタ３００に供給され、プロジェクタ３００は供給された画像データに対応する画像をスクリーン４００に投影する。これらの動作により、画像出力部１９０と投影部３１０が実現される。 When the initial setting is completed, the CPU 20 starts image display (step S20). The CPU 20 supplies image data corresponding to the initial image to be displayed on the screen 400 among the image data stored in the hard disk 60 to the output buffer 80. The image data supplied to the output buffer 80 is immediately supplied to the projector 300, and the projector 300 projects an image corresponding to the supplied image data on the screen 400. By these operations, the image output unit 190 and the projection unit 310 are realized.

次に、ＣＰＵ２０は１フレーム分の画像データを取得する（ステップＳ３０）。入力バッファ７０は、カメラ２００から１フレーム分の画像データを供給されると、ＣＰＵ２０に割り込み信号を送信する。ＣＰＵ２０は、この割り込み信号を受信すると、入力バッファ７０に供給された１フレーム分の画像データをＲＡＭ４０に転送する。また、ＣＰＵ２０は、標準タイマが示す値を、現在のフレームの画像データを識別するデータとともにＲＡＭ４０に記憶する。これらの動作により、画像入力部１１０が実現される。 Next, the CPU 20 acquires image data for one frame (step S30). The input buffer 70 transmits an interrupt signal to the CPU 20 when image data for one frame is supplied from the camera 200. When receiving the interrupt signal, the CPU 20 transfers the image data for one frame supplied to the input buffer 70 to the RAM 40. Further, the CPU 20 stores the value indicated by the standard timer in the RAM 40 together with data for identifying the image data of the current frame. With these operations, the image input unit 110 is realized.

ＣＰＵ２０は、１フレーム分の画像データの取得が完了すると、閾値設定処理を実行する（ステップＳ４０）。ＣＰＵ２０は、閾値設定処理では、現在のフレームの画像データと１フレーム前の画像データとに基づいて、各ブロックの画像が前のフレームから変化したか否かを判別するための閾値を設定する。これらの動作により、閾値設定部１３１が実現される。詳細については、後述する。 When the acquisition of the image data for one frame is completed, the CPU 20 executes a threshold setting process (step S40). In the threshold value setting process, the CPU 20 sets a threshold value for determining whether or not the image of each block has changed from the previous frame, based on the image data of the current frame and the image data of the previous frame. By these operations, the threshold setting unit 131 is realized. Details will be described later.

ＣＰＵ２０は、閾値設定処理が完了すると、連続ノイズ除去処理を実行する（ステップＳ５０）。ＣＰＵ２０は、動き量を検出し、動きブロックを検出した後に連続ノイズを除去する。ＣＰＵ２０は、連続ノイズ除去処理では、一定時間以上継続して画像が変化しているブロックを、特定ブロックの候補から除外する。これらの動作により、動き量検出部１４０、動きブロック検出部１４０及び連続ノイズ除去部１６１が実現される。詳細については、後述する。 When the threshold setting process is completed, the CPU 20 executes a continuous noise removal process (step S50). The CPU 20 detects the amount of motion and removes continuous noise after detecting the motion block. In the continuous noise removal process, the CPU 20 excludes blocks whose images have been continuously changed for a certain time or more from specific block candidates. By these operations, the motion amount detection unit 140, the motion block detection unit 140, and the continuous noise removal unit 161 are realized. Details will be described later.

ＣＰＵ２０は、連続ノイズ除去処理が完了すると、微少ノイズ除去処理を実行する（ステップＳ６０）。ＣＰＵ２０は、微少ノイズ除去処理では、ジェスチャーの対象となる画像は一定以上の大きさを有するとみなして、画像が変化している他のブロックから離れて画像が変化しているブロックを、特定ブロックの候補から除外する。これらの動作により、微少ノイズ除去部１６２が実現される。詳細については、後述する。 When the continuous noise removal process is completed, the CPU 20 executes a minute noise removal process (step S60). In the fine noise removal process, the CPU 20 regards the image that is the target of the gesture as having a certain size or more, and identifies the block in which the image has changed away from other blocks in which the image has changed as a specific block. Exclude from candidates. By these operations, the minute noise removing unit 162 is realized. Details will be described later.

ＣＰＵ２０は、微少ノイズ除去処理が完了すると、パターン検出処理を実行する（ステップＳ７０）。ＣＰＵ２０は、パターン検出処理では、検出された画像の変化を、あらかじめ設定されたジェスチャーの各パターンと比較し、一致する場合は一致したパターンに対応するコマンドを実行する。これらの動作により、ジェスチャー認識部１７０及びコマンド処理部１８０が実現される。詳細については、後述する。 When the fine noise removal process is completed, the CPU 20 executes a pattern detection process (step S70). In the pattern detection process, the CPU 20 compares the detected image change with each pattern of a preset gesture, and executes a command corresponding to the matched pattern if they match. With these operations, the gesture recognition unit 170 and the command processing unit 180 are realized. Details will be described later.

ジェスチャー認識装置１００は、詳細を後述する閾値設定処理（ステップＳ４０）、連続ノイズ除去処理（ステップＳ５０）、微少ノイズ除去処理（ステップＳ６０）、又は、パターン検出処理（ステップＳ７０）を実行することにより、ジェスチャーの認識精度を高めることが可能となる。 The gesture recognition apparatus 100 executes a threshold setting process (step S40), a continuous noise removal process (step S50), a minute noise removal process (step S60), or a pattern detection process (step S70), which will be described in detail later. It becomes possible to improve the recognition accuracy of gestures.

次に、図４に示すフローチャートを用いて、閾値設定処理（ステップＳ４０）の詳細を説明する。 Next, details of the threshold setting process (step S40) will be described using the flowchart shown in FIG.

まず、ＣＰＵ２０は、画素別の輝度の差分を検出する（ステップＳ４１）。具体的には、ＣＰＵ２０は、ＲＡＭ４０に記憶された現在のフレームの画像データに対応する画像を構成する各画素の輝度から、１フレーム前の画像データに対応する画像を構成する画素のうち対応する画素の輝度を減じる。各画素がＲＧＢの３原色から構成される場合は、原色毎に輝度の差分を求める。ＣＰＵ２０は、求めた輝度の差分をＲＡＭ４０に記憶する。 First, the CPU 20 detects a luminance difference for each pixel (step S41). Specifically, the CPU 20 corresponds to the pixel constituting the image corresponding to the image data of the previous frame from the luminance of each pixel constituting the image corresponding to the image data of the current frame stored in the RAM 40. Reduce pixel brightness. When each pixel is composed of three primary colors of RGB, a difference in luminance is obtained for each primary color. The CPU 20 stores the obtained luminance difference in the RAM 40.

次に、ＣＰＵ２０は、全画素について輝度の差分の検出が完了したか否かを判別する（ステップＳ４２）。ＣＰＵ２０は、全画素については輝度の差分の検出が完了していないと判別した場合（ステップＳ４２：ＮＯ）、輝度の差分の検出が完了していない画素について輝度の差分の検出を実行する（ステップＳ４１）。一方、ＣＰＵ２０は、全画素について輝度の差分の検出が完了したと判別した場合（ステップＳ４２：ＹＥＳ）、輝度の差分を自乗した値の平均値を算出する（ステップＳ４３）。 Next, the CPU 20 determines whether or not the detection of the luminance difference has been completed for all the pixels (step S42). If the CPU 20 determines that the detection of the luminance difference has not been completed for all the pixels (step S42: NO), the CPU 20 performs the luminance difference detection for the pixels for which the detection of the luminance difference has not been completed (step S42). S41). On the other hand, if the CPU 20 determines that the detection of the luminance difference has been completed for all the pixels (step S42: YES), the CPU 20 calculates an average value of the squared values of the luminance differences (step S43).

具体的には、輝度の差分の自乗の平均値算出（ステップＳ４３）では、ＣＰＵ２０は、全画素について輝度の差分（ΔＢ_１、ΔＢ_２、・・・、ΔＢ_Ｎ）を求め、各画素の輝度の差分を自乗した値（（ΔＢ_１）^２、（ΔＢ_２）^２、・・・、（ΔＢ_Ｎ）^２）を全画素分加算した値（Σ（ΔＢ_ｎ）^２）を全画素数で割ることにより、輝度の差分を自乗した値の平均値（Σ（ΔＢ_ｎ）^２／Ｎ）を算出する。なお、ｉ番目の画素の輝度をＢ_ｉ、ｉ番目の画素の輝度の差分をΔＢ_ｉ、全画素数をＮとした。各画素がＲＧＢの３原色から構成される場合は、原色毎に輝度の差分を自乗した値の平均値を算出する。 Specifically, in the average value calculation of the square of the luminance difference (step S43), the CPU 20 obtains the luminance difference (ΔB ₁ , ΔB ₂ ,..., ΔB _N ) for all the pixels, and the luminance of each pixel. The value (Σ (ΔB _n ) ² ) obtained by adding the values ((ΔB ₁ ) ² , (ΔB ₂ ) ² ,..., (ΔB _N ) ² ) for all the pixels is divided by the total number of pixels. Thus, an average value (Σ (ΔB _n ) ² / N) of values obtained by squaring the luminance difference is calculated. Note that the luminance of the i-th pixel is B _i , the luminance difference of the i-th pixel is ΔB _i , and the total number of pixels is N. When each pixel is composed of three primary colors of RGB, an average value of values obtained by squaring the luminance difference for each primary color is calculated.

次に、ＣＰＵ２０は、閾値の設定を更新する（ステップＳ４４）。具体的には、例えば、ＣＰＵ２０は、輝度の差分の自乗の平均値算出（ステップＳ４３）で算出した輝度の差分を自乗した値の平均値（Σ（ΔＢ_ｎ）^２／Ｎ）に、１ブロック当たりの画素数Ｍを乗じた値（Σ（ΔＢ_ｎ）^２／Ｎ×Ｍ）を閾値としてＲＡＭ４０に記憶する。ＣＰＵ２０は、閾値設定更新（ステップＳ４４）を完了すると、閾値設定処理（ステップＳ４０）を終了する。 Next, the CPU 20 updates the threshold setting (step S44). Specifically, for example, the CPU 20 sets one block to the average value (Σ (ΔB _n ) ² / N) of the square of the luminance difference calculated in the average square calculation of the luminance difference (step S43). A value (Σ (ΔB _n ) ² / N × M) obtained by multiplying the number of per pixel M is stored in the RAM 40 as a threshold value. CPU20 will complete | finish a threshold value setting process (step S40), if threshold value setting update (step S44) is completed.

このように、輝度の差分を自乗した値の平均値に基づいて閾値を設定すると、画面全体の動きが激しい場合は閾値が高く設定され、動きが少ない場合は閾値が低く設定される。このような閾値を用いる場合、画面全体の動きの激しさに応じて自動で最適な閾値を設定することが可能となり、誤検出を減らすことができる。 As described above, when the threshold value is set based on the average value of the squares of the luminance differences, the threshold value is set high when the movement of the entire screen is intense, and the threshold value is set low when the movement is small. When such a threshold value is used, it is possible to automatically set an optimum threshold value according to the intensity of movement of the entire screen, thereby reducing false detection.

次に、図５に示すフローチャートを用いて、連続ノイズ除去処理（ステップＳ５０）を詳細に説明する。連続ノイズ除去処理は、一定時間以上継続して画像が変化しているブロックを、ノイズにより画像が変化しているブロックとみなし、特定ブロックの候補から除外する処理である。 Next, the continuous noise removal process (step S50) will be described in detail using the flowchart shown in FIG. The continuous noise removal process is a process in which a block whose image has been changed continuously for a certain time or more is regarded as a block whose image has been changed due to noise, and is excluded from specific block candidates.

まず、ＣＰＵ２０は、ブロック毎に動き量Ｘを検出する（ステップＳ５１）。具体的には、ＣＰＵ２０は、ブロック毎に、ブロック内の各画素の輝度の差分を自乗した値の総和（Σ（ΔＢ_ｍ）^２）を動き量Ｘとして求める。 First, the CPU 20 detects the movement amount X for each block (step S51). Specifically, for each block, the CPU 20 obtains the total amount (Σ (ΔB _m ) ² ) obtained by squaring the luminance difference of each pixel in the block as the motion amount X.

次に、ＣＰＵ２０は、動き量Ｘが閾値以上であるか否かの判別を行う（ステップＳ５２）。具体的には、ＣＰＵ２０は、動き量検出（ステップＳ５１）で検出された各ブロックの動き量Ｘが、閾値設定更新（ステップＳ４４）で更新された閾値（Σ（ΔＢ_ｎ）^２／Ｎ×Ｍ）以上であるか否かを判別する。 Next, the CPU 20 determines whether or not the motion amount X is equal to or greater than a threshold value (step S52). Specifically, the CPU 20 uses the threshold (Σ (ΔB _n ) ² / N × M) in which the motion amount X of each block detected in the motion amount detection (step S51) is updated in the threshold setting update (step S44). ) It is determined whether or not it is above.

ＣＰＵ２０は、動き量Ｘが閾値以上ではないと判別した場合（ステップＳ５２：ＮＯ）、対象のブロックを動きのないブロックとみなして非動きブロックに設定する（ステップＳ５３）。具体的には、ＣＰＵ２０は、対象ブロックの属性情報を「非動きブロック」としてＲＡＭ４０に記憶する。ＣＰＵ２０は、ステップＳ５３が完了すると、検出後タイマを停止、クリアする（ステップＳ５４）。 If the CPU 20 determines that the motion amount X is not greater than or equal to the threshold (step S52: NO), the CPU 20 regards the target block as a non-motion block and sets it as a non-motion block (step S53). Specifically, the CPU 20 stores the attribute information of the target block in the RAM 40 as “non-motion block”. When step S53 is completed, the CPU 20 stops and clears the post-detection timer (step S54).

一方、ＣＰＵ２０は、動き量Ｘが閾値以上であると判別した場合（ステップＳ５２：ＹＥＳ）、対象のブロックを動きのあるブロックとみなして動きブロックに設定する（ステップＳ５３）。具体的には、ＣＰＵ２０は、対象ブロックの属性情報を「動きブロック」としてＲＡＭ４０に記憶する。 On the other hand, if the CPU 20 determines that the motion amount X is equal to or greater than the threshold (step S52: YES), the CPU 20 regards the target block as a motion block and sets it as a motion block (step S53). Specifically, the CPU 20 stores the attribute information of the target block in the RAM 40 as “motion block”.

ＣＰＵ２０は、検出後タイマが２秒以上か否かを判別し（ステップＳ５６）、検出後タイマが２秒以上である判別した場合（ステップＳ５６：ＹＥＳ）、対象のブロックをジェスチャーの認識に採用しないブロックに設定する（ステップＳ５７）。すなわち、動き量Ｘが閾値以上であるブロックであっても、継続して２秒以上、閾値以上の動き量が検出されたブロックは、ノイズによって誤って動きブロックとして検出されたブロックとして扱う。具体的には、ＣＰＵ２０は、ＲＡＭ４０に記憶された対象ブロックの属性情報を「ジェスチャーの認識に採用しないブロック」に書き換える。 The CPU 20 determines whether or not the post-detection timer is 2 seconds or more (step S56), and if it is determined that the post-detection timer is 2 seconds or more (step S56: YES), the target block is not employed for gesture recognition. A block is set (step S57). That is, even if the motion amount X is equal to or greater than the threshold value, a block in which a motion amount equal to or greater than the threshold value is continuously detected for 2 seconds or more is treated as a block that is erroneously detected as a motion block due to noise. Specifically, the CPU 20 rewrites the attribute information of the target block stored in the RAM 40 to “a block not adopted for gesture recognition”.

ＣＰＵ２０は、検出後タイマが２秒以上ではないと判別した場合（ステップＳ５６：ＮＯ）、又は、非動きブロックへの設定（ステップＳ５７）が完了した後、停止後タイマが停止中か否かを判別する（ステップＳ５８）。ＣＰＵ２０は、検出後タイマが停止中であると判別した場合（ステップＳ５８：ＹＥＳ）、検出後タイマを起動する（ステップＳ５９）。 When the CPU 20 determines that the timer after detection is not 2 seconds or more (step S56: NO), or after the setting to the non-motion block (step S57) is completed, the CPU 20 determines whether or not the timer is stopped. A determination is made (step S58). When determining that the post-detection timer is stopped (step S58: YES), the CPU 20 starts the post-detection timer (step S59).

ＣＰＵ２０は、検出後タイマが停止中ではないと判別した場合（ステップＳ５８：ＮＯ）、又は、検出後タイマの起動（ステップＳ５９）、又は、検出後タイマの停止、クリア（ステップＳ５４）、が完了した後、全ブロック分の処理が完了しているかを判別する（ステップＳ５１０）。 When the CPU 20 determines that the post-detection timer is not stopped (step S58: NO), the post-detection timer is started (step S59), or the post-detection timer is stopped and cleared (step S54). After that, it is determined whether or not the processing for all the blocks has been completed (step S510).

ＣＰＵ２０は、全ブロック分の設定が完了していると判別した場合（ステップＳ５１０：ＹＥＳ）、連続ノイズ除去処理を終了する。一方、ＣＰＵ２０は、全ブロック分の設定が完了していないと判別した場合（ステップＳ５１０：ＮＯ）、設定が完了していないブロックの動き量検出を行う（ステップＳ５１）。 If the CPU 20 determines that the settings for all blocks have been completed (step S510: YES), the CPU 20 ends the continuous noise removal process. On the other hand, if the CPU 20 determines that the setting for all blocks has not been completed (step S510: NO), the CPU 20 detects the motion amount of the block for which setting has not been completed (step S51).

以上のように、連続ノイズ除去処理では、各ブロックが、動きブロック、非動きブロック、又は、ジェスチャーの認識に採用しないブロック、のうちいずれのブロックに該当するかを設定する。 As described above, in the continuous noise removal process, it is set which block corresponds to each block among a motion block, a non-motion block, or a block that is not used for gesture recognition.

通常、テレビジョン等に投影されている動きそのものをジェスチャーとして認識しようとすることは稀である。従来のジェスチャー認識装置では、テレビジョン等とジェスチャーを行う者とを含む領域を撮像した画像データから、ジェスチャーを認識しようとする場合、テレビジョン等に投影されている動きをジェスチャーとして誤って認識してしまう可能性がある。 Usually, it is rare to try to recognize the movement itself projected on a television or the like as a gesture. In a conventional gesture recognition device, when a gesture is to be recognized from image data obtained by imaging an area including a television or the like and a person performing the gesture, the movement projected on the television or the like is erroneously recognized as a gesture. There is a possibility that.

しかしながら、本発明の実施の形態に係るジェスチャー認識装置１００では、一定時間以上連続して動きブロックとして検出されたブロックは、ジェスチャーの認識に採用するブロックから除外される。このため、ジェスチャーの誤検出を減らすことができる。 However, in gesture recognition apparatus 100 according to the embodiment of the present invention, blocks detected as motion blocks continuously for a certain time or longer are excluded from blocks used for gesture recognition. For this reason, erroneous detection of gestures can be reduced.

また、本発明の実施の形態に係るジェスチャー認識装置１００では、各ブロックの動き量Ｘを、各画素の輝度の差分を自乗した値の総和（Σ（ΔＢ_ｍ）^２）としている。このため、特許文献１に記載された装置のように、各ブロックの動き量Ｘを、各画素の輝度の差分の総和（Σ｜ΔＢ_ｍ｜）とする場合に比べ、より俊敏な動きの認識が可能となり、ジェスチャーの認識精度の向上が期待できる。以下に理由を説明する。 In gesture recognition apparatus 100 according to the embodiment of the present invention, motion amount X of each block is the sum of values obtained by squaring the luminance difference of each pixel (Σ (ΔB _m ) ² ). Therefore, as in the apparatus described in Patent Document 1, the motion amount X of each block is more agile than the case where the sum of the luminance differences of each pixel (Σ | ΔB _m |) is used. It is possible to improve gesture recognition accuracy. The reason will be described below.

ジェスチャーによって画像が変化する場合、輝度の差分が非常に大きな少数の画素が発生すると予想される。一方、照明装置の光量の変化や、撮像装置のブレなどジェスチャーとは無関係なものによって画像の変化する場合、輝度の差分が比較的小さい多数の画素が発生すると予想される。 When an image changes due to a gesture, a small number of pixels with a very large difference in luminance are expected to occur. On the other hand, when the image changes due to a change in the amount of light of the lighting device or a motion such as blurring of the imaging device, it is expected that a large number of pixels with a relatively small difference in luminance will occur.

ここで、ブロック内の各画素の輝度の差分を自乗した値の総和を動き量とする場合、輝度の差分が大きい画素が少しでも存在すれば動き量は比較的大きな値となり、輝度の差分が小さい画素が多く存在しても動き量は比較的小さな値となる。 Here, when the sum of values obtained by squaring the luminance difference of each pixel in the block is used as the amount of motion, the amount of motion becomes a relatively large value if there is any pixel with a large luminance difference, and the luminance difference is Even if there are many small pixels, the amount of motion is a relatively small value.

このため、ブロック内の各画素の輝度の差分を自乗した値の総和を動き量とすることにより、ジェスチャーに起因する画像の変化分を動き量の主成分として検出することが可能となり、ジェスチャーの認識精度の向上が期待できる。 For this reason, it is possible to detect the change in the image caused by the gesture as the main component of the motion amount by using the sum of the squares of the luminance differences of the pixels in the block as the motion amount. Improvement in recognition accuracy can be expected.

次に、図６と図７を用いて、微少ノイズ除去処理（ステップＳ６０）を詳細に説明する。図６は、微少ノイズの除去処理を示すフローチャートであり、図７は、微少ノイズの除去処理を説明するための図である。微少ノイズ除去処理は、ジェスチャーの対象となる画像は一定以上の大きさを有するとみなして、画像が変化している他のブロックから離れて画像が変化しているブロックを、ノイズにより画像が変化しているブロックとみなして特定ブロックの候補から除外する処理である。 Next, the minute noise removal process (step S60) will be described in detail with reference to FIGS. FIG. 6 is a flowchart showing a minute noise removal process, and FIG. 7 is a diagram for explaining the minute noise removal process. In the fine noise removal process, the image subject to gesture is considered to have a certain size or more, and the image changes due to noise in the block where the image changes away from other blocks where the image changes. This is a process of considering a block being excluded and excluding it from a specific block candidate.

図７には、横軸をＸ軸、縦軸をＹ軸として、２００ブロック分（２０（Ｘ軸方向）×１０（Ｙ軸方向））の画像が示されている。なお、図７に示す画像は、プロジェクタ３００からスクリーン４００に投影された富士山と雲と飛行機の画像と、スクリーン４００の右側から被撮像者５００の手とを含む領域をカメラ２００で撮像した画像である。 FIG. 7 shows an image of 200 blocks (20 (X-axis direction) × 10 (Y-axis direction)) with the horizontal axis as the X-axis and the vertical axis as the Y-axis. The image shown in FIG. 7 is an image obtained by imaging the area including the image of Mt. Fuji, clouds, and an airplane projected from the projector 300 onto the screen 400 and the hand of the person 500 to be imaged from the right side of the screen 400. is there.

また、図７に示す斜線が引かれたブロックは、動きブロックに設定された動きブロックである。プロジェクタ３００から投影される画像にほとんど変化がない場合、動きブロックとして検出されるブロックの多くは、被撮像者５００の手の部分に該当するブロックである。 Also, the shaded blocks shown in FIG. 7 are motion blocks set as motion blocks. When there is almost no change in the image projected from the projector 300, most of the blocks detected as motion blocks are blocks corresponding to the hand portion of the person 500 to be imaged.

まず、ＣＰＵ２０は、ＲＡＭ４０に記憶されたブロックの属性情報から、列毎に動きブロック数をカウントし、各列の動きブロック数を求める（ステップＳ６１）。 First, the CPU 20 counts the number of motion blocks for each column from the block attribute information stored in the RAM 40, and obtains the number of motion blocks in each column (step S61).

同様にして、ＣＰＵ２０は、行毎に動きブロック数をカウントし、各行の動きブロック数を求める（ステップＳ６２）。 Similarly, the CPU 20 counts the number of motion blocks for each row, and obtains the number of motion blocks for each row (step S62).

次に、ＣＰＵ２０は、ステップＳ６１で求めた各列の動きブロック数に基づいて、動きブロック数の多い列の範囲を設定する（ステップＳ６３）。 Next, the CPU 20 sets a range of columns having a large number of motion blocks based on the number of motion blocks of each column obtained in step S61 (step S63).

動きブロック数の多い列の範囲を設定する方法には様々な方法が考えられる。例えば、動きブロック数が１以上の列が２列以上連続する列の範囲を設定する。図７では、第１２列〜第２０列の範囲で連続して動きブロック数が１以上であるので、ＣＰＵ２０は、第１２列〜第２０列の範囲を動きブロック数の多い列の範囲として設定する。 Various methods are conceivable for setting the range of a column having a large number of motion blocks. For example, a range of columns in which two or more columns each having one or more motion blocks are consecutive is set. In FIG. 7, since the number of motion blocks is continuously 1 or more in the range from the 12th column to the 20th column, the CPU 20 sets the range from the 12th column to the 20th column as the range of columns having a large number of motion blocks. To do.

続いて、ＣＰＵ２０は、ステップＳ６２で求めた各行の動きブロック数に基づいて、動きブロック数の多い行の範囲を設定する（ステップＳ６４）。ＣＰＵ２０は、ステップＳ６３と同様の基準により、第６行〜第８行の範囲を動きブロック数の多い行の範囲として設定する。 Subsequently, the CPU 20 sets a range of rows having a large number of motion blocks based on the number of motion blocks of each row obtained in step S62 (step S64). The CPU 20 sets the range of the sixth row to the eighth row as a range of rows with a large number of motion blocks based on the same criteria as in step S63.

次に、ＣＰＵ２０は、動きブロック数の多い矩形範囲を設定する（ステップＳ６５）。ＣＰＵ２０は、ステップＳ６３で設定された動きブロック数の多い列の範囲と、ステップＳ６４で設定された動きブロック数の多い行の範囲との双方に属する範囲を、動きブロック数の多い矩形範囲として設定する。図７において、太線で囲まれた範囲が動きブロック数の多い矩形範囲となる。 Next, the CPU 20 sets a rectangular range with a large number of motion blocks (step S65). The CPU 20 sets the range belonging to both the column range having a large number of motion blocks set in step S63 and the row range having a large number of motion blocks set in step S64 as a rectangular range having a large number of motion blocks. To do. In FIG. 7, a range surrounded by a thick line is a rectangular range having a large number of motion blocks.

ＣＰＵ２０は、動きブロック数の多い矩形範囲の設定（ステップＳ６５）を完了すると、各ブロックが設定された矩形範囲内であるか否かを判別する（ステップＳ６６）。ＣＰＵ２０は、各ブロックが設定された矩形範囲内であると判別した場合（ステップＳ６６：ＹＥＳ）、全ブロック分の判別が完了したか否かを判別する（ステップＳ６８）。 When completing the setting of the rectangular range having a large number of motion blocks (step S65), the CPU 20 determines whether or not each block is within the set rectangular range (step S66). When determining that each block is within the set rectangular range (step S66: YES), the CPU 20 determines whether or not the determination for all the blocks has been completed (step S68).

一方、ＣＰＵ２０は、各ブロックが設定された矩形範囲内ではないと判別した場合（ステップＳ６６：ＮＯ）、対象ブロックをジェスチャーの認識に採用しないブロックに設定（ステップＳ６７）する。具体的には、ＣＰＵ２０は、ＲＡＭ４０に記憶された対象ブロックの属性情報を、「ジェスチャーの認識に採用しないブロック」に書き換える。ＣＰＵ２０は、ステップＳ６７を完了すると、全ブロック分の判別が完了したか否かを判別する（ステップＳ６８）。 On the other hand, if the CPU 20 determines that each block is not within the set rectangular range (step S66: NO), the CPU 20 sets the target block as a block that is not adopted for gesture recognition (step S67). Specifically, the CPU 20 rewrites the attribute information of the target block stored in the RAM 40 to “a block not adopted for gesture recognition”. When completing step S67, the CPU 20 determines whether or not the determination for all blocks is completed (step S68).

ＣＰＵ２０は、全ブロック分の判別が完了したと判別すると（ステップＳ６８：ＹＥＳ）、微少ノイズ除去処理を終了する。一方、ＣＰＵ２０は、全ブロック分の判別が完了していないと判別すると（ステップＳ６８：ＮＯ）、判別が完了していないブロックについて矩形範囲内か否かを判別する（ステップＳ６６）。 If the CPU 20 determines that the determination for all blocks has been completed (step S68: YES), it ends the minute noise removal process. On the other hand, if the CPU 20 determines that the determination for all blocks has not been completed (step S68: NO), the CPU 20 determines whether or not the block for which determination has not been completed is within the rectangular range (step S66).

本発明の実施の形態に係るジェスチャー認識装置１００では、動きブロック数の多い矩形範囲内にないブロックは、ジェスチャーの認識に採用するブロックから除外される。このため、ジェスチャーとは無関係の領域であって、ノイズにより輝度の変化が大きい領域を、ジェスチャー認識の対象から外すことができ、ジェスチャーの誤検出を減らすことができる。 In gesture recognition apparatus 100 according to the embodiment of the present invention, blocks that are not within a rectangular range with a large number of motion blocks are excluded from the blocks used for gesture recognition. For this reason, it is possible to exclude a region that is unrelated to the gesture and has a large change in luminance due to noise from the target of gesture recognition, thereby reducing erroneous detection of the gesture.

次に、図８に示すフローチャートを用いて、パターン検出処理（ステップＳ７０）を詳細に説明する。 Next, the pattern detection process (step S70) will be described in detail using the flowchart shown in FIG.

まず、ＣＰＵ２０は、動きブロックが有るか否かを判別する（ステップＳ７１）。ＣＰＵ２０は、いずれかのブロックが動きブロックであると判別すると（ステップＳ７１：ＹＥＳ）、中心位置検出を行う（ステップＳ７２）。一方、ＣＰＵ２０は、全てのブロックが動きブロックではないと判別すると（ステップＳ７１：ＮＯ）、停止後タイマが１秒以上か否かを判別する（ステップＳ７５）。 First, the CPU 20 determines whether or not there is a motion block (step S71). When determining that any of the blocks is a motion block (step S71: YES), the CPU 20 performs center position detection (step S72). On the other hand, when determining that all the blocks are not motion blocks (step S71: NO), the CPU 20 determines whether the post-stop timer is 1 second or longer (step S75).

中心位置検出（ステップＳ７２）では、ＣＰＵ２０は、動きブロックの中心位置を検出する。図９Ａを用いて、動きブロックの中心位置の検出方法について詳細に説明する。 In the center position detection (step S72), the CPU 20 detects the center position of the motion block. The method for detecting the center position of the motion block will be described in detail with reference to FIG. 9A.

図９Ａには、横軸をＸ軸、縦軸をＹ軸として、８０ブロック分（１０（Ｘ軸方向）×８（Ｙ軸方向））の画像が示されている。図９Ａに示す斜線が引かれたブロックは、動きブロックである。 FIG. 9A shows an image of 80 blocks (10 (X axis direction) × 8 (Y axis direction)) with the horizontal axis as the X axis and the vertical axis as the Y axis. The shaded block shown in FIG. 9A is a motion block.

図９Ａに示す画像では、動きブロックは９個存在する。ＣＰＵ２０は、９つのブロックのＸ座標の総和を９で除算することにより中心位置のＸ座標を求め、９つのブロックのＹ座標の総和を９で除算することにより中心位置のＹ座標を求める。 In the image shown in FIG. 9A, there are nine motion blocks. The CPU 20 obtains the X coordinate of the center position by dividing the sum of the X coordinates of the nine blocks by 9, and obtains the Y coordinate of the center position by dividing the sum of the Y coordinates of the nine blocks by 9.

ただし、理解を容易にするため、中心位置の座標に最も近いブロックの座標を中心位置の座標として近似した場合を例にとり以下に説明する。図９Ａにおいて、太線の“○”が付されたブロックの座標（Ｘ座標＝９、Ｙ座標＝７）が中心位置の座標となる。ＣＰＵ２０は、求めた中心位置の座標をＲＡＭ４０に記憶する。 However, in order to facilitate understanding, a case where the coordinates of the block closest to the coordinates of the center position are approximated as the coordinates of the center position will be described below as an example. In FIG. 9A, the coordinates of the block (X coordinate = 9, Y coordinate = 7) marked with a bold line “◯” are the coordinates of the center position. The CPU 20 stores the obtained coordinates of the center position in the RAM 40.

ＣＰＵ２０は、中心位置検出（ステップＳ７２）が完了すると、先端位置検出（ステップＳ７３）を実行する。ＣＰＵ２０は、先端位置検出では、中心位置検出（ステップＳ７２）で求めた中心位置からジェスチャーの動きの方向を求め、さらに中心位置を基準としてジェスチャーの動きの方向の最も離れた位置に存在する動きブロックを先端位置として検出する。 CPU20 will perform front-end | tip position detection (step S73), if center position detection (step S72) is completed. In the tip position detection, the CPU 20 obtains the direction of the gesture movement from the center position obtained in the center position detection (step S72), and further, the motion block existing at the most distant position in the direction of the gesture movement based on the center position. Is detected as the tip position.

図９Ｂを用いて、動きブロックの先端位置の検出方法について詳細に説明する。なお、図９Ｂに示す画像は、図９Ａに示す画像の次のフレームの画像である。 A method for detecting the tip position of the motion block will be described in detail with reference to FIG. 9B. Note that the image shown in FIG. 9B is an image of the next frame of the image shown in FIG. 9A.

図９Ｂに示す画像では、動きブロックは１９個存在する。上述した方法で中心位置の座標を求めると、太線の“○”が付されたブロックの座標（Ｘ座標＝７、Ｙ座標＝６）が中心位置の座標となる。なお、図９Ｂにおいて、破線の“○”が付されたブロックの座標（Ｘ座標＝９、Ｙ座標＝７）が１フレーム前の中心位置の座標である。 In the image shown in FIG. 9B, there are 19 motion blocks. When the coordinates of the center position are obtained by the method described above, the coordinates of the block (X coordinate = 7, Y coordinate = 6) marked with a bold line “◯” become the coordinates of the center position. In FIG. 9B, the coordinates (X coordinate = 9, Y coordinate = 7) of the block marked with a broken line “◯” are the coordinates of the center position one frame before.

ここで、１フレーム前の中心位置の座標から本フレームの中心位置の座標までを結んだときにできる差分ベクトルで示される方向をジェスチャーの対象物が進行する方向と考える。そして、差分ベクトルの延長線上の動きブロックであって、中心位置から最も離れた位置にあるブロックの座標を先端位置の座標とする。 Here, the direction indicated by the difference vector formed when connecting the coordinates of the center position of the previous frame to the coordinates of the center position of the main frame is considered as the direction in which the object of the gesture proceeds. Then, the coordinates of the motion block on the extension line of the difference vector and located farthest from the center position are set as the coordinates of the tip position.

図９Ｂにおいて、太線の“△”が付されたブロックの座標（Ｘ座標＝３、Ｙ座標＝４）が先端位置の座標となる。ＣＰＵ２０は、求めた先端位置の座標をＲＡＭ４０に記憶する。 In FIG. 9B, the coordinates of the block (X coordinate = 3, Y coordinate = 4) marked with a thick line “Δ” are the coordinates of the tip position. The CPU 20 stores the obtained coordinates of the tip position in the RAM 40.

ＣＰＵ２０は、先端位置検出（ステップＳ７３）が完了すると、停止後タイマを再起動する（ステップＳ７４）。具体的には、ＣＰＵ２０は、停止後タイマの値をクリアした上で停止後タイマを起動する。 When the tip position detection (step S73) is completed, the CPU 20 restarts the timer after stopping (step S74). Specifically, the CPU 20 clears the timer value after stopping and starts the timer after stopping.

ＣＰＵ２０は、停止後タイマ再起動（ステップＳ７４）を完了、又は、動きブロック有り？（ステップＳ７１）でＮＯと判別すると、停止後タイマの値が1秒以上であるか否かを判別する（ステップＳ７５）。ＣＰＵ２０は、停止後タイマの値が1秒以上であると判別すると（ステップＳ７５：ＹＥＳ）、軌跡検出を行う（ステップＳ７６）。一方、ＣＰＵ２０は、停止後タイマの値が1秒以上ではないと判別すると（ステップＳ７５：ＮＯ）、パターン検出処理（ステップＳ７０）を終了する。なお、ステップＳ７５は、動きブロックが検出されなくなってから１秒経過する前は、コマンドを発行しないようにするための処理である。 CPU 20 completes timer restart after stop (step S74) or is there a motion block? If it is determined as NO in (Step S71), it is determined whether or not the value of the post-stop timer is 1 second or more (Step S75). When the CPU 20 determines that the value of the timer after the stop is 1 second or more (step S75: YES), the CPU 20 performs locus detection (step S76). On the other hand, when the CPU 20 determines that the value of the timer after stop is not 1 second or more (step S75: NO), the pattern detection process (step S70) is terminated. Step S75 is a process for preventing the command from being issued before 1 second has elapsed since the motion block is no longer detected.

軌跡検出（ステップＳ７６）では、ＣＰＵ２０は、中心位置検出（ステップＳ７２）においてＲＡＭ４０に記憶された中心位置の座標に基づいて、中心位置の軌跡を検出する。また、ＣＰＵ２０は、先端位置検出（ステップＳ７３）においてＲＡＭ４０に記憶された先端位置の座標に基づいて、先端位置の軌跡を検出する。 In the locus detection (step S76), the CPU 20 detects the locus of the center position based on the coordinates of the center position stored in the RAM 40 in the center position detection (step S72). Further, the CPU 20 detects the locus of the tip position based on the coordinates of the tip position stored in the RAM 40 in the tip position detection (step S73).

ＣＰＵ２０は、軌跡検出（ステップＳ７６）が完了すると、ジェスチャーのパターンが所定のパターンとマッチングしているかを判別する（ステップＳ７７）。ＣＰＵ２０は、軌跡検出（ステップＳ７６）で検出した中心位置の軌跡と、先端位置の軌跡と、あらかじめハードディスク６０に記憶されているジェスチャーのパターンを示すデータとに基づいて、パターンのマッチングを行う。 When the locus detection (step S76) is completed, the CPU 20 determines whether or not the gesture pattern matches a predetermined pattern (step S77). The CPU 20 performs pattern matching based on the locus of the center position detected in the locus detection (step S76), the locus of the tip position, and the data indicating the gesture pattern stored in the hard disk 60 in advance.

図１０に、ジェスチャーのパターンの一例を示す。「パターン」は、ジェスチャーのパターンの通し番号を示す。「中心位置の軌跡」は、軌跡検出（ステップＳ７６）で求めた中心位置の軌跡を示す。「移動距離」は、中心位置の移動距離を示す。「移動時間」は、動きブロック有り（ステップＳ７１）において、ＹＥＳと判別されてから、ＮＯと判別されるまでの時間、すなわち、ジェスチャーの候補としての動きが認識されてからジェスチャーの候補としての動きがなくなるまでの時間を示す。 FIG. 10 shows an example of a gesture pattern. “Pattern” indicates a serial number of a gesture pattern. “The locus of the center position” indicates the locus of the center position obtained in the locus detection (step S76). “Movement distance” indicates the movement distance of the center position. “Movement time” is the time from when YES is determined to when NO is determined in the presence of a motion block (step S71), that is, the motion as a gesture candidate after the motion as a gesture candidate is recognized. Indicates the time until disappears.

「コマンドの表示」は、パターンがマッチングしたとき、すなわち、上述の「中心位置の軌跡」、「移動距離」及び「移動時間」の全ての条件を満たしたときに実行するコマンド処理の内容を示す。 “Command display” indicates the contents of command processing executed when a pattern is matched, that is, when all of the above-mentioned conditions of “center position locus”, “movement distance”, and “movement time” are satisfied. .

ＣＰＵ２０は、検出したジェスチャーのパターンがハードディスク６０に記憶されたパターンを示すデータのうちいずれかのパターンとマッチングしたと判別すると（ステップＳ７７：ＹＥＳ）、コマンド処理を行う（ステップＳ７８）。一方、ＣＰＵ２０は、検出したジェスチャーのパターンがいずれのパターンともマッチングしていないと判別すると（ステップＳ７７：ＮＯ）、パターン検出初期化（ステップＳ７９）を行う。 If the CPU 20 determines that the detected gesture pattern matches any of the patterns indicating the patterns stored in the hard disk 60 (step S77: YES), the CPU 20 performs command processing (step S78). On the other hand, if the CPU 20 determines that the detected gesture pattern does not match any pattern (step S77: NO), the CPU 20 performs pattern detection initialization (step S79).

ＣＰＵ２０は、コマンド処理（ステップＳ７８）が完了、又は、検出したジェスチャーのパターンがいずれのパターンともマッチングしていないと判別すると（ステップＳ７７：ＮＯ）、パターン検出初期化（ステップＳ７９）を行う。パターン検出初期化（ステップＳ７９）では、ＣＰＵ２０は、パターン検出処理（ステップＳ７０）で使用した各種変数の初期化を行う。具体的には、ＲＡＭ４０に記憶されている中心位置、先端位置の履歴を消去する。 If the CPU 20 determines that the command processing (step S78) is completed or that the detected gesture pattern does not match any pattern (step S77: NO), the CPU 20 performs pattern detection initialization (step S79). In pattern detection initialization (step S79), the CPU 20 initializes various variables used in the pattern detection process (step S70). Specifically, the history of the center position and the tip position stored in the RAM 40 is deleted.

ＣＰＵ２０は、パターン検出初期化（ステップＳ７９）を完了すると、停止後タイマの停止し、クリアする（ステップＳ７１０）。ＣＰＵ２０は、停止後タイマの停止、クリア（ステップＳ７１０）を完了すると、パターン検出処理（ストップＳ７０）を終了し、フレームの画像取得（ステップＳ３０）に処理を戻す。 After completing the pattern detection initialization (step S79), the CPU 20 stops and clears the timer after the stop (step S710). When the CPU 20 completes stopping and clearing the timer after stopping (step S710), the CPU 20 ends the pattern detection process (stop S70), and returns the process to frame image acquisition (step S30).

ここで、図１１Ａ〜図１１Ｆ及び図１２Ａ〜図１２Ｄを用いて、パターンマッチング（ステップＳ７７）とコマンド処理（ステップＳ７８）の例について詳細に説明する。 Here, an example of pattern matching (step S77) and command processing (step S78) will be described in detail with reference to FIGS. 11A to 11F and FIGS. 12A to 12D.

始めに、ジェスチャーの「パターン」が「１」である場合の、パターンマッチングとコマンド処理について説明する。 First, pattern matching and command processing when the “pattern” of the gesture is “1” will be described.

図１１Ａは、動きブロックの中心位置が右下の特定領域内で検出された直後の画像を示す図である。なお、図１１Ａ〜図１１Ｆにおいて、四隅に示した太線で囲まれた３×３ブロックの領域は特定領域である。また、太線の“○”が付されたブロックの座標が動きブロックの中心位置の座標である。図１１Ａに示すように、被撮像者５００の手が画角外から画角内の右下の領域に進入すると、動きブロックの中心位置が右下の特定領域内に検出される。 FIG. 11A is a diagram illustrating an image immediately after the center position of the motion block is detected in the lower right specific region. In FIGS. 11A to 11F, the 3 × 3 block area surrounded by the bold lines shown at the four corners is a specific area. Also, the coordinates of the block with the bold line “◯” are the coordinates of the center position of the motion block. As shown in FIG. 11A, when the hand of the person to be imaged 500 enters the lower right region within the angle of view from the outside of the angle of view, the center position of the motion block is detected in the lower right specific region.

図１１Ｂは、動きブロックの中心位置が右下の特定領域内に検出されているときの画像を示す図である。図１１Ｂに示すように、被撮像者５００の手が画角内の右下の領域で移動している間は、動きブロックの中心位置は右下の特定領域内で検出され続ける。 FIG. 11B is a diagram illustrating an image when the center position of the motion block is detected in the lower right specific region. As shown in FIG. 11B, while the hand of the person 500 to be imaged is moving in the lower right area within the angle of view, the center position of the motion block is continuously detected in the lower right specific area.

図１１Ｃは、動きブロックの中心位置が右下の特定領域内から検出されなくなった直後の画像を示す図である。図１１Ｃにおいて、破線の“○”が付されたブロックの座標が最後に検出された中心位置の座標である。図１１Ｃに示すように、被撮像者５００の手が画角内の右下の領域から画角外に外れると、動きブロックの中心位置が右下の特定領域内から検出されなくなる。 FIG. 11C is a diagram illustrating an image immediately after the center position of the motion block is no longer detected from within the lower right specific region. In FIG. 11C, the coordinates of the block with the broken line “◯” are the coordinates of the center position detected last. As illustrated in FIG. 11C, when the hand of the person to be imaged 500 moves out of the angle of view from the lower right region within the angle of view, the center position of the motion block is not detected from within the lower right specific region.

動きブロックの中心位置が検出されなくなってから１秒が経過し、停止後タイマが１秒以上になると、ＣＰＵ２０は、軌跡検出（ステップＳ７６）後、パターンのマッチングを行う（ステップＳ７７）。前述のように、中心位置の軌跡は画角外から画角の右下の特定領域に進入した後に画角外に外れる軌跡を示しているため、パターン１の「中心位置の軌跡」の条件を満たす。 When one second elapses after the center position of the motion block is no longer detected and the post-stop timer reaches 1 second or longer, the CPU 20 performs pattern matching after locus detection (step S76) (step S77). As described above, the trajectory of the center position indicates a trajectory that deviates from the angle of view after entering the specific area at the lower right of the angle of view from the outside of the angle of view. Fulfill.

ただし、「移動時間」、ここでは、動きブロックの中心位置が画角の右下の特定領域内に滞在した時間によって、ＣＰＵ２０が実行するコマンドの内容が異なる。なお、ＣＰＵ２０は、フレーム画像取得（ステップＳ３０）においてＲＡＭ４０に記憶した画像データの取得時刻をもとに移動時間を求める。 However, the content of the command executed by the CPU 20 differs depending on the “movement time”, here, the time at which the center position of the motion block stays in the specific area at the lower right of the angle of view. Note that the CPU 20 obtains the movement time based on the acquisition time of the image data stored in the RAM 40 in the frame image acquisition (step S30).

「移動時間」が２秒以上である場合は、ＣＰＵ２０は、画像を切り替えない。具体的には、ＣＰＵ２０は、図１１Ｄに示すように、それまで表示していたページの画像を、スクリーン４００に投影させ続ける。 When the “movement time” is 2 seconds or more, the CPU 20 does not switch the image. Specifically, as illustrated in FIG. 11D, the CPU 20 continues to project the image of the page that has been displayed so far onto the screen 400.

「移動時間」が１秒以上２秒未満である場合は、ＣＰＵ２０は、次ページの画像を表示する。具体的には、ＣＰＵ２０は、ハードディスク６０に記憶されている次ページの画像の画像データを、出力バッファ８０に転送することにより、プロジェクタ３００がスクリーン４００に投影する画像を、図１１Ｅに示すような次ページの画像に切り替える。 When the “movement time” is 1 second or more and less than 2 seconds, the CPU 20 displays the image of the next page. Specifically, the CPU 20 transfers the image data of the image of the next page stored in the hard disk 60 to the output buffer 80, whereby the image projected on the screen 400 by the projector 300 is as shown in FIG. 11E. Switch to the next page image.

「移動時間」が１秒未満である場合は、ＣＰＵ２０は、次々ページの画像を表示する。具体的には、ＣＰＵ２０は、次ページの画像とは異なる次々ページの画像の画像データを、ハードディスク６０から出力バッファ８０に転送することにより、プロジェクタ３００がスクリーン４００に投影する画像を、図１１Ｆに示すような次々ページの画像に切り替える。 When the “movement time” is less than 1 second, the CPU 20 displays the image of the page one after another. Specifically, the CPU 20 transfers the image data of the image of the next page different from the image of the next page from the hard disk 60 to the output buffer 80, whereby the image projected on the screen 400 by the projector 300 is shown in FIG. 11F. Switch to the next page image as shown.

上述のように、本発明の実施の形態に係るジェスチャー認識装置１００は、ＣＰＵ２０は、タイマカウンタ９０がカウントアップする標準タイマを用いて、画像データを取得した時刻をＲＡＭ４０に記憶する。このため、ジェスチャーの速度を正確に測定することができる。従って、中心位置の軌跡や移動距離が同じ場合であっても、ジェスチャーの速度に応じて異なるコマンドを用意することができる。 As described above, in gesture recognition apparatus 100 according to the embodiment of the present invention, CPU 20 stores the time at which image data is acquired in RAM 40 using the standard timer that timer counter 90 counts up. For this reason, the speed of the gesture can be accurately measured. Therefore, even when the locus of the center position and the movement distance are the same, different commands can be prepared according to the gesture speed.

また、ジェスチャーの中心位置を検出することで、ジェスチャー動作を容易に捉えることができる。 Further, by detecting the center position of the gesture, it is possible to easily grasp the gesture operation.

次に、ジェスチャーのパターンが「パターン５」である場合の、パターンマッチングとコマンド処理について説明する。 Next, pattern matching and command processing when the gesture pattern is “pattern 5” will be described.

図１２Ａは、動きブロックの中心位置が検出された直後の画像を示す図である。図１２Ａ、図１２Ｂにおいて、太線の“○”が付されたブロックの座標が動きブロックの中心位置の座標である。図１２Ａに示すように、ジェスチャーの対象物が画角外から画角内に進入すると、動きブロックの中心位置が検出される。 FIG. 12A is a diagram illustrating an image immediately after the center position of the motion block is detected. In FIG. 12A and FIG. 12B, the coordinates of the block marked with a bold line “◯” are the coordinates of the center position of the motion block. As shown in FIG. 12A, when the object of the gesture enters the angle of view from outside the angle of view, the center position of the motion block is detected.

図１２Ｂは、動きブロックの先端位置が最も左側になったときの画像を示す図である。図１２Ｂにおいて、太線の“△”が付されたブロックの座標が動きブロックの先端位置の座標である。図１２Ｂに示すように、被撮像者５００の手が画角内で最も左側の位置に移動したとき、動きブロックの先端位置が最も左側になる。 FIG. 12B is a diagram illustrating an image when the tip position of the motion block is on the leftmost side. In FIG. 12B, the coordinates of the block marked with a thick line “Δ” are the coordinates of the tip position of the motion block. As shown in FIG. 12B, when the hand of the person to be imaged 500 moves to the leftmost position within the angle of view, the tip position of the motion block is on the leftmost side.

図１２Ｃは、動きブロックの中心位置が検出されなくなった直後の画像を示す図である。図１２Ｃにおいて、破線の“○”が付されたブロックの座標が最後に検出されたの中心位置の座標であり、破線の“△”が付されたブロックの座標が先端位置が最も左側になったときの先端位置の座標（以下、先端位置のピーク座標とする。）である。なお、図１２Ｃに示すように、画角内に被撮像者５００の手が残っていても、手が停止している場合は、手が画角外に外れたときと同じように、動きブロックの中心位置が検出されなくなる。 FIG. 12C is a diagram illustrating an image immediately after the center position of the motion block is no longer detected. In FIG. 12C, the coordinates of the block with the broken line “◯” are the coordinates of the center position that was detected last, and the coordinates of the block with the broken line “Δ” are the leftmost position. The coordinates of the tip position (hereinafter referred to as the peak coordinates of the tip position). Note that, as shown in FIG. 12C, when the hand of the person being imaged 500 remains within the angle of view, when the hand is stopped, the motion block is the same as when the hand is out of the angle of view. No center position is detected.

動きブロックの中心位置が検出されなくなってから１秒が経過し、停止後タイマが１秒以上になると、ＣＰＵ２０は、軌跡検出（ステップＳ７６）後、パターンのマッチングを行う（ステップＳ７７）。前述のように、「中心位置の軌跡」は右→左→右であり、「中心位置の移動距離」は画角の１／２０以上であるため、パターン５の条件を満たす。 When one second elapses after the center position of the motion block is no longer detected and the post-stop timer reaches 1 second or longer, the CPU 20 performs pattern matching after locus detection (step S76) (step S77). As described above, “the locus of the center position” is right → left → right, and “the movement distance of the center position” is 1/20 or more of the angle of view, so the condition of the pattern 5 is satisfied.

ここで、「移動時間」が３秒未満である場合は、ＣＰＵ２０は、現在表示している画像を拡大表示する。具体的には、ＣＰＵ２０は、ハードディスク６０に記憶されている現在表示している画像に対応する画像データをもとに、先端位置のピーク座標を中心にして拡大した画像に対応する画像データを作成する。そして、ＣＰＵ２０は、作成した画像データを、出力バッファ８０に転送することにより、プロジェクタ３００がスクリーン４００に投影する画像を、図１２Ｄに示すような拡大画像に切り替える。 Here, when the “movement time” is less than 3 seconds, the CPU 20 enlarges and displays the currently displayed image. Specifically, the CPU 20 creates image data corresponding to an image enlarged around the peak coordinates of the tip position based on the image data corresponding to the currently displayed image stored in the hard disk 60. To do. Then, the CPU 20 transfers the created image data to the output buffer 80, thereby switching the image projected on the screen 400 by the projector 300 to an enlarged image as shown in FIG. 12D.

一方、「移動時間」が３秒以上である場合は、ＣＰＵ２０は、画像を切り替えない。 On the other hand, when the “movement time” is 3 seconds or more, the CPU 20 does not switch the image.

上述のように、本発明の実施の形態に係るジェスチャー認識装置１００は、ジェスチャーの先端位置を検出することで、コマンドのパターンの自由度を広げることができる。 As described above, the gesture recognition apparatus 100 according to the embodiment of the present invention can expand the degree of freedom of the command pattern by detecting the tip position of the gesture.

なお、この発明は上記実施例に限定されず、種々の変形及び応用が可能である。 In addition, this invention is not limited to the said Example, A various deformation | transformation and application are possible.

上記実施の形態では、１ブロック当たりの画素数Ｍを縦方向８画素×横方向８画素＝６４としたが、ブロックのサイズは任意である。例えば、より小さい動きを検出したい場合は、ブロックのサイズを小さくし（例えば、縦方向４画素×縦方向４画素）、より速い動きを検出したい場合は、ブロックのサイズを大きくする（例えば、縦方向１６画素×縦方向１６画素）。 In the above embodiment, the number M of pixels per block is 8 pixels in the vertical direction × 8 pixels in the horizontal direction = 64, but the size of the block is arbitrary. For example, if it is desired to detect a smaller motion, the block size is reduced (for example, 4 pixels in the vertical direction × 4 pixels in the vertical direction), and if it is desired to detect a faster motion, the block size is increased (for example, the vertical size). 16 pixels in the direction × 16 pixels in the vertical direction).

また、上記実施の形態では、動き量Ｘの閾値を、輝度の差分の自乗の平均値算出（ステップＳ４３）で算出した輝度の差分を自乗した値の平均値（Σ（ΔＢ_ｎ）^２／Ｎ）に、１ブロック当たりの画素数Ｍを乗じた値（Σ（ΔＢ_ｎ）^２／Ｎ×Ｍ）とした。しかし、動き量Ｘの閾値は、上述の閾値をさらに定数Ｋ倍した値（Σ（ΔＢ_ｎ）^２／Ｎ×Ｍ×Ｋ）や、輝度の差分の絶対値の平均値に１ブロック当たりの画素数Ｍを乗じた値（Σ｜ΔＢ_ｎ｜／Ｎ×Ｍ）等、画面全体の輝度の変化量に応じて求められる他の値としてもよい。 In the above-described embodiment, the threshold value of the motion amount X is set to the average value (Σ (ΔB _n ) ² / N of the square value of the luminance difference calculated in the average value calculation of the square of the luminance difference (step S43). ) Multiplied by the number of pixels M per block (Σ (ΔB _n ) ² / N × M). However, the threshold value of the motion amount X is a value obtained by further multiplying the above threshold value by a constant K (Σ (ΔB _n ) ² / N × M × K), or an average value of absolute values of luminance differences. Other values obtained in accordance with the amount of change in luminance of the entire screen, such as a value obtained by multiplying the number M (Σ | ΔB _n | / N × M), may be used.

また、上記実施の形態では、動き量Ｘの閾値を、演算により求めていたが、画面全体の輝度の変化量に応じて適切な閾値を選択するようにしてもよい。この場合、複数の閾値を候補値としてあらかじめハードディスク６０等に記憶しておけばよい。 In the above embodiment, the threshold value of the motion amount X is obtained by calculation. However, an appropriate threshold value may be selected according to the amount of change in luminance of the entire screen. In this case, a plurality of threshold values may be stored in advance in the hard disk 60 or the like as candidate values.

また、上記実施の形態では、動き量Ｘの閾値を、画面全体の輝度の変化量に応じて求めていたが、画面全体の動きの激しさが予測できる場合は、閾値を固定値としてもよい。この場合、閾値をあらかじめハードディスク６０等に記憶しておけばよい。 In the above-described embodiment, the threshold value of the motion amount X is obtained according to the amount of change in the brightness of the entire screen. However, when the intensity of motion of the entire screen can be predicted, the threshold value may be a fixed value. . In this case, the threshold value may be stored in advance in the hard disk 60 or the like.

また、上記実施の形態では、動きブロック数の多い矩形範囲を１つのみ設定したが、矩形範囲は２つ以上設定してもよい。また、動きブロック数の多い範囲として設定する形は、矩形に限られず、ジェスチャー対象物の形状に合わせて、例えば、円形、楕円形、四角形でない多角形など任意の形状にすることができる。 Moreover, in the said embodiment, although only one rectangular range with many motion blocks was set, you may set two or more rectangular ranges. In addition, the shape set as the range having a large number of motion blocks is not limited to a rectangle, and may be an arbitrary shape such as a circle, an ellipse, or a polygon other than a rectangle according to the shape of the gesture target.

上記実施の形態では、カメラ２００から供給された全てのフレーム画像について処理する例を示した。しかし、撮像する際のフレームレートがジェスチャー認識装置の処理速度に対して高速である場合は、間引きしたフレーム画像に対して処理するようにしてもよい。 In the above-described embodiment, an example in which all frame images supplied from the camera 200 are processed has been described. However, when the frame rate at the time of imaging is higher than the processing speed of the gesture recognition device, the thinned frame image may be processed.

また、本発明は、上述したジェスチャーのパターンやコマンドは上述の例に限定されず、任意に設定することが可能である。 In the present invention, the above-described gesture patterns and commands are not limited to the above examples, and can be arbitrarily set.

さらに、本発明は、上述した構成例やフローチャートに示される手順に限定されないことは勿論である。 Furthermore, it goes without saying that the present invention is not limited to the procedures shown in the configuration examples and flowcharts described above.

本発明の実施の形態に係るジェスチャー認識装置が適用されるシステムの構成図である。1 is a configuration diagram of a system to which a gesture recognition device according to an embodiment of the present invention is applied. 本発明の実施の形態に係るジェスチャー認識装置の機能ブロック図である。It is a functional block diagram of the gesture recognition apparatus which concerns on embodiment of this invention. 図１に示すジェスチャー認識装置のジェスチャー認識処理の一例を示すフローチャートである。It is a flowchart which shows an example of the gesture recognition process of the gesture recognition apparatus shown in FIG. 図３のフローチャートに示す閾値設定処理の一例を示すフローチャートである。It is a flowchart which shows an example of the threshold value setting process shown in the flowchart of FIG. 図３のフローチャートに示す連続ノイズ除去処理の一例を示すフローチャートである。It is a flowchart which shows an example of the continuous noise removal process shown in the flowchart of FIG. 図３のフローチャートに示す微少ノイズ除去処理の一例を示すフローチャートである。It is a flowchart which shows an example of the minute noise removal process shown in the flowchart of FIG. 微少ノイズの除去処理を説明するための図である。It is a figure for demonstrating the removal process of a minute noise. 図３のフローチャートに示すパターン検出処理の一例を示すフローチャートである。It is a flowchart which shows an example of the pattern detection process shown in the flowchart of FIG. 動きブロックの中心位置の検出方法を説明するための図である。It is a figure for demonstrating the detection method of the center position of a motion block. 動きブロックの先端位置の検出方法を説明するための図である。It is a figure for demonstrating the detection method of the front-end | tip position of a motion block. ジェスチャーのパターンの一例を示す図である。It is a figure which shows an example of the pattern of gesture. パターン１とのマッチング処理を説明するための図である。It is a figure for demonstrating a matching process with the pattern 1. FIG. パターン１とのマッチング処理を説明するための図である。It is a figure for demonstrating a matching process with the pattern 1. FIG. パターン１とのマッチング処理を説明するための図である。It is a figure for demonstrating a matching process with the pattern 1. FIG. パターン１のコマンド処理を説明するための図である。FIG. 10 is a diagram for explaining command processing of pattern 1; パターン１のコマンド処理を説明するための図である。FIG. 10 is a diagram for explaining command processing of pattern 1; パターン１のコマンド処理を説明するための図である。FIG. 10 is a diagram for explaining command processing of pattern 1; パターン５とのマッチング処理を説明するための図である。10 is a diagram for explaining a matching process with a pattern 5. FIG. パターン５とのマッチング処理を説明するための図である。10 is a diagram for explaining a matching process with a pattern 5. FIG. パターン５とのマッチング処理を説明するための図である。10 is a diagram for explaining a matching process with a pattern 5. FIG. パターン５のコマンド処理を説明するための図である。10 is a diagram for explaining command processing of pattern 5. FIG.

Explanation of symbols

１０バス
２０ＣＰＵ（Central Processing Unit）
３０ＲＯＭ（Read Only Memory）
４０ＲＡＭ（Random Access Memory）
５０Ｉ／Ｏ（Input Output）部
６０ハードディスク
７０入力バッファ
８０出力バッファ
９０タイマカウンタ
１００ジェスチャー認識装置
１１０画像入力部
１２０時刻取得部
１２１撮像時刻取得部
１３０認識条件設定部
１３１閾値設定部
１４０動き量検出部
１５０動きブロック検出部
１６０特定ブロック検出部
１６１連続ノイズ除去部
１６２微少ノイズ除去部
１７０ジェスチャー認識部
１７１中心位置検出部
１７２先端位置検出部
１７３軌跡検出部
１７４パターンマッチング部
１８０コマンド処理部
１９０画像出力部
２００カメラ
２１０撮像部
３００プロジェクタ
３１０投影部
４００スクリーン
５００被撮像者 10 Bus 20 CPU (Central Processing Unit)
30 ROM (Read Only Memory)
40 RAM (Random Access Memory)
50 I / O (Input Output) unit 60 Hard disk 70 Input buffer 80 Output buffer 90 Timer counter 100 Gesture recognition device 110 Image input unit 120 Time acquisition unit 121 Imaging time acquisition unit 130 Recognition condition setting unit 131 Threshold setting unit 140 Motion amount detection Unit 150 motion block detection unit 160 specific block detection unit 161 continuous noise removal unit 162 minute noise removal unit 170 gesture recognition unit 171 center position detection unit 172 tip position detection unit 173 locus detection unit 174 pattern matching unit 180 command processing unit 190 image output Section 200 Camera 210 Imaging section 300 Projector 310 Projection section 400 Screen 500 Person to be imaged

Claims

In a gesture recognition device for recognizing a gesture based on an action of an imaged object,
Recognition condition setting means for setting gesture recognition conditions;
A motion amount detection means for detecting a motion amount based on the amount of change in luminance of each pixel in the divided region for each divided region obtained by dividing the captured image into a plurality of regions;
A motion region detection unit that detects a segmented region in which a motion amount detected by the motion amount detection unit is equal to or greater than a predetermined threshold as a motion region;
Specific area determination means for determining whether or not the movement area detected by the movement area detection means is a specific area to be adopted for gesture recognition;
Gesture recognition means for recognizing a gesture based on at least a motion area determined to be a specific area by the specific area determination means and a gesture recognition condition set by the recognition condition setting means;
A gesture recognition device characterized by comprising:

The recognition condition setting means includes threshold setting means for setting the predetermined threshold,
The motion region detection means detects a segmented region in which the motion amount detected by the motion amount detection means is equal to or greater than a threshold set by the threshold setting means as a motion region;
The gesture recognition apparatus according to claim 1.

The threshold setting means sets the threshold based on the amount of change in luminance of each pixel of the sequentially captured images;
The gesture recognition apparatus according to claim 2.

The motion amount detection means detects a sum of values obtained by squaring the amount of change in luminance of each pixel in the divided region as a motion amount.
The gesture recognition apparatus according to claim 1.

The specific area determination means does not determine that the movement area detected by the movement area detection means continuously for a predetermined time or more is the specific area;
The gesture recognition apparatus according to claim 1.

The specific area determination unit is included in a predetermined area including the plurality of movement areas among the movement areas detected by the movement area detection unit when a plurality of movement areas are detected by the movement area detection unit. A movement area to be determined is the specific area,
The gesture recognition apparatus according to claim 1.

In the case where there are a plurality of motion areas determined to be specific areas by the specific area determination means, the movement of the gesture in the image is determined from the positions of the plurality of motion areas determined to be the specific areas in the image. A center position detecting means for detecting a center position;
The gesture recognition means recognizes a gesture based on a center position detected by the center position detection means;
The gesture recognition apparatus according to claim 1.

Movement direction determination means for determining the direction of movement of a gesture from a plurality of captured images;
In a case where there are a plurality of motion areas determined to be the specific area by the specific area determination unit, the motion detected by the center position detection unit among the plurality of motion areas determined to be the specific area. Tip position detecting means for detecting a movement region existing at the most distant position in the direction of movement of the gesture determined by the movement direction determining means with reference to the center position; and
The gesture recognizing means recognizes a gesture based on the position in the image of the motion region existing at the most distant position detected by the tip position detecting means;
The gesture recognition apparatus according to claim 7.

Imaging time acquisition means for acquiring the imaging time of images taken sequentially, further comprising:
The gesture recognizing unit obtains the time when the specific region determining unit continuously determines that the motion region is a specific region based on the image capturing time of the image acquired by the imaging time acquisition unit, Recognized as a gesture according to the given time,
The gesture recognition apparatus according to claim 1.

In a gesture recognition method for recognizing a gesture based on an action of an imaged object,
A recognition condition setting step for setting a recognition condition for the gesture;
A motion amount detection step for detecting a motion amount based on the amount of change in luminance of each pixel in the divided region for each divided region obtained by dividing the captured image into a plurality of regions;
A motion region detection step of detecting, as a motion region, a divided region in which the motion amount detected in the motion amount detection step is equal to or greater than a predetermined threshold;
A specific region determination step for determining whether or not the motion region detected in the motion region detection step is a specific region to be adopted for gesture recognition;
A gesture recognition step for recognizing a gesture based on at least a motion region determined to be a specific region in the specific region determination step and a gesture recognition condition set in the recognition condition setting step;
A gesture recognition method characterized by comprising: