JP5788853B2

JP5788853B2 - System and method for a gesture-based control system

Info

Publication number: JP5788853B2
Application number: JP2012242791A
Authority: JP
Inventors: アンダーコフラー，ジョン・エス; ペアレント，ケビン・ティー
Original assignee: オブロング・インダストリーズ・インコーポレーテッド
Priority date: 2005-02-08
Filing date: 2012-11-02
Publication date: 2015-10-07
Anticipated expiration: 2026-02-08
Also published as: JP2013047978A

Description

この特許出願は、参照によりその全体が本明細書に組み込まれている、２００５年２月８日に出願された米国仮特許出願第６０／６５１２９０号、「ジェスチャベースの制御システム」の優先権を主張するものである。 This patent application is the priority of US Provisional Patent Application No. 60/651290, "Gesture-Based Control System", filed February 8, 2005, which is incorporated herein by reference in its entirety. It is what I insist.

本発明は、一般には、コンピュータシステムの分野に関し、より詳細には、ジェスチャベースの制御システムのためのシステムおよび方法に関する。 The present invention relates generally to the field of computer systems, and more particularly to systems and methods for gesture-based control systems.

ユーザは、マウス、キーボード、ジョイスティック、十字キーなどの入力デバイスを使用して、ディスプレイ上のウィンドウ内のデータまたは画像を操作し、あるいはウィンドウに関連するメニューまたは関連するプログラムから操作を選択することによってコマンドを入力し、コンピュータシステムと対話し得る。こうした入力装置は、カーソルなどのグラフィカルなオンスクリーンポインタを置くために使用され得る、位置変換装置として動作することもできる。カーソルは、たとえば、修正される文字を示し、あるいはデータが入力され、または操作が実施される位置を示すように機能する。何らかの形または外観のカーソルが一般に、コンピュータディスプレイ上に存在する。ユーザによる入力装置の操作は、カーソルの対応する動きをもたらす。したがって、たとえばマウスまたは他の入力装置が動くと、カーソルが同じ方向に動くことになる。 The user uses an input device such as a mouse, keyboard, joystick, or cross key to manipulate data or images in a window on the display, or by selecting an operation from a menu or associated program associated with the window Commands can be entered and interacted with the computer system. Such an input device can also operate as a position translation device that can be used to place a graphical on-screen pointer such as a cursor. The cursor functions, for example, to indicate a character to be corrected, or to indicate a position where data is input or an operation is performed. Some form or appearance of cursor is generally present on the computer display. The operation of the input device by the user results in a corresponding movement of the cursor. Thus, for example, when the mouse or other input device moves, the cursor moves in the same direction.

カーソルは、その機能およびコンピュータシステムの状態により異なる外観を有し得る。たとえば、コンピュータディスプレイ上のテキストのフィールド内に置かれるとき、カーソルは、「Ｉビーム」、またはブリンクする縦線の外観を有し得る。テキストフィールドのカーソルの位置は、ユーザによって一般にはキーボードを介して入力される次の文字の位置を示す。カーソルは、その機能に応じた他の外観を有し得る。描画またはペインティングのプログラムでは、カーソルは、ペイントブラシ、鉛筆、消しゴム、バケツまたは他の図形として表され得る。 The cursor may have a different appearance depending on its function and the state of the computer system. For example, when placed within a field of text on a computer display, the cursor may have the appearance of an “I-beam” or blinking vertical line. The position of the cursor in the text field indicates the position of the next character that is typically entered by the user via the keyboard. The cursor may have other appearances depending on its function. In a drawing or painting program, the cursor may be represented as a paint brush, pencil, eraser, bucket or other shape.

カーソルは、ユーザ選択可能な操作の上に置かれ、またはウィンドウなどのグラフィック要素を選択するために使用されるときに、矢印またはポインタの形をとることもできる。カーソルで所望の操作を選択し、アクティブにするために、それは、操作のグラフィックまたはテキスト表現の上に置かれ得る。操作を行うために、マウス入力装置上に位置するボタンが押下されかつ／または解放され得る。ユーザは、視覚的なフィードバックによって、通常はコンピュータのディスプレイの画像の何らかの変化の形で、実行操作の受入れについての通知を受ける。使用中のプログラムのうちの１つまたは複数が一般に、この視覚的な応答を生成する。これらのプログラムは、選択された操作に応答して、表示画像を更新させる描画コマンドを生成する。 The cursor can also be in the form of an arrow or pointer when placed over a user-selectable operation or used to select a graphic element such as a window. In order to select and activate the desired operation with the cursor, it can be placed on a graphic or textual representation of the operation. To perform an operation, a button located on the mouse input device may be pressed and / or released. The user is notified by visual feedback about the acceptance of an execution operation, usually in the form of some change in the image of the computer display. One or more of the programs in use typically generate this visual response. These programs generate a drawing command for updating the display image in response to the selected operation.

従来技術のシステムの欠点は、入力装置がしばしば、ただ単に、装置であるということである。ユーザは、有線または無線のマウス、あるいは他の入力装置を有し、選択、位置変換、アクティブ化および他の入力機能を管理するためにその装置を使用することを求められる。これらの物理装置の使用はしばしば、自然または直覚的ではない。別の欠点は、それぞれ異なる機能が実施され得るように、入力装置のコンテキストを変更するために特定のステップを経る必要性である。 A disadvantage of prior art systems is that the input device is often simply a device. The user is required to have a wired or wireless mouse, or other input device, and use that device to manage selection, repositioning, activation and other input functions. The use of these physical devices is often not natural or intuitive. Another drawback is the need to go through specific steps to change the context of the input device so that different functions can be performed.

非常に大きいディスプレイが普及するにつれて、従来技術の入力装置およびシステムのさらなる欠点が明らかになる。マウスを使用して、たとえば大きいディスプレイにわたり
カーソルの位置を変換しようとする場合、ユーザはしばしば、大きいディスプレイの一部であってもユーザがカーソルをドラッグできるようにするには、マウスを持ち上げ、それをマウス表面に置かなければならない。これは、無駄な、不自然な運動である。 As very large displays become widespread, further disadvantages of prior art input devices and systems become apparent. When using a mouse, for example, to move the position of the cursor across a large display, the user often lifts the mouse to allow the user to drag the cursor even though it is part of the large display, Must be placed on the mouse surface. This is a useless, unnatural exercise.

これらの問題への解決策を提供しようといくつかの従来技術の試みが行われてきた。従来技術の１つの解決策は、ユーザの手にグローブを使用することである。これらのグローブは、ユーザの１つまた複数の手を入力装置にすることを意図している。一実施形態では、入力グローブは、コンピュータシステムに、配線接続される。この解決策には、事実上ユーザをその場所に縛り付け、コンピュータシステムの近くにあること、および動き範囲の制約を要するという欠点がある。他の場合では、グローブは無線である。しかし、こうした無線の実装は、グローブ用の独立した電源を必要とする。電源の充電が必要であるとき、グローブは使用され得ない。 Several prior art attempts have been made to provide solutions to these problems. One prior art solution is to use a glove in the user's hand. These gloves are intended to make the user's hand or hands the input device. In one embodiment, the input globe is hard-wired to the computer system. This solution has the disadvantage that it effectively ties the user to the location, is near the computer system, and requires a limited range of motion. In other cases, the globe is wireless. However, such wireless implementations require a separate power source for the globe. The glove cannot be used when the power supply needs to be charged.

このシステムは、表示画面または画面上に提示された、視覚的に提示された様々な要素へのジェスチャインタフェースを提供する。 This system provides a gesture interface to a display screen or various visually presented elements presented on the screen.

このシステムのオペレータは、一実施形態では、オペレータの手を使用して、「ジェスチャコマンド」の連続した流れを出すことによって、これらの要素をナビゲーションし操作する。他の実施形態では、ナビゲーションおよび制御を提供するために、ユーザの頭、足、腕、脚またはユーザの全身が使用され得る。ジェスチャ語彙は、片手または両手で適切な「ポーズ」を形成することによって即座の一度限りの動作がもたらされる「瞬間」コマンドと、オペレータが、リテラル「ポインティング」（ｌｉｔｅｒａｌｐｏｉｎｔｉｎｇ）ジェスチャによって画面上の要素を直接参照し、相対または「オフセット」ジェスチャによってナビゲーション操作を実施する「空間」コマンドとを含む。全体的または直接の空間ジェスチャのために使用されるポインティングジェスチャに加えて、本発明は、ＸＹＺ空間の相対的な空間ナビゲーションジェスチャの別のカテゴリを認識することもできる。この動作カテゴリは、ＸＹＺ技術と呼ばれることがある。高いフレームレートを維持し、オペレータのジェスチャの解釈においてほぼ感知不可能な遅れを保証し、注意深く設計された空間メタファと、容易に明らかな「直接操作」機構の両方を使用することによってこのシステムは、オペレータと、表される情報＆プロセスとの間の鮮明な「認知結合」を提供する。このシステムは、ユーザの手を識別する能力について企図している。この識別システムは、グローブ、またはその上に特定の表示が備えられたグローブの形であってもよく、あるいはユーザの手の上に認識可能な表示を設けるための任意の適切な手段であってもよい。カメラのシステムは、ユーザの手の位置、向きおよび動きを検出し、その情報を実行可能なコマンドに変換することができる。 The operator of the system, in one embodiment, navigates and manipulates these elements by issuing a continuous stream of “gesture commands” using the operator's hand. In other embodiments, the user's head, feet, arms, legs, or the user's entire body may be used to provide navigation and control. Gesture vocabulary consists of “instant” commands that provide an immediate “one-off” action by forming an appropriate “pause” with one or both hands, and an on-screen element by an operator with a literal “pointing” (literal pointing) gesture. And a “spatial” command that performs navigation operations with relative or “offset” gestures. In addition to pointing gestures used for global or direct spatial gestures, the present invention can also recognize another category of relative spatial navigation gestures in XYZ space. This operational category is sometimes referred to as XYZ technology. By maintaining both high frame rates, guaranteeing almost undetectable delays in the interpretation of operator gestures, the system uses both carefully designed spatial metaphors and easily apparent “direct manipulation” mechanisms. Provides a clear “cognitive link” between the operator and the information & process represented. This system contemplates the ability to identify a user's hand. This identification system may be in the form of a glove, or a glove with a specific display on it, or any suitable means for providing a recognizable display on the user's hand, Also good. The camera system can detect the position, orientation and movement of the user's hand and translate that information into executable commands.

ジェスチャベースの制御システムのためのシステムおよび方法について述べられる。以下の説明では、本発明についてのより完全な理解を提供するために、複数の特徴について詳細に述べられる。本発明は、これらの具体的な詳細なしに実施され得ることが明らかである。他の場合では、よく知られている特徴については、詳細には述べられていない。 A system and method for a gesture-based control system is described. In the following description, numerous features are set forth in detail in order to provide a more thorough understanding of the present invention. It will be apparent that the invention may be practiced without these specific details. In other cases, well-known features have not been described in detail.

システム
本発明の一実施形態のブロック図が、図１に示されている。ユーザは、並んだカメラ１０４Ａ〜１０４Ｄの視野内で、彼の手１０１および１０２を見つける。カメラは、指ならびに手１０１および１０２の位置、向きおよび動きを検出し、プリプロセッサ１０５への出力信号を生成する。プリプロセッサ１０５は、カメラ出力をジェスチャ信号に変換し、
このジェスチャ信号は、システムのコンピュータ処理装置１０７に供給される。コンピュータ１０７は、入力情報を使用して、画面上の１つまたは複数のカーソルを制御させるコマンドを生成し、ビデオ出力をディスプレイ１０３に供給する。 System A block diagram of one embodiment of the present invention is shown in FIG. The user finds his hands 101 and 102 within the field of view of the cameras 104A-104D side by side. The camera detects the position, orientation and movement of the fingers and hands 101 and 102 and generates an output signal to the preprocessor 105. The preprocessor 105 converts the camera output into a gesture signal,
This gesture signal is supplied to the computer processing unit 107 of the system. Computer 107 uses the input information to generate commands that control one or more cursors on the screen and provides video output to display 103.

このシステムは、入力として単一ユーザの手を含んで示されているが、本発明は、複数のユーザを使用して実施されることもできる。さらに、このシステムは、手ではなく、または手に加えて、頭、足、脚、腕、肘、膝などを含めて、ユーザの体の１つまたは複数の任意の部分をも追跡し得る。 Although this system is shown including a single user's hand as input, the present invention can also be implemented using multiple users. In addition, the system may track any part or parts of the user's body, including the head, feet, legs, arms, elbows, knees, etc., instead of or in addition to the hand.

示された実施形態では、ユーザの手１０１および１０２の位置、向きおよび動きを検出するために、４台のカメラが使用されている。本発明は、本発明の範囲または精神から逸脱せずに、より多くのまたは少ないカメラと共に使用され得ることを理解されたい。さらに、カメラは、この例示的な実施形態では対称的に配置されているが、本発明では、こうした対称性の要件はない。本発明では、ユーザの手の位置、向きおよび動きを可能にする任意の数のカメラ、またはカメラの位置決定が使用され得る。 In the illustrated embodiment, four cameras are used to detect the position, orientation and movement of the user's hands 101 and 102. It should be understood that the present invention can be used with more or fewer cameras without departing from the scope or spirit of the present invention. Furthermore, although the cameras are arranged symmetrically in this exemplary embodiment, the present invention does not have such symmetry requirements. In the present invention, any number of cameras that allow the position, orientation and movement of the user's hand or camera positioning can be used.

本発明の一実施形態では、使用されるカメラは、グレースケール画像をキャプチャすることができるモーション・キャプチャ・カメラである。一実施形態では、使用されるカメラは、ＶｉｃｏｎＭＸ４０カメラなど、Ｖｉｃｏｎ社製のものである。このカメラは、カメラ上の処理を含み、１０００フレーム／秒で画像をキャプチャすることができる。モーション・キャプチャ・カメラは、マーカを検出し、マーカの位置を突き止めることができる。 In one embodiment of the invention, the camera used is a motion capture camera capable of capturing grayscale images. In one embodiment, the camera used is from Vicon, such as a Vicon MX40 camera. This camera includes processing on the camera and can capture images at 1000 frames / second. The motion capture camera can detect the marker and locate the marker.

述べられた実施形態では、カメラは、光検出のために使用される。他の実施形態では、カメラまたは他の検出器は、電磁気、静磁気、ＲＦＩＤまたは他の任意の適切なタイプの検出のために使用され得る。 In the described embodiment, the camera is used for light detection. In other embodiments, a camera or other detector may be used for electromagnetic, magnetostatic, RFID or any other suitable type of detection.

プリプロセッサ１０５は、３次元空間点の再構成および骨格点（ｓｋｅｌｅｔａｌｐｏｉｎｔ）のラベル付けを生成するために使用される。ジェスチャ変換器１０６は、３Ｄ空間的情報およびマーカ動き情報をコマンド言語に変換するために使用され、このコマンド言語は、ディスプレイ上のカーソルの位置、形状および動作を更新するためにコンピュータプロセッサによって解釈されることができる。本発明の代替の一実施形態では、プリプロセッサ１０５とジェスチャ変換器１０６は、単一の装置へと組み合わされることができる。 The preprocessor 105 is used to generate 3D spatial point reconstruction and skeleton point labeling. The gesture converter 106 is used to convert 3D spatial information and marker motion information into a command language, which is interpreted by a computer processor to update the position, shape and movement of the cursor on the display. Can. In an alternative embodiment of the present invention, preprocessor 105 and gesture converter 106 can be combined into a single device.

コンピュータ１０７は、Ａｐｐｌｅ社、Ｄｅｌｌ社、または他の任意の適切なメーカによって製造されたものなど、任意の汎用コンピュータであり得る。コンピュータ１０７は、アプリケーションを実行し、ディスプレイ出力を供給する。さもなければマウスまたは他の従来技術の入力装置から来るカーソル情報は、ここではジェスチャシステムから来ている。 The computer 107 may be any general purpose computer such as those manufactured by Apple, Dell, or any other suitable manufacturer. Computer 107 executes the application and provides display output. Otherwise, cursor information coming from a mouse or other prior art input device now comes from a gesture system.

マーカ・タグ
本発明は、システムがユーザの手を見つけ、それが左手を見ているか、それとも右手を見ているか、またどの指が見えるかを識別できるように、ユーザの１つまたは複数の指の上のマーカ・タグを使用することを企図している。これは、システムがユーザの手の位置、向きおよび動きを検出することを可能にする。この情報は、複数のジェスチャがシステムによって認識され、ユーザによってコマンドとして使用されることを可能にする。 Marker Tag The present invention provides one or more fingers of a user so that the system can find the user's hand and identify whether it is looking at the left or right hand and which finger is visible. It is intended to use a marker tag on the top. This allows the system to detect the position, orientation and movement of the user's hand. This information allows multiple gestures to be recognized by the system and used as commands by the user.

一実施形態のマーカ・タグは、（この実施形態では、人間の手の上の様々な位置に付けるのに適している）基板と、基板の表面上に一意の識別パターンで配列された個別のマー
カとを含む物理タグである。 The marker tag in one embodiment comprises a substrate (which in this embodiment is suitable for being placed at various locations on the human hand) and individual arrays arranged with a unique identification pattern on the surface of the substrate. A physical tag including a marker.

マーカおよび関連する外部感知システムは、その３空間位置の正確で、精密で、迅速で、連続した取得を可能にする任意の領域（光学、電磁気、静磁気など）で動作し得る。マーカ自体は、（たとえば構造化された電磁パルスの放射によって）能動的に動作することも、あるいは（たとえばこの実施形態の場合のように光学的再帰反射型であることよって）受動的に動作することもある。 The marker and associated external sensing system can operate in any region (optical, electromagnetic, magnetostatic, etc.) that allows accurate, precise, rapid and continuous acquisition of its three spatial positions. The marker itself can operate actively (for example by radiating structured electromagnetic pulses) or passively (for example by being optically retroreflective as in this embodiment). Sometimes.

それぞれの取得フレームで、検出システムは、（カメラまたは他の検出器の視野内の）機器を備えた作業領域ボリューム内に現在あるタグからのすべてのマーカを含む、回復された３空間位置の集合体の「クラウド」（ａｇｇｒｅｇａｔｅｃｌｏｕｄ）を受信する。各タグ上のマーカは、十分な数があり、検出システムが以下のタスクを実施することができるように一意のパターンで配列される。（１）回復された各マーカ位置が、単一のタグを形成する点の唯一の下位集合に割り当てられるセグメント化、（２）点のセグメント化された各下位集合が特定のタグとして識別されるラベル付け、（３）識別されたタグの３空間位置が回復される位置決め、および（４）識別されたタグの３空間の向きが回復される向き決定。タスク（１）および（２）は、下記に述べられ、また図２の一実施形態で示されるように、マーカパターンの特定の性質によって可能となる。 At each acquisition frame, the detection system collects a set of recovered three spatial positions that includes all markers from tags currently in the work area volume with the instrument (in the field of view of the camera or other detector). Receive the body's “cloud” (aggregate cloud). The markers on each tag are sufficient and are arranged in a unique pattern so that the detection system can perform the following tasks. (1) Segmentation where each recovered marker position is assigned to a unique subset of points forming a single tag, (2) Each segmented subset of points is identified as a particular tag Labeling, (3) positioning in which the 3-space position of the identified tag is restored, and (4) orientation determination in which the 3-space orientation of the identified tag is restored. Tasks (1) and (2) are enabled by the specific nature of the marker pattern, described below and as shown in one embodiment of FIG.

一実施形態のタグ上のマーカは、通常のグリッド位置のサブセットに付けられる。この実施形態の場合のように、基礎をなすこのグリッドは、従来のデカルト式の種類のものであってもよく、あるいは、そうではなく、他の何らかの通常の面モザイク（たとえば三角形／六角形／タイル配列）のものであってもよい。グリッドの目盛りおよび間隔は、隣接したグリッド位置が混同される可能性が低くなるように、マーカ感知システムの既知の空間分解能に関して確立される。すべてのタグについてのマーカパターンの選択が、以下の制約を満たすべきである：タグのパターンが、回転、変換またはミラーリングのいずれかの組合せによって、他のいずれかのタグのパターンのそれと一致するものではない。マーカの多様性および配列はさらに、指定されたいくらかの数の構成要素マーカの損失（または閉塞）が許容されるように選択され得る。いずれかの任意の変換の後、損なわれたモジュールを他のいずれかのモジュールと混同する可能性は依然として低くあるべきである。 Markers on the tag of one embodiment are attached to a subset of normal grid positions. As in this embodiment, this underlying grid may be of the conventional Cartesian type, or otherwise, some other normal surface mosaic (eg triangle / hexagon / Tile arrangement). Grid scales and spacing are established with respect to the known spatial resolution of the marker sensing system so that adjacent grid positions are less likely to be confused. The choice of marker pattern for all tags should meet the following constraints: The tag pattern matches that of any other tag pattern by any combination of rotation, translation, or mirroring is not. The diversity and arrangement of markers can further be selected to allow for the loss (or occlusion) of some number of component markers. After any arbitrary conversion, the likelihood of confusing a damaged module with any other module should still be low.

次に図２を参照すると、複数のタグ２０１Ａ〜２０１Ｅ（左手側）および２０２Ａ−２０２Ｅ（右手）が示されている。それぞれのタグは長方形であり、この実施形態では、５×７グリッド配列からなる。長方形の形状は、タグの向きを決定するのに役立つように、またミラー複製の可能性を減少させるように選択される。示された実施形態では、それぞれの手の各指についてタグがある。一部の実施形態では、１つの手につき１つ、２つ、３つまたは４つのタグを使用するので十分であり得る。それぞれのタグは、異なるグレースケールまたは色の濃淡の境界線を有する。この境界内には、３×５グリッド配列がある。マーカ（図２の黒点によって表される）は、情報を提供するために、グリッド配列内の特定の点に配置される。 Referring now to FIG. 2, a plurality of tags 201A-201E (left hand side) and 202A-202E (right hand) are shown. Each tag is rectangular and in this embodiment consists of a 5 × 7 grid array. The rectangular shape is selected to help determine tag orientation and to reduce the possibility of mirror replication. In the embodiment shown, there is a tag for each finger of each hand. In some embodiments, it may be sufficient to use one, two, three or four tags per hand. Each tag has a different grayscale or shade of color border. Within this boundary is a 3 × 5 grid array. Markers (represented by black dots in FIG. 2) are placed at specific points in the grid array to provide information.

資格情報（ｑｕａｌｉｆｙｉｎｇｉｎｆｏｒｍａｔｉｏｎ）は、それぞれのパターンを「共通の」かつ「一意の」サブパターンにセグメント化することによって、タグのマーカパターンで符号化され得る。たとえば、この実施形態は、２つの可能な「境界パターン」、長方形境界についてのマーカの分布を指定する。タグの「ファミリー」がこのように確立され、したがって、左手用のタグは、タグ２０１Ａ〜２０１Ｅに示されるように、同じ境界パターンを使用し、右手の指に付けられたものには、タグ２０２Ａ〜２０２Ｅに示されるように、異なるパターンが割り当てられ得る。このサブパターンは、タグのすべての向きにおいて、左パターンが右パターンと区別され得るように選択される。示された例では、左手パターンは、各隅、および隅のグリッド位置から２番目のマーカ上にマーカを
含む。右手パターンは、２つの隅だけにマーカを有しており、また隅でないグリッド位置に２つのマーカを有している。パターンの検査によって、４つのマーカのうちのいずれか３つのマーカが目に見える限り、左手パターンは、左手パターンから明確に区別できることが明らかになる。一実施形態では、境界の色または濃淡が、利き手の指標として使用されることもできる。 Qualifying information may be encoded with the marker pattern of the tag by segmenting each pattern into “common” and “unique” sub-patterns. For example, this embodiment specifies the distribution of markers for two possible “boundary patterns”, rectangular boundaries. A “family” of tags is thus established, so the tag for the left hand uses the same boundary pattern, as shown in tags 201A-201E, and is attached to the finger of the right hand for tag 202A. Different patterns may be assigned, as shown at ~ 202E. This sub-pattern is chosen so that the left pattern can be distinguished from the right pattern in all orientations of the tag. In the example shown, the left-hand pattern includes a marker on each corner and on the second marker from the corner grid position. The right hand pattern has markers at only two corners and has two markers at non-corner grid locations. Inspection of the pattern reveals that the left-hand pattern can be clearly distinguished from the left-hand pattern as long as any three of the four markers are visible. In one embodiment, the border color or tint can also be used as an indication of the dominant hand.

それぞれのタグは、もちろん、一意の内部のパターンを依然として使用しなければならず、マーカは、そのファミリーの共通の境界内に分散される。示された実施形態では、内部グリッド配列内の２つのマーカが、指の回転または向きによる重複なしに１０個の指のそれぞれを一意に識別するのに十分であることが分かっている。マーカのうちの１つが閉塞されても、パターンの組合せ、およびタグの利き手によって一意の識別子がもたらされる。 Each tag, of course, still must use a unique internal pattern, and the markers are distributed within the common boundaries of the family. In the illustrated embodiment, it has been found that two markers in the internal grid array are sufficient to uniquely identify each of the ten fingers without duplication due to finger rotation or orientation. Even if one of the markers is occluded, a unique identifier is provided by the combination of patterns and the dominant hand of the tag.

この実施形態では、グリッド位置は、その意図された位置で個々の再帰反射マーカを付ける（手動の）作業に役立つように、硬い基板上に視覚的に存在する。これらのグリッドおよび意図されたマーカ位置は、基板上にカラーインクジェットプリンタによって文字通り印刷され、この基板はこの場合、（最初は）柔軟な「シュリンクフィルム」のシートである。それぞれのモジュールは、シートから切り取られ、次いで炉で焼かれ、その熱処理の間、それぞれのモジュールは、正確で反復可能な収縮を経る。この手順の後に続く短い間隔の間、冷却タグは、たとえば指の縦の曲線をたどるようにわずかに形作られてもよく、その後、基板は、適切に硬くなり、マーカが、示されたグリッド点に付けられ得る。 In this embodiment, the grid location is visually present on the rigid substrate to aid in the (manual) task of attaching individual retroreflective markers at its intended location. These grids and intended marker positions are literally printed on a substrate by a color ink jet printer, which in this case is (initially) a sheet of flexible “shrink film”. Each module is cut from the sheet and then baked in a furnace, and during the heat treatment, each module undergoes accurate and repeatable shrinkage. During the short interval following this procedure, the cooling tag may be shaped slightly, for example to follow the vertical curve of the finger, after which the substrate will be properly hardened and the marker will be shown at the indicated grid point. Can be attached to.

一実施形態では、マーカ自体は、粘着性または他の何らかの適切な手段によって基板に付けられた小さい反射球体など、３次元のものである。マーカの３次元性は、２次元マーカに比べて、検出および位置決めに役立ち得る。しかし、本発明の精神および範囲から逸脱せずに、いずれもが使用され得る。 In one embodiment, the marker itself is three-dimensional, such as a small reflective sphere attached to the substrate by adhesive or some other suitable means. The three-dimensional nature of the marker can help detection and positioning compared to a two-dimensional marker. However, any can be used without departing from the spirit and scope of the present invention.

現在、タグは、ベルクロまたは他の適切な手段によって、オペレータが着用するグローブに付けられ、または柔らかい両面テープを使用してオペレータの指に交互に直接付けられる。第３の実施形態では、硬い基板をすべて不要にし、オペレータの指および手の上に個々のマーカを直接付け、または「塗る」ことが可能である。 Currently, tags are attached to the glove worn by the operator by Velcro or other suitable means, or alternatively directly attached to the operator's finger using soft double-sided tape. In a third embodiment, it is possible to dispense with all hard substrates and attach or “paint” individual markers directly on the operator's fingers and hands.

ジェスチャ語彙
本発明は、手のポーズ、向き、手の組合せ、および向きの混合からなるジェスチャ語彙を企図している。表記言語もまた、本発明のジェスチャ語彙でポーズおよびジェスチャを設計し伝達するために実装される。ジェスチャ語彙は、運動学的な結合の瞬間的な「ポーズ状態」を簡潔なテキストの形で表すための体系である。当該の結合は、生物学的（たとえば人間の手、または人間の体全体、またはバッタの脚、またはキツネザルの多関節の背骨）であることも、あるいはそうではなく、非生物学（たとえばロボットアーム）であることもある。いずれの場合も、結合は、単純であっても（背骨）、分岐していてもよい（手）。本発明のジェスチャ語彙体系は、任意の特定の結合について、一定の長さの列を確立し、次いで、列の「文字位置」を占有する特定のＡＳＣＩＩ文字の集合体は、その結合の瞬間的状態または「ポーズ」についての一意の描写である。 Gesture Vocabulary The present invention contemplates a gesture vocabulary consisting of hand poses, orientations, hand combinations, and a mixture of orientations. A notation language is also implemented to design and communicate poses and gestures in the gesture vocabulary of the present invention. The gesture vocabulary is a system for expressing a momentary “pose state” of kinematic coupling in a simple text form. The connection may be biological (eg, a human hand, or the entire human body, or a leg of a grasshopper, or a lemur's articulated spine), or otherwise, non-biological (eg, a robotic arm). ). In either case, the bond may be simple (spine) or branched (hand). The gesture vocabulary system of the present invention establishes a certain length sequence for any particular combination, and then the collection of specific ASCII characters that occupy the "character position" of the sequence is the instantaneous A unique description of a state or “pause”.

手のポーズ
図３は、本発明を使用するジェスチャ語彙の一実施形態の手のポーズを示している。本発明では、手の５つの指のそれぞれが使用されると仮定する。これらの指は、ｐ−小指、ｒ−薬指、ｍ−中指、ｉ−人差し指およびｔ−親指としてのコードである。指および親指についての複数のポーズが、図３に定義され、示されている。ジェスチャ語彙列は、結合（この場合では指）の表現可能な各自由度について単一の文字位置を確立する。さらに、
こうした各自由度は、その完全な動き範囲が、その文字列位置で有限数の標準ＡＳＣＩＩ文字のうちの１つを割り当てることによって表現され得るように離散化される（または「量子化される」）と理解される。これらの自由度は、体に特有の原点、および座標系（手の甲、バッタの体の中心、ロボットアームのベースなど）に関して表現される。したがって、結合の位置および向きを、よりグローバルな座標系で「全体として」表現するために、少数の追加のジェスチャ語彙文字位置が使用される。 Hand Pose FIG. 3 shows a hand pose for one embodiment of a gesture vocabulary using the present invention. In the present invention, it is assumed that each of the five fingers of the hand is used. These fingers are codes as p-small finger, r-ring finger, m-middle finger, i-index finger and t-thumb. Multiple poses for fingers and thumbs are defined and shown in FIG. The gesture vocabulary string establishes a single character position for each expressible degree of freedom of the combination (in this case a finger). further,
Each such degree of freedom is discretized (or “quantized”) so that its full range of motion can be represented by assigning one of a finite number of standard ASCII characters at that string position. ). These degrees of freedom are expressed in terms of the body-specific origin and coordinate system (back of hand, locust body center, robot arm base, etc.). Thus, a small number of additional gesture vocabulary character positions are used to represent the position and orientation of the join “in general” in a more global coordinate system.

やはり図３を参照すると、複数のポーズが、ＡＳＣＩＩ文字を使用して定義され、識別されている。ポーズのうちのいくつかは、親指と非親指の間で分割される。本発明は、この実施形態では、ＡＳＣＩＩ文字自体がポーズを暗示するような符号化を使用する。しかし、暗示的かどうかに関係なく、ポーズを表すために任意の文字が使用され得る。さらに、本発明では、表記列用にＡＳＣＩＩ文字を使用する要件はない。任意の適切なシンボル、数字または他の表現が、本発明の範囲および精神から逸脱せずに使用され得る。たとえば、表記は、所望であれば、１つの指ごとに２ビットを使用してもよく、あるいは要望に応じて、他の任意の数のビットを使用してもよい。 Still referring to FIG. 3, multiple poses are defined and identified using ASCII characters. Some of the poses are split between the thumb and non-thumb. The present invention uses, in this embodiment, an encoding in which the ASCII character itself implies a pose. However, any character can be used to represent a pose, whether implicit or not. Further, in the present invention, there is no requirement to use ASCII characters for the notation sequence. Any suitable symbol, number or other expression may be used without departing from the scope and spirit of the invention. For example, the notation may use 2 bits per finger if desired, or any other number of bits as desired.

丸めた指は文字「＾」によって表され、丸めた親指は、「＞」によって表される。まっすぐな指または上を向いた親指は「１」で示され、傾いたものは「＼」または「／」で示される。「−」は、まっすぐ横を向いた親指を表し、「ｘ」は、面に向いた親指を表す。 A rounded finger is represented by the letter “^”, and a rounded thumb is represented by “>”. A straight finger or an upward thumb is indicated by “1”, and a tilted one is indicated by “\” or “/”. “-” Represents a thumb pointing straight sideways, and “x” represents a thumb facing the surface.

これらの個々の指および親指の描写を使用して、相当な数（ｒｏｂｕｓｔｎｕｍｂｅｒ）の手のポーズが定義され、本発明の方式を使用して書かれることができる。それぞれのポーズは、上述されたように、ｐ−ｒ−ｍ−ｉ−ｔの順序の５つの文字によって表される。図３は、複数のポーズを示しており、ここでは図示および例示のため、いくつかのポーズについて述べられている。平らに、また地面に対して平行に保たれた手は、「１１１１１」で表される。こぶしは、「＾＾＾＾＞」で表される。「ＯＫ」サインは、「１１１＾＞」で表される。 Using these individual finger and thumb depictions, a substantial number of hand poses can be defined and written using the scheme of the present invention. Each pose is represented by five characters in the order p-r-m-t as described above. FIG. 3 illustrates a plurality of poses, where several poses are described for purposes of illustration and illustration. A hand held flat and parallel to the ground is represented by “11111”. A fist is represented by “^^^^>”. The “OK” sign is represented by “111 ^>”.

文字列は、示唆的な文字を使用する場合、簡単明瞭に「人間が読むことができる」機会を提供する。それぞれの自由度を示す可能な文字列のセットは、迅速な認識および明らかな類似を考慮して一般に選択され得る。たとえば、縦線（「｜」）は恐らく、結合要素が「まっすぐ」であることを意味し、Ｌ字（「Ｌ」）は、９０度の曲げを意味することがあり、山記号（"＾"）は、急な曲げを示し得る。上述されたように、任意の文字または符号化が、要望に応じて使用され得る。 Strings provide an easy and clear “human readable” opportunity when using suggestive characters. A set of possible strings indicating the respective degrees of freedom can generally be selected in view of quick recognition and obvious similarities. For example, a vertical line (“|”) probably means that the connecting element is “straight”, and an L-shape (“L”) can mean a 90 degree bend, and a mountain symbol (“^” ") May indicate a sharp bend. As described above, any character or encoding can be used as desired.

本明細書で述べられるようなジェスチャ語彙例を使用するどんなシステムもが、列の比較の高い計算効率の利益を享受し、指定されたいずれかのポーズの識別または探索は事実上、所望のポーズ列と瞬間的な実際の列との間の「列の比較」（たとえばＵＮＩＸ（登録商標）の「ｓｔｒｃｍｐ（）」関数）になる。さらに、「ワイルドカード文字」の使用は、プログラマまたはシステム設計者に、追加の、よく知られている効率および効力を提供し、その瞬間的な状態が一致には無関係である自由度は、疑問符（「？」）として指定されることができ、追加のワイルドカードの意味が割り当てられ得る。 Any system that uses the example gesture vocabulary as described herein will benefit from the high computational efficiency of column comparison, and any specified pose identification or search is effectively the desired pose. It becomes a “column comparison” between the column and the instantaneous actual column (eg, UNIX® “strcmp ()” function). In addition, the use of “wildcard characters” provides programmers or system designers with additional, well-known efficiency and effectiveness, with the degree of freedom that their instantaneous state is unrelated to matching is a question mark. ("?") And can be assigned additional wildcard meanings.

向き
指および親指のポーズに加えて、手の向きは、情報を表すことができる。グローバル空間の向きを表す文字が、透過的に選択されることもでき、文字「＜」、「＞」、「＾」および「ｖ」は、向きの文字の位置で生じるとき、左、右、上および下の考えを示すために使用され得る。図４は、手の向きの記述子、およびポーズと向きを組み合わせる符号化の例を示している。本発明の一実施形態では、２つの文字位置は、まず手のひらの方向を、次いで（指の実際の曲げに関係なく、それらがまっすぐであれば）指の方向を示している
。これらの２つの位置について可能な文字は、向きの「体中心の」概念を表現し、「−」、「＋」、「ｘ」、「＊」、「＾」および「ｖ」は、内側、横方、前方（体から離れて前方に）、後方（体から離れて後方に）、頭方（上向き）および尾方である（下向き）を表現する。 Orientation In addition to finger and thumb poses, hand orientation can represent information. Characters representing the orientation of the global space can also be selected transparently, and the characters “<”, “>”, “^” and “v” are left, right, Can be used to show the top and bottom ideas. FIG. 4 shows an example of hand orientation descriptors and encoding that combines pose and orientation. In one embodiment of the invention, the two character positions first indicate the direction of the palm and then the direction of the finger (if they are straight, regardless of the actual bending of the finger). The possible letters for these two positions represent the “body-centric” concept of orientation, where “−”, “+”, “x”, “*”, “^” and “v” are inside, The horizontal, forward (away from the body, forward), backward (away from the body, backward), head (upward), and tail (downward) directions are expressed.

本発明の表記方式およびその実施形態では、文字を示す５つの指のポーズの後に、コロン、次いで完全なコマンドポーズを定義するための２つの向きの文字が続く。一実施形態では、開始位置は、「ｘｙｚ」ポーズと呼ばれ、このポーズでは、親指がまっすぐ上を指し、人差し指が前方を指し、中指が人差し指に垂直で、ポーズが右手で行われるときは左を指す。これは、列「＾＾ｘｌ−：−ｘ」で表される。 In the notation system of the present invention and its embodiments, a five finger pose representing a character is followed by a colon and then two orientation characters to define a complete command pose. In one embodiment, the starting position is referred to as an “xyz” pose, where the thumb points straight up, the index finger points forward, the middle finger is perpendicular to the index finger, and left when the pose is performed with the right hand Point to. This is represented by the sequence “^^ l −: − x”.

「ＸＹＺの手」は、視覚的に提示された３次元構造の完全な６自由度のナビゲーションを可能にするために人間の手の幾何学を利用するための技術である。この技術は、オペレータの手の大量の変換および回転だけに依存し、したがってその指は原則として、所望の任意のポーズで保持され得るが、この実施形態では、人指し指が体から離れて指し、親指が天井の方を指し、中指が左右を指す、静的な構成が好まれる。したがって、３つの指は、（大まかにではあるが、全く明らかな目的により）３空間座標系、したがって「ＸＹＺの手」の互いに直交する軸を示す。 “XYZ Hand” is a technique for utilizing the geometry of the human hand to enable full six-degree-of-freedom navigation of visually presented three-dimensional structures. This technique relies solely on a large amount of transformation and rotation of the operator's hand, so that the finger can in principle be held in any desired pose, but in this embodiment the forefinger points away from the body and the thumb A static configuration is preferred, with the pointing toward the ceiling and the middle finger pointing to the left and right. Thus, the three fingers (for a rough but quite obvious purpose) show the three-space coordinate system and hence the "XYZ Hands" orthogonal axes.

次いで、ＸＹＺの手のナビゲーションは、所定の「中立位置」にオペレータの体の前で保持された、上述のポーズの手、指に進む。３空間オブジェクト（またはカメラ）の３つの変換および３つの回転自由度へのアクセスは、以下の自然なやり方で行われる。（体の自然な座標系に関する）手の左右の動きが、計算のコンテキストのＸ軸に沿った動きをもたらし、手の上下の動きは、制御されたコンテキストのＹ軸に沿った動きをもたらし、（オペレータの体に向かう／そこから離れる）前後の手の動きは、コンテキスト内のｚ軸の動きをもたらす。同様に、人差し指のまわりのオペレータの手の回転は、計算コンテキストの向きの「回転」変更につながり、「ピッチ」および「偏揺れ」の変化は、それぞれ中指および親指のまわりのオペレータの手の回転によって同じように行われる。 The XYZ hand navigation then proceeds to the hand and fingers of the pose described above, held in front of the operator's body in a predetermined “neutral position”. Access to three transformations and three rotational degrees of freedom of a three-space object (or camera) is done in the following natural way: Left and right hand movements (relative to the body's natural coordinate system) result in movement along the X axis of the computational context, and vertical movements of the hand result in movement along the Y axis of the controlled context, Front and back hand movements (toward and away from the operator's body) result in z-axis movement in context. Similarly, the rotation of the operator's hand around the index finger leads to a “rotation” change in the orientation of the calculation context, and the changes in “pitch” and “sway” change the operator's hand around the middle finger and thumb, respectively. Is done in the same way.

本明細書では「計算コンテキスト」は、ＸＹＺ手の方法によって制御されるエンティティに言及するために使用されており、合成３空間オブジェクトまたはカメラのいずれかを示唆しているように見えるが、この技術は、実環境のオブジェクト、様々な自由度の制御、たとえば適切な回転アクチュエータを備えたビデオまたは動画カメラのパン／チルト／回転の制御に等しく有用であることを理解されたい。さらに、ＸＹＺ手のポーズによって与えられた物理自由度はいくらか、仮想領域内でも、それほど文字通りにはマッピングされないことがある。この実施形態では、ＸＹＺ手は、大きいパノラマディスプレイ画像へのナビゲーションアクセスを提供するためにも使用され、その結果、オペレータの手の左右および上下の動きは、画像のまわりの予想された左右または上下の「パニング」をもたらすが、オペレータの手の前後ろの動きは、「ズーミング」制御にマッピングする。 As used herein, “computational context” is used to refer to entities controlled by XYZ hand methods, and appears to suggest either a composite three-space object or a camera, but this technique Should be understood to be equally useful for real-world objects, control of various degrees of freedom, for example control of pan / tilt / rotation of video or video cameras with appropriate rotational actuators. Furthermore, some of the physical degrees of freedom given by the XYZ hand poses may not be mapped so literally within the virtual region. In this embodiment, the XYZ hand is also used to provide navigation access to a large panoramic display image, so that the left and right and up and down movements of the operator's hand are expected left and right or up and down around the image. "Panning" of the operator, but the front and back movements of the operator's hand map to the "zooming" control.

あらゆる場合において、手の動きと、引き起こされた計算上の変換／回転との間の結合は、直接的である（すなわち、オペレータの手の位置または回転オフセットが、何らかの線形または非線形関数を介して、計算コンテキストにおいてオブジェクトまたはカメラの位置または回転オフセットに１対１にマッピングする）ことも、または間接的である（すなわち、オペレータの手の位置または回転オフセットが、何らかの線形または非線形関数を介して、計算コンテキストにおいて位置／向きの第１のまたはより高次の派生物に１対１にマッピングする）こともあり、次いで、進行中の統合は、計算コンテキストの実際のゼロ次位置／向きの非静的な変化を生じさせる。この後者の制御手段は、自動車の「アクセスペダル」の使用に類似するものであり、このアクセスペダルでは、ペダルの一定のオフセットによって、多かれ少なかれ、一定の車両速度がもたらされる。 In all cases, the coupling between hand movement and the resulting computational transformation / rotation is straightforward (ie, the operator's hand position or rotational offset is via some linear or non-linear function) Mapping one-to-one to the position or rotational offset of the object or camera in the computational context, or indirect (ie, the position or rotational offset of the operator's hand is via some linear or non-linear function, Mapping in the calculation context may be a one-to-one mapping to the first or higher order derivative of the position / orientation), and then the ongoing integration Change. This latter control means is similar to the use of an automobile “access pedal”, where a constant offset of the pedal results in a more or less constant vehicle speed.

実環境のＸＹＺ手のローカル６自由度座標の原点の働きをする「中立位置」は、（１）（たとえば囲んでいる部屋に対して）空間内の絶対位置および向きとして、（２）オペレータの全体的位置および「方向」に関係なく、オペレータ自身に対して固定の位置および向きとして（たとえば、体の前の２０．３２ｃｍ（８インチ）、あごの下の２５．４ｃｍ（１０インチ）、および肩の面に沿って横方向になど）、または（３）（たとえば、ＸＹＺ手の現在位置および向きがこれ以降、変換および回転の原点として使用されるべきであることを示す、オペレータの「もう一方」の手によって実行されるジェスチャコマンドを使用して）オペレータの故意の２次動作を介して相互作用的に確立され得る。 The “neutral position”, which serves as the origin of the local 6-degree-of-freedom coordinates of the XYZ hand in the real environment, is (1) the absolute position and orientation in space (for example relative to the surrounding room) Regardless of the overall position and “direction”, as a fixed position and orientation with respect to the operator himself (eg, 20.32 cm (8 inches) in front of the body, 25.4 cm (10 inches) under the chin), and Such as laterally along the shoulder surface), or (3) (eg, the operator's “other” indicating that the current position and orientation of the XYZ hand should be used as the origin of transformation and rotation from now on) It can be established interactively through the intentional secondary action of the operator (using gesture commands executed by one hand).

ＸＹＺ手の中立位置に関する「移動止め」領域（または「デッドゾーン」）を、このボリューム内の動きが、制御されたコンテキスト内の動きにマッピングされないように提供することはさらに好都合である。 It is further advantageous to provide a “detent” region (or “dead zone”) for the neutral position of the XYZ hand so that movements in this volume are not mapped to movements in the controlled context.

他のポーズには、以下が含まれる。
［｜｜｜｜｜：ＶＸ］は、手のひらが下に向き、指が前に向いた平手（親指が指に平行）である。
［｜｜｜｜｜：Ｘ＾］は、手のひらが前に向き、指が天井を向いた平手である。
［｜｜｜｜｜：−Ｘ］は、手のひらを体中心（左手の場合は右、右手の場合は左）に向き、指が前に向いた平手である。
［＾＾＾＾−：−Ｘ］は、単一の手による立てた親指である（親指が天井を指している）。
［＾＾＾｜−：−Ｘ］は、前に向けた銃の手振りである。 Other poses include:
[||||: VX] is a palm with the palm facing downward and the finger facing forward (thumb is parallel to the finger).
[||||: X ^] is a palm with the palm facing forward and the finger facing the ceiling.
[||| ::-X] is a palm with the palm facing the center of the body (right for the left hand, left for the right hand) and the finger facing forward.
[^^^^-:-X] is an upright thumb with a single hand (thumb points to the ceiling).
[^^^ |-:-X] is a forward gun gesture.

両手の組合せ
本発明は、単一の手のコマンド、ポーズ、ならびに両手を用いたコマンドおよびポーズを企図している。図５は、本発明の一実施形態の両手の組合せの例、および関連する表記を示している。第１の例の表記をよく見ると、「完全な停止」は、それが閉じた２つのこぶしを含むことを明らかにしている。「スナップショット」の例では、それぞれの手の親指および人差し指が伸びており、親指が互いの方を向き、ゴールポストの形のフレームを定義している。「ラダーおよびスロットルの開始位置」は、指および親指が上を向き、手のひらが画面に面している。 Combination of Both Hands The present invention contemplates a single hand command, pose, and command and pose using both hands. FIG. 5 shows an example of a combination of both hands and an associated notation according to an embodiment of the present invention. A closer look at the notation in the first example reveals that “complete stop” includes two closed fists. In the “snapshot” example, the thumb and index finger of each hand are stretched, the thumbs point towards each other, defining a frame in the form of a goal post. The “ladder and throttle start positions” are such that fingers and thumbs face up and the palm faces the screen.

向きの混合
図６は、本発明の一実施形態の向き混合の一例を示している。示された例では、混合は、指ポーズの列の後に、向き表記の対を括弧内に囲むことにより表されている。たとえば、第１のコマンドは、すべてがまっすぐ向いた指位置を示している。向きコマンドの第１の対によって、手のひらがディスプレイに向かって平らとなり、第２の対は、画面に向かって４５度のピッチに回転する手を有する。この例では、混合の対が示されているが、本発明では、任意の数の混合が企図されている。 Orientation Mixing FIG. 6 shows an example of orientation mixing in one embodiment of the present invention. In the example shown, blending is represented by enclosing pairs of orientation notations in parentheses after the finger pose row. For example, the first command shows the finger positions all pointing straight. The first pair of orientation commands causes the palm to flatten toward the display, and the second pair has hands rotating at a 45 degree pitch toward the screen. In this example, pairs of blends are shown, but any number of blends is contemplated by the present invention.

例示的なコマンド
図８は、本発明で使用され得る複数の可能なコマンドを示している。ここでの議論の一部は、ディスプレイ上のカーソル制御に関してであったが、本発明は、その活動に限定されない。実際、本発明は、画面上のありとあらゆるデータ、およびデータの一部、ならびにディスプレイの状態を操作することにおいて大きい応用性を有する。たとえば、コマンドは、ビデオメディアの再生中にビデオコントロールに取って代わるために使用され得る。コマンドは、休止、早送り、巻戻しなどのために使用され得る。さらに、コマンドは、画像のズームインおよびズームアウト、画像の向きの変更、任意の方向へのパニングなどのために実施され得る。本発明は、開く、閉じる、保存するなどのメニューコマンドの代
わりに使用されることもできる。換言すると、想像され得るどんなコマンドまたは活動もが、手ジェスチャと共に実施されることができる。 Exemplary Commands FIG. 8 illustrates a number of possible commands that may be used with the present invention. Part of the discussion here has been with cursor control on the display, but the invention is not limited to that activity. Indeed, the present invention has great applicability in manipulating any and all data on the screen, and portions of data, as well as the state of the display. For example, commands can be used to replace video controls during playback of video media. Commands can be used for pause, fast forward, rewind, etc. Further, the commands may be implemented for zooming in and out of the image, changing the orientation of the image, panning in any direction, and the like. The present invention can also be used in place of menu commands such as open, close and save. In other words, any command or activity that can be imagined can be performed with a hand gesture.

操作
図７は、一実施形態の本発明の操作を示すフローチャートである。ステップ７０１で、検出システムは、マーカおよびタグを検出する。判断ブロック７０２で、タグおよびマーカが検出されたかどうか判断される。そうでない場合は、システムは、ステップ７０１に戻る。ステップ７０２でタグおよびマーカが検出される場合は、システムは、ステップ７０３に進む。ステップ７０３で、システムは、検出されたタグおよびマーカから、手、指およびポーズを識別する。ステップ７０４で、システムは、ポーズの向きを識別する。ステップ７０５で、システムは、検出された１つまたは複数の手の３次元空間位置を識別する。（ステップ７０３、７０４および７０５のいずれかまたはすべてが、単一のステップとして組み合わされ得ることに留意されたい。）
ステップ７０６で、情報は、上述のジェスチャ表記に変換される。判断ブロック７０７で、ポーズが有効であるかどうか判断される。これは、生成された表記列を使用して、単純な列の比較によって遂行され得る。ポーズが有効ではない場合は、このシステムは、ステップ７０１に戻る。ポーズが有効な場合は、このシステムは、ステップ７０８で、表記および位置情報データをコンピュータに送信する。ステップ７０９で、コンピュータは、ジェスチャに応答して適切な措置を決定し、ステップ７１０で、それに従ってディスプレイを更新する。 Operation FIG. 7 is a flow chart illustrating the operation of the present invention of one embodiment. In step 701, the detection system detects markers and tags. At decision block 702, it is determined whether a tag and marker have been detected. Otherwise, the system returns to step 701. If a tag and marker are detected at step 702, the system proceeds to step 703. In step 703, the system identifies hands, fingers and poses from the detected tags and markers. In step 704, the system identifies the orientation of the pose. In step 705, the system identifies the detected three-dimensional spatial position of the one or more hands. (Note that any or all of steps 703, 704, and 705 may be combined as a single step.)
At step 706, the information is converted to the gesture notation described above. At decision block 707, it is determined whether the pause is valid. This can be accomplished by a simple column comparison using the generated notation sequence. If the pause is not valid, the system returns to step 701. If the pause is valid, the system sends the notation and location information data to the computer at step 708. In step 709, the computer determines the appropriate action in response to the gesture and in step 710 updates the display accordingly.

本発明の一実施形態では、ステップ７０１〜７０５は、カメラ上のプロセッサによって遂行される。他の実施形態では、処理は、所望であれば、システムコンピュータによって遂行されることができる。 In one embodiment of the present invention, steps 701-705 are performed by a processor on the camera. In other embodiments, the processing can be performed by a system computer, if desired.

解析および変換
このシステムは、基礎をなすシステムによって回復された低レベルのジェスチャの流れを「解析し」、「変換し」、解析され変換されたそれらのジェスチャを、幅広いコンピュータアプリケーションおよびシステムの制御のために使用されることができるコマンドの流れまたはイベントデータに変え得る。これらの技術およびアルゴリズムは、これらの技術を実装するエンジンと、エンジンの能力を利用するコンピュータアプリケーションを構築するためのプラットフォームを提供するコンピュータコードからなるシステムで実施され得る。 Analysis and Transformation This system “analyzes” and “transforms” the low-level gesture flow recovered by the underlying system and uses those analyzed and transformed gestures to control a wide range of computer applications and systems. Can be turned into command flow or event data that can be used to These techniques and algorithms may be implemented in a system consisting of an engine that implements these techniques and computer code that provides a platform for building computer applications that utilize the capabilities of the engine.

一実施形態は、コンピュータインタフェースにおいて人間の手の豊富なジェスチャの使用を可能にすることに焦点が当てられているが、（それだけに限らないが、腕、胴、脚および頭を含めて）体の他の部分、ならびにそれだけに限らないがカリパス、コンパス、柔軟な曲線近似器（ａｐｐｒｏｘｉｍａｔｏｒ）および様々な形のポインティングデバイスを含めて、様々な種類の、静的と有節の（ａｒｔｉｃｕｌａｔｉｎｇ）両方の、手によらない物理的ツールによって行われるジェスチャを認識することもできる。マーカおよびタグは、要望に応じてオペレータによって運ばれ使用され得るアイテムおよびツールに適用されてもよい。 One embodiment is focused on enabling the use of abundant gestures of the human hand in a computer interface, including but not limited to the body, torso, legs and head. Various types of both static and articulating hands, including, but not limited to, calipers, compass, flexible curve approximators and various forms of pointing devices It is also possible to recognize gestures made by physical tools that do not depend on. Markers and tags may be applied to items and tools that can be carried and used by an operator as desired.

本明細書で述べられるシステムは、認識され、作用を受け得る幅広いジェスチャの範囲が豊富であるジェスチャシステムを構築し、それと同時にアプリケーションへの容易な統合をすることを可能にする複数の革新を組み込む。 The system described herein incorporates multiple innovations that allow to build gesture systems that are rich in a wide range of recognized and actable gestures, while at the same time allowing easy integration into applications .

一実施形態のジェスチャ解析および変換システムは、以下からなる。
１）複数の異なる集合体レベルのジェスチャ、すなわち
ａ．単一の手の「ポーズ」（互いに関連のある手の部分の構成および向き）、単一の手の
向きおよび３次元空間内の位置、
ｂ．ポーズ、位置または両方を考慮に入れるいずれかの手についての両手使いの組合せ、ｃ．複数の人の組合せ；このシステムは、３つ以上の手を追跡することができ、したがって、複数の人が協力して（またはゲームアプリケーションの場合には競合して）ターゲットシステムをコントロールすることができる、
ｄ．ポーズが連続して組み合わされる連続したジェスチャ；これらは「アニメーション化」ジェスチャを呼ばれる、
ｅ．オペレータが空間の形状を追跡する「書記素」ジェスチャ、
を指定する（コンピュータプログラムで使用するために符号化する）ための簡潔で効率的なやり方。
２）所与のアプリケーションコンテキストに関連する上述の各カテゴリからの特定のジェスチャを登録するためのプログラマチック技術。
３）登録されたジェスチャが識別されることができ、それらのジェスチャをカプセル化するイベントが適切なアプリケーションコンテキストに引き渡されることができるように、ジェスチャストリームを解析するためのアルゴリズム。 The gesture analysis and conversion system of one embodiment consists of the following.
1) A plurality of different aggregate level gestures: a. A single hand “pose” (composition and orientation of hand parts relative to each other), a single hand orientation and position in 3D space;
b. A two-handed combination for either hand that takes into account pose, position or both, c. Multiple people combination; this system can track more than two hands, so multiple people can work together (or compete in the case of gaming applications) to control the target system it can,
d. Continuous gestures in which poses are combined sequentially; these are called "animated" gestures,
e. “Grapheme” gesture, where the operator tracks the shape of the space,
A concise and efficient way to specify (encode for use in a computer program).
2) Programmatic techniques for registering specific gestures from each of the above categories associated with a given application context.
3) An algorithm for parsing the gesture stream so that registered gestures can be identified and events encapsulating those gestures can be passed to the appropriate application context.

構成要素（１ａ）から（１ｆ）を備えた指定システム（１）は、本明細書で述べられたシステムのジェスチャ解析および変換能力を利用するための基礎を提供する。
単一の手の「ポーズ」は、
ｉ）指と手の甲の間の相対的向きの列として表され、
ｉｉ）少数の個別の状態へと量子化される。 The designation system (1) with components (1a) through (1f) provides the basis for utilizing the gesture analysis and transformation capabilities of the system described herein.
A single hand “pose”
i) represented as a sequence of relative orientations between the fingers and the back of the hand,
ii) Quantized to a small number of individual states.

相対的な結合の向きを使用することによって、本明細書で述べられたシステムは、様々な手のサイズおよび幾何学的形状に付随する問題を回避することができる。このシステムでは、「オペレータ較正」は必要でない。さらに、ポーズを相対的な向きの列または群として指定することによって、より複雑なジェスチャ指定が、ポーズ表現をさらなるフィルタおよび指定と組み合わせることによって容易に作成されることができる。 By using relative bond orientations, the system described herein can avoid the problems associated with various hand sizes and geometries. In this system, “operator calibration” is not necessary. Furthermore, by specifying poses as relative orientation columns or groups, more complex gesture specifications can be easily created by combining pose expressions with additional filters and specifications.

ポーズ指定について少数の個別の状態を使用すると、ポーズの簡潔な指定、ならびに基礎となる様々な追跡技術（たとえばカメラを使用する受動光追跡、点灯ドットおよびカメラを使用する能動光追跡、電磁界追跡など）を使用する正確なポーズ認識の保証が可能となる。 Using a small number of individual states for pose designation, concise designation of poses, as well as various underlying tracking techniques (eg passive light tracking using cameras, active light tracking using illuminated dots and cameras, electromagnetic field tracking) Etc.) can be used to guarantee accurate pose recognition.

あらゆるカテゴリ（１ａ）から（１ｆ）内のジェスチャは、非重大なデータが無視されるように、部分的（または最小限に）指定され得る。たとえば、２つの指の位置が決定的であり、他の指の位置が重要でないジェスチャは、単一の指定によって表されることができ、この単一指定では、関連する２つの指の操作位置が示され、同じ列内に、「ワイルドカード」または総括的な「これらを無視する」の表示が、他の指についてリストされる。 Gestures within any category (1a) to (1f) can be partially (or minimally) specified so that non-critical data is ignored. For example, a gesture in which the position of two fingers is critical and the position of the other fingers is not important can be represented by a single designation, where the associated two-finger operating position In the same column, a “wild card” or generic “ignore these” indication is listed for the other fingers.

それだけに限らないが、多層の指定技術、相対的向きの使用、データ量子化、およびあらゆるレベルの部分または最小の指定に対する許容を含めて、ジェスチャ認識について本明細書で述べられた革新はすべて、手のジェスチャの仕様を超えて、体の他の部分および「製造された」ツールおよびオブジェクトを使用するジェスチャの指定にまで一般化されるものである。 All of the innovations described herein for gesture recognition, including but not limited to multilayer specification techniques, the use of relative orientation, data quantization, and tolerance for any level of partial or minimal specification are all manual. Beyond the gesture specification, it is generalized to the specification of gestures using other parts of the body and “manufactured” tools and objects.

「ジェスチャ登録」のためのプログラマチック技術（２）は、定義された１組のアプリケーションプログラミングインターフェース呼出しからなり、この呼出しによって、プログラマは、実行されているシステムの他の部分に対してエンジンがどのジェスチャを使用可能にすべきかを定義することができる。 The programmatic technique (2) for “gesture registration” consists of a defined set of application programming interface calls that allow the programmer to determine which engine is responsible for the rest of the system being executed. It can be defined whether a gesture should be enabled.

これらのＡＰＩルーチンは、アプリケーションのセットアップ時に使用され、実行アプリケーションの寿命の間を通じて使用される静的なインターフェース定義を作成し得る。それらは、実行の間に使用され、インターフェース特性が進行中に変化することを可能にすることもできる。インターフェースのこのリアルタイム変更は、
ｉ）コンテキスト上の、条件付きの複雑な制御状態を構築すること、
ｉｉ）制御環境にヒステリシスを動的に追加すること、および
ｉｉｉ）ユーザが実行システム自体のインターフェース語彙を変更しまたは拡張することができるアプリケーションを作成することを可能にする。 These API routines are used during application setup and can create static interface definitions that are used throughout the lifetime of the running application. They can also be used during execution to allow interface characteristics to change on the fly. This real-time change of the interface
i) building a complex, conditional control state on the context;
ii) dynamically add hysteresis to the control environment, and iii) allow users to create applications that can change or extend the interface vocabulary of the execution system itself.

ジェスチャの流れを解析するためのアルゴリズム（３）は、（１）にあるように指定され、（２）にあるように登録されたジェスチャを、入ってくる低レベルのジェスチャデータと比較する。登録されたジェスチャについて一致が認識される場合、一致したジェスチャを表すイベント情報は、スタックを上へと実行アプリケーションに引き渡される。 Algorithm (3) for analyzing the flow of gestures compares the gestures specified as in (1) and registered as in (2) with incoming low-level gesture data. When a match is recognized for a registered gesture, event information representing the matched gesture is passed up the stack to the executing application.

このシステムの設計では、効率的なリアルタイムのマッチングが望まれ、指定されたジェスチャが、できるだけ迅速に処理される可能性の木として扱われる。
さらに、指定されたジェスチャを認識するために内部で使用される原始的な比較演算子もまた、アプリケーションプログラマーが使用するために公開され、その結果、さらなる比較（たとえば複雑なまたは複合のジェスチャにおける柔軟な状態の検査）が、アプリケーションコンテキストの内部からでさえ生じ得る。 In the design of this system, efficient real-time matching is desired, and the specified gesture is treated as a tree that can be processed as quickly as possible.
In addition, primitive comparison operators that are used internally to recognize specified gestures are also exposed for use by application programmers, resulting in further comparison (eg, flexibility in complex or complex gestures). State check) can occur even from within the application context.

認識「ロッキング」の意味論は、本明細書で述べられたシステムの革新である。これらの意味論は、登録ＡＰＩ（２）によって示唆される（またそれほどではないにせよ、指定語彙（１）内に埋め込まれる）。登録ＡＰＩ呼出しには、
ｉ）「エントリ」状態通知子（ｎｏｔｉｆｉｅｒ）、および「継続」状態通知子と、
ｉｉ）ジェスチャ優先度指定子と、が含まれる。 The semantics of cognitive “locking” is the innovation of the system described herein. These semantics are suggested by the registration API (2) (and to a lesser extent, embedded within the specified vocabulary (1)). For registration API calls:
i) an “entry” status notifier and a “continue” status notifier;
ii) a gesture priority specifier.

−ジェスチャが認識されている場合、その「継続」条件は、同じまたはより低い優先度のジェスチャについてのすべての「エントリ」条件に優先する。エントリと継続状態の区別によって、知覚されるシステム有用性が著しく増す。 If a gesture is recognized, its “continuation” condition takes precedence over all “entry” conditions for the same or lower priority gestures. The distinction between entry and continuation state significantly increases the perceived system utility.

本明細書で述べられたシステムは、実環境データエラーおよび不確実なことに直面しても堅牢に動作するためのアルゴリズムを含む。低レベルの追跡システムからのデータは、（光追跡におけるマーカの閉塞、ネットワークの脱落または処理遅れなどを含めて様々な理由により）不完全なことがある。 The system described herein includes an algorithm for operating robustly in the face of real-world data errors and uncertainties. Data from low-level tracking systems may be incomplete (for a variety of reasons, including marker blockage in optical tracking, network loss or processing delays).

欠落データは、解析システムによってマーク付けされ、欠落データの量およびコンテキストに応じて、「最後に知られる」または「最も可能性の高い」状態へと補間される。
特定のジェスチャ構成要素（たとえば特定の接合箇所の向き）に関するデータが欠落しているが、その特定の構成要素の「最後に知られる」状態が、物理的に可能と解析され得る場合、このシステムはリアルタイムマッチングにおいて、この最後に知られる状態を使用する。 Missing data is marked by the analysis system and interpolated to a “last known” or “most likely” state, depending on the amount and context of the missing data.
If the data for a particular gesture component (eg, a particular joint orientation) is missing but the “last known” state of that particular component can be analyzed as physically possible, the system Uses this last known state in real-time matching.

それとは異なり、「最後に知られる」状態が、物理的に不可能と解析される場合、このシステムは、その構成要素について「最良の推測範囲」に戻り、リアルタイムマッチングにおいてこの合成データを使用する。 In contrast, if the “last known” state is analyzed as physically impossible, the system will return to the “best guess range” for that component and use this composite data in real-time matching. .

本明細書で述べられた指定および解析システムは、複数の手によるジェスチャについて、いずれかの手がポーズ要件を満たすことが許されるように、「利き手不可知論」をサポートするように注意深く設計されている。 The specification and analysis system described herein is carefully designed to support "handedness agnostic" for multiple hand gestures, so that either hand is allowed to meet the pose requirement. .

一致する仮想／表示および物理空間
このシステムは、１つまたは複数の表示装置（「画面」）上に示された仮想空間が、システムの１つまたは複数のオペレータが存在する物理空間と一致するものとして扱われる環境を提供することができる。こうした環境の一実施形態について、ここで述べられる。この現在の実施形態は、固定位置にある、プロジェクタで駆動される３つの画面を含んでおり、単一のデスクトップコンピュータによって駆動され、本明細書で述べられたジェスチャ語彙およびインターフェースシステムを使用して制御される。しかし、任意の数の画面が、述べられる技術によってサポートされ、それらの画面が（固定されているのではなく）移動するものとすることができ、画面が、多くの独立したコンピュータによって同時に駆動されることができ、システム全体は、任意の入力装置または技術によって制御され得ることに留意されたい。 Matching virtual / display and physical space This system is such that the virtual space shown on one or more display devices (“screens”) matches the physical space where one or more operators of the system reside. Can be provided as an environment. One embodiment of such an environment is described herein. This current embodiment includes three projector-driven screens in a fixed position, driven by a single desktop computer, using the gesture vocabulary and interface system described herein. Be controlled. However, any number of screens can be supported by the described technology and they can be moved (rather than fixed) and the screens are driven simultaneously by many independent computers. Note that the entire system can be controlled by any input device or technique.

本開示で述べられたインターフェースシステムは、物理空間内の画面の寸法、向きおよび位置を決定する手段を有するべきである。この情報が与えられると、システムは、投影として、これらの画面が位置する（またシステムのオペレータが存在する）物理空間を、システム上で実行されるコンピュータアプリケーションの仮想空間への動的にマッピングすることができる。この自動マッピングの一環として、このシステムは、システムによってホストされるアプリケーションの必要性に応じて、２つの空間の尺度、角度、奥行き、次元および他の空間特性をも様々なやり方で変換する。 The interface system described in this disclosure should have means for determining the size, orientation and position of the screen in physical space. Given this information, the system dynamically maps, as projections, the physical space in which these screens are located (and the presence of the system operator) to the virtual space of a computer application running on the system. be able to. As part of this automatic mapping, the system also converts two spatial measures, angles, depths, dimensions, and other spatial characteristics in various ways, depending on the needs of the application hosted by the system.

物理と仮想空間の間のこの連続的な変換によって、既存アプリケーションプラットフォーム上で達成するのが難しく、あるいは既存プラットフォーム上で実行される各アプリケーションについて断片的に実施されなければならない複数のインターフェース技術の一貫した浸透的な使用が可能となる。これらの技術は、（それだけに限定されないが）以下を含む。
１）浸透的で自然なインターフェース技術として、ジェスチャインタフェース環境内で手を使用し、あるいは物理ポインティングツールまたはデバイスを使用する、リテラルポインティングの使用。
２）画面の動きまたは再位置決めのための自動補償。
３）たとえば奥行き知覚の向上のために視差シフトをシミュレートする、オペレータの位置によって変化するグラフィックスレンダリング。
４）実環境の位置、向き、状態などを考慮に入れた、画面表示内の物理オブジェクトの包含。たとえば、大きい半透明の画面の前に立つオペレータは、アプリケーショングラフィックスと、画面の背後のスケールモデルの真の位置の表現の両方を見ることがある（また恐らく移動し、または向きを変更している）。 This continuous conversion between physical and virtual space makes it difficult to achieve on existing application platforms, or the consistency of multiple interface technologies that must be implemented piecewise for each application running on existing platforms Permeable use. These techniques include (but are not limited to):
1) The use of literal pointing as a penetrating and natural interface technology, using the hand in a gesture interface environment or using a physical pointing tool or device.
2) Automatic compensation for screen movement or repositioning.
3) Graphics rendering depending on operator position, simulating parallax shift, for example to improve depth perception.
4) Inclusion of physical objects in the screen display taking into account the position, orientation, state, etc. of the real environment. For example, an operator standing in front of a large translucent screen may see both application graphics and a representation of the true position of the scale model behind the screen (and maybe move or change orientation). )

リテラルポインティングが、マウスベースのウィンドウ処理インターフェースおよび他のほとんどの同時代のシステム内で使用される抽象ポインティングとは異なることに留意されたい。それらのシステムでは、オペレータは、仮想ポインタと物理ポインティングデバイスとの間の変換を管理することを学ばなければならず、認識してこの２つの間でマッピングしなければならない。 Note that literal pointing is different from abstract pointing used in mouse-based windowing interfaces and most other contemporary systems. In those systems, the operator must learn to manage the conversion between the virtual pointer and the physical pointing device and must recognize and map between the two.

それとは異なり、本開示で述べられたシステムでは、アプリケーションまたはユーザの観点から、（仮想空間が数学的な操作により適していることを除いて）仮想空間と物理空間の差はなく、したがって、オペレータに認識の変換が求められない。 In contrast, in the system described in this disclosure, from an application or user perspective, there is no difference between virtual space and physical space (except that virtual space is more suitable for mathematical manipulation), and therefore the operator No recognition conversion is required.

ここで述べられた実施形態によって提供されたリテラルポインティングの最も近い類似物は、（たとえば多くのＡＴＭマシンで見られるような）タッチセンシティブ画面である。タッチセンシティブ画面は、画面上の２次元の表示空間と、画面表面の２次元入力空間
との間の１対１のマッピングを提供する。類似したやり方で、ここで述べられたシステムは、１つまたは複数の画面に表示された仮想空間と、オペレータが存在する物理空間との間の（必ずしもではないが、恐らくは１対１の）柔軟なマッピングを提供する。この類似物は有用であるが、この「マッピング手法」を３次元、任意に大きいアーキテクチュア環境、および複数画面に拡張することが重要であることは、理解するに値する。 The closest analog of literal pointing provided by the embodiments described herein is a touch-sensitive screen (eg, as found on many ATM machines). Touch-sensitive screens provide a one-to-one mapping between a two-dimensional display space on the screen and a two-dimensional input space on the screen surface. In a similar manner, the system described herein is flexible (although not necessarily one-to-one) between the virtual space displayed on one or more screens and the physical space in which the operator resides. Provide a simple mapping. While this analog is useful, it is worth understanding that it is important to extend this “mapping approach” to three dimensions, arbitrarily large architecture environments, and multiple screens.

ここに述べられた構成要素に加えて、このシステムは、環境の物理空間と各画面上の表示空間の間の連続したシステムレベルのマッピング（恐らく回転、変換、スケーリングまたは他の幾何学的変換によって修正される）を実施するアルゴリズムを実装することもできる。 In addition to the components described here, this system is also capable of continuous system level mapping (possibly by rotation, transformation, scaling or other geometric transformations) between the physical space of the environment and the display space on each screen. It is also possible to implement an algorithm that implements

計算オブジェクトおよびマッピングを取り、仮想空間のグラフィック表現を出力するレンダリングスタック。
制御システムからイベントデータ（現在の実施形態では、システムおよびマウス入力からのジェスチャデータとポインティングデータの両方）を取り、入力イベントからの空間データを、仮想空間内の座標にマッピングする入力イベント処理スタック。次いで、変換されたイベントは、実行アプリケーションに引き渡される。 A rendering stack that takes computational objects and mappings and outputs a graphical representation of virtual space.
An input event processing stack that takes event data (both gesture data and pointing data from the system and mouse input in the current embodiment) from the control system and maps the spatial data from the input event to coordinates in virtual space. The converted event is then passed to the executing application.

ローカルエリアネットワーク上で複数のコンピュータにわたって実行されるアプリケーションをシステムがホストすることを可能にする「接着層」。
このように、ジェスチャベースの制御システムについて述べられた。 An “adhesive layer” that allows the system to host applications that run across multiple computers on a local area network.
Thus, a gesture-based control system has been described.

本発明のシステムの一実施形態を示す図である。It is a figure which shows one Embodiment of the system of this invention. 本発明のマーキングタグの一実施形態を示す図である。It is a figure which shows one Embodiment of the marking tag of this invention. 本発明の一実施形態のジェスチャ語彙のポーズを示す図である。It is a figure which shows the pose of the gesture vocabulary of one Embodiment of this invention. 本発明の一実施形態のジェスチャ語彙の向きを示す図である。It is a figure which shows the direction of the gesture vocabulary of one Embodiment of this invention. 本発明の一実施形態のジェスチャ語彙の両手の組合せを示す図である。It is a figure which shows the combination of both hands of the gesture vocabulary of one Embodiment of this invention. 本発明の一実施形態のジェスチャ語彙の向きの混合を示す図である。It is a figure which shows mixing of the direction of the gesture vocabulary of one Embodiment of this invention. 本発明のシステムの一実施形態の操作を示すフローチャートである。It is a flowchart which shows operation of one Embodiment of the system of this invention. 本発明の一実施形態のコマンド例を示す図である。It is a figure which shows the example of a command of one Embodiment of this invention.

Claims

Automatically detecting body gestures from gesture data received via an optical detector, wherein the gesture data is absolute for the instantaneous state of the body in terms of time and space. Position data, receiving three spatial positions of the markers from the body tag, recovering the three spatial positions and orientations for each of the markers, and using the three spatial positions and orientations of the markers without using reference data Identifying the gesture using only, and
The gesture, a step of converting the gesture signal, the method comprising the step of converting converts the information of the gesture to the gesture notation, wherein the gesture notation represents a gesture vocabulary, the oxygenate scan Cha vocabulary, the Represent each gesture using a character string that describes the state of at least one kinematic coupling for the body, and each character of the character string is assigned to each of a plurality of positions in the at least one kinematic coupling. The gesture signal includes transmission of the gesture vocabulary; and
Controlling a component coupled to the computer in response to the gesture signal;
Including methods.

The method of claim 1, wherein the detecting step includes detecting the position of the body.

The method of claim 1, wherein the detecting step includes detecting the orientation of the body.

The method of claim 1, wherein the detecting step includes detecting the movement of the body.

The method of claim 1, wherein the detecting step includes identifying the gesture, wherein the identifying includes identifying a pose and orientation for the body part.

The method of claim 1, wherein the detecting step includes dynamically detecting a position of at least one tag.

The method of claim 6 , wherein the detecting step includes detecting a position of a tag set coupled to the body part.

Each tag in the tag set comprises a pattern, each pattern in each tag of the tag set is different from any pattern in any remaining tag of the plurality of the tags, as claimed in claim 7 Method.

Each tag includes a first pattern and a second pattern, the first pattern is a common to all tags in the tag set, the second pattern, the tag sets at least two The method of claim 8 , wherein the method differs between tags.

The method of claim 7 , wherein the tag set forms a plurality of patterns in the body.

Wherein the step of detecting, using the set of tags coupled to each of a plurality of appendages of the body includes detecting a position of the appendages The method of claim 6.

The first tag set is coupled to a second appendage, tag set the first comprises a first plurality of tags, each tag, the tag in the first plurality of tags The method of claim 11 , comprising: a first pattern common to each tag; and a second pattern that is unique to each tag in the first plurality of tags.

The second tag set is coupled to a second appendage, tag set of the second comprises a second plurality of tags, each tag, the tag in the second plurality of tags The method of claim 12 , comprising: a third pattern common to each tag; and a fourth pattern unique to each tag in the second plurality of tags.

The method of claim 6 , wherein the at least one tag comprises a tag set, the tag set comprising at least one tag type selected from the group consisting of active tags and passive tags.

The method of claim 1, wherein the detecting step comprises dynamically detecting and positioning a marker in the body.

The method of claim 15 , wherein the detecting step includes detecting a position of a marker set coupled to the body part.

The method of claim 16 , wherein the marker set forms a plurality of patterns in the body.

16. The method of claim 15 , wherein the detecting step includes assigning the position of each marker to a subset of markers that form a tag.

The method of claim 1, wherein the gesture vocabulary represents an instantaneous pose state of the kinematic coupling of the body in text form.

The method of claim 1, wherein the gesture vocabulary expresses the kinematic coupling direction of the body in text form.

The method of claim 1, wherein the gesture vocabulary represents a combination of body kinematic coupling orientations in text form.

The kinematic coupling is at least one first appendage of the body The method of claim 1.

Wherein each position of a character string includes the step of assigning a second appendage, appendages the second is coupled to the first appendage The method of claim 22.

Assigning a character to a plurality of respective positions of the second appendage in a plurality of characters, the method according to claim 23.

25. The method of claim 24 , wherein the plurality of positions are defined with respect to a coordinate origin.

Determining a coordinate origin using a position selected from a group consisting of an absolute position and orientation in space, the fixed position and orientation relative to the body being independent of the overall position and orientation of the body, 26. The method of claim 25 , wherein the method responds interactively to body movements.

The characters in a plurality of characters, including a plurality of steps to be assigned to each direction of the first appendage The method of claim 24.

Step of controlling the component, by mapping the first gesture appendages in three-dimensional space object, at the same time, comprising controlling said three-dimensional space object in six degrees of freedom, according to claim 22 the method of.

23. The method of claim 22 , wherein controlling the component comprises controlling a three-dimensional spatial object through three transformation degrees of freedom and three rotational degrees of freedom.

30. The method of claim 29 , comprising controlling movement of the three-dimensional spatial object by mapping a plurality of gestures of the first appendage to a plurality of object transformations of the three-dimensional spatial object.

32. The method of claim 30 , wherein the mapping includes direct mapping between the plurality of gestures and the plurality of object transformations.

32. The method of claim 30 , wherein the mapping includes an indirect mapping between the plurality of gestures and the plurality of object transformations.

32. The method of claim 30 , wherein the mapping includes associating a position offset of the plurality of gestures with a position offset of the object transformation in the three-dimensional space object.

That said mapping, said first position offset appendages includes associating the conversion speed of the object conversion in the three-dimensional spatial objects The method of claim 30.

The detecting step includes detecting when an estimated position of the three-dimensional space object intersects a virtual space, the virtual space comprising a space drawn on a display device coupled to the computer. Item 30. The method according to Item 29 .

36. The method of claim 35 , wherein controlling the component comprises controlling the virtual object in the virtual space when the estimated location intersects with the virtual object.

38. The method of claim 36 , wherein controlling the component comprises controlling the position of the virtual object in the virtual space in response to the estimated position in the virtual space.

38. The method of claim 36 , wherein controlling the component comprises controlling a posture of the virtual object in the virtual space in response to the gesture.

The method of claim 1, comprising identifying the gesture at a plurality of levels.

40. The method of claim 39 , wherein the plurality of levels comprises a first level comprising a pose of a first appendage of the body.

The plurality of levels includes a second level comprising a combination of poses, the combination of poses comprising a first pose of a first appendage and a second pose of a second appendage in the body. 40. The method of claim 39 , comprising:

The plurality of levels includes a third level comprising a combination of pose and position, wherein the combination of pose and position comprises a third pose of at least one appendage of the body; 40. The method of claim 39 , comprising a fourth pose of at least one appendage of the body.

40. The method of claim 39 , wherein the plurality of levels comprises a fourth level comprising at least one series of gestures.

40. The method of claim 39 , wherein the plurality of levels comprises a fifth level comprising grapheme gestures, the grapheme gestures comprising tracking free space shapes.

40. The method of claim 39 , comprising: generating a registration gesture by registering a gesture as associated with at least one application; and determining a priority of the registration gesture .