JP2022092745A

JP2022092745A - Operation method using gesture in extended reality and head-mounted display system

Info

Publication number: JP2022092745A
Application number: JP2020205628A
Authority: JP
Inventors: 勝修郭; Sheng-Hsiu Kuo
Original assignee: XRspace Co Ltd
Current assignee: XRspace Co Ltd
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2022-06-23

Abstract

To provide an operation method using gesture in extended reality (XR) and a head-mounted display (HMD) system.SOLUTION: In a method, a first gesture is identified in a first image. The first gesture corresponds to a hand of a user. A virtual hand and a first interactive object arranged in an interactive area are displayed in response to a result of identifying the first gesture. The virtual hand performs the first gesture. A second gesture is identified in a second image. The second gesture corresponds to the hand of the user, and it is different from the first gesture. The second gesture mutually operates with the first interactive object in the interactive area. The virtual hand and a second interactive object are displayed on a display in response to a result of identifying the second gesture. The virtual hand performs the second gesture. Accordingly, intuitive gesture control is provided.SELECTED DRAWING: Figure 2

Description

本開示は、一般に、仮想シミュレーション、特に、拡張現実（ＸＲ）におけるジェスチャによる操作方法及びヘッドマウントディスプレイシステムに関する。 The present disclosure generally relates to virtual simulations, in particular gesture-based manipulation methods and head-mounted display systems in augmented reality (XR).

仮想現実（ｖｉｒｔｕａｌｒｅａｌｉｔｙ，ＶＲ）、拡張現実（ａｕｇｍｅｎｔｅｄｒｅａｌｉｔｙ，ＡＲ）、複合現実（ｍｉｘｅｄｒｅａｌｉｔｙ，ＭＲ）などの感覚、知覚及び／又は環境をシミュレートするための拡張現実（Ｅｘｔｅｎｄｅｄｒｅａｌｉｔｙ，ＸＲ）技術が最近流行している。前述の技術は、ゲーム、軍事訓練、ヘルスケア、リモートワークなど、複数の分野に適用することができる。ＸＲでは、ユーザがヘッドマウントディスプレイ（ＨＭＤ）を着用した場合、ユーザは、自分の手を使用してジェスチャを行い、さらに特定の機能をトリガすることができる。その機能は、ハードウェア又はソフトウェアの制御に関連するものであり得る。従って、ユーザが自分の手でヘッドマウントディスプレイシステムを制御することが簡単となっている。 Extended reality (XR) technology for simulating sensations, perceptions and / or environments such as virtual reality (VR), augmented reality (AR), mixed reality (MR), etc. Has been popular recently. The techniques described above can be applied in multiple areas such as gaming, military training, healthcare and remote work. In XR, when a user wears a head-mounted display (HMD), the user can use his or her hand to make gestures and further trigger specific functions. The function may be related to the control of hardware or software. Therefore, it is easy for the user to control the head-mounted display system with his / her own hands.

幾つかのジェスチャは、直感的に機能をトリガするものでない可能性がある。従って、本開示は、直観的なジェスチャ制御を提供するＸＲにおけるジェスチャによる操作方法及びヘッドマウントディスプレイシステムを対象とする。 Some gestures may not be intuitive to trigger a function. Accordingly, the present disclosure is directed to gesture-based operating methods and head-mounted display systems in XRs that provide intuitive gesture control.

一例示的実施形態において、ＸＲにおけるジェスチャによる操作方法は、これに限定するものではないが、以下のステップを含む。第１ジェスチャは、第１画像において識別される。前記第１ジェスチャは、ユーザの手に対応する。インタラクティブ領域に位置する仮想手と第１インタラクティブオブジェクトは、前記第１ジェスチャの識別された結果に応答して表示される。前記仮想手は、前記第１ジェスチャを行う。第２ジェスチャは、第２画像において識別される。前記第２ジェスチャは、前記ユーザの手に対応し、前記第１ジェスチャとは異なる。前記第２ジェスチャは、前記インタラクティブ領域の前記第１インタラクティブオブジェクトと相互作用する。前記仮想手及び第２インタラクティブオブジェクトは、前記第１ジェスチャ及び前記第２ジェスチャの識別された結果に応答してディスプレイに表示される。前記仮想手が前記第２ジェスチャを行う。前記仮想手の数は、１つ又は２つであることができる。前記仮想手は、ＸＲでは全身又は半身のアバターの手であることができる。 In an exemplary embodiment, the gesture-based operation method in XR includes, but is not limited to, the following steps. The first gesture is identified in the first image. The first gesture corresponds to the user's hand. The virtual hand and the first interactive object located in the interactive area are displayed in response to the identified result of the first gesture. The virtual hand performs the first gesture. The second gesture is identified in the second image. The second gesture corresponds to the hand of the user and is different from the first gesture. The second gesture interacts with the first interactive object in the interactive area. The virtual hand and the second interactive object are displayed on the display in response to the identified results of the first gesture and the second gesture. The virtual hand makes the second gesture. The number of virtual hands can be one or two. The virtual hand can be a full-body or half-body avatar's hand in XR.

一例示的実施形態において、ヘッドマウントディスプレイシステムは、これに限定するものではないが、画像キャプチャデバイス、ディスプレイ、及びプロセッサを含む。前記画像キャプチャデバイスは、画像をキャプチャする。前記プロセッサは、前記画像キャプチャデバイス及び前記ディスプレイに結合される。前記プロセッサは、次のステップを実行するように構成される。前記プロセッサは、前記画像キャプチャデバイスによってキャプチャされた第１画像における第１ジェスチャを識別する。前記第１ジェスチャは、ユーザの手に対応する。前記プロセッサは、前記第１ジェスチャの識別された結果に応答して、前記ディスプレイのインタラクティブ領域に位置する仮想手と第１インタラクティブオブジェクトを表示する。前記仮想手が前記第１ジェスチャを行う。前記プロセッサは、前記画像キャプチャデバイスによってキャプチャされた第２画像における第２ジェスチャを識別する。前記第２ジェスチャは、前記ユーザの手に対応し、前記第１ジェスチャとは異なり、前記第２ジェスチャは、インタラクティブ領域の第１インタラクティブオブジェクトと相互作用する。前記プロセッサは、前記第１ジェスチャ及び前記第２ジェスチャの識別された結果に応答して、前記ディスプレイ上に前記仮想手及び第２インタラクティブオブジェクトを表示する。前記仮想手が前記第２ジェスチャを行う。前記仮想手の数は１つ又は２つであることができる。前記仮想手は、ＸＲでは全身又は半身のアバターの手であることができる。 In an exemplary embodiment, the head-mounted display system includes, but is not limited to, an image capture device, a display, and a processor. The image capture device captures an image. The processor is coupled to the image capture device and the display. The processor is configured to perform the next step. The processor identifies a first gesture in a first image captured by the image capture device. The first gesture corresponds to the user's hand. The processor displays a virtual hand and a first interactive object located in the interactive area of the display in response to the identified result of the first gesture. The virtual hand makes the first gesture. The processor identifies a second gesture in a second image captured by the image capture device. The second gesture corresponds to the hand of the user, and unlike the first gesture, the second gesture interacts with a first interactive object in the interactive area. The processor displays the virtual hand and the second interactive object on the display in response to the identified results of the first gesture and the second gesture. The virtual hand makes the second gesture. The number of virtual hands can be one or two. The virtual hand can be a full-body or half-body avatar's hand in XR.

上記に照らして、操作方法及びヘッドマウントディスプレイシステムによれば、２つの連続するジェスチャが２つの画像で識別され、ジェスチャの組み合わせがディスプレイをトリガして、異なるインタラクティブオブジェクトを表示することができる。更に、仮想手と更に相互作用するために、１つのインタラクティブオブジェクトが提供される。従って、ヘッドマウントディスプレイシステムを制御するための便利で興味深い方法が提供される。 In light of the above, according to the method of operation and the head-mounted display system, two consecutive gestures can be identified in two images, and the combination of gestures can trigger the display to display different interactive objects. In addition, one interactive object is provided to further interact with the virtual hand. Therefore, a convenient and interesting way to control a head-mounted display system is provided.

理解されるべきこととして、この概要は、本開示の全ての態様及び実施形態を含むとは限らず、いかなる方法にも制限又は限定することを意味するものではなく、本明細書に開示される発明は、それに対する自明な改良及び修正を含むものであると当業者により理解されるものである。 It should be understood that this summary does not include all aspects and embodiments of the present disclosure, nor does it imply any limitation or limitation to any method, and is disclosed herein. The invention is to be understood by those skilled in the art to include obvious improvements and modifications to it.

本開示の一例示的実施形態によるヘッドマウントディスプレイシステムを示すブロック図である。It is a block diagram which shows the head-mounted display system by an exemplary embodiment of this disclosure. 本開示の一例示的実施形態による拡張現実（ＸＲ）におけるジェスチャによる操作方法を示すフローチャートである。It is a flowchart which shows the operation method by a gesture in augmented reality (XR) by an exemplary embodiment of this disclosure. 本開示の一例示的実施形態によるジェスチャ分類子の予測を示す概略図である。It is a schematic diagram which shows the prediction of the gesture classifier by an exemplary embodiment of this disclosure. 、, 本開示の一例示的実施形態によるジェスチャによるインタラクティブオブジェクトのトリガーを示す概略図である。It is a schematic diagram which shows the trigger of an interactive object by a gesture by an exemplary embodiment of this disclosure. 、, 本開示の一例示的実施形態によるジェスチャによるインタラクティブオブジェクトのトリガーを示す概略図である。It is a schematic diagram which shows the trigger of an interactive object by a gesture by an exemplary embodiment of this disclosure. 、, 本開示の一例示的実施形態によるジェスチャによるインタラクティブオブジェクトのトリガーを示す概略図である。It is a schematic diagram which shows the trigger of an interactive object by a gesture by an exemplary embodiment of this disclosure.

本開示の好適実施形態を詳細に参照し、その例を添付の図面に示す。可能な限り、図面及び説明で同じ参照番号を使用して同じ又は類似の部材を参照している。 Preferred embodiments of the present disclosure are referred to in detail, examples of which are shown in the accompanying drawings. Wherever possible, the drawings and description refer to the same or similar members using the same reference numbers.

図１は、本開示の一例示的実施形態によるヘッドマウントディスプレイシステム１００を示すブロック図である。図１を参照し、ヘッドマウントディスプレイ（ＨＭＤ）システム１００は、これらに限定しないが、メモリ１１０、ディスプレイ１２０、画像キャプチャデバイス１３０、及びプロセッサを含む。ＨＭＤシステム１００は、ＸＲ又はその他の現実シミュレーション関連技術に適用される。 FIG. 1 is a block diagram showing a head-mounted display system 100 according to an exemplary embodiment of the present disclosure. With reference to FIG. 1, the head-mounted display (HMD) system 100 includes, but is not limited to, a memory 110, a display 120, an image capture device 130, and a processor. The HMD system 100 is applied to XR or other reality simulation related techniques.

メモリ１１０は、任意のタイプの固定又は可動のランダムアクセスメモリ（ＲＡＭ）、読み取り専用メモリ（ＲＯＭ）、フラッシュメモリ、類似のデバイス、又は上記デバイスの組み合わせであることができる。メモリ１１０は、プログラムコード、デバイス構成、バッファーデータ、又は永続データ（画像、ジェスチャ分類子、事前定義されたジェスチャ、設定等）を記録し、これらのデータについては、後述する。 The memory 110 can be any type of fixed or movable random access memory (RAM), read-only memory (ROM), flash memory, similar devices, or a combination of the above devices. The memory 110 records program code, device configuration, buffer data, or persistent data (images, gesture classifiers, predefined gestures, settings, etc.), which will be described later.

ディスプレイ１２０は、ＬＣＤ、ＬＥＤディスプレイ、又はＯＬＥＤディスプレイであることができる。 The display 120 can be an LCD, an LED display, or an OLED display.

画像キャプチャデバイス１３０は、モノクロカメラ又はカラーカメラ、ディープカメラ、ビデオレコーダ等のカメラ、又は他の画像をキャプチャすることができる画像キャプチャデバイスであることができる。 The image capture device 130 can be a monochrome camera or a camera such as a color camera, a deep camera, a video recorder, or an image capture device capable of capturing other images.

プロセッサ１５０は、メモリ１１０、ディスプレイ１２０、及び画像キャプチャデバイス１３０に結合される。プロセッサ１５０は、メモリ１１０に格納されたプログラムコードをロードして、本開示の例示的実施形態の手順を実行するように構成される。 The processor 150 is coupled to the memory 110, the display 120, and the image capture device 130. Processor 150 is configured to load program code stored in memory 110 and perform the procedures of the exemplary embodiments of the present disclosure.

幾つかの実施形態では、プロセッサ１５０は、ＣＰＵ（ｃｅｎｔｒａｌｐｒｏｃｅｓｓｉｎｇｕｎｉｔ）、マイクロプロセッサ、マイクロコントローラ、グラフィックス処理ユニット（ＧＰＵ）、デジタル信号処理（ＤＳＰ）チップ、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）であることができる。プロセッサ１５０の機能は、独立した電子デバイス又は集積回路（ＩＣ）によって実行されることもでき、プロセッサ１５０の動作は、ソフトウェアによって実施されることもできる。 In some embodiments, the processor 150 is a CPU (central processing unit), microprocessor, microcontroller, graphics processing unit (GPU), digital signal processing (DSP) chip, field programmable gate array (FPGA). Can be done. The function of the processor 150 may be performed by an independent electronic device or an integrated circuit (IC), and the operation of the processor 150 may be performed by software.

一実施形態において、ＨＭＤ又はデジタル眼鏡は、メモリ１１０、ディスプレイ１２０、画像キャプチャデバイス１３０、及びプロセッサ１５０を含む。幾つかの実施形態では、プロセッサ１５０は、ディスプレイ１２０及び／又は画像キャプチャデバイス１３０と同じ装置に配置されない場合がある。しかしながら、それぞれディスプレイ１２０、画像キャプチャデバイス１３０、及びプロセッサ１５０を備えた装置は、ブルートゥース（登録商標）、Ｗｉ‐Ｆｉ、及びＩＲ無線通信などの互換性のある通信技術を備えた通信トランシーバ、又は物理的な伝送ラインをさらに含み、お互いにデータを受信することができる。例えば、プロセッサ１５０は、ＨＭＤ内に配置され、一方で、画像キャプチャデバイス１３０は、ＨＭＤの外側に配置されることができる。別の例では、プロセッサ１５０がコンピューティングデバイス内に配置され、一方で、ディスプレイ１２０がコンピューティングデバイスの外側に配置されることができる。 In one embodiment, the HMD or digital eyeglasses include a memory 110, a display 120, an image capture device 130, and a processor 150. In some embodiments, the processor 150 may not be located in the same device as the display 120 and / or the image capture device 130. However, the device with the display 120, the image capture device 130, and the processor 150, respectively, is a communication transceiver or physical with compatible communication techniques such as Bluetooth®, Wi-Fi, and IR radio communication. Can receive data from each other, including more typical transmission lines. For example, the processor 150 can be located inside the HMD, while the image capture device 130 can be located outside the HMD. In another example, the processor 150 can be located inside the computing device, while the display 120 can be located outside the computing device.

本開示の１つ又は複数の実施形態で提供される動作プロセスをよりよく理解するために、幾つかの実施形態を以下に例示して、ヘッドマウントディスプレイシステム１００を詳しく説明する。ヘッドマウントディスプレイシステム１００のデバイスとモジュールは、次の実施形態に適用され、ここで提供されるＸＲにおけるジェスチャによる操作方法を説明する。その方法の各ステップは、実際の実装状況に応じて調整することができ、ここで説明するものに限定するものではない。 In order to better understand the operating process provided in one or more embodiments of the present disclosure, some embodiments are exemplified below to illustrate the head-mounted display system 100 in detail. The devices and modules of the head-mounted display system 100 are applied to the following embodiments, and the gesture operation method in the XR provided herein will be described. Each step of the method can be adjusted according to the actual implementation situation, and is not limited to the one described here.

図２は、本開示の一例示的実施形態による、拡張現実（ＸＲ）におけるジェスチャによる操作方法を示すフローチャートである。図２を参照すると、プロセッサ１５０は、画像キャプチャデバイス１３０によってキャプチャされた第１画像における第１ジェスチャを識別することができる（ステップＳ２１０）。具体的には、第１ジェスチャは、掌を上に向ける、掌を下に向ける、手を振る、拳ジェスチャなどの事前定義されたジェスチャである。第１ジェスチャは、ユーザの手に対応する。第一に、プロセッサ１５０は、画像の手を識別することができる。次に、プロセッサ１５０は、第１画像においてユーザの手によって行われたジェスチャを識別し、識別されたジェスチャが事前定義された第１ジェスチャであるかどうかを比較することができる。 FIG. 2 is a flowchart showing a gesture-based operation method in augmented reality (XR) according to an exemplary embodiment of the present disclosure. Referring to FIG. 2, processor 150 can identify the first gesture in the first image captured by the image capture device 130 (step S210). Specifically, the first gesture is a predefined gesture such as palm up, palm down, waving, fist gesture, and so on. The first gesture corresponds to the user's hand. First, the processor 150 can identify the hand of the image. The processor 150 can then identify gestures made by the user in the first image and compare whether the identified gesture is a predefined first gesture.

一実施形態では、プロセッサ１５０は、第１画像からユーザの手の関節を識別し、ジェスチャ分類子を介して第１画像及び識別されたユーザの手の関節に基づいてユーザの手のジェスチャを予測することができる。具体的には、手の関節の位置は手のジェスチャに関連している。さらに、画像の輪郭、サイズ、テクスチャ、形状、及びその他の特徴は、手のジェスチャに関連している。設計者は、事前定義されたジェスチャを含む多くの画像を訓練サンプルとして準備し、訓練サンプルを使用して、ジェスチャ認識機能で構成されている機械学習アルゴリズム（深層学習、人工ニューラルネットワーク（ＡＮＮ）、サポートベクターマシン（ＳＶＭ）等）によってジェスチャ分類子を訓練することができる。更に、これらの訓練サンプルでは手関節が識別され、手関節は同じジェスチャ分類子又は別のジェスチャ分類子を訓練するための別の訓練サンプルになる。次に、訓練されたジェスチャ分類子を使用して、入力画像で行われるジェスチャを判定することができる。 In one embodiment, the processor 150 identifies the joint of the user's hand from the first image and predicts the gesture of the user's hand based on the first image and the identified joint of the user's hand via the gesture classifier. can do. Specifically, the position of the joints of the hand is related to the gesture of the hand. In addition, image contours, sizes, textures, shapes, and other features are related to hand gestures. The designer prepares many images containing predefined gestures as training samples, and uses the training samples to use machine learning algorithms (deep learning, artificial neural networks (ANN), which are composed of gesture recognition functions. Gesture classifiers can be trained by support vector machines (SVMs, etc.). In addition, these training samples identify the wrist joint, which becomes another training sample for training the same gesture classifier or another gesture classifier. The trained gesture classifier can then be used to determine the gestures made on the input image.

幾つかの実施形態では、プロセッサ１５０は、ユーザの手の識別された関節なしで、単に第１画像に基づいて、ジェスチャを予測し、それから、ユーザの手の識別された関節に基づいて予測されたジェスチャを確定することができる。例えば、図３は、本開示の一例示的実施形態によるジェスチャ分類子の予測を示す概略図である。図３を参照し、ジェスチャを含む画像ＯＭがジェスチャ分類子に入力されると、その特徴が画像ＯＭから抽出される（ステップＳ３０１、即ち、特徴抽出）。例えば、ステップＳ３０１では、プロセッサ１５０は、画像ＯＭのピクセル値を含むフィルターで畳み込み計算を実行し、対応するカーネルを使用して特徴マップを出力する。特徴は、テクスチャ、コーナー、エッジ、又は形状であることができる。次に、プロセッサ１５０は、特徴マップ等のステップＳ３０１から抽出された特徴を分類することができる（ステップＳ３０２、即ち、分類）。なお、１つのジェスチャ分類子は、１つ以上のラベル（即ち、本実施形態の１つ以上のジェスチャ）で構成されている場合がある。ジェスチャ分類子は、判定されたジェスチャを出力することができる。 In some embodiments, the processor 150 predicts a gesture based solely on the first image, without the identified joints of the user's hand, and then based on the identified joints of the user's hand. You can confirm the gesture. For example, FIG. 3 is a schematic diagram showing the prediction of a gesture classifier according to an exemplary embodiment of the present disclosure. With reference to FIG. 3, when the image OM including the gesture is input to the gesture classifier, the feature is extracted from the image OM (step S301, that is, feature extraction). For example, in step S301, processor 150 performs a convolution calculation with a filter containing the pixel values of the image OM and outputs a feature map using the corresponding kernel. Features can be textures, corners, edges, or shapes. Next, the processor 150 can classify the features extracted from step S301 such as the feature map (step S302, that is, classification). In addition, one gesture classifier may be composed of one or more labels (that is, one or more gestures of the present embodiment). The gesture classifier can output the determined gesture.

単に画像ＯＭに基づいて１つ又は複数のジェスチャが判定された後、手関節Ｊを識別した画像ＯＭが、同じ又は別のジェスチャ分類子に入力される。同様に、プロセッサ１５０は、手関節Ｊを識別した画像ＯＭに対して特徴抽出（ステップＳ３０１）及び分類（ステップＳ３０２）を実行し、判定されたジェスチャを出力することができる。次いで判定されたジェスチャは、最初に判定されたジェスチャの正しさをチェックするために使用される。例えば、両方の判定されたジェスチャが同じである場合、プロセッサ１５０は、そのジェスチャを確定することができる。判定されたジェスチャが異なる場合、プロセッサ１５０は、別の画像において、そのジェスチャを判定することができる。 After only one or more gestures are determined based on the image OM, the image OM identifying the wrist joint J is input to the same or another gesture classifier. Similarly, the processor 150 can execute feature extraction (step S301) and classification (step S302) on the image OM that identifies the wrist joint J, and output the determined gesture. The determined gesture is then used to check the correctness of the first determined gesture. For example, if both determined gestures are the same, the processor 150 can determine the gesture. If the determined gestures are different, the processor 150 can determine the gesture in another image.

一実施形態では、プロセッサ１５０は、更にユーザの右手及び左手を識別することができる。これは、プロセッサ１５０が、どちらの手が、ジェスチャを行うか、又は画像キャプチャデバイス１３０にキャプチャされるかを認識していることを意味する（即ち、手は画像キャプチャデバイス１３０の視野（ＦＯＶ）内に配置される）。幾つかの実施形態では、プロセッサ１５０は、ユーザの右手と左手にそれぞれ異なる事前定義ジェスチャ又は同じ事前定義ジェスチャを定義することができる。例えば、１つの機能は、右手又は左手での親指を立てるジェスチャによってトリガされる。別の例では、別の機能が、右手で人差し指を上げるジェスチャによってトリガされ、同じ機能が、左手で小指を上げるジェスチャによってトリガされる。 In one embodiment, the processor 150 can further identify the user's right and left hands. This means that the processor 150 is aware of which hand is performing the gesture or is captured by the image capture device 130 (ie, the hand is the field of view (FOV) of the image capture device 130). Placed inside). In some embodiments, processor 150 may define different predefined gestures or the same predefined gestures for the user's right and left hands, respectively. For example, one function is triggered by a thumbs-up gesture with the right or left hand. In another example, another function is triggered by a gesture that raises the index finger with the right hand, and the same function is triggered by a gesture that raises the little finger with the left hand.

なお、例えば、３Ｄモデルベースのアルゴリズム、骨格ベースのアルゴリズム、外観ベースのモデル、筋電図ベースのモデル等、多くのジェスチャ認識アルゴリズムが存在する。これらのアルゴリズムは、実際の要件に合わせて実装できる。 It should be noted that there are many gesture recognition algorithms such as 3D model-based algorithms, skeleton-based algorithms, appearance-based models, and electromyogram-based models. These algorithms can be implemented according to actual requirements.

プロセッサ１５０は、第１ジェスチャの識別された結果に応答して、ディスプレイ１２０上のインタラクティブ領域に配置された仮想手及び第１インタラクティブオブジェクトを表示することができる（ステップＳ２３０）。具体的には、識別された結果が第１画像のジェスチャが第１ジェスチャと同一である場合、ユーザの手に対応する仮想手が第１ジェスチャを行う。プロセッサ１５０は、ディスプレイ１２０上に第１ジェスチャを行う仮想手を表示することができ、それにより、ユーザは、彼／彼女が正しいジェスチャを行っているかどうかを認識することができる。但し、第１画像の識別されたジェスチャが第１ジェスチャではない場合でも、プロセッサ１５０は識別されたジェスチャをディスプレイ１２０に表示することができる。更に、第１ジェスチャは、ディスプレイ１２０をトリガし、第１インタラクティブオブジェクトを表示することに用いられる。これは、ユーザが第１ジェスチャを行うまで、第１インタラクティブオブジェクトがディスプレイ１２０に表示されなくすることができることを意味する。第１インタラクティブオブジェクトは、画像、ビデオ、仮想ボール、又はその他の仮想オブジェクトである。第１インタラクティブオブジェクトは、仮想手又はアバターの手のインタラクティブ領域に位置する。これは、仮想の手の指、掌、又はその他の部分が、インタラクティブ領域に位置する任意のオブジェクトと相互作用可能となり得ることを意味する。例えば、インタラクティブ領域において、指が仮想キーに触れたり、又は掌が仮想ボールを持ったりすることができる。なお、インタラクティブ領域の形状と位置は、実際の要求に応じて変更されることができる。更に、仮想手の数は１つ又は２つであることができる。仮想手は、ＸＲでは全身又は半身のアバターの手であることができる。 The processor 150 can display the virtual hand and the first interactive object placed in the interactive area on the display 120 in response to the identified result of the first gesture (step S230). Specifically, when the identified result is that the gesture of the first image is the same as the first gesture, the virtual hand corresponding to the user's hand performs the first gesture. The processor 150 can display a virtual hand performing the first gesture on the display 120, whereby the user can recognize whether he / she is performing the correct gesture. However, even if the identified gesture of the first image is not the first gesture, the processor 150 can display the identified gesture on the display 120. Further, the first gesture is used to trigger the display 120 to display the first interactive object. This means that the first interactive object can be hidden from the display 120 until the user makes the first gesture. The first interactive object is an image, video, virtual ball, or other virtual object. The first interactive object is located in the interactive area of the virtual hand or the avatar's hand. This means that the fingers, palms, or other parts of the virtual hand can interact with any object located in the interactive area. For example, in the interactive area, a finger can touch a virtual key or a palm can hold a virtual ball. The shape and position of the interactive area can be changed according to actual requirements. Further, the number of virtual hands can be one or two. The virtual hand can be a full-body or half-body avatar's hand in XR.

一実施形態では、第１インタラクティブオブジェクトは、相互作用できることをユーザに通知し、ユーザに別のジェスチャの実行を試行させる。即ち、第１インタラクティブオブジェクトは、後続のジェスチャのヒントに関連している。例えば、第１インタラクティブオブジェクトは、仮想ボールであり、ユーザは仮想ボールを持ったり、掴んだりすることを試行することができる。 In one embodiment, the first interactive object notifies the user that it can interact and causes the user to try another gesture. That is, the first interactive object is associated with hints for subsequent gestures. For example, the first interactive object is a virtual ball, and the user can try to hold or grab the virtual ball.

プロセッサ１５０は、第２画像における第２ジェスチャを識別することができる（ステップＳ２５０）。具体的には、第２ジェスチャは、掌を上に向ける、掌を下に向ける、交差させた指、又は拳ジェスチャ等の別の事前定義されたジェスチャであるが、第１ジェスチャとは異なるものである。第２ジェスチャも、ユーザの手に対応する。プロセッサ１５０は、第２画像においてユーザの手によって行われたジェスチャを識別し、識別されたジェスチャが事前定義された第２ジェスチャであるかどうかを比較することができる。 Processor 150 can identify the second gesture in the second image (step S250). Specifically, the second gesture is another predefined gesture, such as a palm-up, palm-down, crossed finger, or fist gesture, but different from the first gesture. Is. The second gesture also corresponds to the user's hand. Processor 150 can identify gestures made by the user in the second image and compare whether the identified gesture is a predefined second gesture.

一実施形態では、ステップＳ２１０で詳細に述べたように、プロセッサ１５０は、第２画像からユーザの手の関節を識別し、ジェスチャ分類子を介して第２画像及び識別されたユーザの手の関節に基づいてユーザの手のジェスチャを予測することができる。幾つかの実施形態では、ステップＳ２１０で詳細に述べたように、プロセッサ１５０は、ユーザの手の識別された関節なしで、単に第２画像に基づいて、ジェスチャを予測し、次に、ユーザの手の識別された関節に基づいて予測されたジェスチャを確定することができる。 In one embodiment, as described in detail in step S210, the processor 150 identifies the joint of the user's hand from the second image and via the gesture classifier the second image and the identified user's hand joint. The gesture of the user's hand can be predicted based on. In some embodiments, as detailed in step S210, the processor 150 predicts the gesture, simply based on the second image, without the identified joints of the user's hand, and then the user's. Predicted gestures can be determined based on the identified joints of the hand.

プロセッサ１５０は、第２ジェスチャの識別された結果に応答して、ディスプレイ１２０上に仮想手及び第２インタラクティブオブジェクトを表示することができる（ステップＳ２７０）。具体的には、識別された結果が、第２画像のジェスチャが第２ジェスチャと同一であるという場合、ユーザに対応する仮想手が第２ジェスチャを行う。プロセッサ１５０は、ディスプレイ１２０上に第２ジェスチャを行う仮想手を表示することができ、第２ジェスチャを有する手は、インタラクティブ領域内の第１インタラクティブオブジェクトと相互作用することができる。例えば、仮想手が仮想ボールを掴む。幾つかの実施形態では、第１インタラクティブオブジェクトの変形のアニメーションがディスプレイ１２０に表示されることができる。例えば、仮想ボールが握り潰される。但し、第２画像の識別されたジェスチャが第２ジェスチャではない場合でも、プロセッサ１５０は、識別されたジェスチャをディスプレイ１２０に表示することができる。更に、第１インタラクティブオブジェクトは、間違ったジェスチャのために非表示にされることができる。 The processor 150 can display the virtual hand and the second interactive object on the display 120 in response to the identified result of the second gesture (step S270). Specifically, if the identified result is that the gesture of the second image is the same as the second gesture, the virtual hand corresponding to the user performs the second gesture. The processor 150 can display a virtual hand performing the second gesture on the display 120, and the hand having the second gesture can interact with the first interactive object in the interactive area. For example, a virtual hand grabs a virtual ball. In some embodiments, an animation of the transformation of the first interactive object can be displayed on the display 120. For example, a virtual ball is crushed. However, even if the identified gesture of the second image is not the second gesture, the processor 150 can display the identified gesture on the display 120. In addition, the first interactive object can be hidden due to the wrong gesture.

更に、第１ジェスチャと第２ジェスチャの組み合わせを使用して、ディスプレイ１２０をトリガして第２インタラクティブオブジェクトを表示し、第１インタラクティブオブジェクトを非表示にする。これは、ユーザが第１ジェスチャを実行してから第２ジェスチャを実行するまで、第２インタラクティブオブジェクトがディスプレイ１２０に表示されないようにできることを意味する。第１画像で第１ジェスチャが識別された後に第２ジェスチャとは異なる第３ジェスチャが第２画像で識別された場合、第１インタラクティブオブジェクトは、引き続きディスプレイ１２０に表示され、第２インタラクティブオブジェクトは表示されない。第２インタラクティブオブジェクトは、画像、ビデオ、メニュー、又はその他の仮想オブジェクトであることができる。一方、第２ジェスチャが識別されるため、第２ジェスチャのヒントである第１インタラクティブオブジェクトは、表示される必要はない。従って、第１インタラクティブオブジェクトは、ユーザが第１ジェスチャと第２ジェスチャを直感的に組み合わせることを補助することができる。 In addition, the combination of the first and second gestures is used to trigger the display 120 to show the second interactive object and hide the first interactive object. This means that the second interactive object can be hidden from the display 120 from the time the user performs the first gesture until the user performs the second gesture. If a third gesture different from the second gesture is identified in the second image after the first gesture is identified in the first image, the first interactive object is still displayed on the display 120 and the second interactive object is visible. Not done. The second interactive object can be an image, video, menu, or other virtual object. On the other hand, since the second gesture is identified, the first interactive object, which is a hint of the second gesture, does not need to be displayed. Therefore, the first interactive object can help the user intuitively combine the first and second gestures.

例えば、図４Ａ及び図４Ｂは、本開示の一例示的実施形態によるジェスチャによるインタラクティブオブジェクトのトリガを示す概略図である。図４Ａを参照し、第１ジェスチャとして定義される掌を上に向けるジェスチャは、第１時点で左手を有する第１画像において識別される。掌を上に向けるジェスチャを備えた仮想左手ＬＨと仮想ボールｉｏ１（即ち、第１インタラクティブオブジェクト）がディスプレイ１２０に表示される。図４Ｂを参照し、第２ジェスチャとして定義される拳ジェスチャは、第２時点で左手を有する第２画像において識別される。拳ジェスチャのある仮想左手ＬＨとメインメニューｉｏ２（即ち、第２インタラクティブオブジェクト）が、ディスプレイ１２０に表示される。メインメニューｉｏ２は、友達リスト、地図、アプリストアのアイコン等の複数のアイコンを含む。 For example, FIGS. 4A and 4B are schematics showing the triggering of an interactive object by gestures according to an exemplary embodiment of the present disclosure. With reference to FIG. 4A, the palm-up gesture defined as the first gesture is identified in the first image with the left hand at the first time point. A virtual left-hand LH with a palm-up gesture and a virtual ball io1 (ie, a first interactive object) are displayed on the display 120. With reference to FIG. 4B, the fist gesture defined as the second gesture is identified in the second image with the left hand at the second time point. A virtual left-handed LH with a fist gesture and a main menu io2 (ie, a second interactive object) are displayed on the display 120. The main menu io2 includes a plurality of icons such as a friend list, a map, and an app store icon.

一実施形態では、第２インタラクティブオブジェクトは、第１メニューと第２メニューを含む。第２メニューは、第１メニューとは異なる。プロセッサ１５０は、右手が識別された場合は、ディスプレイ１２０に第１メニューを表示し、左手が識別された場合は、ディスプレイ１２０に第２メニューを表示する。これは、第１ジェスチャと第２ジェスチャの組み合わせが右手によって行われた場合、第１メニューがディスプレイ１２０に表示されるが、第１ジェスチャと第２ジェスチャの組み合わせが左手で行われた場合、ディスプレイ１２０に第２メニューが表示されることを意味する。 In one embodiment, the second interactive object includes a first menu and a second menu. The second menu is different from the first menu. The processor 150 displays the first menu on the display 120 when the right hand is identified, and displays the second menu on the display 120 when the left hand is identified. This is because if the combination of the first and second gestures is done by the right hand, the first menu is displayed on the display 120, but if the combination of the first and second gestures is done by the left hand, the display. It means that the second menu is displayed at 120.

例えば、図４Ｂに示すように、第２メニューはメインメニューｉｏ２である。図５Ａ及び図５Ｂは、本開示の一例示的実施形態に基づくジェスチャによるインタラクティブオブジェクトのトリガを示す概略図である。図５Ａを参照し、第１ジェスチャとして定義される掌を上に向けるジェスチャは、第３時点で右手を有する第１画像において識別される。掌を上に向けるジェスチャと仮想ボールｉｏ３を備えた仮想右手ＲＨ（即ち、第１インタラクティブオブジェクト）がディスプレイ１２０に表示される。図５Ｂを参照し、第２ジェスチャとして定義される拳ジェスチャは、第４時点で左手を有する第２画像において識別される。第１ジェスチャを有する仮想右手ＲＨ及びクイック設定メニューｉｏ４（即ち、第２インタラクティブオブジェクト又は第１メニュー）がディスプレイ１２０に表示される。クイック設定メニューｉｏ４は、カメラのオン／オフ、仮想手での特定の動作、メッセージングのアイコン等の複数のアイコンを含む。 For example, as shown in FIG. 4B, the second menu is the main menu io2. 5A and 5B are schematics showing the triggering of an interactive object by gestures based on an exemplary embodiment of the present disclosure. With reference to FIG. 5A, the palm-up gesture defined as the first gesture is identified in the first image with the right hand at the third time point. A virtual right-handed RH (ie, a first interactive object) with a palm-up gesture and a virtual ball io3 is displayed on the display 120. With reference to FIG. 5B, the fist gesture defined as the second gesture is identified in the second image with the left hand at the fourth time point. A virtual right-handed RH with a first gesture and a quick setting menu io4 (ie, a second interactive object or a first menu) are displayed on the display 120. The quick setting menu io4 includes a plurality of icons such as camera on / off, specific action by virtual hand, and messaging icon.

一実施形態では、第２ジェスチャが検出された場合、プロセッサ１５０はディスプレイ１２０上の第１インタラクティブオブジェクトを更に非表示にすることができる。これは、後続のジェスチャを更に指示する必要がなく、第１インタラクティブオブジェクトが表示されないことを意味する。従って、第２インタラクティブオブジェクトのみがディスプレイ１２０に表示される。図５Ａ及び５Ｂを例にとると、拳ジェスチャが識別された後、仮想ボールｉｏ３は非表示にされる。 In one embodiment, if the second gesture is detected, the processor 150 can further hide the first interactive object on the display 120. This means that the first interactive object is not displayed without further instruction for subsequent gestures. Therefore, only the second interactive object is displayed on the display 120. Taking FIGS. 5A and 5B as an example, the virtual ball io3 is hidden after the fist gesture is identified.

別の実施形態では、第２インタラクティブオブジェクトがディスプレイ１２０に表示され、第１ジェスチャと第２ジェスチャの識別結果が確定された場合（即ち、第１ジェスチャと第２ジェスチャの組み合わせがユーザによって行われた場合）、プロセッサ１５０はディスプレイ１２０上の第１インタラクティブオブジェクトと第２インタラクティブオブジェクトを非表示にすることができる。従って、ジェスチャによってメニューをオフにすることができる。 In another embodiment, the second interactive object is displayed on the display 120 and the identification result of the first gesture and the second gesture is confirmed (that is, the combination of the first gesture and the second gesture is performed by the user. Case), the processor 150 can hide the first and second interactive objects on the display 120. Therefore, the menu can be turned off by gesture.

例えば、図６Ａ及び図６Ｂは、本開示の一例示的実施形態に基づくジェスチャによるインタラクティブオブジェクトのトリガを示す概略図である。図６Ａを参照し、クイック設定メニューｉｏ４がディスプレイ１２０に表示されている。仮想ボールｉｏ３は、右手ＲＨでの掌を上に向けたジェスチャのために表示される。図６Ｂを参照し、右手ＲＨでの拳ジェスチャのため、仮想ボールｉｏ３とクイック設定メニューｉｏ４の両方が非表示にされている。 For example, FIGS. 6A and 6B are schematics showing a gesture-based triggering of an interactive object according to an exemplary embodiment of the present disclosure. With reference to FIG. 6A, the quick setting menu io4 is displayed on the display 120. The virtual ball io3 is displayed for a palm-up gesture in the right hand RH. With reference to FIG. 6B, both the virtual ball io3 and the quick setting menu io4 are hidden due to the fist gesture with the right hand RH.

なお、図４Ａ～図６Ｂの第１インタラクティブオブジェクト及び第２インタラクティブオブジェクト及びジェスチャは、実際の要求に応じて変更されることができ、実施形態は、それに限定するものではない。 The first interactive object, the second interactive object, and the gesture of FIGS. 4A to 6B can be changed according to an actual request, and the embodiment is not limited thereto.

要約すると、上記の例示的実施形態は、ＸＲにおけるジェスチャによる操作方法及びヘッドマウントディスプレイシステムを示した。ジェスチャの組み合わせは２つの画像で識別され、ディスプレイで第２インタラクティブオブジェクトを表示するために用いられる。更に、第２ジェスチャを行うようにさらに促すために、第１ジェスチャが識別された後に第１インタラクティブオブジェクトが表示されることができる。従って、直感的なジェスチャ制御が提供される。 In summary, the above exemplary embodiments have shown gesture-based operation methods and head-mounted display systems in XR. The combination of gestures is identified by the two images and is used to display the second interactive object on the display. In addition, the first interactive object can be displayed after the first gesture has been identified to further encourage the second gesture to be performed. Therefore, intuitive gesture control is provided.

当業者であれば明らかであるように、本開示の範囲又は精神から逸脱することなく、本開示の構造に対して様々な修正及び変形を行うことができる。前述の観点において、本開示は、それらが後述の特許請求の範囲及びそれらの均等の範囲内にあるという条件で、本開示の修正及び変形を含むことが意図される。 As will be apparent to those skilled in the art, various modifications and variations may be made to the structure of the present disclosure without departing from the scope or spirit of the present disclosure. In the above-mentioned viewpoint, the present disclosure is intended to include amendments and modifications of the present disclosure, provided that they are within the scope of the claims described below and their equality.

ジェスチャによる操作方法とヘッドマウントディスプレイシステムは、拡張現実に適用できる。 Gesture operation methods and head-mounted display systems can be applied to augmented reality.

１００ヘッドマウントディスプレイシステム
１１０メモリ
１２０ディスプレイ
１３０画像キャプチャデバイス
１５０プロセッサ
Ｓ２１０～Ｓ２７０、Ｓ３０１～Ｓ３０２ステップ
ＯＭ画像
J 関節
ＬＨ仮想左手
ｉｏ１仮想ボール
ｉｏ２メインメニュー
ＲＨ仮想右手
ｉｏ３仮想ボール
ｉｏ４クイック設定メニュー 100 Head-mounted display system 110 Memory 120 Display 130 Image capture device 150 Processors S210-S270, S301-S302 Step OM image
J Joint LH Virtual Left Hand io1 Virtual Ball io2 Main Menu RH Virtual Right Hand io3 Virtual Ball io4 Quick Setting Menu

Claims

In the first image, the step of identifying the first gesture corresponding to the user's hand,
In response to the identified result of the first gesture, the virtual hand and the first interactive object placed in the interactive area are displayed, and the virtual hand performs the first gesture.
In the second image, a second gesture that corresponds to the user's hand and is different from the first gesture is identified, wherein the second gesture interacts with the first interactive object in the interactive region.
A step of displaying the virtual hand and the second interactive object in response to the identified result of the second gesture, and the virtual hand performing the second gesture.
Gesture-based operation methods in augmented reality (XR), including.

An image capture device that captures images and
With the display
Combined with the image capture device and the display
In the first image captured by the image capture device, the first gesture corresponding to the user's hand is identified.
In response to the identified result of the first gesture, the virtual hand and the first interactive object placed in the interactive area are displayed on the display, and the virtual hand performs the first gesture.
In the second image captured by the image capture device, a second gesture that corresponds to the user's hand and is different from the first gesture is identified, and the second gesture interacts with the first interactive object in the interactive area. Acts and
The virtual hand and the second interactive object are displayed on the display in response to the identified result of the second gesture, and the virtual hand performs the second gesture.
With a processor configured to
Head-mounted display system with.

The step of identifying the first gesture or the second gesture is
Identifying the joint of the user's hand from the first image or the second image,
The gesture classifier predicts the gesture of the user's hand based on the first image or the second image and the identified joints of the user's hand, and the gesture classifier is trained by a machine learning algorithm. When,
The operation method according to claim 1, comprising the above.
Or,
The processor further
The joint of the user's hand is identified from the first image or the second image, and the joint is identified.
The gesture classifier predicts the gesture of the user's hand based on the first image or the second image and the identified joints of the user's hand, and the gesture classifier is trained by a machine learning algorithm. ,
The head-mounted display system according to claim 2.

The step of predicting the gesture of the user's hand is
Predicting gestures simply based on the first or second image, without the identified joints of the user's hand.
Establishing the predicted gesture based on the identified joints of the user's hand,
The operation method according to claim 3, which includes
Or,
The processor further
Predicting gestures simply based on the first or second image, without the identified joints of the user's hand.
Establishing a predicted gesture based on the identified joint of the user's hand,
The head-mounted display system according to claim 3.

The operation method according to claim 1, further comprising a step of hiding the first interactive object in response to the identified result of the second gesture.
Or,
The processor further
Hide the first interactive object in response to the identified result of the second gesture.
The head-mounted display system according to claim 2.

The operation method according to claim 1, further comprising a step of hiding the first interactive object and the second interactive object in response to the identified result of the second gesture.
Or,
The processor further
Hiding the first interactive object and the second interactive object in response to the identified result of the second gesture.
The head-mounted display system according to claim 2.

The step of identifying the first gesture or the second gesture is
The step of displaying the second interactive object, which comprises identifying one of the user's right and left hands, is
Displaying the first menu in response to the identification of the right hand,
The operation method according to claim 1, wherein a second menu different from the first menu is displayed in response to the identification of the left hand, and the second interactive object includes the first menu and the second menu. ,
Or,
The processor further
Identify one of the user's right and left hands
In response to the identification of the right hand, the first menu is displayed on the display.
In response to the identification of the left hand, the display displays the second menu different from the first menu, and the second interactive object includes the first menu and the second menu.
The head-mounted display system according to claim 2.

The first menu corresponds to the quick setting menu, and the second menu corresponds to the main menu.
The operation method according to claim 7.
Or,
The head-mounted display system according to claim 7.

The first gesture is a gesture with the palm facing up, and the second gesture is a fist gesture.
The operation method according to claim 1.
Or,
The head-mounted display system according to claim 2.