JP2023103050A

JP2023103050A - Vr head-mounted system

Info

Publication number: JP2023103050A
Application number: JP2022003886A
Authority: JP
Inventors: 良平上瀧; Ryohei Kamitaki; 泰雅市川; Yasumasa Ichikawa
Original assignee: World Scan Project Corp
Current assignee: World Scan Project Corp
Priority date: 2022-01-13
Filing date: 2022-01-13
Publication date: 2023-07-26

Abstract

To provide a 6DOF-enabled head-mounted system which allows a user to view an entire-celestial-sphere image even if a head-mounted body is attached with a portable device.SOLUTION: There is provided a head-mounted system 100 in which a portable device SP including a camera CA on a first surface and a display unit DIS on a second surface is attached to a head-mounted body 10, and which allows a user to view an image projected onto the display unit DIS. The head-mounted body 10 includes a storage pocket 12 which stores the portable device such that the first surface faces forward, and which is notched such that the camera can image the front side. The portable device 10 comprises: a rotation amount sensor JS which senses a rotation amount in the three rotation directions of roll, pitch, and yaw; a translation movement amount calculation unit 32 which detects a feature point from a photographed image of the camera and calculates a translation movement amount in the three orthogonal axial directions; and an in-visual field image generation unit 36 which generates an in-visual field image from an entire-celestial-sphere image on the basis of the rotation amount and the translation movement amount.SELECTED DRAWING: Figure 3

Description

本発明は、スマートフォン等のポータブルデバイスを使って、全天球映像等を６ＤＯＦ（Degree of Freedom）に応じて視野内映像が変化する映像の表示を行うＶＲヘッドマウントシステムに関する。 The present invention relates to a VR head-mounted system that uses a portable device such as a smartphone to display an omnidirectional image or the like in which the image within the field of view changes according to 6DOF (Degree of Freedom).

ＶＲ（virtual reality）等の分野で、周囲３６０°の撮影が可能な撮影装置によって撮影された全天球映像の利用が進んでいる。利用者は、ヘッドマウントにスマートフォン等のポータブルデバイスを取り付けて、ポータブルデバイスの表示部に、全天球映像を立体視が可能な視差付きのステレオ映像として表示させて、全天球映像を視聴することができる。特許文献１は、ポータブルデバイスをヘッドマウントに取り付ける発明を開示している。 2. Description of the Related Art In the field of VR (virtual reality) and the like, the use of omnidirectional images captured by a camera capable of capturing 360° surroundings is progressing. A user attaches a portable device such as a smartphone to a head mount, displays an omnidirectional image as a stereo image with parallax that enables stereoscopic viewing on the display unit of the portable device, and can view the omnidirectional image. Patent Literature 1 discloses an invention for attaching a portable device to a head mount.

ポータブルデバイスをヘッドマウントに取り付けると、ポータブルデバイスが有する角速度センサで、Ｘ軸・Ｙ軸・Ｚ軸周りの３つの動き（ロール、ピッチ、ヨー）を感知することができる。このため利用者の頭の回転や傾きを感知して、3ＤＯＦ対応の全天球映像を視聴することができる。 When the portable device is attached to the head mount, the angular velocity sensor of the portable device can detect three movements (roll, pitch, yaw) around the X, Y, and Z axes. As a result, it is possible to sense the rotation and tilt of the user's head and view 3DOF omnidirectional images.

しかし、特許文献１のポータブルデバイスを取り付けたヘッドマウントは、Ｘ軸・Ｙ軸・Ｚ軸方向の「併進移動」という３つの動きを検知することができない。このため利用者は、併進移動を含めた６ＤＯＦ対応の全天球映像を視聴することができない問題があった。さらに、全天球映像を視聴するだけでなく、全天球映像を巻き戻したり進めたりするためには、スマートフォン等のポータブルデバイス以外に手（ハンド）で持つコントローラも用意する必要があった。 However, the head mount to which the portable device of Patent Document 1 is attached cannot detect three movements of "translational movement" in the X-axis, Y-axis, and Z-axis directions. Therefore, there is a problem that the user cannot view omnidirectional video corresponding to 6DOF including translational movement. Furthermore, in order to not only view the omnidirectional video, but also to rewind and advance the omnidirectional video, it was necessary to prepare a hand-held controller in addition to a portable device such as a smartphone.

特表２０１９－５１０３２８号公報Japanese Patent Publication No. 2019-510328

そこで本実施形態は、ポータブルデバイスを取り付けたヘッドマウントであっても、６ＤＯＦ対応の全天球映像を視聴することできるようにすることを目的とする。またコントローラも用意しなくても、全天球映像に対して処理（停止処理、巻き戻し処理等）できるようにすることを目的とする。 Therefore, an object of the present embodiment is to make it possible to view a 6DOF omnidirectional video even with a head-mounted portable device. Another object of the present invention is to enable processing (stop processing, rewind processing, etc.) for omnidirectional video without preparing a controller.

本実施形態のヘッドマウントシステムは、第１面にカメラと第２面に表示部とを含むポータブルデバイスがヘッドマウント本体に取り付けられて、表示部に投影される映像を視聴するためのヘッドマウントシステムである。そしてヘッドマウント本体は、第１面が前方を向くようにポータブルデバイスを収納しカメラが前方を撮影できるように切り欠かれた収容ポケットを有する。さらにポータブルデバイスは、ロール・ピッチ・ヨーの３回転方向の回転量を感知する回転量センサと、カメラの撮影画像から特徴点を検出して、３つの直交軸方向の並進移動量を計算する並進移動量計算部と、回転量及び並進移動量に基づいて、全天球映像から視野内映像を生成する視野内映像生成部と、を有する。 The head mount system of the present embodiment is a head mount system in which a portable device including a camera on the first surface and a display unit on the second surface is attached to a head mount body and an image projected on the display unit is viewed. The head mount main body has a storage pocket cut out so that the portable device can be stored with the first surface facing forward and the camera can photograph the front. Further, the portable device has a rotation amount sensor that senses the amount of rotation in the three rotation directions of roll, pitch, and yaw, a translation amount calculation unit that detects feature points from an image captured by the camera and calculates the amount of translational movement in three orthogonal axis directions, and an in-field image generation unit that generates an in-field image from an omnidirectional image based on the amount of rotation and the amount of translational movement.

またポータブルデバイスは、カメラの撮影画像から利用者の手を検出してトラッキングするハンドトラッキング部と、手が所定時間より長く撮影画像内にあるとき、ハンド画像を生成するハンド画像生成部と、手が所定時間より長く撮影画像内にあるとき、視野内映像に関連するＧＵＩ画像を生成するＧＵＩ画像生成部と、視野内映像に、ハンド画像及びＧＵＩ画像を合成する映像合成部と、を有することが好ましい。 The portable device preferably has a hand tracking unit that detects and tracks the user's hand from an image captured by the camera, a hand image generation unit that generates a hand image when the hand is in the captured image for longer than a predetermined time, a GUI image generation unit that generates a GUI image related to the video within the field of view when the hand is in the captured image for a longer time than the predetermined time, and a video synthesizing unit that combines the hand image and the GUI image with the video within the field of view.

またポータブルデバイスは、ハンド画像がＧＵＩ画像の指示アイコンに所定時間存在していたかを判定するＧＵＩ指示判定部を有し、ＧＵＩ指示判定部は、ハンド画像がＧＵＩ画像の指示アイコン上にある場合に、指示アイコンの動作の実行を指示してもよい。
またポータブルデバイスは、視野内映像が商品を購入できるショッピング店舗であるか否かを判定するショッピング店舗判定部と、ショッピング店舗で販売される複数の商品画像を蓄積する商品画像蓄積部と、を有する。ＧＵＩ指示判定部が、ハンド画像がショッピングのＧＵＩ画像上にある場合に、商品画像蓄積部が商品画像を映像合成部に供給することが好ましい。 The portable device may also have a GUI instruction determination unit that determines whether the hand image has been present on the instruction icon of the GUI image for a predetermined time, and the GUI instruction determination unit may instruct execution of the operation of the instruction icon when the hand image is on the instruction icon of the GUI image.
The portable device also has a shopping store determination unit that determines whether or not the image within the field of view is a shopping store where products can be purchased, and a product image storage unit that stores a plurality of product images sold at the shopping store. It is preferable that the product image storage unit supplies the product image to the image synthesizing unit when the GUI instruction determination unit determines that the hand image is on the shopping GUI image.

またポータブルデバイスは、ハンド画像が商品画像に所定時間存在していたかを判定する商品特定判定部を有し、商品特定判定部は、ハンド画像と商品画像とを一体化するとともに、且つカート画像ではハンド画像と商品画像とを切り離すことが好ましい。
商品画像蓄積部が商品画像を映像合成部に供給する際には、映像合成部は視野内映像を合成しないことが好ましい。 Moreover, the portable device has a product identification determination unit for determining whether the hand image exists in the product image for a predetermined time, and the product identification determination unit preferably integrates the hand image and the product image, and separates the hand image and the product image in the cart image.
When the product image storage unit supplies the product image to the image synthesizing unit, it is preferable that the image synthesizing unit does not synthesize the in-field image.

本実施形態の視野内画像を投影する方法は、第１面にカメラと第２面に表示部とを含むポータブルデバイスをヘッドマウント本体の収容ポケットに収納して、表示部に視野内画像を投影する方法である。そして収容ポケットは、カメラが前方を撮影できるように切り欠かれている。そしてポータブルデバイスが、ロール・ピッチ・ヨーの３回転方向の回転量を感知し、カメラの撮影画像から特徴点を検出して、３つの直交軸方向の並進移動量を計算し、回転量及び並進移動量に基づいて、全天球映像から視野内映像を生成する、
視野内画像を投影する。 The method of projecting an in-field image according to this embodiment is a method in which a portable device including a camera on the first surface and a display unit on the second surface is stored in a housing pocket of the head mount body, and the in-field image is projected on the display unit. And the storage pocket is cut out so that the camera can shoot forward. Then, the portable device senses the amount of rotation in the three rotational directions of roll, pitch, and yaw, detects feature points from the image captured by the camera, calculates the amount of translational movement in the three orthogonal axis directions, and generates an in-field image from the omnidirectional image based on the amount of rotation and translational movement.
Project the image within the field of view.

本実施形態のＶＲヘッドマウントシステムは、６ＤＯＦ対応の全天球映像を視聴することができる。また、ハンドコントローラを用意することなく、全天球映像に対して処理を指示できる。 The VR head-mounted system of this embodiment can view 6DOF omnidirectional video. In addition, it is possible to instruct processing for the omnidirectional video without preparing a hand controller.

本実施形態のポータブルデバイスを取り付けたヘッドマウントシステムの斜視図である。1 is a perspective view of a head-mounted system to which the portable device of this embodiment is attached; FIG. （Ａ）は、４方向から見たヘッドマウント（ポータブルデバイスを除く）図である。（Ｂ）は、瞳孔間の距離を調整する機構の説明図であり、（Ｃ）はレンズの焦点距離を調整する機構の説明図である。(A) is a view of the head mount (excluding the portable device) viewed from four directions. (B) is an explanatory diagram of a mechanism for adjusting the distance between the pupils, and (C) is an explanatory diagram of a mechanism for adjusting the focal length of the lens. ＶＲヘッドマウントシステムに使用されるポータブルデバイスのブロック図である。1 is a block diagram of a portable device used in a VR head-mounted system; FIG. ハンドトラッキングに関するフローチャート１である。1 is a flow chart 1 for hand tracking; （Ａ）は、ハンドトラッキングされたハンド画像の例である。（Ｂ）は、ＧＵＩ画像の一例であり５つの指示アイコンが表示された例である。（Ｃ）から（Ｆ）は、ポータブルデバイスの表示部に表示される視野内映像の例である。なお表示部の画像又は映像は、人間の脳で認識する１つの視野内映像で描かれている。(A) is an example of a hand-tracked hand image. (B) is an example of a GUI image in which five instruction icons are displayed. (C) to (F) are examples of in-field images displayed on the display unit of the portable device. It should be noted that the image or video on the display unit is drawn as one video within the field of view recognized by the human brain. ハンドトラッキングに関するフローチャート２である。Fig. 2 is flow chart 2 for hand tracking; （Ａ）から（Ｆ）は、ポータブルデバイスの表示部に表示される画像の例である。なお表示部の画像又は映像は、人間の脳で認識する１つの視野内映像で描かれている。(A) to (F) are examples of images displayed on the display unit of the portable device. It should be noted that the image or video on the display unit is drawn as one video within the field of view recognized by the human brain.

［ヘッドマウントシステムの全体構成］
図１は本実施形態に係るのポータブルデバイスＳＰをヘッドマウント本体１０に取り付けたヘッドマウントシステム１００を示す斜視図である。図１に示すように、ヘッドマウントシステム１００は、ヘッドマウント本体１０、ヘッドマウント本体１０を利用者の頭部にしっかりと取り付けるストラップ２０及びポータブルデバイスＳＰを含んでいる。 [Overall configuration of head-mounted system]
FIG. 1 is a perspective view showing a head mount system 100 in which a portable device SP according to this embodiment is attached to a head mount body 10. FIG. As shown in FIG. 1, the head mount system 100 includes a head mount body 10, a strap 20 for securely attaching the head mount body 10 to the user's head, and a portable device SP.

ポータブルデバイスＳＰは、ヘッドマウント本体１０の収納ポケット１２に収納される。本実施形態に開示するポータブルデバイスＳＰは、第１面（－Ｙ軸側）に１以上のカメラＣＡを設けており、第２面（＋Ｙ軸側）に表示部ＤＩＳ（図２を参照）を設けている。ポータブルデバイスＳＰは、以下に限定されるものではないが、いわゆるスマートフォン、タブレット端末もしくは手持ち式視覚メディアプレイヤなどが挙げられる。 The portable device SP is stored in the storage pocket 12 of the head mount body 10. - 特許庁The portable device SP disclosed in this embodiment has one or more cameras CA on the first surface (−Y axis side) and a display unit DIS (see FIG. 2) on the second surface (+Y axis side). Portable devices SP include, but are not limited to, so-called smart phones, tablet terminals or handheld visual media players.

例示的なポータブルデバイスＳＰは、中央処理装置（ＣＰＵ）（図示せず）、表示部ＤＩＳ、カメラＣＡ、および通信部を含み、システムと共に使用するためのアプリケーションを動作できるようにすることができる。ポータブルデバイスＳＰ５０は、１つまたは複数のジャイロセンサ、加速度センサ、比重計、又は磁気計等の回転角センサを組み込んでいる。本実施形態では、回転角センサで、ヘッドマウントシステム１００のロール・ピッチ・ヨーの３回転方向の回転量を感知することができる。 An exemplary portable device SP includes a central processing unit (CPU) (not shown), a display unit DIS, a camera CA, and a communication unit to enable running applications for use with the system. Portable device SP50 incorporates a rotation angle sensor such as one or more gyro sensors, accelerometers, hydrometers, or magnetometers. In this embodiment, the rotation angle sensor can detect the amount of rotation of the head mounted system 100 in the three directions of roll, pitch, and yaw.

ヘッドマウント本体１０にはストラップ２０が取り付けられており、ストラップ２０で利用者の頭部がしっかり固定される。なお。例えば、ヘッドマウント本体１０は、ストラップなしで頭頂部に固定されるヘルメット様のデバイスに組み込むこともできる。 A strap 20 is attached to the head mount body 10, and the user's head is firmly fixed with the strap 20.例文帳に追加note that. For example, head mount body 10 can be incorporated into a helmet-like device that is secured to the crown of the head without straps.

ポータブルデバイスＳＰを収納する収納ポケット１２は、板バネのように撓む構造になっており、ポータブルデバイスＳＰを、周壁１１と収納ポケット１２との間で挟み込む。収納ポケット１２の幅（Ｘ軸方向）は、ポータブルデバイスＳＰの長さより短く切り欠かれた形状に形成されており、ポータブルデバイスＳＰのカメラＣＡが前方を撮影できるようになっている。収納ポケット１２の幅が長い場合には、ポータブルデバイスＳＰのカメラＣＡに相当する周辺が切り欠かれ、カメラＣＡが前方を撮影できるようになっていてもよい。 A storage pocket 12 for storing the portable device SP has a structure that bends like a leaf spring, and the portable device SP is sandwiched between the peripheral wall 11 and the storage pocket 12.例文帳に追加The width (X-axis direction) of the storage pocket 12 is formed in a notched shape shorter than the length of the portable device SP, so that the camera CA of the portable device SP can photograph the front. If the width of the storage pocket 12 is long, the periphery corresponding to the camera CA of the portable device SP may be cut out so that the camera CA can photograph the front.

周壁１１には、レンズの焦点距離を調整する焦点調製レバー１４及び瞳孔間の距離に合わせてレンズを移動させる瞳孔間距離調整ダイヤル１５が設けられている。 The peripheral wall 11 is provided with a focus adjustment lever 14 for adjusting the focal length of the lens and an interpupillary distance adjustment dial 15 for moving the lens according to the distance between the pupils.

［ヘッドマウント本体の構成］
図２は本実施形態に係るのヘッドマウント本体１０を示した図である。図２（Ａ）はヘッドマウント本体１０を４方向からの見た図である。図２（Ｂ）（Ｃ）は、ヘッドマウント本体１０の内部に配置されるレンズ周辺を示した図である。 [Composition of the head mount body]
FIG. 2 is a diagram showing the head mount body 10 according to this embodiment. FIG. 2A is a view of the head mount body 10 viewed from four directions. 2B and 2C are diagrams showing the periphery of the lens arranged inside the head mount body 10. FIG.

ヘッドマウント本体１０の周壁１１及び収納ポケット１２は、好ましくは、エチレン酢酸ビニル（ＥＶＡ）、ポリウレタン（ＰＵ）、もしくはＡＢＳ樹脂等のプラスチック材料で構成されるから。これらはそれぞれ、単独でまたは様々な組み合わせで利用することができる。好ましい実施形態では、周壁１１及び収納ポケット１２は射出成形などで成形される。周壁１１の前方部分は概して矩形または箱形の形状を有している。 This is because the peripheral wall 11 and the storage pocket 12 of the head mount body 10 are preferably made of a plastic material such as ethylene vinyl acetate (EVA), polyurethane (PU), or ABS resin. Each of these can be used alone or in various combinations. In a preferred embodiment, the peripheral wall 11 and storage pocket 12 are molded, such as by injection molding. The forward portion of peripheral wall 11 has a generally rectangular or box-like shape.

周壁１１の正面１１ａと収納ポケット１２との距離は、ポータブルデバイスＳＰの厚みより若干短くなるように形成され、側方向（Ｘ軸方向）からみて収納ポケット１２はＳ字形状になっている。プラスチック自体の弾性力と形状により、収納ポケット１２は、板バネのような作用で、ポータブルデバイスＳＰ（図示せず）を周壁１１の正面１１ａをしっかりしっかりと装着することができる。周壁１１の天面及び側面には、ストラップ２０（図１を参照）が取り付けられるストラップピン１９が形成されている。 The distance between the front surface 11a of the peripheral wall 11 and the storage pocket 12 is formed to be slightly shorter than the thickness of the portable device SP, and the storage pocket 12 has an S shape when viewed from the side (X-axis direction). Due to the elasticity and shape of the plastic itself, the storage pocket 12 acts like a leaf spring and can firmly attach the portable device SP (not shown) to the front face 11a of the peripheral wall 11 . A strap pin 19 to which a strap 20 (see FIG. 1) is attached is formed on the top and side surfaces of the peripheral wall 11 .

収納ポケット１２の幅Ｌ１は、ポータブルデバイスＳＰのタイプおよびサイズに応じて変更できるが、典型的には、ポータブルデバイスＳＰを把持および保持することが意図される。ポータブルデバイスＳＰの平均的な表示部のサイズは約５インチ（１２．７ｃｍ）から６．５インチ（１６．５ｃｍ）であり、ポータブルデバイスＳＰの第１面のカメラＣＡの位置は、Ｙ軸方向からみて左上側に配置されていることが多く、図１のようにポータブルデバイスＳＰを横置きすると、カメラＣＡは、右上に位置する。このカメラＣＡが隠れないように収納ポケット１２の幅Ｌ１は切り欠かれている。 The width L1 of the storage pocket 12 can vary depending on the type and size of the portable device SP, but is typically intended to grip and hold the portable device SP. The average size of the display unit of the portable device SP is about 5 inches (12.7 cm) to 6.5 inches (16.5 cm), and the position of the camera CA on the first surface of the portable device SP is often arranged on the upper left side when viewed from the Y-axis direction. When the portable device SP is laid horizontally as shown in FIG. The width L1 of the storage pocket 12 is notched so that the camera CA is not hidden.

利用者の額に接触するためのフェイス縁部１７は、ゴム、ウレタンフォーム等の柔軟な素材であることが好ましい。周壁１１とフェイス縁部１７とは接着剤または篏合などで接合される。フェイス縁部１７は、利用者の顔に接触した際は、周囲から中空１３に光が入ることを防止する。 The face edge portion 17 for contacting the user's forehead is preferably made of a flexible material such as rubber or urethane foam. The peripheral wall 11 and the face edge portion 17 are joined by an adhesive, a fitting, or the like. The face edge 17 prevents light from entering the hollow 13 from the surroundings when it contacts the user's face.

中空１３にはポータブルデバイスＳＰの表示部の映像を視聴するため一対のレンズＬＺが配置される。その一対のレンズＬＺをＹ軸方向及びＸ軸方向に移動させるため、周壁１１は焦点調製レバー１４及び瞳孔間距離調整ダイヤル１５を有している。瞳孔間距離調整ダイヤル１５は利用者によって異なる瞳孔間の距離を利用者に合わせて調整するダイヤルである。図２（Ｂ）に示されるように、レンズＬＺを保持する一対のレンズホルダー１８はそれぞれラック１８ａを有している。調整ダイヤル１５にはピニオン１５ａが形成されており、ラック１８ａとピニオン１５ａとが噛み合っている。ピニオン１５ａが回転されることで、レンズホルダー１８に保持されたレンズＬＺが中央側へ近づいたり外側に離れたりする。なお図２（Ｂ）では理解を助けるため、瞳孔間距離調整ダイヤル１５のピニオン１５ａとラック１８ａとが分離して描かれている。 A pair of lenses LZ are arranged in the hollow 13 for viewing the image on the display of the portable device SP. The peripheral wall 11 has a focus adjustment lever 14 and an interpupillary distance adjustment dial 15 for moving the pair of lenses LZ in the Y-axis direction and the X-axis direction. The interpupillary distance adjustment dial 15 is a dial for adjusting the interpupillary distance, which varies depending on the user, according to the user. As shown in FIG. 2B, each of the pair of lens holders 18 holding the lens LZ has a rack 18a. A pinion 15a is formed on the adjustment dial 15, and the rack 18a and the pinion 15a are engaged with each other. By rotating the pinion 15a, the lens LZ held by the lens holder 18 approaches the center side or moves away from the outside. In FIG. 2(B), the pinion 15a and the rack 18a of the interpupillary distance adjustment dial 15 are drawn separately to facilitate understanding.

図２（Ｃ）に示されるように、焦点調製レバー１４は、レンズＬＺの焦点距離を調整するため、利用者が調整レバーを前後（Ｙ軸方向）に移動させる。レンズホルダー１８は篏合板１８ｂが形成されており、焦点調製レバー１４のクランプ部１４ａが、篏合板１８ｂを挟み込む。レンズＬＺは、利用者の眼とポータブルデバイスＳＰの表示部との間に位置する。理解できるように、利用者の眼がレンズＬＺと位置合わせされて、利用者はポータブルデバイスＳＰの表示部を視聴するためにレンズＬＺを通して見ることができる。レンズＬＺは、ポータブルデバイスＳＰの表示部に映し出された視野内映像の左または右の不連続の領域に、利用者の視界の焦点を合わせることができる。レンズを通して利用者の視界を適切に位置合わせすることは、仮想現実アプリケーションでは特に重要である。 As shown in FIG. 2C, the focus adjustment lever 14 is moved back and forth (in the Y-axis direction) by the user to adjust the focal length of the lens LZ. A bridge 18b is formed on the lens holder 18, and the clamp 14a of the focus adjustment lever 14 clamps the bridge 18b. The lens LZ is positioned between the user's eye and the display of the portable device SP. As can be seen, the user's eyes are aligned with the lens LZ so that the user can look through the lens LZ to view the display of the portable device SP. The lens LZ can focus the user's field of vision on a left or right discontinuous area of the in-field image projected on the display of the portable device SP. Proper alignment of the user's view through the lens is especially important in virtual reality applications.

［ポータブルデバイスの構成］
図３はポータブルデバイスＳＰの機能的構成を示すブロック図である。ポータブルデバイスＳＰは、スマートフォン、タブレット型コンピュータ又はパーソナルコンピュータ等を含む。表示装置ＤＩＳは、液晶ディスプレイ又は有機ＥＬディスプレイ等の表示装置である。表示装置ＤＩＳは、利用者の右眼用と左眼用にそれぞれ別の表示画面を備えるものであってもよく、一つの表示画面のみを備えるものであってもよい。ポータブルデバイスＳＰには、ｗｅｂサイト等のストアからアプリをダウンロードしてある。そしてアプリを起動することで、ポータブルデバイスＳＰは、以下に説明する機能を有している。またアプリから複数の全天球映像のデータをダウンロードできる。 [Portable device configuration]
FIG. 3 is a block diagram showing the functional configuration of the portable device SP. Portable devices SP include smartphones, tablet computers, personal computers, and the like. The display device DIS is a display device such as a liquid crystal display or an organic EL display. The display device DIS may have separate display screens for the right and left eyes of the user, or may have only one display screen. An application is downloaded from a store such as a website to the portable device SP. By activating the application, the portable device SP has the functions described below. You can also download multiple spherical image data from the app.

回転量センサＪＳは、ヘッドマウント本体１０の動きを検出する。回転量センサＪＳは、ヨー、ロール及びピッチの３軸の回転（３ＤＯＦ；ＤｅｇｒｅｅｓＯｆＦｒｅｅｄｏｍ）を検知することができる。なお、回転量センサＪＳは、ＩＭＵ（ｉｎｅｒｔｉａｌｍｅａｓｕｒｅｍｅｎｔｕｎｉｔ：慣性計測装置）であってもよく、ジャイロセンサといった種々の組み合わせであっても良い。回転量センサＪＳがＩＭＵの場合、Ｙ軸方向（前後）、Ｘ軸方向（左右）及びＺ軸方向（上下）の３方向の移動を検出可能である。 The rotation amount sensor JS detects movement of the head mount body 10 . The rotation amount sensor JS is capable of detecting three-axis rotation (3DOF; Degrees Of Freedom) of yaw, roll, and pitch. Note that the rotation amount sensor JS may be an IMU (inertial measurement unit) or various combinations such as a gyro sensor. If the rotation amount sensor JS is an IMU, it can detect movement in three directions: the Y-axis direction (back and forth), the X-axis direction (left and right), and the Z-axis direction (up and down).

ポータブルデバイスＳＰは、さらに通信部ＷＦ、映像データ蓄積部３１、並進移動計算部３２、ハンドトラッキング部３３、ハンド画像生成部３４、ＧＵＩ画像生成部３５、視野内映像生成部３６、映像合成部３７、ＧＵＩ指示判定部３８を有している。またポータブルデバイスＳＰは、ショッピング店舗判定部４１、商品画像蓄積部４２及び商品特定判定部４３を備える。 The portable device SP further includes a communication unit WF, a video data storage unit 31, a translation calculation unit 32, a hand tracking unit 33, a hand image generation unit 34, a GUI image generation unit 35, an in-field video generation unit 36, a video synthesis unit 37, and a GUI instruction determination unit 38. The portable device SP also includes a shopping store determination unit 41 , a product image storage unit 42 and a product identification determination unit 43 .

通信部ＷＦは、５Ｇもしくは４Ｇなどの長距離通信、又はＷｉＦｉ（商標）やｂｌｕｅｔｏｏｔｈ（商標）等の短距離通信を行う。映像データ蓄積部３１は、全天球映像のデータ（以下、映像データ）を通信部ＷＦを介して取得する。全天球映像は、ある一点を中心とする３６０°の全方位の映像であり、全天球カメラを利用して撮像された映像や複数のカメラで撮影された映像を合成した映像である。映像データ蓄積部３１は、ポータブルデバイスＳＰに保存されている映像データを読出すことによって映像データを取得してもよい。映像データ蓄積部３１は、映像データを取得すると、映像データを復号し、全天球映像を生成する。映像データ蓄積部３１は、生成した全天球映像を視野内映像生成部３６に供給する。 The communication unit WF performs long-distance communication such as 5G or 4G, or short-distance communication such as WiFi (trademark) or bluetooth (trademark). The video data storage unit 31 acquires omnidirectional video data (hereinafter referred to as video data) via the communication unit WF. The omnidirectional image is a 360° omnidirectional image centered on a certain point, and is an image obtained by synthesizing images captured using an omnidirectional camera or images captured by a plurality of cameras. The video data storage unit 31 may acquire video data by reading video data stored in the portable device SP. When the image data storage unit 31 acquires the image data, the image data storage unit 31 decodes the image data and generates an omnidirectional image. The image data storage unit 31 supplies the generated omnidirectional image to the in-field image generation unit 36 .

並進移動計算部３２は、カメラＣＡで撮影した撮影画像を定期的に取得し、現実空間に存在する特徴的な点を指標として、ＸＹＺ軸方向の並進移動量を計算する。具体的には、W. A. Hoff and K. Nguyen, "Computer vision-based registration techniques for augmented reality", Proc. SPIE, vol.2904, pp. 538-548, Nov. 1996に開示されている。
なお、図３では並進移動計算部３２はカメラＣＡの撮影画像のみから並進移動を計算するように描かれているが、回転量センサＪＳがＩＭＵ（慣性計測装置）であれば、６ＤＯＦを検出することができるので、カメラＣＡで撮影した撮影画像とＩＭＵからのＸＹＺ軸方向の並進方向の信号とを組み合わせて、ＸＹＺ軸方向の並進移動量を計算しても良い。定期的に撮影された撮影画像とセンサとを使った並進移動量の計算は、具体的には、S. You and U. Neumann, "Fusion of vision and gyro tracking for robust augmented reality registration", Proc. IEEE Virtual Reality 2001, pp.71-78, Mar. 2001.に開示されている。なお、ＸＹＺ軸方向の並進移動量の精度が低くて良い場合には、ＩＭＵ（慣性計測装置）が検出する並進移動量を使い、カメラＣＡの撮影画像を使う並進移動計算部３２が無くても良い。 The translational movement calculation unit 32 periodically acquires images captured by the camera CA, and calculates the amount of translational movement in the XYZ-axis directions using characteristic points existing in the physical space as indices. Specifically, it is disclosed in WA Hoff and K. Nguyen, "Computer vision-based registration techniques for augmented reality", Proc. SPIE, vol.2904, pp. 538-548, Nov. 1996.
In FIG. 3, the translational movement calculation unit 32 is depicted to calculate the translational movement only from the captured image of the camera CA. However, if the rotation amount sensor JS is an IMU (inertial measurement unit), 6DOF can be detected. Calculation of the amount of translational movement using regularly captured images and sensors is specifically disclosed in S. You and U. Neumann, "Fusion of vision and gyro tracking for robust augmented reality registration", Proc. IEEE Virtual Reality 2001, pp.71-78, Mar. 2001. If the accuracy of the translational movement in the XYZ-axis directions can be low, the translational movement detected by an inertial measurement unit (IMU) is used, and the translational movement calculator 32 that uses the image captured by the camera CA may be omitted.

ハンドトラッキング部３３は、カメラＣＡの撮影画像に基づいて、利用者の手（ハンド）及び指を認識し、手の位置及び指の動作をトラッキングする。ハンドトラッキング部３３は、ディープラーニングモデルを使用して、カメラＣＡが撮影した撮影画像の中に、手が存在するか否かを判断することができる。 The hand tracking unit 33 recognizes the hand and fingers of the user based on the image captured by the camera CA, and tracks the position of the hand and the movement of the fingers. The hand tracking unit 33 can use a deep learning model to determine whether or not a hand is present in the captured image captured by the camera CA.

ハンド画像生成部３４は、ハンドトラッキング部３３が利用者の手を認識したことに基づいて、映像内に映し出すハンド画像５１を生成する。ハンド画像生成部３４は、利用者の手が所定時間より長く撮影画像内にあるときに、ハンド画像５１を生成することが好ましい。利用者が手を振って歩いている際に、一時的に映像視野内に手が入るような事象を排除するためである。図５（Ａ）は、表示部ＤＩＳに投影されたハンド画像５１の一例である。視野内映像にハンド画像５１を重ねて表示しても視野内映像の視聴を妨げないように、ハンド画像５１は半透明画像であることが好ましい。ハンド画像生成部３４は、予め特有なハンド画像５１（指差ししているハンド画像、親指と人差し指とで物をつかんでいるハンド画像等）を複数用意しておき、カメラＣＡの撮影画像中の手の形状に近いハンド画像５１を選択してもよい。また画像処理により撮影された手の輪郭を検出してハンド画像５１を生成しても良い。生成したハンド画像５１は映像合成部３７に供給される。カメラＣＡが撮影した撮影画像から、手が存在しなくなったら、ハンド画像生成部３４はハンド画像の生成を止め、ハンド画像が消去される。 The hand image generation unit 34 generates a hand image 51 to be displayed in the video based on the recognition of the user's hand by the hand tracking unit 33 . It is preferable that the hand image generator 34 generates the hand image 51 when the user's hand is in the captured image for longer than a predetermined time. This is to eliminate an event in which the user's hand temporarily enters the visual field while the user is waving and walking. FIG. 5A is an example of a hand image 51 projected on the display unit DIS. The hand image 51 is preferably a translucent image so that viewing of the in-field video is not disturbed even if the hand image 51 is superimposed on the in-field video. The hand image generation unit 34 may prepare a plurality of unique hand images 51 (a hand image pointing, a hand image grasping an object with the thumb and forefinger, etc.) in advance, and select the hand image 51 that is close to the shape of the hand in the image captured by the camera CA. Alternatively, the hand image 51 may be generated by detecting the contour of the photographed hand by image processing. The generated hand image 51 is supplied to the video synthesizing section 37 . When the hand disappears from the captured image captured by the camera CA, the hand image generator 34 stops generating the hand image, and the hand image is erased.

ＧＵＩ画像生成部３５は、手を認識したことに基づいて、映像に関連するＧＵＩ画像５２を生成する。ＧＵＩ画像生成部３５は、予め複数のＧＵＩ画像５２を用意しておくことが好ましい。ＧＵＩ画像５２は、例えば、巻き戻し、レビュー、停止、キュー、早送り等の指示アイコンである。図５（Ｂ）は、表示部ＤＩＳに投影されたＧＵＩ画像５２の一例である。また後述するように商品購入の指示アイコンであってもよい。なお、図３ではハンド画像生成部３４がハンド画像を生成した後にＧＵＩ画像５２を生成するように矢印が描かれている。しかし、ハンドトラッキング部３３で手を認識した後、ハンド画像５１と同時にＧＵＩ画像５２が生成されても良い。つまりＧＵＩ画像生成部３５は、利用者の手が所定時間より長く撮影画像内にあるときに、ＧＵＩ画像５２を生成することが好ましい。ＧＵＩ画像５２は、視野内映像に応じて指示アイコンの数や種類が変化することが好ましい。 The GUI image generator 35 generates a GUI image 52 related to the video based on the recognition of the hand. The GUI image generator 35 preferably prepares a plurality of GUI images 52 in advance. The GUI image 52 is, for example, instruction icons such as rewind, review, stop, cue, and fast forward. FIG. 5B is an example of a GUI image 52 projected onto the display unit DIS. Alternatively, as will be described later, it may be an instruction icon for purchasing a product. In FIG. 3, the arrow is drawn so that the GUI image 52 is generated after the hand image generating unit 34 generates the hand image. However, the GUI image 52 may be generated simultaneously with the hand image 51 after the hand tracking unit 33 recognizes the hand. In other words, it is preferable that the GUI image generator 35 generates the GUI image 52 when the user's hand is in the captured image for longer than the predetermined time. In the GUI image 52, it is preferable that the number and types of instruction icons change according to the in-field video.

ＧＵＩ画像生成部３５は生成したＧＵＩ画像５２を表示空間に定位させることが好ましい。生成したＧＵＩ画像５２は映像合成部３７に供給される。視野内映像にＧＵＩ画像５２を重ねて表示しても視野内映像の視聴を妨げないように、ＧＵＩ画像５２は半透明画像であることが好ましい。これにより、ヘッドマウント本体１０を装着した利用者が実空間で向きを変えたり並進移動しても、ＧＵＩ画像５２は、表示空間の視点を基準とした所定の位置に実質的に固定され、ＧＵＩ画像５２は移動しない。 The GUI image generator 35 preferably localizes the generated GUI image 52 in the display space. The generated GUI image 52 is supplied to the video synthesizing section 37 . The GUI image 52 is preferably a translucent image so that viewing of the in-field video is not disturbed even when the GUI image 52 is displayed superimposed on the in-field video. Thus, even if the user wearing the head mount body 10 changes direction or translates in the real space, the GUI image 52 is substantially fixed at a predetermined position based on the viewpoint of the display space, and the GUI image 52 does not move.

視野内映像生成部３６は、映像データ蓄積部３１から供給された全天球映像から、表示部ＤＩＳに表示される映像である視野内映像を生成する。視野内映像生成部３６は、回転量センサＪＳ及び並進移動計算部３２から回転量及び並進移動量を取得し、ヘッドマウント本体１０の向きや利用者の移動に応じて全天球映像の一部を抽出することによって視野内映像を生成することができる。視野内映像生成部３６は、ヨー、ロール及びピッチの３軸の回転及びＸＹＺ軸方向の移動量が検出されると、その回転及び移動量に応じて全天球映像のうち視野内映像となる範囲を移動させる。 The in-field image generation unit 36 generates an in-field image, which is an image to be displayed on the display unit DIS, from the omnidirectional image supplied from the image data storage unit 31 . The in-field image generation unit 36 acquires the amount of rotation and translation from the rotation amount sensor JS and the translation calculation unit 32, and extracts a part of the omnidirectional image according to the orientation of the head mount body 10 and the movement of the user, thereby generating the in-field image. When the in-field image generation unit 36 detects the rotation of the three axes of yaw, roll, and pitch and the amount of movement in the XYZ-axis directions, it moves the range of the in-field image of the omnidirectional image according to the rotation and movement amount.

これにより、利用者が頭部を動かすとともに前後左右上下に移動すると、ヘッドマウント本体１０の動きに追随して全天球映像のうち視野内映像となる範囲が移動し、利用者は周囲を見渡すかのように全天球映像を視聴することができる。即ち、視野内映像によって実現される全天球映像により、利用者の頭部の位置の変化に対して実質的に視点の位置が独立して制御される表示空間が形成される。なお、表示部ＤＩＳには、右目用と左目用にそれぞれ視野内映像が投影されるが、図５及び図７では、人間の脳で認識するに１つの視野内映像として描かれている。 As a result, when the user moves his/her head and moves forward, backward, left, right, up and down, the range of the omnidirectional video image within the visual field moves following the movement of the head mount body 10, and the user can view the omnidirectional video image as if looking around. That is, the omnidirectional image realized by the in-field image forms a display space in which the position of the viewpoint is substantially independently controlled with respect to the change in the position of the user's head. In-field-of-view images are projected on the display unit DIS for the right eye and the left-eye respectively, but in FIGS. 5 and 7, they are depicted as one in-field image for recognition by the human brain.

映像合成部３７は、視野内映像生成部３６から供給された視野内映像と、ハンド画像生成部３４から供給されたハンド画像と、ＧＵＩ画像生成部３５から供給されたＧＵＩ画像５２とを合成する。合成された映像は表示部ＤＩＳに投影される。 The image synthesis unit 37 synthesizes the in-field image supplied from the in-field image generation unit 36, the hand image supplied from the hand image generation unit 34, and the GUI image 52 supplied from the GUI image generation unit 35. The synthesized image is projected on the display unit DIS.

ＧＵＩ指示判定部３８は、映像合成部３７で表示部ＤＩＳに投影されたＧＵＩ画像５２に、利用者のハンド画像が所定時間（１～２秒）存在するかを判定する。ＧＵＩ画像５２に複数の指示アイコンが含まれる場合には、それらの１つの指示アイコンが特定されたかを判定する。例えば、利用者は、自身の手を移動させることで、ハンド画像をＧＵＩ画像５２の早送りボタンの指示アイコンに移動させ、自身の手を１秒間維持する。ＧＵＩ指示判定部３８は、早送りの指示があったと判断し、視野内映像生成部３６に映像を早送りするように指示する。 The GUI instruction determination unit 38 determines whether the user's hand image exists in the GUI image 52 projected on the display unit DIS by the video composition unit 37 for a predetermined time (1 to 2 seconds). If the GUI image 52 contains a plurality of instruction icons, it is determined whether one of them has been identified. For example, the user moves his/her hand to move the hand image to the fast-forward button instruction icon of the GUI image 52 and holds his/her hand for one second. The GUI instruction determination unit 38 determines that a fast-forward instruction has been given, and instructs the in-field image generation unit 36 to fast-forward the image.

ショッピング店舗判定部４１は、視野内映像生成部３６で生成される映像にショッピング店舗があるか否かを判定する。ショッピング店舗判定部４１は、ディープラーニングモデルを使用して視野内映像の中に、ショッピング店舗が存在するか否かを判定する。ディープラーニングを使用せず、予め全天球映像にショッピング店舗の存在を示す特有信号を入れておき、ショッピング店舗判定部４１が、その特有信号を検出してショッピング店舗が存在することを判定しても良い。 The shopping store determination unit 41 determines whether or not there is a shopping store in the image generated by the in-field image generation unit 36 . The shopping store determination unit 41 uses a deep learning model to determine whether or not there is a shopping store in the image within the field of view. Without using deep learning, a unique signal indicating the presence of a shopping store may be included in the omnidirectional video in advance, and the shopping store determination unit 41 may detect the unique signal and determine the presence of the shopping store.

商品画像蓄積部４２は、ショッピング店舗で販売可能な商品の画像（静止画・動画）を蓄積している。例えば商品がカバンであれば、販売可能なショルダーバッグもしくは手提げバッグ等の写真を予め撮影しておき、商品画像蓄積部４２は、それら複数の画像が蓄積している。また商品画像蓄積部４２は、商品１つ１つの商品画像を順次映像合成部３７に供給したり、サムネイル表示された複数の商品を映像合成部３７に供給したりする。 The product image storage unit 42 stores images (still images/moving images) of products that can be sold at shopping stores. For example, if the product is a bag, a photograph of a sellable shoulder bag, handbag, or the like is taken in advance, and the product image storage unit 42 stores a plurality of these images. In addition, the product image storage unit 42 sequentially supplies product images of each product to the image synthesizing unit 37 and supplies a plurality of products displayed as thumbnails to the image synthesizing unit 37 .

商品特定部４３は、映像合成部３７で表示部ＤＩＳに投影された商品画像に、利用者のハンド画像が所定時間（１～２秒）存在するかを判定し、商品を特定したかを判定する。またはハンドトラッキング部３３が利用者の手が親指と人差し指とで商品画像を摘まむような動作をトラッキングした際に、商品特定部４３は、その商品を特定したと判定する。その後、商品が購入される手続に移る。 The product specifying unit 43 determines whether the user's hand image exists for a predetermined time (1 to 2 seconds) in the product image projected on the display unit DIS by the image synthesizing unit 37, and determines whether the product has been specified. Alternatively, when the hand tracking unit 33 tracks the action of the user's hand pinching the product image with the thumb and forefinger, the product identification unit 43 determines that the product has been identified. After that, the process moves to the procedure for purchasing the product.

［ヘッドマウントシステムの動作］
図４は、ヘッドマウントシステム１００の動作を示すフローチャートである。
まず利用者はポータブルデバイスＳＰにダウンロード済のアプリを起動する（Ｓ４０１）。そして利用者はヘッドマウント本体１０の収納ポケット１２にポータブルデバイスＳＰをセットする（Ｓ４０２）。利用者は、ヘッドマウントシステム１００を装着する。 [Operation of the head-mounted system]
FIG. 4 is a flow chart showing the operation of the head mount system 100. As shown in FIG.
First, the user activates an application that has been downloaded to the portable device SP (S401). Then, the user sets the portable device SP in the storage pocket 12 of the head mount body 10 (S402). A user wears the head mounted system 100 .

視野内映像が表示部ＤＩＳに投影され（Ｓ４０３）、利用者は例えば図５（Ｃ）に示されるような視野内映像を視聴する。必要であれば、利用者は焦点調製レバー１４を移動させレンズの焦点距離を調整し、また瞳孔間距離調整ダイヤル１５を回して瞳孔間の距離に合わせてレンズを移動させる。回転量センサＪＳからのヘッドマウント本体１０の動き（ヨー、ロール及びピッチ）の信号及びポータブルデバイスＳＰのカメラＣＡに基づく並進移動量の信号に基づき、視野内映像が変化する。 The image within the field of view is projected onto the display unit DIS (S403), and the user views the image within the field of view as shown in FIG. 5C, for example. If necessary, the user moves the focus adjustment lever 14 to adjust the focal length of the lens, and turns the interpupillary distance adjustment dial 15 to move the lens according to the distance between the pupils. The image within the field of view changes based on the motion (yaw, roll, and pitch) signals of the head mount body 10 from the rotation amount sensor JS and the translation amount signal based on the camera CA of the portable device SP.

視野内映像が表示部ＤＩＳに投影されると同時に、カメラＣＡによるハンドトラッキングが開始される（Ｓ４０４）。利用者が自身の手を前方に突き出すと、ポータブルデバイスＳＰのカメラＣＡの撮影視野に利用者の手が入ってくる。そしてハンドトラッキング部３３が、利用者の手及び指を認識し、手の位置及び指の動作をトラッキングする。利用者が自身の手を継続して所定時間（例えば１秒）撮影視野内に入っている場合に（Ｓ４０５ＹＥＳ）、ハンド画像生成部３４が生成したハンド画像５１が表示部ＤＩＳに投影される（Ｓ４０６）。図５（Ｄ）に示されるように視野内映像に半透明のハンド画像５１が重ねて表示される。 At the same time that the in-field image is projected onto the display unit DIS, hand tracking by the camera CA is started (S404). When the user sticks out his/her hand forward, the user's hand enters the field of view of the camera CA of the portable device SP. A hand tracking unit 33 recognizes the user's hand and fingers, and tracks the position of the hand and the movement of the fingers. When the user's hand is continuously within the field of view for a predetermined time (for example, 1 second) (S405 YES), the hand image 51 generated by the hand image generating unit 34 is projected onto the display unit DIS (S406). As shown in FIG. 5D, a translucent hand image 51 is superimposed on the image within the field of view.

ハンド画像５１が表示部ＤＩＳに投影されると同時に又は数秒遅れて、ＧＵＩ画像生成部３５が生成したＧＵＩ画像５２が表示部ＤＩＳに投影される（Ｓ４０７）。図５（Ｅ）に示されるように視野内映像に半透明のＧＵＩ画像５２が中央に重ねて表示される。本実施形態の一例では、ＧＵＩ画像５２は、左から、１０秒巻き戻し指示アイコン、レビュー指示アイコン、停止指示アイコン、キュー指示アイコン、３０秒早送り指示アイコンが表示されている。なお、視野内映像が停止している場合には、ＧＵＩ画像５２は停止指示アイコンが再生指示アイコンに切り替わる。本実施形態では、ＧＵＩ画像５２が表示部ＤＩＳの中央に投影されているが、表示部ＤＩＳの上端もしくは下端であってもよい。このＧＵＩ画像５２は、利用者が頭を左右に動かしたりしてヘッドマウント本体１０を動かして視野内映像が変化しても、一定位置に投影されるようになっていることが好ましい。 Simultaneously with the projection of the hand image 51 onto the display unit DIS or several seconds later, the GUI image 52 generated by the GUI image generation unit 35 is projected onto the display unit DIS (S407). As shown in FIG. 5(E), a semi-transparent GUI image 52 is superimposed on the image within the field of view and displayed in the center. In one example of the present embodiment, the GUI image 52 displays, from the left, a 10-second rewind instruction icon, a review instruction icon, a stop instruction icon, a cue instruction icon, and a 30-second fast forward instruction icon. When the in-field video is stopped, the GUI image 52 switches from the stop instruction icon to the reproduction instruction icon. In this embodiment, the GUI image 52 is projected at the center of the display section DIS, but it may be projected at the upper end or the lower end of the display section DIS. It is preferable that the GUI image 52 is projected at a fixed position even if the user moves the head mount body 10 by moving the head left or right and the image within the field of view changes.

ＧＵＩ画像５２は視野内映像が変化しても、一定位置に投影されている。このため図５（Ｅ）に示されるように、利用者は自身の手を移動させることで、ハンドトラッキング部３３が例えば点線で描かれたハンド画像５１から、実線で描かれたハンド画像５１に移動し、１０秒巻き戻し指示アイコンに所定時間維持される。ＧＵＩ指示判定部３８は、ハンド画像５１がＧＵＩ画像５２の指示アイコンに存在するか否かを判断する（Ｓ４０８）。図５（Ｅ）では、ＧＵＩ指示判定部３８は、１０秒巻き戻し指示アイコンが指示されたと判定して、視野内映像生成部３６に、映像を１０秒巻き戻すように指示する（Ｓ４０９）。 The GUI image 52 is projected at a fixed position even if the in-field image changes. Therefore, as shown in FIG. 5(E), by moving the user's own hand, the hand tracking unit 33 moves from, for example, the hand image 51 drawn with a dotted line to the hand image 51 drawn with a solid line, and the 10-second rewind instruction icon is maintained for a predetermined time. The GUI instruction determination unit 38 determines whether or not the hand image 51 exists in the instruction icon of the GUI image 52 (S408). In FIG. 5E, the GUI instruction determination unit 38 determines that the 10-second rewind instruction icon has been instructed, and instructs the in-field image generation unit 36 to rewind the image by 10 seconds (S409).

一方、利用者が自身の手をカメラＣＡの撮影視野から出した場合に（Ｓ４０５ＮＯ）、図５（Ｅ）に示される映像から、図５（Ｆ）に示される映像に変わる。つまり、表示部ＤＩＳからハンド画像５１が消去され（Ｓ４１０）、同時もしくは少し遅れてＧＵＩ画像５２が消去される。 On the other hand, when the user takes his or her hand out of the field of view of the camera CA (S405 NO), the image shown in FIG. 5(E) changes to the image shown in FIG. 5(F). That is, the hand image 51 is erased from the display unit DIS (S410), and the GUI image 52 is erased at the same time or a little later.

次に、利用者が複数の映像データから視聴したい映像データを選択する場合と、映像データがショッピング映像である場合に利用者が商品を購入する場合とを、図６のフローチャート及び図７（Ａ）～図７（Ｆ）を使って説明する。 Next, a case where the user selects video data that the user wants to view from a plurality of video data and a case where the user purchases a product when the video data is a shopping video will be described with reference to the flowchart of FIG. 6 and FIGS. 7(A) to 7(F).

ポータブルデバイスＳＰのアプリを起動した際には、複数の映像データのカテゴリが選択できるようなサムネイル表示が表示部ＤＩＳに投影される（Ｓ６０１）。図７（Ａ）は、表示部ＤＩＳにその一例を示したサムネイル表示である。例えば、映像データのカテゴリには、日本旅行の映像、台湾旅行の映像、海中の映像、ショッピング映像等があり、利用者が自身の手を移動させてハンド映像５１を横矢印７１に所定時間置くことで、表示されていないサムネイル表示に移動させることができる。図７（Ａ）は、ハンド画像５１がショッピング映像のカテゴリを選択した状態を示している。すると、図７（Ｂ）に示されるように、複数のショッピング映像から１つのショッピング映像を選べるようなサムネイル表示が表示部ＤＩＳに投影される。ショッピング映像ではなく、海中の映像を視聴したい場合には、利用者は自身の手を移動させ、ハンド映像５１を上位カテゴリ矢印７３に所定時間置けばよい。本実施形態では、利用者はハンド映像５１をショッピング映像Ｂ７４に所定時間置いて、ショッピング映像Ｂを選択した例である。 When the application of the portable device SP is started, a thumbnail display is projected on the display unit DIS so that a plurality of video data categories can be selected (S601). FIG. 7A is a thumbnail display showing an example on the display unit DIS. For example, the category of video data includes a video of a trip to Japan, a video of a trip to Taiwan, an underwater video, a shopping video, etc. By moving the user's own hand and placing the hand video 51 on the horizontal arrow 71 for a predetermined time, it can be moved to a thumbnail display that is not displayed. FIG. 7A shows a state in which the hand image 51 has selected the shopping video category. Then, as shown in FIG. 7(B), a thumbnail display is projected on the display section DIS so that one shopping image can be selected from a plurality of shopping images. If the user wishes to view an underwater image instead of the shopping image, the user can move his or her hand and place the hand image 51 on the upper category arrow 73 for a predetermined period of time. In this embodiment, the user selects the shopping image B by placing the hand image 51 on the shopping image B74 for a predetermined period of time.

視野内映像生成部３６はショッピング映像Ｂを投影する（Ｓ６０２）。図７（Ｃ）は、ショッピング映像の一例であり仮想上のショッピングモール７４を示している。利用者が頭部を動かすとともに前後左右に移動すると、回転量センサＪＳ及び並進移動計算部３２から回転量及び並進移動量を取得し、視野内映像生成部３６は、ショッピングモールの映像やショッピングモール内の店舗の視野内映像を投影する。利用者が移動してショッピングモールの仮想店舗（例えばカバン店）に入ると、ショッピング店舗判定部４１が、商品を購入できるショッピング店舗であるか否かを判定する。ショッピングモールのすべての仮想店舗で商品を購入できるのであればショッピング店舗判定部４１は無くてもよいが、例えばショッピングモールの一部の仮想店舗で商品を購入できるのであれば、利用者が入った仮想店舗がショッピング店舗か否かを利用者に示すために、ショッピング店舗判定部４１が、全天球映像に含まれる特有信号を検出したり、ディープラーニングで認識したりする（Ｓ６０３）。 The in-field image generator 36 projects the shopping image B (S602). FIG. 7C is an example of a shopping image showing a virtual shopping mall 74 . When the user moves his or her head forward, backward, left, or right, the rotation amount and the translation amount are obtained from the rotation amount sensor JS and the translation calculation unit 32, and the in-field image generation unit 36 projects the image of the shopping mall and the in-field image of the store in the shopping mall. When the user moves and enters a virtual store (for example, a bag store) in the shopping mall, the shopping store determining unit 41 determines whether or not it is a shopping store where products can be purchased. If products can be purchased at all virtual stores in the shopping mall, the shopping store determination unit 41 may be omitted. However, if products can be purchased at some virtual stores in the shopping mall, for example, the shopping store determination unit 41 detects a specific signal included in the omnidirectional image or recognizes it by deep learning in order to indicate to the user whether the virtual store the user entered is a shopping store (S603).

利用者が移動している仮想店舗が商品を購入できるショッピング店舗であれば（Ｓ６０３ＹＥＳ）、ショッピング店舗判定部４１が、ショッピングのＧＵＩ画像５２ａを表示部ＤＩＳに投影する（Ｓ６０４）。図７（Ｄ）は、ショッピング店舗７５の視野内映像にショッピングのＧＵＩ画像５２ａが表示部ＤＩＳの下端に投影された例である。ショッピング店舗７５の視野内映像で利用者が頭部を動かすと視野内映像が変化するが、ショッピングのＧＵＩ画像５２ａは一定位置に投影されたままである。利用者が前後左右に移動してショッピング店舗７５から出てショッピングモール７４の通路等にいると、ショッピングのＧＵＩ画像５２ａは消去される。利用者が移動している仮想店舗が商品を購入できるショッピング店舗でなければ（Ｓ６０４ＮＯ）、引き続きショッピングモールの視野内映像が表示部ＤＩＳに投影される（Ｓ４０３）。 If the virtual store to which the user is moving is a shopping store where products can be purchased (S603 YES), the shopping store determination unit 41 projects a shopping GUI image 52a onto the display unit DIS (S604). FIG. 7(D) is an example in which a shopping GUI image 52a is projected on the lower end of the display section DIS on the video within the field of view of the shopping store 75. FIG. When the user moves the head in the visual field image of the shopping store 75, the visual field image changes, but the shopping GUI image 52a remains projected at a fixed position. When the user moves forward, backward, leftward, and rightward to leave the shopping store 75 and is in the aisle of the shopping mall 74, the shopping GUI image 52a is erased. If the virtual store to which the user is moving is not a shopping store where products can be purchased (S604 NO), the image within the field of view of the shopping mall is continuously projected on the display unit DIS (S403).

次にＧＵＩ指示判定部３８は、ハンド画像５１がショッピングのＧＵＩ画像５２ａに存在するか否かを判断する（Ｓ６０５）。図７（Ｄ）では、ハンド画像５１がショッピングのＧＵＩ画像５２ａに存在している状態を示している。ハンド画像５１がショッピングのＧＵＩ画像５２ａに存在する場合（Ｓ６０５ＹＥＳ）、商品画像蓄積部４２は、そのショッピング店舗７５の複数の商品画像（静止画もしくは動画）を映像合成部３７に供給する。ハンド画像５１がショッピングのＧＵＩ画像５２ａにない場合には、ハンドトラッキング部３３は、引き続きカメラＣＡの撮影画像に基づいて、手の位置及び指の動作をトラッキングする（Ｓ４０４）。 Next, the GUI instruction determination unit 38 determines whether or not the hand image 51 exists in the shopping GUI image 52a (S605). FIG. 7D shows a state in which the hand image 51 exists in the shopping GUI image 52a. When the hand image 51 exists in the shopping GUI image 52 a ( S<b>605 YES), the product image storage unit 42 supplies a plurality of product images (still images or moving images) of the shopping store 75 to the video synthesizing unit 37 . If the hand image 51 is not in the shopping GUI image 52a, the hand tracking unit 33 continues to track the hand position and finger motion based on the image captured by the camera CA (S404).

商品画像蓄積部４２が複数の商品画像を映像合成部３７に供給すると、図７（Ｅ）に示されるように、商品画像７６が投影されており、横矢印７１でサムネイル画像（商品画像７６）を移動させることによって利用者は別の商品画像７６を視聴することができる。なお、図７（Ｅ）ではショッピング店舗７５の視野内映像が合成されずに、利用者が商品画像７６を見やすいように商品画像７６のみが投影されている。つまり映像合成部３７は、視野内映像生成部３６から供給された視野内映像を合成していない。しかしながら、ショッピング店舗７５の視野内映像が半透明で投影されても良い。 When the product image storage unit 42 supplies a plurality of product images to the image synthesizing unit 37, the product image 76 is projected as shown in FIG. In FIG. 7E, only the product image 76 is projected so that the user can easily see the product image 76 without synthesizing the in-field video of the shopping store 75 . In other words, the image synthesis unit 37 does not synthesize the in-field image supplied from the in-field image generation unit 36 . However, the image within the field of view of the shopping store 75 may be projected semi-transparently.

利用者は自身の手を動かし、ハンドトラッキング部３３が手の位置及び指の動作をトラッキングする。商品特定判定部４３は、ハンド画像５１がある商品画像７６に所定時間維持されると商品画像７６を特定したと判定する（Ｓ６０８）。または、ハンドトラッキング部３３が利用者の手が親指と人差し指とで商品画像７６を摘まむような動作をトラッキングした際に、商品特定部４３は、その商品画像７６を特定したと判定する（Ｓ６０８）。商品画像７６が特定されるとハンド画像５１と商品画像７６とが一体化され移動可能となる。 The user moves his or her hand, and the hand tracking section 33 tracks the hand position and finger movements. The product identification determination unit 43 determines that the product image 76 has been identified when the hand image 51 is maintained on the product image 76 for a predetermined time (S608). Alternatively, when the hand tracking unit 33 tracks the motion of the user's hand pinching the product image 76 with the thumb and forefinger, the product identification unit 43 determines that the product image 76 has been identified (S608). When the product image 76 is specified, the hand image 51 and the product image 76 are integrated and become movable.

次に、商品特定判定部４３は、ハンド画像５１が商品画像７６と一体化されてカート７７に移動したか、つまりハンド画像５１がカート画像７７に存在するかを判定する（Ｓ６０９）。ハンド画像５１がカート画像７７に存在すると（Ｓ６０９ＹＥＳ）、商品画像７６がカート７７内に入り、ハンド画像５１と商品画像７７とが切り離される（Ｓ６１０）。そして図７（Ｆ）に示されるように、商品購入ＧＵＩ画像５２ｂが投影される（Ｓ６１１）。商品購入ＧＵＩ画像５２ｂには、「購入に進む」指示アイコン、「商品を戻す」指示アイコン、「ショッピングを続ける」及び「ショッピングから退出する」指示アイコン等が投影される。利用者は自身の手を動かし、これらのいずれかの指示アイコンにハンド画像５１を移動させる。ハンド画像５１が「購入に進む」指示アイコンに所定時間存在すると、利用者は商品を購入する画面に進むことができる。 Next, the product identification determination unit 43 determines whether the hand image 51 has been integrated with the product image 76 and moved to the cart 77, that is, whether the hand image 51 exists in the cart image 77 (S609). If the hand image 51 exists in the cart image 77 (S609 YES), the product image 76 enters the cart 77 and the hand image 51 and the product image 77 are separated (S610). Then, as shown in FIG. 7F, a product purchase GUI image 52b is projected (S611). A “proceed to purchase” instruction icon, a “return product” instruction icon, a “continue shopping” and “exit shopping” instruction icon, and the like are projected onto the product purchase GUI image 52b. The user moves his/her hand to move the hand image 51 to one of these instruction icons. When the hand image 51 remains on the "proceed to purchase" instruction icon for a predetermined time, the user can proceed to a screen for purchasing the product.

本実施形態では、ショッピング映像の場合に商品を購入できる例を説明したが、例えば海中の映像の途中に、シュノーケルや足ヒレの映像が出てきた際に、シュノーケルや足ヒレの購入画面が投影されてもよい。 In this embodiment, an example in which a product can be purchased in the case of a shopping image has been described, but for example, when an image of a snorkel or flippers appears in the middle of an underwater image, a screen for purchasing a snorkel or flippers may be projected.

１００ … ヘッドマウントシステム
１０ … ヘッドマウント本体
１１ … 周壁、１２ … 収納ポケット
１４ … 焦点調製レバー、１５ … 瞳孔間距離調整ダイヤル
１７ … フェイス縁部、２０ … ストラップ
３１ … 映像データ蓄積部、３２ … 並進移動計算部
３３ … ハンドトラッキング部、３４ … ハンド画像生成部
３５ … ＧＵＩ画像生成部、３６ … 視野内映像生成部
３７ … 映像合成部、３８ … ＧＵＩ指示判定部
４１ … ショッピング店舗判定部、４２ … 商品画像蓄積部
４３ … 商品特定判定部
５１ … ハンド画像、５２ … ＧＵＩ画像
ＣＡ … カメラ、ＤＩＳ … 表示部、ＪＳ … 回転量センサ
ＳＰ … ポータブルデバイス、ＷＦ … 通信部
100... Head mount system 10... Head mount main body 11... Peripheral wall 12... Storage pocket 14... Focus adjustment lever 15... Interpupillary distance adjustment dial 17... Face edge 20... Strap 31... Image data storage unit 32... Translational movement calculation unit 33... Hand tracking unit 34... Hand image generation unit 35... GUI image generating unit 36... Visual field image generating unit 37... Video synthesizing unit 38... GUI instruction determination unit 41... Shopping store determination unit 42... Product image storage unit 43... Product specific determination unit 51... Hand image 52... GUI image CA... Camera DIS... Display unit JS... Rotation amount sensor SP... Portable device WF … communication

Claims

A portable device including a camera on a first surface and a display unit on a second surface is attached to a head mount body, and a head mount system for viewing an image projected on the display unit,
The head mount body is
The portable device is stored so that the first surface faces forward, and the camera has a notched storage pocket so that the camera can shoot forward,
The portable device
a rotation amount sensor that senses the amount of rotation in the three directions of roll, pitch, and yaw;
a translational movement amount calculation unit that detects feature points from the image captured by the camera and calculates the amount of translational movement in three orthogonal axis directions;
an in-field image generation unit that generates an in-field image from an omnidirectional image based on the amount of rotation and the amount of translational movement;
a head-mounted system.

The portable device
a hand tracking unit that detects and tracks the user's hand from the image captured by the camera;
a hand image generation unit that generates a hand image when the hand is in the captured image for longer than a predetermined time;
a GUI image generation unit that generates a GUI image related to the in-field image when the hand is in the captured image for a period longer than a predetermined time;
a video synthesizing unit that synthesizes the hand image and the GUI image with the in-field video;
has
2. The head mounted system according to claim 1, wherein the in-field image, the hand image and the GUI image are projected on the display unit.

The portable device
a GUI instruction determination unit that determines whether the hand image has existed in the instruction icon of the GUI image for a predetermined time;
3. The head-mounted system according to claim 2, wherein said GUI instruction determination unit instructs execution of an operation of said instruction icon when said hand image is on said instruction icon of said GUI image.

The portable device
a shopping store determination unit that determines whether or not the in-field image is a shopping store where products can be purchased;
a product image accumulating unit for accumulating images of a plurality of products sold at the shopping store;
has
4. The head-mounted system according to claim 3, wherein said product image storage unit supplies said product image to said image synthesizing unit when said hand image is on a shopping GUI image.

The portable device
a product identification determination unit that determines whether the hand image has existed in the product image for a predetermined time;
5. The head-mounted system according to claim 4, wherein said product identification determination unit integrates said hand image and said product image, and separates said hand image and said product image in a cart image.

When the product image storage unit supplies the product image to the video synthesis unit,
5. The head-mounted system according to claim 4, wherein said video synthesizing unit does not synthesize said in-field video.

A method of storing a portable device including a camera on a first surface and a display unit on a second surface in a storage pocket of a head mount body and projecting an image within the field of view onto the display unit,
The storage pocket is notched so that the camera can shoot forward,
the portable device
Detects the amount of rotation in three directions: roll, pitch, and yaw.
Detecting feature points from the image captured by the camera and calculating the amount of translational movement in three orthogonal axis directions,
generating an in-field image from an omnidirectional image based on the amount of rotation and the amount of translation;
A method of projecting an in-field image.

the portable device
detecting and tracking a user's hand from the image captured by the camera;
generating a hand image when the hand is in the captured image for longer than a predetermined time;
generating a GUI image related to the in-field image when the hand is in the captured image for longer than a predetermined time;
synthesizing the hand image and the GUI image with the in-field video;
8. The method of projecting an in-field image according to claim 7, wherein the in-field image, the hand image and the GUI image are projected on the display unit.

the portable device
determining whether or not the hand image has existed on the pointing icon of the GUI image for a predetermined time;
9. The method of projecting an in-field image according to claim 8, wherein when the hand image is on the pointing icon of the GUI image, it indicates to perform the action of the pointing icon.

the portable device
determining whether the in-field image is a shopping store where products can be purchased;
accumulating a plurality of product images to be sold at the shopping store;
10. The method of projecting an in-field image according to claim 9, wherein the product image accumulator supplies the product image to the video synthesizer when the hand image is on a shopping GUI image.