JP2021034744A

JP2021034744A - Image presentation device and program

Info

Publication number: JP2021034744A
Application number: JP2019148569A
Authority: JP
Inventors: 数馬吉野; Kazuma Yoshino; 裕之川喜田; Hiroyuki Kawakita; 小出　大一; Daiichi Koide; 大一小出; 健介久富; Kensuke Hisatomi
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2019-08-13
Filing date: 2019-08-13
Publication date: 2021-03-01
Anticipated expiration: 2039-08-13
Also published as: JP7403256B2

Abstract

To provide an image presentation device capable of displaying a human body or a material neighbor a viewer user, or the like in a virtual space.SOLUTION: An image presentation device comprises: a reproduction video image acquisition part that acquires a reproduction video image as a video image for a reproduction; a circumference video image acquisition part that acquires a circumference video image as a video image of the circumference of an own device; a recognition part that recognizes a prescribed subject contained in the circumference video image; a distance information acquisition part that acquires distance information corresponding to the circumference video image; a mask generation part that generates mask information expressing whether or not it is a region which has to present the reproduction video image on the basis of the distance information, and expressing whether or not it is a region where the subject recognized by the recognition part exists; and a presentation part that outputs at least the reproduction video image on the basis of the mask information generated by the mask generation part.SELECTED DRAWING: Figure 1

Description

本発明は、映像提示装置およびプログラムに関する。 The present invention relates to a video presentation device and a program.

バーチャルリアリティに関する研究および開発が進んでいる。バーチャルリアリティ型のコンテンツとして、３６０度カメラを利用して全周を撮影した実写コンテンツも多くみられるようになった。これらにより、ユーザーに対して、あたかも別の場所に居るかのような体験を提供することも可能となった。バーチャルリアリティ型のコンテンツの提供には、例えば、ヘッドマウントディスプレイ（ＨＭＤ）等が使用される。 Research and development on virtual reality is in progress. As virtual reality type content, many live-action content shot around the entire circumference using a 360-degree camera has come to be seen. These have also made it possible to provide users with the experience of being in another location. For example, a head-mounted display (HMD) or the like is used to provide virtual reality type contents.

また、バーチャルリアリティ型のコンテンツが提供する仮想空間を複数のユーザーが一緒に体験するための技術も提案されている。複数のユーザーが仮想空間を一緒に体験することにより、ユーザーは、ヘッドマウントディスプレイで視野全体を覆う形での一人だけの体験ではなく、他のユーザーと楽しみを共有しながら仮想空間を体験することもできる。 In addition, a technology has been proposed for multiple users to experience the virtual space provided by virtual reality type contents together. By allowing multiple users to experience the virtual space together, the user can experience the virtual space while sharing the enjoyment with other users, rather than the experience of only one person covering the entire field of view with a head-mounted display. You can also.

例えば、非特許文献１および非特許文献２で示される技術では、他のユーザーのアバターを仮想空間に表示する。また、非特許文献３に記載された技術では、実写映像から切り出した遠隔地の他のユーザーを仮想空間に表示することも提案されている。 For example, in the techniques shown in Non-Patent Document 1 and Non-Patent Document 2, the avatars of other users are displayed in the virtual space. Further, in the technique described in Non-Patent Document 3, it is also proposed to display another user in a remote place cut out from a live-action video in a virtual space.

また、バーチャルリアリティ型コンテンツを視聴するユーザーの位置や姿勢等に応じた映像を生成して提示する装置も開発されている。例えば、特許文献１には、立体映像表示装置の構成例が記載されている。その立体映像表示装置は、ユーザーが装着する頭部装着型表示装置の個別の位置姿勢を計測する位置姿勢計測装置を備えている。そして、両眼映像生成手段は、上記の位置姿勢計測装置によって計測された位置姿勢に従って、ユーザーが３次元の原映像を観察した場合における、各ユーザーの左目に入る左目映像と、右目に入る右目映像とを生成する。これにより、その立体映像表示装置は、ユーザーの位置および姿勢に応じた映像を表示している。 In addition, a device has also been developed that generates and presents an image according to the position and posture of a user who views virtual reality type content. For example, Patent Document 1 describes a configuration example of a stereoscopic image display device. The stereoscopic image display device includes a position / posture measuring device that measures the individual position / posture of the head-mounted display device worn by the user. Then, the binocular image generation means has a left eye image that enters the left eye of each user and a right eye that enters the right eye when the user observes the three-dimensional original image according to the position / posture measured by the position / posture measuring device. Generate video. As a result, the stereoscopic image display device displays an image according to the position and posture of the user.

また、特許文献２には、撮像された現実空間における物体の移動速度に応じて、生成する画像の透明度を決定する画像処理装置が記載されている。 Further, Patent Document 2 describes an image processing device that determines the transparency of a generated image according to the moving speed of an imaged object in the real space.

特開２０１６−０１９１７０号公報Japanese Unexamined Patent Publication No. 2016-0119170 特開２０１８−０６３５６７号公報Japanese Unexamined Patent Publication No. 2018-063567

“Facebook Spaces”，Beta，Facebook, Inc.，２０１９年更新，２０１９年８月８日ダウンロード，URL https://www.facebook.com/spaces“Facebook Spaces”, Beta, Facebook, Inc., updated 2019, downloaded August 8, 2019, URL https://www.facebook.com/spaces 「まだ使ってないの？『Facebook Spaces』が切り拓くＶＲの可能性」，投稿者 caug5，株式会社ＣＡＰＡ，２０１８年５月８日，２０１９年８月８日ダウンロード，URL https://www.capa.co.jp/archives/22169"Have you not used it yet? The possibility of VR opened up by" Facebook Spaces "", Posted by caug5, CAPA Co., Ltd., May 8, 2018, August 8, 2019 Download, URL https: // www. capa.co.jp/archives/22169 Simon N.B Gunkel，Marleen D.W. Dohmen，Hans Stokking，Omar Niamut，” 360-degree photo-realistic VR conferencing”， the 26th IEEE Conference on Virtual Reality and 3D User Interfaces, posters, 2 pages, Mar.2019.Simon N.B Gunkel, Marleen D.W. Dohmen, Hans Stokking, Omar Niamut, "360-degree photo-realistic VR conferencing", the 26th IEEE Conference on Virtual Reality and 3D User Interfaces, posters, 2 pages, Mar. 2019.

しかしながら、コンテンツの世界に没入しているかのような効果をより一層得るという目的では、仮想空間を単に他のユーザーと一緒に楽しんだり活用したりするだけでなく、現実空間において一緒に存在している他のユーザーと仮想空間を共有しているかのような感覚を生じさせることが効果的である。そのためには、視聴するユーザー自身の身体の少なくとも一部や、当該ユーザーがいる場所の状況や、同時に同一コンテンツを視聴している他のユーザーの姿などの少なくとも一部がコンテンツの一部として提示されると、より一層、当該コンテンツの世界に没入しているかのような効果が得られることが期待される。 However, for the purpose of further immersing yourself in the world of content, you can not only enjoy and utilize the virtual space with other users, but also exist together in the real space. It is effective to create the feeling of sharing a virtual space with other users. For that purpose, at least a part of the user's own body to watch, the situation of the place where the user is, and the appearance of other users who are watching the same content at the same time are presented as a part of the content. Then, it is expected that the effect as if immersed in the world of the content can be obtained.

本発明は、上記の課題認識に基づいて行なわれたものであり、視聴ユーザー自身の身体や、視聴ユーザーが存在している場所や、同時に視聴している他のユーザーの姿などの少なくとも一部をバーチャル空間内に表示させることのできる映像提示装置およびプログラムを提供しようとするものである。 The present invention has been made based on the above-mentioned problem recognition, and at least a part of the viewer's own body, the place where the viewer is present, and the appearance of other users who are watching at the same time. It is intended to provide a video presentation device and a program capable of displaying an image in a virtual space.

［１］上記の課題を解決するため、本発明の一態様による映像提示装置は、再生用の映像である再生用映像を取得する再生用映像取得部と、自装置の周辺の映像である周辺映像を取得する周辺映像取得部と、前記周辺映像に含まれる所定の被写体を認識する認識部と、前記周辺映像に対応する距離情報を取得する距離情報取得部と、前記距離情報に基づいて前記再生用映像を提示すべき領域であるか否かを表すマスク情報であって、且つ、前記認識部が認識した前記被写体が存在する領域であるか否かを表すマスク情報を生成するマスク生成部と、前記マスク生成部が生成した前記マスク情報に基づいて、少なくとも前記再生用映像を出力する提示部と、を備える。 [1] In order to solve the above problems, the image presenting apparatus according to one aspect of the present invention includes a reproduction image acquisition unit that acquires a reproduction image that is a reproduction image, and a peripheral image that is an image around the own device. A peripheral image acquisition unit that acquires an image, a recognition unit that recognizes a predetermined subject included in the peripheral image, a distance information acquisition unit that acquires distance information corresponding to the peripheral image, and the distance information based on the distance information. A mask generation unit that generates mask information indicating whether or not the image for reproduction should be presented, and mask information indicating whether or not the subject is an area recognized by the recognition unit. And a presentation unit that outputs at least the reproduction video based on the mask information generated by the mask generation unit.

［２］また、本発明の一態様は、上記の映像提示装置において、前記提示部が出力する再生用映像の時間方向の再生位置を、他の映像提示装置との間で同期させる同期部、をさらに備えるものである。 [2] Further, one aspect of the present invention is a synchronization unit that synchronizes the playback position of the playback video output by the presentation unit in the time direction with another video presentation device in the above-mentioned video presentation device. Is further prepared.

［３］また、本発明の一態様は、上記の映像提示装置において、前記提示部は、前記マスク情報に基づいて、画面内の領域ごとに、前記再生用映像または前記周辺映像の少なくともいずれかを表示するように出力する、ものである。 [3] Further, in one aspect of the present invention, in the above-mentioned image presenting apparatus, the presenting unit is at least one of the reproduction image and the peripheral image for each area in the screen based on the mask information. Is output to display.

［４］また、本発明の一態様は、上記の映像提示装置において、前記周辺映像取得部が取得した前記周辺映像のうち、一部のみを切り出す提示領域抽出部、をさらに備え、前記認識部は、前記提示領域抽出部が切り出す前の前記周辺映像を基に前記所定の被写体を認識し、前記提示部は、前記提示領域抽出部によって切り出された前記周辺映像を出力する、ものである。 [4] Further, one aspect of the present invention further includes a presentation area extraction unit that cuts out only a part of the peripheral image acquired by the peripheral image acquisition unit in the image presentation device, and the recognition unit. Is to recognize the predetermined subject based on the peripheral image before the presentation area extraction unit cuts out, and the presentation unit outputs the peripheral image cut out by the presentation area extraction unit.

［５］また、本発明の一態様は、上記の映像提示装置において、前記マスク生成部は、前記再生用映像と前記周辺映像とを混合して提示する領域における混合比率の情報を含んだ前記マスク情報を生成し、前記提示部は、前記再生用映像と前記周辺映像とを混合して提示する領域においては、前記混合比率の情報に基づいて、前記再生用映像と前記周辺映像とが混合するように出力する、ものである。 [5] Further, in one aspect of the present invention, in the video presenting apparatus, the mask generation unit includes information on a mixing ratio in a region where the playback video and the peripheral video are mixed and presented. In the region where the mask information is generated and the reproduction image and the peripheral image are mixed and presented, the reproduction image and the peripheral image are mixed based on the information of the mixing ratio. It outputs as if it were.

［６］また、本発明の一態様は、上記の映像提示装置において、前記提示部は、前記再生用映像の透過度を可変とする機能を有し、前記マスク情報に応じた透過度で前記再生用映像を出力する、ものである。 [6] Further, in one aspect of the present invention, in the above-mentioned video presenting apparatus, the presenting unit has a function of changing the transparency of the playback video, and the transparency is adjusted according to the mask information. It outputs video for playback.

［７］また、本発明の一態様は、上記の映像提示装置において、前記透過度は、０以上且つ１以下の実数であり、前記提示部は、前記マスク情報に応じた透過度で前記再生用映像を出力する、ものである。 [7] Further, in one aspect of the present invention, in the above-mentioned image presenting apparatus, the transmittance is a real number of 0 or more and 1 or less, and the presenting unit reproduces the reproduction with the transmittance corresponding to the mask information. It outputs the video for the purpose.

［８］また、本発明の一態様は、上記の映像提示装置において、当該映像提示装置は、哺乳類生物（例えば、人）の頭部に装着するタイプの装置であり、当該映像提示装置の位置および姿勢を検知する位置および姿勢検知部、をさらに備え、前記提示部は、全周映像である前記再生用映像のうちの、前記位置および姿勢検知部が検知した位置および姿勢に応じた部分映像を出力する、ものである。 [8] Further, one aspect of the present invention is the above-mentioned image presenting device, in which the image presenting device is a type of device worn on the head of a mammalian organism (for example, a human), and the position of the image presenting device. The presenting unit further includes a position and a posture detecting unit for detecting the posture, and the presenting unit is a partial image corresponding to the position and the posture detected by the position and the posture detecting unit in the reproduction image which is an all-around image. Is the one that outputs.

［９］また、本発明の一態様は、再生用の映像である再生用映像を取得する再生用映像取得過程と、自装置の周辺の映像である周辺映像を取得する周辺映像取得過程と、前記周辺映像に含まれる所定の被写体を認識する認識過程と、前記周辺映像に対応する距離情報を取得する距離取得過程と、前記距離情報に基づいて前記再生用映像を提示すべき領域であるか否かを表すマスク情報であって、且つ、前記認識部が認識した前記被写体が存在する領域であるか否かを表すマスク情報を生成するマスク生成過程と、前記マスク生成過程で生成した前記マスク情報に基づいて、少なくとも前記再生用映像を出力する提示過程と、の処理をコンピューターに実行させるプログラムである。 [9] Further, one aspect of the present invention includes a playback image acquisition process for acquiring a reproduction image which is a reproduction image, and a peripheral image acquisition process for acquiring a peripheral image which is an image around the own device. Is it a recognition process for recognizing a predetermined subject included in the peripheral image, a distance acquisition process for acquiring distance information corresponding to the peripheral image, and an area for presenting the reproduction image based on the distance information? A mask generation process that generates mask information indicating whether or not the mask information indicates whether or not the subject exists in the area recognized by the recognition unit, and a mask generated in the mask generation process. Based on the information, it is a program that causes a computer to execute at least the presentation process of outputting the playback video and the processing.

本発明によれば、バーチャルリアリティの世界への没入感をより一層増すことが可能となる。 According to the present invention, it is possible to further increase the immersive feeling in the world of virtual reality.

本発明の第１実施形態による映像提示装置の概略機能構成を示すブロック図である。It is a block diagram which shows the schematic functional structure of the image presenting apparatus according to 1st Embodiment of this invention. 第１実施形態による複数の映像提示装置が相互に連携するシステムの構成例を示す概略図である。It is a schematic diagram which shows the configuration example of the system in which a plurality of image presentation devices by 1st Embodiment cooperate with each other. 第１実施形態による映像提示装置が提示する映像に関して、周辺映像取得部が取得した映像を提示するか、再生用映像取得部が取得した映像を提示するかの、領域ごとの区別を表す概略図（３次元空間を平面視した平面図）である。Regarding the video presented by the video presentation device according to the first embodiment, a schematic diagram showing a distinction for each region, whether the video acquired by the peripheral video acquisition unit is presented or the video acquired by the playback video acquisition unit is presented. (A plan view of a three-dimensional space as a plan view). 第１実施形態による提示領域抽出部による画像処理（切り出し）の例を示す概略図である。It is the schematic which shows the example of the image processing (cutting out) by the presentation area extraction part by 1st Embodiment. 第１実施形態による映像提示装置による、周辺映像取得部によって取得された映像と、再生用映像取得部５によって取得された映像との提示方法の例を示す概略図である。（Ａ）は３次元空間を横から見た図であり、（Ｂ）は映像提示装置からの距離とマスクデータの値との関係を示すグラフである。It is a schematic diagram which shows the example of the presentation method of the image acquired by the peripheral image acquisition unit, and the image acquired by the reproduction image acquisition unit 5 by the image presenting apparatus according to 1st Embodiment. (A) is a view of the three-dimensional space from the side, and (B) is a graph showing the relationship between the distance from the image presenting device and the value of the mask data. 第１実施形態による映像提示装置が提示する映像の構成例を示す概略図である。It is the schematic which shows the structural example of the image presented by the image presenting apparatus by 1st Embodiment. 第１実施形態による映像提示装置が提示する映像の１フレーム分についての処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of processing about one frame of the image presented by the image presenting apparatus by 1st Embodiment. 本発明の第２実施形態による映像提示装置の概略機能構成を示すブロック図である。It is a block diagram which shows the schematic functional structure of the image presenting apparatus according to 2nd Embodiment of this invention. 第２実施形態による映像提示装置５１が提示する映像の構成例を示す概略図である。It is the schematic which shows the structural example of the image presented by the image presenting apparatus 51 by 2nd Embodiment.

［第１実施形態］
次に、本発明の第１実施形態について、図面を参照しながら説明する。第１実施形態による映像提示装置は、ビデオシースルー方式を用いる。ビデオシースルー方式については、後で説明する。 [First Embodiment]
Next, the first embodiment of the present invention will be described with reference to the drawings. The video presentation device according to the first embodiment uses a video see-through method. The video see-through method will be described later.

図１は、本実施形態による映像提示装置の概略機能構成を示すブロック図である。図示するように、映像提示装置１は、処理部２と、周辺映像取得部３と、距離情報取得部４と、再生用映像取得部５と、位置・姿勢取得部６と、同期部７と、ディスプレイ装置９とを含んで構成される。また、上記の処理部２は、認識部２１と、提示領域抽出部２２と、マスク生成部２３と、提示部２４と、を含んで構成される。映像提示装置１が備える上記の各機能部の少なくとも一部の機能は、電子回路を用いて実現され得る。また、それらの各機能部の一部または全部が、コンピューターと、プログラムとを用いて実現されてもよい。各機能部は、必要に応じて、記憶手段を有する。記憶手段は、例えば、電子回路上において所定の状態を維持するフリップフロップや、プログラムを用いる場合のプログラム上の変数や、プログラムの実行によりアロケーションされるメモリーである。また、必要に応じて、磁気ハードディスク装置やソリッドステートドライブ（ＳＳＤ）といった不揮発性の記憶手段を用いるようにしてもよい。各部の機能は、次の通りである。 FIG. 1 is a block diagram showing a schematic functional configuration of the image presentation device according to the present embodiment. As shown in the figure, the image presentation device 1 includes a processing unit 2, a peripheral image acquisition unit 3, a distance information acquisition unit 4, a playback image acquisition unit 5, a position / attitude acquisition unit 6, and a synchronization unit 7. , And a display device 9. Further, the processing unit 2 includes a recognition unit 21, a presentation area extraction unit 22, a mask generation unit 23, and a presentation unit 24. At least a part of the functions of the above-mentioned functional units included in the image presenting device 1 can be realized by using an electronic circuit. Further, a part or all of each of these functional parts may be realized by using a computer and a program. Each functional unit has a storage means, if necessary. The storage means is, for example, a flip-flop that maintains a predetermined state on an electronic circuit, a variable on the program when a program is used, or a memory that is allocated by executing the program. Further, if necessary, a non-volatile storage means such as a magnetic hard disk device or a solid state drive (SSD) may be used. The functions of each part are as follows.

映像提示装置１は、例えば、哺乳類生物（例えば、人）の頭部に装着するタイプの装置として実現される。つまり、映像提示装置１は、映像表示装置と外部測定装置と、映像を処理するための機能（例えば、コンピューターを用いて実現される）が一体化したヘッドマウントディスプレイとして実現され得る。映像提示装置１は、ビデオシースルーあるいは光学シースルーの方式を用いて、実空間に映像（再生用映像）を重畳する用途で使用される。本実施形態では、映像提示装置１は、ビデオシースルー方式を実現する。 The image presentation device 1 is realized as, for example, a type of device worn on the head of a mammalian organism (for example, a human). That is, the image presentation device 1 can be realized as a head-mounted display in which an image display device, an external measurement device, and a function for processing an image (for example, realized by using a computer) are integrated. The image presentation device 1 is used for superimposing an image (reproduction image) on a real space by using a video see-through or optical see-through method. In the present embodiment, the video presentation device 1 realizes a video see-through method.

処理部２は、周辺映像取得部３や、距離情報取得部４や、再生用映像取得部５や、位置・姿勢取得部６や、同期部７からの情報を処理し、ディスプレイ装置９に表示させる映像を計算し、出力する。処理部２は、ヘッドマウント型の映像提示装置１内に内蔵されてもよく、ヘッドマウント型の本体とは別体として（例えば、ＰＣとして）実現されてもよい。 The processing unit 2 processes information from the peripheral image acquisition unit 3, the distance information acquisition unit 4, the playback image acquisition unit 5, the position / attitude acquisition unit 6, and the synchronization unit 7, and displays the information on the display device 9. Calculate and output the video to be displayed. The processing unit 2 may be built in the head-mounted video presentation device 1, or may be realized as a separate body (for example, as a PC) from the head-mounted main body.

周辺映像取得部３は、映像提示装置５１（自装置）の周辺の映像である周辺映像を取得する。周辺映像取得部３は、例えば、ステレオカメラを用いて実現される。また、周辺映像取得部３が用いる映像取得手段が、視野角の広い、いわゆる魚眼カメラであってもよい。 The peripheral image acquisition unit 3 acquires peripheral images that are images around the image presenting device 51 (own device). The peripheral image acquisition unit 3 is realized by using, for example, a stereo camera. Further, the image acquisition means used by the peripheral image acquisition unit 3 may be a so-called fisheye camera having a wide viewing angle.

距離情報取得部４は、周辺映像取得部３が取得する映像（周辺画像）に対応する距離情報（デプスマップ）を取得する。距離情報取得部４は、周辺映像取得部３とは別の装置として、周辺画像に対応する距離画像を取得するものであってもよい。また、距離情報取得部４は、魚眼ステレオカメラを用いて実現される周辺映像取得部３から、魚眼画像を得て距離情報を算出するものであってもよい。なお、距離画像を取得することそのものは、既存技術を利用して実現可能である。 The distance information acquisition unit 4 acquires the distance information (depth map) corresponding to the image (peripheral image) acquired by the peripheral image acquisition unit 3. The distance information acquisition unit 4 may acquire a distance image corresponding to the peripheral image as a device separate from the peripheral image acquisition unit 3. Further, the distance information acquisition unit 4 may obtain a fisheye image from the peripheral image acquisition unit 3 realized by using a fisheye stereo camera and calculate the distance information. It should be noted that the acquisition of the distance image itself can be realized by using the existing technology.

つまり、周辺映像取得部３と距離情報取得部４とは、あわせて、自装置の視点から（つまり、ほぼ、映像提示装置１を頭部に装着するユーザーの視点から）、ＲＧＢＤ（Ｒ：赤、Ｇ：緑、Ｂ：青、Ｄ：距離）情報を取得する。 That is, the peripheral image acquisition unit 3 and the distance information acquisition unit 4 are combined from the viewpoint of the own device (that is, from the viewpoint of the user who wears the image presentation device 1 on the head), RGBD (R: red). , G: green, B: blue, D: distance) Information is acquired.

再生用映像取得部５は、再生用の映像を取得する。再生用映像は、例えば、全天周（あるいは全周）を撮影した３６０度実写映像のコンテンツである。また、再生用映像は、一部または全部に、コンピューターグラフィクスを用いた映像であってもよい。再生用映像取得部５は、例えば、ＤＶＤやブルーレイディスクやハードディスク装置などといった記録媒体から、再生用映像を取得する。あるいは、再生用映像取得部５が、通信や放送等の信号で配信される再生用映像を取得するものであってもよい。再生用映像は、時系列のフレーム用画像と、適宜必要とされる音声とから成る。 The playback video acquisition unit 5 acquires the playback video. The playback video is, for example, the content of a 360-degree live-action video shot of the entire sky (or the entire circumference). Further, the playback video may be a video using computer graphics in part or in whole. The playback video acquisition unit 5 acquires playback video from a recording medium such as a DVD, a Blu-ray disc, or a hard disk device. Alternatively, the playback video acquisition unit 5 may acquire the playback video delivered by a signal such as communication or broadcasting. The playback video consists of a time-series frame image and audio that is appropriately required.

位置・姿勢取得部６は、自装置（映像提示装置１）の位置および姿勢を検知するものである。ここで、位置は、３次元空間における位置座標で表される情報である。また、姿勢は、映像提示装置１の向きを、例えば３次元の角度の情報で表した情報である。位置・姿勢取得部６は、例えば、ジャイロセンサーを内蔵することにより、位置および姿勢を取得するようにしてもよい。また、ステレオカメラの画像からも自己位置推定ができるため、位置・姿勢取得部６は、周辺映像取得部３が取得した映像から、位置および姿勢を算出してもよい。また、位置・姿勢取得部６は、外部からのビーコン信号を受信することによって、あるいは実空間内の場所を特定するためにクロックと同期して外部から発せられる赤外線等の信号を受信することによって、位置や姿勢を取得するようにしてもよい。また、位置・姿勢取得部６は、外部（例えば、コンテンツの視聴空間である部屋内）に設けられた複数のカメラが自装置（映像提示装置１）を撮影して、求められた位置および姿勢の情報を受信するようにしてもよい。 The position / posture acquisition unit 6 detects the position and posture of its own device (image presentation device 1). Here, the position is information represented by the position coordinates in the three-dimensional space. Further, the posture is information representing the orientation of the image presenting device 1 by, for example, three-dimensional angle information. The position / posture acquisition unit 6 may acquire the position and posture by incorporating, for example, a gyro sensor. Further, since the self-position can be estimated from the image of the stereo camera, the position / posture acquisition unit 6 may calculate the position and the posture from the image acquired by the peripheral image acquisition unit 3. Further, the position / attitude acquisition unit 6 receives a beacon signal from the outside, or receives a signal such as infrared rays emitted from the outside in synchronization with the clock to identify a place in the real space. , The position and posture may be acquired. Further, in the position / posture acquisition unit 6, a plurality of cameras provided outside (for example, in a room which is a viewing space for contents) take a picture of the own device (image presenting device 1), and the position and posture obtained are obtained. Information may be received.

同期部７は、他の映像提示装置１との間で相互に通信することにより、随時、情報を交換する。同期部７は、自装置（映像提示装置１）が他の映像提示装置１と同期して同一の映像コンテンツを再生する際に、提示部２４がディスプレイ装置９に表示させる再生用映像の時間方向の再生位置を、他の映像提示装置１との間で同期させる。具体的には、例えば、所定の時間間隔で、映像提示装置１の同期部７間相互で、再生するコンテンツの相対時間位置の情報を交換する。映像提示装置１は、自装置の同期部７が他装置から受信した時間情報に基づいて、再生する映像コンテンツの出力のタイミングを調整する。 The synchronization unit 7 exchanges information at any time by communicating with another video presenting device 1. When the own device (video presentation device 1) reproduces the same video content in synchronization with another video presentation device 1, the synchronization unit 7 displays the playback video on the display device 9 in the time direction. Is synchronized with the other video presentation device 1. Specifically, for example, information on the relative time position of the content to be reproduced is exchanged between the synchronization units 7 of the video presentation device 1 at predetermined time intervals. The video presentation device 1 adjusts the output timing of the video content to be reproduced based on the time information received from the other device by the synchronization unit 7 of the own device.

また、同期部７は、他の映像提示装置１と同期して同一の映像コンテンツの再生を開始する際に、自装置の位置および姿勢の情報を他の映像提示装置１に送信する。また、同期部７は、他の映像提示装置１から、その装置の位置および姿勢の情報を受信する。ここで、「他の装置」は１台であってもよいし、２台以上であってもよい。このように、同期部７が他の映像提示装置１の位置および姿勢の情報を取得することにより、再生するバーチャルリアリティコンテンツの映像の切り出し（位置および姿勢に基づく）方を、他の映像提示装置１と整合させることが可能となる。複数の映像提示装置１による協調動作のための構成については、後でも説明する。 Further, when the synchronization unit 7 starts playing the same video content in synchronization with the other video presentation device 1, the synchronization unit 7 transmits the position and posture information of the own device to the other video presentation device 1. Further, the synchronization unit 7 receives information on the position and orientation of the device from the other video presentation device 1. Here, the number of "other devices" may be one or two or more. In this way, the synchronization unit 7 acquires the position and orientation information of the other video presenting device 1 to cut out (based on the position and orientation) the video of the virtual reality content to be reproduced by the other video presenting device. It is possible to match with 1. The configuration for cooperative operation by the plurality of video presentation devices 1 will be described later.

ディスプレイ装置９は、提示部２４が出力する映像を、表示する。提示部２４が出力する映像には、周辺映像と再生用映像のいずれもが含まれる可能性がある。ディスプレイ装置９は、画面上の領域ごとに、提示部２４から渡される映像を表示する。なお、ディスプレイ装置９は、例えば立体視のためのステレオ表示を行うものであってもよい。 The display device 9 displays the image output by the presentation unit 24. The video output by the presentation unit 24 may include both a peripheral video and a playback video. The display device 9 displays an image passed from the presentation unit 24 for each area on the screen. The display device 9 may perform stereo display for stereoscopic viewing, for example.

認識部２１は、周辺映像取得部３が取得した周辺映像内の、所定の被写体（例えば、人）を認識する処理を行う。認識部２１は、機械学習により、映像における所定被写体の特徴を予め学習済みである。認識部２１は、学習済みのモデルを参照することにより、周辺映像内においてその被写体が映っている箇所（画像内の領域の座標等の情報）を特定し、その情報を出力する。認識部２１は、認識処理の結果として、周辺映像内の領域の位置情報を、マスク生成部２３に渡す。 The recognition unit 21 performs a process of recognizing a predetermined subject (for example, a person) in the peripheral image acquired by the peripheral image acquisition unit 3. The recognition unit 21 has learned in advance the characteristics of a predetermined subject in the video by machine learning. By referring to the trained model, the recognition unit 21 identifies a part (information such as coordinates of a region in the image) in which the subject is reflected in the peripheral image, and outputs the information. As a result of the recognition process, the recognition unit 21 passes the position information of the area in the peripheral image to the mask generation unit 23.

提示領域抽出部２２は、周辺映像取得部３が取得した周辺映像から、提示領域の映像（画像）を切り出す。提示領域の映像は、周辺映像全体の一部のみ（例えば、中心付近の部分）であってもよい。これにより、周辺映像取得部３が取得する映像の視野角と、ディスプレイ装置９に表示する映像の視野角とを、合わせることができる。提示領域抽出部２２は、切り出した映像（画像）を、提示部２４に渡す。なお、提示領域抽出部２２が提示領域の映像を抽出する方法については、後で別の図を参照しながら説明する。提示領域抽出部２２の処理は、後述するように、中心射影の処理を含んでよい。 The presentation area extraction unit 22 cuts out an image (image) of the presentation area from the peripheral image acquired by the peripheral image acquisition unit 3. The image in the presentation area may be only a part of the entire peripheral image (for example, a part near the center). As a result, the viewing angle of the image acquired by the peripheral image acquisition unit 3 and the viewing angle of the image displayed on the display device 9 can be matched. The presentation area extraction unit 22 passes the cut out video (image) to the presentation unit 24. The method by which the presentation area extraction unit 22 extracts the video of the presentation area will be described later with reference to another figure. The process of the presentation area extraction unit 22 may include the process of central projection, as will be described later.

提示領域抽出部２２は、また、抽出前の周辺画像と、抽出して切り出した画像との位置関係の情報を、マスク生成部２３に渡す。言い換えれば、提示領域抽出部２２は、中心射影等の手法を用いて映像を切り出した際の、周辺映像取得部３が取得した映像と、ディスプレイ装置９の視野角に合わせた映像との、位置の対応関係を、マスク生成部２３に伝える。これにより、マスク生成部２３は、提示用の座標系に合ったマスク情報を生成できるようになる。 The presentation area extraction unit 22 also passes information on the positional relationship between the peripheral image before extraction and the extracted and cut out image to the mask generation unit 23. In other words, the presentation area extraction unit 22 positions the image acquired by the peripheral image acquisition unit 3 and the image matched to the viewing angle of the display device 9 when the image is cut out by using a technique such as central projection. The correspondence between the above is transmitted to the mask generation unit 23. As a result, the mask generation unit 23 can generate mask information suitable for the coordinate system for presentation.

周辺映像取得部３が用いるカメラとして、魚眼レンズ等の広角のレンズを選定した場合に、映像提示装置１を装着したユーザーの頭が動いても、広い範囲の映像を捉えているため、認識部２１が所定の被写体を正しく認識できる可能性が高まる。つまり、認識部２１が人等の被写体を認識する場合に、その被写体をトラッキングしやすくなる。つまり、認識精度が安定する。一方で、視野角が広すぎる場合には、ディスプレイ装置９の視野角と合わない場合もあり得るが、提示領域抽出部２２の処理により、それら両者の視野角を合わせることが可能となる。 When a wide-angle lens such as a fisheye lens is selected as the camera used by the peripheral image acquisition unit 3, even if the head of the user wearing the image presentation device 1 moves, the recognition unit 21 captures a wide range of images. Increases the possibility that a predetermined subject can be correctly recognized. That is, when the recognition unit 21 recognizes a subject such as a person, it becomes easy to track the subject. That is, the recognition accuracy is stable. On the other hand, if the viewing angle is too wide, it may not match the viewing angle of the display device 9, but the processing of the presentation area extraction unit 22 makes it possible to match the viewing angles of both.

マスク生成部２３は、マスク情報を生成する。マスク情報は、距離情報取得部４から渡される距離情報に基づいて再生用映像を提示すべき領域であるか否かを表す。また、マスク情報は、認識部２１が認識した被写体が存在する領域であるか否かを表す。本実施形態では、マスク情報は、周辺画像を表示すべき領域であるか、再生用画像を表示すべき領域であるかを表す情報である。マスク生成部２３は、再生用映像と周辺映像とを混合して提示する領域における混合比率の情報を含んだマスク情報を生成してよい。 The mask generation unit 23 generates mask information. The mask information indicates whether or not it is an area in which a playback image should be presented based on the distance information passed from the distance information acquisition unit 4. Further, the mask information indicates whether or not the subject recognized by the recognition unit 21 exists in the region. In the present embodiment, the mask information is information indicating whether the peripheral image should be displayed or the reproduction image should be displayed. The mask generation unit 23 may generate mask information including information on the mixing ratio in the region where the playback image and the peripheral image are mixed and presented.

上記のように、マスク生成部２３は、距離に基づくマスクと、認識結果に基づくマスクとを生成する。これにより、提示部２４は、次のような提示を行えるようになる。例えば、自分自身の身体や、同一の空間内で同一のバーチャルリアリティコンテンツを一緒に体験している他者の身体を、バーチャルリアリティ映像の中に提示することができる。認識部２１によって認識される所定の被写体（人等）に関しては、自装置からの距離に関わらず、周辺映像の表示が行われるようにすることができる。特定の被写体（人等）以外に関しては、距離に基づく提示が行われる。つまり、自装置から比較的近い範囲の物は、周辺映像に含まれる形で、バーチャルリアリティ空間の中に提示される。また、自装置から比較的遠い範囲に存在する物は、周辺映像に含まれる形では提示されない。そのように自装置から比較的遠い範囲に存在する物が存在する領域では、再生用映像が提示される。 As described above, the mask generation unit 23 generates a mask based on the distance and a mask based on the recognition result. As a result, the presentation unit 24 can make the following presentations. For example, the body of oneself or the body of another person who is experiencing the same virtual reality content together in the same space can be presented in the virtual reality image. With respect to a predetermined subject (person or the like) recognized by the recognition unit 21, the peripheral image can be displayed regardless of the distance from the own device. For other than a specific subject (person, etc.), the presentation is based on the distance. That is, an object in a range relatively close to the own device is presented in the virtual reality space in a form included in the peripheral image. In addition, objects that exist in a range relatively far from the own device are not presented in the form included in the peripheral image. In such an area where an object existing in a range relatively far from the own device exists, a playback image is presented.

ここで、「比較的近い範囲」とは、例えば、人がその場から動くことなく（例えば、着座のまま）手を伸ばして触れられる範囲である。例えば、１メートル以内程度の範囲である。逆に「比較的遠い範囲」とは、例えば、２メートル以上程度の範囲である。その中間の距離の範囲（１メートル以上且つ２メートル以下）では、近距離用の周辺映像と、遠距離用の再生用映像とを混合した映像を提示することが考えられる。 Here, the "relatively close range" is, for example, a range in which a person can reach out and touch without moving from the place (for example, while sitting). For example, the range is within 1 meter. On the contrary, the "relatively distant range" is, for example, a range of about 2 meters or more. In the middle distance range (1 meter or more and 2 meters or less), it is conceivable to present an image in which a short-distance peripheral image and a long-distance playback image are mixed.

提示部２４は、マスク生成部２３が生成したマスク情報に基づいて、少なくとも再生用映像を出力する。また、同じくマスク情報に基づいて、近距離の領域では、周辺映像を出力する。なお、提示部２４が再生用映像を提示する場合、再生用映像全体の中から、映像提示装置１の位置および姿勢に基づいて適切な部分の映像を切り出して、表示させるようにする。つまり、提示部２４は、全周映像である再生用映像のうちの、位置・姿勢取得部６が検知した位置および姿勢に応じた部分映像を出力するようにしてよい。 The presentation unit 24 outputs at least a playback image based on the mask information generated by the mask generation unit 23. Also, based on the mask information, the peripheral image is output in the short distance area. When the presentation unit 24 presents the playback video, an appropriate portion of the video is cut out from the entire playback video based on the position and orientation of the video presentation device 1 and displayed. That is, the presentation unit 24 may output a partial image corresponding to the position and the posture detected by the position / posture acquisition unit 6 in the reproduction image which is the all-around image.

提示部２４は、再生用映像と周辺映像とを混合して提示する領域においては、マスク情報の混合比率の値に基づいて、再生用映像と周辺映像とが混合するように、それらを出力する。つまり、提示部２４は、マスク情報に基づいて、画面内の領域ごとに、再生用映像または周辺映像の少なくともいずれかを表示するように出力する。また、提示部２４は、上記の混合比率に基づいて、両映像を混合して表示するように出力してもよい。 In the region where the playback video and the peripheral video are mixed and presented, the presentation unit 24 outputs them so that the playback video and the peripheral video are mixed based on the value of the mixing ratio of the mask information. .. That is, the presentation unit 24 outputs at least one of the reproduction video and the peripheral video for each area in the screen based on the mask information. Further, the presentation unit 24 may output both images in a mixed manner based on the above mixing ratio.

図２は、複数の映像提示装置１が相互に連携するシステムの構成例を示す概略図である。図示するように、複数の映像提示装置１は、相互に通信しながら連携動作することが可能である。なお、同図では、一例として３台の映像提示装置１が同時に稼働している状況を示しているが、連携動作する映像提示装置１の数は、任意である。図示する状況では、３台の映像提示装置１は、同一の空間（例えば、同一の部屋）内で同時に稼働する。各々の映像提示装置１は、１人のユーザーによって使用される。図示する例では、３台の映像提示装置１は、無線ルーター３１を通して、且つサーバー装置３２を介して相互に情報を交換する。具体的には、各々の映像提示装置１の同期部７同士が、通信により、相互に情報を交換する。 FIG. 2 is a schematic diagram showing a configuration example of a system in which a plurality of video presentation devices 1 cooperate with each other. As shown in the figure, the plurality of video presentation devices 1 can operate in cooperation with each other while communicating with each other. Although the figure shows a situation in which three video presentation devices 1 are operating at the same time as an example, the number of video presentation devices 1 that cooperate with each other is arbitrary. In the illustrated situation, the three video presentation devices 1 operate simultaneously in the same space (for example, the same room). Each video presentation device 1 is used by one user. In the illustrated example, the three video presentation devices 1 exchange information with each other through the wireless router 31 and via the server device 32. Specifically, the synchronization units 7 of each video presentation device 1 exchange information with each other by communication.

複数の映像提示装置１同士が交換する主な情報は、次の２種類である。
第１は、映像提示装置１の位置および姿勢である。コンテンツ再生開始時において、映像提示装置１は、自装置の位置および姿勢の情報を他の映像提示装置１に通知する。同時に、映像提示装置１は、他装置の位置および姿勢の情報を受け取る。このように複数の映像提示装置１のそれぞれが他の映像提示装置１の位置および姿勢の情報を取得することにより、それらの映像提示装置１が同時に同一コンテンツを再生する場合に、再生用映像から切り出す部分映像を、映像提示装置１間で整合させることが可能となる。
第２は、再生用映像を再生する際の、映像提示装置１間での再生タイミングを合わせるための情報である。具体的には、例えば、映像提示装置１は、コンテンツの再生の時間位置の情報を相互に交換する。このような情報交換を、所定の時間間隔ごとに行うようにしてもよい。これにより、複数の映像提示装置１間で、同じタイミングで同一のコンテンツを再生することが可能となる。 The main types of information exchanged between the plurality of video presentation devices 1 are the following two types.
The first is the position and orientation of the video presentation device 1. At the start of content reproduction, the video presentation device 1 notifies another video presentation device 1 of information on the position and posture of its own device. At the same time, the image presenting device 1 receives information on the position and orientation of the other device. In this way, when each of the plurality of video presentation devices 1 acquires the position and orientation information of the other video presentation devices 1 so that the video presentation devices 1 simultaneously reproduce the same content, the playback video can be used. The partial video to be cut out can be matched between the video presentation devices 1.
The second is information for matching the reproduction timing between the image presentation devices 1 when reproducing the reproduction image. Specifically, for example, the video presentation device 1 exchanges information on the time position of content reproduction with each other. Such information exchange may be performed at predetermined time intervals. As a result, the same content can be reproduced at the same timing between the plurality of video presentation devices 1.

なお、図２では複数台の映像提示装置１が相互に連携して動作する構成を示したが、映像提示装置１は、他の映像提示装置１と連携する形態ではなく、単独の形態でも動作し得るものである。また、複数台の映像提示装置１のうちの例えば１台がサーバーの機能を兼ね備えるようにして、各々の映像提示装置１の情報を収集したり、収集した情報を各々の映像提示装置１に配信したりする形態としてもよい。 Although FIG. 2 shows a configuration in which a plurality of video presentation devices 1 operate in cooperation with each other, the video presentation device 1 does not operate in cooperation with other video presentation devices 1 but also operates in a single form. It is possible. Further, for example, one of the plurality of video presentation devices 1 has a server function to collect information on each video presentation device 1 and distribute the collected information to each video presentation device 1. It may be in the form of a server.

図３は、映像提示装置１が提示する映像に関して、周辺映像取得部３が取得した映像を提示するか、再生用映像取得部５が取得した映像を提示するかの、領域ごとの区別を表す概略図である。同図は、映像提示装置１を使用する空間（例えば、部屋内）を平面視した平面図である。 FIG. 3 shows a distinction for each region between the video acquired by the peripheral video acquisition unit 3 and the video acquired by the playback video acquisition unit 5 with respect to the video presented by the video presentation device 1. It is a schematic diagram. The figure is a plan view of a space (for example, inside a room) in which the image presenting device 1 is used.

同図において、符号１００は、部屋等の領域である。領域１００は、例えば、壁等によって囲われていてもよい。符号１０１は、領域１００内で、映像提示装置１を使用することによって映像を視聴しようとするユーザーである。また、符号１０２および１０３のそれぞれは、ユーザー１０１とは別の人である。人１０２は、ユーザー１０１の比較的近くに存在している。人１０３は、ユーザー１０１から比較的遠い位置に存在している。また、符号１０８は、領域１００内の床上に置かれているテーブルである。また、符号１２１は、領域１００内で、且つ、ユーザー１０１から所定の距離内にある範囲の副領域である。副領域１２１は、破線で示されている。 In the figure, reference numeral 100 is an area such as a room. The area 100 may be surrounded by, for example, a wall or the like. Reference numeral 101 is a user who intends to view a video by using the video presentation device 1 in the area 100. Further, each of the reference numerals 102 and 103 is a person different from the user 101. The person 102 is relatively close to the user 101. The person 103 exists at a position relatively far from the user 101. Reference numeral 108 is a table placed on the floor in the area 100. Further, reference numeral 121 is a sub-region within the range 100 and within a predetermined distance from the user 101. The sub-region 121 is indicated by a broken line.

本実施形態では、前述の通り、提示部２４は、所定距離内にある物体が存在する領域等と、距離に関わらず人であると認識された領域に関しては、周辺映像取得部３によって取得された映像を提示する。また、提示部２４は、上記領域（所定距離内にある物体が存在する領域等と、距離に関わらず人であると認識された領域）以外の領域に関しては、再生用映像取得部５によって取得された映像を提示する。つまり、図３においてハッチングで示した領域に関しては、ユーザー１０１が使用する映像提示装置１の提示部２４は、周辺映像取得部３によって取得された映像を提示する。ここで、「ハッチングで示した領域」とは、境界線１３１の内側であって且つ領域１００内である領域と、境界線１３２の内側である領域（なお、その領域はすべて領域１００内である）とである。 In the present embodiment, as described above, the presentation unit 24 is acquired by the peripheral image acquisition unit 3 with respect to the area where the object within a predetermined distance exists and the area recognized as a person regardless of the distance. Present the video. Further, the presentation unit 24 acquires the area other than the above area (the area where an object within a predetermined distance exists and the area recognized as a person regardless of the distance) by the playback video acquisition unit 5. Present the video. That is, with respect to the area shown by hatching in FIG. 3, the presentation unit 24 of the video presentation device 1 used by the user 101 presents the video acquired by the peripheral video acquisition unit 3. Here, the "regions indicated by hatching" are a region inside the boundary line 131 and within the region 100 and a region inside the boundary line 132 (note that all the regions are within the region 100). ) And.

なお、映像提示装置１は、３次元空間内における距離に基づいて、周辺映像取得部３によって取得された映像と、再生用映像取得部５によって取得された映像の、いずれの映像を表示するかを制御する。また、映像提示装置１は、３次元空間内において撮像された像の認識結果（人を含む領域であるか否か）に応じて、周辺映像取得部３によって取得された映像と、再生用映像取得部５によって取得された映像の、いずれの映像を表示するかを制御する。ここで説明した図３は、３次元空間のうちの高さ方向の次元を省略して、平面図に投射した状態を表している。 The video presenting device 1 displays which video, the video acquired by the peripheral video acquisition unit 3 or the video acquired by the playback video acquisition unit 5, is displayed based on the distance in the three-dimensional space. To control. Further, the image presenting device 1 sets the image acquired by the peripheral image acquisition unit 3 and the image for reproduction according to the recognition result (whether or not the area includes a person) of the image captured in the three-dimensional space. It controls which of the images acquired by the acquisition unit 5 is displayed. FIG. 3 described here shows a state of being projected onto a plan view by omitting the dimension in the height direction in the three-dimensional space.

図４は、提示領域抽出部２２による画像処理の例を示す概略図である。同図（Ａ）は、周辺映像取得部３が撮影する視野角（ＦＯＶ，Field of View）での画像の例を示す。同図（Ｂ）は、同図（Ａ）の画像を基に、提示領域抽出部２２が抽出した結果の視野角での画像の例である。図示するように、周辺映像取得部３が、例えば焦点距離の短いレンズ（例えば、魚眼レンズ、またはそれに近いレンズ）を使用して、非常に広い視野角の画像（映像）を撮影するようにしてよい。提示領域抽出部２２は、そのような広い視野角の画像をから、例えば中心射影により、ディスプレイ装置９（例えば、ヘッドマウントディスプレイ）の視野角に合わせた切り出しを行う。なお、ここでの射影の方式は中心射影には限定されず、例えば平行射影等を用いてもよい。 FIG. 4 is a schematic view showing an example of image processing by the presentation area extraction unit 22. FIG. 3A shows an example of an image at a viewing angle (FOV, Field of View) taken by the peripheral image acquisition unit 3. FIG. (B) is an example of an image at a viewing angle as a result of extraction by the presentation area extraction unit 22 based on the image of FIG. (A). As shown in the figure, the peripheral image acquisition unit 3 may use, for example, a lens having a short focal length (for example, a fisheye lens or a lens close thereto) to capture an image (image) having a very wide viewing angle. .. The presentation area extraction unit 22 cuts out an image having such a wide viewing angle according to the viewing angle of the display device 9 (for example, a head-mounted display) by, for example, central projection. The projection method here is not limited to the central projection, and for example, a parallel projection or the like may be used.

図５は、映像提示装置１による、周辺映像取得部３によって取得された映像と、再生用映像取得部５によって取得された映像との提示方法の例を示す概略図である。同図は、映像提示装置１が、周辺映像取得部３によって取得された映像（便宜的に、「実空間映像」と呼ぶ）を提示する領域と、再生用映像取得部５によって取得された映像（便宜的に、「バーチャル映像」と呼ぶ）を提示する領域と、これらの実空間映像とバーチャル映像とを混合した映像（便宜的に、「混合映像」と呼ぶ）を提示する領域とを示す。 FIG. 5 is a schematic view showing an example of a method of presenting the video acquired by the peripheral video acquisition unit 3 and the video acquired by the playback video acquisition unit 5 by the video presentation device 1. In the figure, the image presenting device 1 presents an image (referred to as “real space image” for convenience) acquired by the peripheral image acquisition unit 3, and the image acquired by the playback image acquisition unit 5. An area for presenting an image (referred to as "virtual image" for convenience) and an area for presenting an image (referred to as "mixed image" for convenience) in which these real space images and virtual images are mixed are shown. ..

同図（Ａ）は、映像提示装置１を使用してコンテンツを視聴するユーザー１０１からの距離と、提示する映像との関係を示す概略図である。この図における横軸は、ユーザー１０１からの距離を表す。ユーザー１０１からの距離ｄ（単位は、メートル（ｍ））に応じて、Ｒ１、Ｒ２、Ｒ３という３つの領域に分かれている。０≦ｄ＜１の範囲は、領域Ｒ１である。領域Ｒ１に関しては、映像提示装置１は、実空間映像（real）を提示する。１≦ｄ＜２の範囲は、領域Ｒ２である。領域Ｒ２に関しては、映像提示装置１は、混合映像（mixed）を提示する。２≦ｄの範囲は、領域Ｒ３である。領域Ｒ３に関しては、映像提示装置１は、バーチャル映像（virtual）を提示する。 FIG. (A) is a schematic diagram showing the relationship between the distance from the user 101 who views the content using the video presentation device 1 and the video to be presented. The horizontal axis in this figure represents the distance from the user 101. It is divided into three areas, R1, R2, and R3, according to the distance d (unit: meter (m)) from the user 101. The range of 0 ≦ d <1 is the region R1. With respect to the region R1, the video presenter 1 presents a real real space video. The range of 1 ≦ d <2 is the region R2. With respect to region R2, the video presenter 1 presents a mixed video. The range of 2 ≦ d is the region R3. With respect to the area R3, the video presentation device 1 presents a virtual video (virtual).

同図（Ｂ）は、同図（Ａ）に示す表示方法を実現するためのマスクデータの例を示すグラフである。このグラフの横軸は、映像提示装置１のユーザーからの距離である。またこのグラフの縦軸は、マスクデータの値ｍである。ｍ＝０．０は、実空間映像のみを表示する（つまり、バーチャル映像の比率が０．０である）ことに対応する。ｍ＝１．０は、バーチャル映像のみを表示する（つまり、バーチャル映像の比率が１．０である）ことに対応する。０．０＜ｍ＜１．０の範囲にあるｍは、混合映像の表示におけるバーチャル映像の比率を表す。同図（Ｂ）に示す例では、０≦ｄ＜１の場合（領域Ｒ１）に、ｍ＝０．０である。また、１≦ｄ＜２の場合（領域Ｒ２）に、０．０＜ｍ＜１．０で、ｍは可変である。一例として、ｍ＝ｄ−１．０である。また、２≦ｄの場合（領域Ｒ３）に、ｍ＝１．０である。 FIG. 3B is a graph showing an example of mask data for realizing the display method shown in FIG. The horizontal axis of this graph is the distance from the user of the video presentation device 1. The vertical axis of this graph is the mask data value m. m = 0.0 corresponds to displaying only the real space image (that is, the ratio of the virtual image is 0.0). m = 1.0 corresponds to displaying only the virtual image (that is, the ratio of the virtual image is 1.0). M in the range of 0.0 <m <1.0 represents the ratio of the virtual image in the display of the mixed image. In the example shown in FIG. 3B, m = 0.0 when 0 ≦ d <1 (region R1). Further, when 1 ≦ d <2 (region R2), 0.0 <m <1.0 and m is variable. As an example, m = d-1.0. Further, when 2 ≦ d (region R3), m = 1.0.

つまり、マスク生成部２３は、距離情報取得部４から渡される距離画像に基づき、距離ｄに応じたマスク値（バーチャル映像の割合）ｍの値を画素値とするマスク画像を生成する。また、提示部２４は、マスク生成部２３から渡されるマスク画像の各画素の値（ｍの値）に応じて、実空間映像、バーチャル映像、または混合映像（混合比率はｍに依る）を適宜提示する。 That is, the mask generation unit 23 generates a mask image in which the value of the mask value (ratio of virtual images) m corresponding to the distance d is the pixel value based on the distance image passed from the distance information acquisition unit 4. Further, the presentation unit 24 appropriately produces a real space image, a virtual image, or a mixed image (the mixing ratio depends on m) according to the value (m value) of each pixel of the mask image passed from the mask generation unit 23. Present.

ここで、図５では、認識部２１による認識処理の結果を省略している。実際には、認識部２１の認識処理の結果に基づき、人が存在している領域では、映像提示装置１からの距離には依らず、実空間映像を提示する。つまり、人が存在している領域に関して、マスク生成部２３が生成するマスク画像では、ｍ＝０．０である。 Here, in FIG. 5, the result of the recognition process by the recognition unit 21 is omitted. Actually, based on the result of the recognition process of the recognition unit 21, the real space image is presented in the area where a person exists, regardless of the distance from the image presenting device 1. That is, with respect to the region where a person exists, m = 0.0 in the mask image generated by the mask generation unit 23.

なお、図５では、混合映像を表示する領域において、例として、ｍ＝ｄ−１．０とした。しかしながら、ｍとｄとの関係はこの数式で表す関係に限定されない。ｍとｄとを、その他の対応関係としてもよい。なお、ｍの値を、ｄの値に対して広義単調増加としてよい。 In FIG. 5, m = d-1.0 is set as an example in the region for displaying the mixed image. However, the relationship between m and d is not limited to the relationship expressed by this mathematical formula. m and d may be other correspondence relationships. The value of m may be a monotonous increase in a broad sense with respect to the value of d.

また、図５では、混合映像を表示する領域として領域Ｒ２が設けられている。しかしながら、混合映像を表示する領域が必ずしも設けられなくてもよい。一例として、ｄ≦１．５の範囲においては領域Ｒ１（実空間映像を表示する領域であり、ｍ＝０．０である）として、１．５＜ｄの範囲においては領域Ｒ３（バーチャル映像を表示する領域であり、ｍ＝１．０である）としてもよい。 Further, in FIG. 5, a region R2 is provided as a region for displaying the mixed video. However, the area for displaying the mixed image does not necessarily have to be provided. As an example, in the range of d ≦ 1.5, the area R1 (the area for displaying the real space image, m = 0.0) is set, and in the range of 1.5 <d, the area R3 (virtual image is displayed). It is an area to be displayed, and m = 1.0).

図６は、本実施形態の映像提示装置１が提示する映像の構成例を示す概略図である。図示するように、映像提示装置１の提示部２４は、（１）の再生用映像（バーチャル映像映像）と、（２）の周辺映像（実空間映像）とを合成した結果である提示映像を提示する。同図において、符号３０１は、再生用映像である。この再生用映像３０１は、元の全天周映像の一部を切り出して得られた映像である。また、符号３０２は、周辺映像である。また、符号３０３は、映像提示装置１からの距離と、周辺映像の認識結果とに基づいて生成されたマスク映像である。周辺映像３０２とマスク映像３０３とを合成して、マスクされた周辺映像３０４が得られる。マスク映像３０３の白の部分は、周辺映像３０２のために割り当てた領域である。マスク映像３０３のハッチング部分は、再生用映像３０１のために割り当てた領域である。そして、再生用映像３０１と、マスクされた周辺映像３０４とを合成することにより、提示映像３０５が得られる。なお、再生用映像３０１のサイズとマスクされた周辺映像３０４のサイズとは異なり、再生用映像３０１のサイズのほうが大きい。マスクされた周辺映像３０４は、再生用映像３０１の所定の一部領域に合成される。提示部２４は、このように、ビデオシースルー方式で合成した提示映像を、ディスプレイ装置９に表示する。提示部２４は、このビデオシースルー方式においては、全天周映像（再生用映像）の中に、映像提示装置１のユーザー自身の身体の映像と、当該ユーザー自身の近傍（所定距離内）の周辺映像と、同じ空間内（部屋内）に存在している人（例えば、同一のコンテンツを同時に視聴、体験している人）の身体の映像とを、重畳する。こういったビデオシースルー方式の映像を表示するのに向いているのは、例えば、ユーザーの頭部に装着するタイプのヘッドマウントディスプレイである。 FIG. 6 is a schematic view showing a configuration example of a video presented by the video presentation device 1 of the present embodiment. As shown in the figure, the presentation unit 24 of the video presentation device 1 displays a presentation video that is the result of synthesizing the playback video (virtual video video) of (1) and the peripheral video (real space video) of (2). Present. In the figure, reference numeral 301 is a reproduction video. The playback video 301 is a video obtained by cutting out a part of the original all-sky video. Reference numeral 302 is a peripheral image. Further, reference numeral 303 is a mask image generated based on the distance from the image presenting device 1 and the recognition result of the peripheral image. A masked peripheral image 304 is obtained by synthesizing the peripheral image 302 and the mask image 303. The white portion of the mask image 303 is an area allocated for the peripheral image 302. The hatched portion of the mask image 303 is an area allocated for the reproduction image 301. Then, the presentation image 305 is obtained by synthesizing the reproduction image 301 and the masked peripheral image 304. Note that the size of the playback image 301 is larger than the size of the masked peripheral image 304. The masked peripheral image 304 is combined with a predetermined partial area of the reproduction image 301. The presentation unit 24 displays the presentation video synthesized by the video see-through method on the display device 9 in this way. In this video see-through method, the presentation unit 24 includes an image of the user's own body of the image presenting device 1 and the periphery of the user's own vicinity (within a predetermined distance) in the all-sky image (reproduction image). The image and the image of the body of a person (for example, a person who is simultaneously viewing and experiencing the same content) existing in the same space (inside the room) are superimposed. For example, a head-mounted display that is worn on the user's head is suitable for displaying such video see-through type images.

図７は、映像提示装置１が提示する映像の１フレーム分についての処理の手順を示すフローチャートである。なお、具体的な処理の手順としては、同図に提示する処理と等価な内容の他の手順を用いてもよい。以下、このフローチャートに沿って手順を説明する。 FIG. 7 is a flowchart showing a processing procedure for one frame of the video presented by the video presentation device 1. As a specific processing procedure, another procedure having the same content as the processing presented in the figure may be used. Hereinafter, the procedure will be described according to this flowchart.

ステップＳ１１において、周辺映像取得部３は、１フレーム分の周辺映像を取得し、その画像を処理部２に転送する。
ステップＳ１２において、距離情報取得部４は、映像の１フレーム分に相当する距離画像（デプスマップ）を取得し、その距離画像を処理部２に転送する。 In step S11, the peripheral image acquisition unit 3 acquires the peripheral image for one frame and transfers the image to the processing unit 2.
In step S12, the distance information acquisition unit 4 acquires a distance image (depth map) corresponding to one frame of the video, and transfers the distance image to the processing unit 2.

ステップＳ１３において、認識部２１は、周辺映像取得部３から渡された周辺映像の１フレーム分に基づいて、認識処理を行い、人が映っている領域を特定する。認識部２１は、周辺映像における人の領域の位置情報を出力する。認識部２１は、認識処理の結果をマスク生成部２３に渡す。 In step S13, the recognition unit 21 performs recognition processing based on one frame of the peripheral image passed from the peripheral image acquisition unit 3, and identifies an area in which a person is reflected. The recognition unit 21 outputs the position information of the human area in the peripheral image. The recognition unit 21 passes the result of the recognition process to the mask generation unit 23.

ステップＳ１４において、提示領域抽出部２２は、ステップＳ１１において周辺映像取得部３から渡された周辺映像の１フレーム分の画像を、中心射影画像に変換する。
ステップＳ１５において、提示領域抽出部２２は、ステップＳ１４で変換した結果である中心射影画像から、提示領域を抽出する。提示領域とは、中心射影画像のうち、ディスプレイ装置９に表示する部分の領域である。提示領域抽出部２２は、抽出した提示領域の画像を、提示部２４に渡す。また、提示領域抽出部２２は、抽出した提示領域の位置に関する情報を、マスク生成部２３に渡す。 In step S14, the presentation area extraction unit 22 converts an image for one frame of the peripheral image passed from the peripheral image acquisition unit 3 in step S11 into a central projection image.
In step S15, the presentation area extraction unit 22 extracts the presentation area from the central projection image which is the result of the conversion in step S14. The presentation area is an area of a portion of the central projected image to be displayed on the display device 9. The presentation area extraction unit 22 passes the extracted image of the presentation area to the presentation unit 24. Further, the presentation area extraction unit 22 passes information regarding the position of the extracted presentation area to the mask generation unit 23.

ステップＳ１６において、マスク生成部２３は、距離情報取得部４から受け取った距離画像と、認識部２１から受け取った人の映っている領域の位置情報とに基づいて、映像の提示のためのマスクを生成する。マスクは、例えば、画素ごとの、バーチャル映像の比率の値（ｍ；０．０≦ｍ≦．０）のマトリックスであってよい。また、マスクのデータを別の形態で表すようにしてもよい。マスクは、言い換えれば、バーチャル映像と実空間映像とを合成する際の透過度の値のマトリックスである。マスク生成部２３は、生成したマスクを、提示部２４に渡す。 In step S16, the mask generation unit 23 creates a mask for presenting an image based on the distance image received from the distance information acquisition unit 4 and the position information of the area in which the person is reflected received from the recognition unit 21. Generate. The mask may be, for example, a matrix of virtual image ratio values (m; 0.0 ≦ m ≦ .0) for each pixel. Further, the mask data may be represented in another form. The mask is, in other words, a matrix of transparency values when synthesizing a virtual image and a real space image. The mask generation unit 23 passes the generated mask to the presentation unit 24.

ステップＳ１７において、提示部２４は、マスク生成部２３から渡されたマスクのデータに基づいて、周辺映像取得部３が取得した画像（実空間映像）と、再生用映像取得部５が取得した画像（バーチャル映像）とを合成する。このとき、提示部２４は、再生用映像取得部５から渡される全周映像のフレームのうち、位置・姿勢取得部６から渡される当該映像提示装置１自身の位置および姿勢に基づく所定部分のみを、提示のために切り出す。そして、提示部２４は、画素ごとに、上記の混合比率（ｍ）による混合を行う。 In step S17, the presentation unit 24 has an image (real space image) acquired by the peripheral image acquisition unit 3 and an image acquired by the playback image acquisition unit 5 based on the mask data passed from the mask generation unit 23. Combine with (virtual image). At this time, the presentation unit 24 selects only a predetermined portion of the all-around video frame passed from the playback video acquisition unit 5 based on the position and orientation of the video presentation device 1 itself passed from the position / posture acquisition unit 6. , Cut out for presentation. Then, the presentation unit 24 performs mixing according to the above mixing ratio (m) for each pixel.

ステップＳ１８において、提示部２４は、ステップＳ１７の処理で合成された画像をディスプレイ装置９に表示させる。 In step S18, the presentation unit 24 causes the display device 9 to display the image synthesized in the process of step S17.

以上、このフローチャートに示した一連の処理が、映像の１フレーム分の処理である。映像提示装置は、毎フレーム、この一連の処理を行う。このようにして、提示部２４は、再生用映像取得部５が取得した再生用映像と、周辺映像取得部が取得した周辺映像とを合成して、動画として提示する。 As described above, the series of processes shown in this flowchart is the process for one frame of the video. The video presentation device performs this series of processing every frame. In this way, the presentation unit 24 synthesizes the playback video acquired by the playback video acquisition unit 5 and the peripheral video acquired by the peripheral video acquisition unit 5 and presents the video as a moving image.

このフローチャートに示した処理のバリエーションの例は、次の通りである。
例えば、ステップＳ１３における認識処理を、ステップＳ１４で変換した結果である中心射影画像に基づいて行うようにしてもよい。
例えば、ステップＳ１４における中心射影画像への変換処理を行わないようにしてもよい。
例えば、ステップＳ１６のマスク生成の処理において、距離に基づくマスクと、認識結果に基づくマスクとを、別々に作成してから後で合成してもよいし、それらの両マスクを最初から１枚のマスクのデータとして作成してもよい。
例えば、ステップＳ１６のマスク生成の処理において、混合映像を表示する領域（０．０＜ｍ＜１．０である領域）がないようにしてもよい。その場合、距離や画像認識結果に応じて、ｍ＝０．０の領域とｍ＝１．０の領域との境界においてｍの値が不連続的に変化する。
また、例えば、論理的な矛盾が生じない範囲内で、フローチャートに示した各処理の順序を変えてもよい。 An example of a variation of the processing shown in this flowchart is as follows.
For example, the recognition process in step S13 may be performed based on the central projection image that is the result of conversion in step S14.
For example, the conversion process to the central projected image in step S14 may not be performed.
For example, in the mask generation process of step S16, a mask based on the distance and a mask based on the recognition result may be created separately and then combined later, or both masks may be combined from the beginning. It may be created as mask data.
For example, in the mask generation process of step S16, there may be no region (region where 0.0 <m <1.0) for displaying the mixed video. In that case, the value of m changes discontinuously at the boundary between the region of m = 0.0 and the region of m = 1.0 according to the distance and the image recognition result.
Further, for example, the order of each process shown in the flowchart may be changed within a range in which no logical contradiction occurs.

本実施形態による映像提示装置１の構成および処理をまとめると、次の通りである。 The configuration and processing of the video presentation device 1 according to the present embodiment are summarized as follows.

映像提示装置１は、少なくとも、再生用映像取得部５と、周辺映像取得部３と、認識部２１と、距離情報取得部４と、マスク生成部２３と、提示部２４とを備える。再生用映像取得部５は、再生用の映像である再生用映像を取得する。周辺映像取得部３は、自装置の周辺の映像である周辺映像を取得する。認識部２１は、前記周辺映像に含まれる所定の被写体（例えば、人）を認識する。距離情報取得部４は、前記周辺映像に対応する距離情報を取得する。マスク生成部２３は、マスク情報を生成する。マスク情報は、前記距離情報に基づいて前記再生用映像を提示すべき領域であるか否かを表す。また、マスク情報は、認識部２１が認識した前記被写体が存在する領域であるか否かを表す。提示部２４は、マスク生成部２３が生成した前記マスク情報に基づいて、少なくとも前記再生用映像を出力する。なお、マスク情報は、例えば、画面上の位置に対応して、再生用映像を提示すべき領域であるか、周辺映像（あるいは、周辺の状況）を提示すべき領域であるかを表す。また、マスク情報が、再生用映像と周辺映像との混合比率の情報（数値）を持っていてもよい。典型的な場合において、比較的短距離の領域と、所定の被写体が認識されている領域とにおいて、周辺映像を提示することとする。また、比較的長距離の領域において、再生用映像を提示することとする。なお、それらの中間の距離の領域において、両映像を混合して提示するようにしてもよい。混合比率は、例えば、距離に応じたリニアな値としてもよい。 The image presentation device 1 includes at least a playback image acquisition unit 5, a peripheral image acquisition unit 3, a recognition unit 21, a distance information acquisition unit 4, a mask generation unit 23, and a presentation unit 24. The playback video acquisition unit 5 acquires a playback video, which is a playback video. The peripheral image acquisition unit 3 acquires a peripheral image which is an image around the own device. The recognition unit 21 recognizes a predetermined subject (for example, a person) included in the peripheral image. The distance information acquisition unit 4 acquires the distance information corresponding to the peripheral image. The mask generation unit 23 generates mask information. The mask information indicates whether or not the reproduction video should be presented based on the distance information. Further, the mask information indicates whether or not the subject recognized by the recognition unit 21 is in the region where the subject exists. The presentation unit 24 outputs at least the reproduction video based on the mask information generated by the mask generation unit 23. The mask information indicates, for example, whether it is an area in which a playback image should be presented or an area in which a peripheral image (or a peripheral situation) should be presented, corresponding to a position on the screen. Further, the mask information may have information (numerical value) of the mixing ratio of the playback image and the peripheral image. In a typical case, a peripheral image is presented in a relatively short-distance region and an region in which a predetermined subject is recognized. In addition, the playback video will be presented in a relatively long-distance region. In addition, in the region of the intermediate distance between them, both images may be mixed and presented. The mixing ratio may be, for example, a linear value according to the distance.

映像提示装置１が、同期部７を備えてもよい。同期部７は、提示部２４が出力する再生用映像の時間方向の再生位置を、他の映像提示装置との間で同期させる。自映像提示装置と、他映像提示装置とは、通信により、随時情報を交換できる。 The image presenting device 1 may include a synchronization unit 7. The synchronization unit 7 synchronizes the playback position of the playback video output by the presentation unit 24 in the time direction with another video presentation device. Information can be exchanged at any time between the self-image presenting device and another image presenting device by communication.

提示部２４は、前記マスク情報に基づいて、画面内の領域ごとに、前記再生用映像または前記周辺映像の少なくともいずれかを表示するように出力する。提示部２４は、上記の混合比率に基づいて、両映像を混合して表示するように出力してもよい。 Based on the mask information, the presentation unit 24 outputs so as to display at least one of the reproduction video and the peripheral video for each area in the screen. The presentation unit 24 may output both images in a mixed manner based on the above mixing ratio.

映像提示装置１が、提示領域抽出部２２を備えてもよい。提示領域抽出部２２は、周辺映像取得部３が取得した前記周辺映像のうち、一部（例えば、中心の部分）のみを切り出す機能を有する。この場合も、認識部２１は、提示領域抽出部２２が切り出す前の周辺映像を基に前記の被写体の認識処理を行ってもよい。一方で、提示部２４は、提示領域抽出部２２によって切り出された部分の周辺映像を表示するように出力する。 The image presentation device 1 may include a presentation area extraction unit 22. The presentation area extraction unit 22 has a function of cutting out only a part (for example, a central portion) of the peripheral image acquired by the peripheral image acquisition unit 3. In this case as well, the recognition unit 21 may perform the subject recognition process based on the peripheral image before the presentation area extraction unit 22 cuts out. On the other hand, the presentation unit 24 outputs so as to display the peripheral image of the portion cut out by the presentation area extraction unit 22.

マスク生成部２３は、前記再生用映像と前記周辺映像とを混合して提示する領域における混合比率の情報を含んだ前記マスク情報を生成してよい。提示部２４は、前記再生用映像と前記周辺映像とを混合して提示する領域においては、マスク情報の混合比率の値に基づいて、前記再生用映像と前記周辺映像とが混合するように出力する。 The mask generation unit 23 may generate the mask information including the information of the mixing ratio in the region where the reproduction image and the peripheral image are mixed and presented. In the region where the playback video and the peripheral video are mixed and presented, the presentation unit 24 outputs the playback video and the peripheral video so as to be mixed based on the value of the mixing ratio of the mask information. To do.

当該映像提示装置１は、一例として、哺乳類生物（例えば、人等）の頭部に装着するタイプの装置である。映像提示装置１は、当該映像提示装置１の位置および姿勢を検知する位置および姿勢検知部（位置・姿勢取得部６）をさらに備えてよい。提示部２４は、全周映像である再生用映像のうちの、位置・姿勢取得部６が検知した位置および姿勢に応じた部分映像を出力するようにしてよい。 As an example, the image presentation device 1 is a type of device worn on the head of a mammalian organism (for example, a human being). The image presentation device 1 may further include a position and posture detection unit (position / posture acquisition unit 6) for detecting the position and posture of the image presentation device 1. The presentation unit 24 may output a partial image corresponding to the position and the posture detected by the position / posture acquisition unit 6 in the reproduction image which is the all-around image.

以上、説明したように、本実施形態によれば、バーチャルリアリティの映像コンテンツ（例えば、実写３６０度映像のバーチャルリアリティ等）を提示する際、視聴者自身の身体や、視聴者の近傍の物体や、他の人（例えば、同一のコンテンツを視聴する人）の身体を、自然な形で一緒に提示する。これにより、コンテンツへの没入感や、他の人との状況の共有を、より一層楽しむことができるようになる。 As described above, according to the present embodiment, when presenting virtual reality video content (for example, virtual reality of a live-action 360-degree video), the viewer's own body, an object in the vicinity of the viewer, or the like , Present the body of another person (for example, a person who watches the same content) together in a natural way. This makes it possible to enjoy the immersive feeling in the content and the sharing of the situation with other people.

つまり、映像提示装置１を使用するユーザーにとっては、ユーザー自身の身体がそのままの状態で画面に表示されることで、バーチャルリアリティ空間内でより現実感のある体験をすることができる。また、ユーザーは、同じ空間にいる他の人と一緒にバーチャルリアリティ体験を共有することができる。また、ステレオで表示する場合、現実の距離感を保持したままコンテンツを提示できる。また、ヘッドマウント型の装置構成として場合に、ユーザーは、映像提示装置１を身に着けてコンテンツを視聴することができ、その自由な行動が妨げられない。例えば、ユーザー自身の身体を計測するための外部センサー等の都合でユーザーの行動範囲が制約されない。また、ユーザーは、自己から比較的近距離にある現実の物を、実際に触った感覚を得ながら、操作することができる。これにより、例えばバーチャルリアリティ映像を見ながら食事をするなどのような、「ながら見」も可能となる。 That is, for the user who uses the image presentation device 1, the user's own body is displayed on the screen as it is, so that a more realistic experience can be experienced in the virtual reality space. Users can also share their virtual reality experience with others in the same space. Further, when displaying in stereo, the content can be presented while maintaining the actual sense of distance. Further, in the case of the head-mounted device configuration, the user can wear the video presentation device 1 to view the content, and the free action is not hindered. For example, the range of action of the user is not restricted due to the convenience of an external sensor for measuring the user's own body. In addition, the user can operate a real object that is relatively close to himself / herself while getting the feeling of actually touching it. This enables "watching while watching", for example, eating while watching a virtual reality image.

［第２実施形態］
次に、本発明の第２実施形態について説明する。なお、前実施形態において既に説明した事項については以下において説明を省略する場合がある。ここでは、本実施形態に特有の事項を中心に説明する。第２実施形態による映像提示装置は、光学シースルー方式を用いる。光学シースルー方式については、後で説明する。 [Second Embodiment]
Next, the second embodiment of the present invention will be described. The matters already described in the previous embodiment may be omitted below. Here, the matters peculiar to the present embodiment will be mainly described. The image presenting apparatus according to the second embodiment uses an optical see-through method. The optical see-through method will be described later.

図８は、本実施形態による映像提示装置の概略機能構成を示すブロック図である。この映像提示装置５１は、前実施形態で説明した映像提示装置１と一部類似の構成を有する。映像提示装置５１は、映像提示装置１における処理部２に代えて、処理部５２を備える。また、映像提示装置５１は、映像提示装置１におけるディスプレイ装置９に代えて、ディスプレイ装置５９を備える。本実施形態の処理部５２の特徴は、第１実施形態における提示部２４に代えて提示部７４を有する点である。また、処理部５２は、第１実施形態における提示領域抽出部２２に代えて提示領域抽出部２２を有する。 FIG. 8 is a block diagram showing a schematic functional configuration of the image presentation device according to the present embodiment. The video presentation device 51 has a configuration partially similar to that of the video presentation device 1 described in the previous embodiment. The video presentation device 51 includes a processing unit 52 in place of the processing unit 2 in the video presentation device 1. Further, the image presenting device 51 includes a display device 59 instead of the display device 9 in the image presenting device 1. A feature of the processing unit 52 of the present embodiment is that it has a presentation unit 74 instead of the presentation unit 24 in the first embodiment. Further, the processing unit 52 has a presentation area extraction unit 22 instead of the presentation area extraction unit 22 in the first embodiment.

ディスプレイ装置５９は、光学シースルー方式による提示に適したタイプの装置である。つまり、ディスプレイ装置５９は、全画面のうちの一部の領域においては再生用映像取得部５が取得した再生用映像を表示する。また、ディスプレイ装置５９は、上記の再生用映像を表示しない領域においては、映像提示装置５１を使用するユーザーが、当該ユーザー自身の周辺を、光学的に見ることができるようにする。ディスプレイ装置５９は、一形態として、表示画面が持つ画素の各々について、ＲＧＢ信号に基づく再生用映像の一部（当該画素部分）を表示するか、画素を透過状態としてユーザーの目で周辺の状況を視認できるようにするかを制御する機能を有する。 The display device 59 is a type of device suitable for presentation by an optical see-through method. That is, the display device 59 displays the playback video acquired by the playback video acquisition unit 5 in a part of the entire screen. Further, the display device 59 enables the user who uses the video presentation device 51 to optically see the surroundings of the user himself / herself in the area where the playback video is not displayed. As one form, the display device 59 displays a part of the reproduction image (the pixel part) based on the RGB signal for each of the pixels of the display screen, or sets the pixels in a transparent state and shows the surrounding situation with the eyes of the user. It has a function of controlling whether or not to make the image visible.

提示領域抽出部７２は、周辺映像取得部３が取得した映像の中の提示すべき領域を抽出する。提示領域抽出部７２は、周辺映像取得部３が取得した広い視野角の映像のうち、例えば中心射影により、ディスプレイ装置５９の視野角に合わせた切り出しを行う。本実施形態では、提示領域抽出部７２は、切り出した映像そのものを提示部７４に渡すことはしない。前述の通り、本実施形態では光学シースルー方式を用いるため、周辺映像取得部３が取得した映像自体をディスプレイ５９に表示させる必要がないためである。ただし、提示領域抽出部７２は、中心射影により映像を切り出した際の、周辺映像取得部３が取得した映像と、ディスプレイ装置５９の視野角に合わせた映像との、位置の対応関係を、マスク生成部２３に伝える。 The presentation area extraction unit 72 extracts an area to be presented in the image acquired by the peripheral image acquisition unit 3. The presentation area extraction unit 72 cuts out an image having a wide viewing angle acquired by the peripheral image acquisition unit 3 according to the viewing angle of the display device 59, for example, by center projection. In the present embodiment, the presentation area extraction unit 72 does not pass the cut-out video itself to the presentation unit 74. As described above, since the optical see-through method is used in this embodiment, it is not necessary to display the image itself acquired by the peripheral image acquisition unit 3 on the display 59. However, the presentation area extraction unit 72 masks the positional correspondence between the image acquired by the peripheral image acquisition unit 3 and the image adjusted to the viewing angle of the display device 59 when the image is cut out by the central projection. Notify the generation unit 23.

提示部７４は、マスク生成部２３が生成したマスクに基づき、再生用映像取得部５が取得した映像を提示する。本実施形態では、提示部７４は、周辺映像をディスプレイ装置５９に表示させる代わりに、再生用映像の透過度（ｍ）を制御して、視聴者がマスクに表されている透過度で実空間を見ることができるようにする。 The presentation unit 74 presents the video acquired by the playback video acquisition unit 5 based on the mask generated by the mask generation unit 23. In the present embodiment, the presentation unit 74 controls the transmittance (m) of the playback image instead of displaying the peripheral image on the display device 59, so that the viewer can see the transparency represented by the mask in the real space. To be able to see.

つまり、提示部７４は、再生用映像の透過度を可変とする機能を有する。提示部７４は、マスク情報に応じた透過度で再生用映像を出力する。周辺映像に相当する状況を提示すべき箇所においては、提示部７４は、再生用映像を透過とする。これにより、視聴者は、実空間を視認することが可能となる。また、透過度を、０以上且つ１以下の実数としてよい。その中間の領域（透過度が０より大きく、１より小さい領域）において、提示部７４は、マスク情報の数値に応じた透過度で再生用映像を出力する。つまり、提示部７４は、再生用映像と実空間とが所定の混合比率で混合した状態で視聴者に視認されるよう、映像出力を制御する。このような機能を有することにより、提示部７４は、光学シースルーを実現する。言い換えれば、提示部７４は、提示する全天周映像のうち、自分の身体および同じ空間内で一緒に体験している人の身体と、近傍部分とをマスクして、再生用映像を提示する。 That is, the presentation unit 74 has a function of changing the transparency of the playback video. The presentation unit 74 outputs a playback image with transparency according to the mask information. Where the situation corresponding to the peripheral image should be presented, the presentation unit 74 transmits the reproduction image. This allows the viewer to visually recognize the real space. Further, the transparency may be a real number of 0 or more and 1 or less. In the intermediate region (the region where the transmittance is larger than 0 and smaller than 1), the presentation unit 74 outputs the playback video with the transmittance corresponding to the numerical value of the mask information. That is, the presentation unit 74 controls the video output so that the viewer can see the playback video and the real space in a mixed state at a predetermined mixing ratio. By having such a function, the presentation unit 74 realizes optical see-through. In other words, the presentation unit 74 masks the body of the person who is experiencing together with his / her own body and the body of the person who is experiencing together in the same space, and the vicinity portion of the whole sky image to be presented, and presents the image for reproduction. ..

本実施形態による映像提示装置５１の処理の手順は、基本的には、前実施形態での処理の手順と同様である。ただし、本実施形態では、周辺映像取得部３が取得した周辺映像をディスプレイ装置に表示する代わりに、映像提示装置５１は、映像内の当該領域において、視聴者が、実空間を見ることができるように映像の提示を制御する。 The processing procedure of the image presentation device 51 according to the present embodiment is basically the same as the processing procedure in the previous embodiment. However, in the present embodiment, instead of displaying the peripheral image acquired by the peripheral image acquisition unit 3 on the display device, the image presenting device 51 allows the viewer to see the real space in the area in the image. The presentation of the image is controlled so as to.

図９は、本実施形態の映像提示装置５１が提示する映像の構成例を示す概略図である。図示するように、映像提示装置５１の提示部７４は、（１）の実空間と、（２）の再生用映像（マスクされる部分あり）との組み合わせである（３）の提示映像（光学シースルー方式）を、ユーザーに対して提示する。同図において、符号３１１は、ディスプレイ装置５９を透過してユーザーの目で見える実空間である。また、符号３１２は、再生用映像である。この再生用映像３１２は、元の全天周映像の一部を切り出して得られた映像である。また、符号３１３は、映像提示装置５１からの距離と、周辺映像の認識結果とに基づいて生成されたマスク映像である。マスク映像３１３の白の部分は、ユーザーに実空間（符号３１１）を見せるように割り当てた領域である。マスク映像３１３のハッチング部分は、再生用映像３１２のために割り当てた領域である。提示部７４は、再生用映像３１２のうちのマスク部分（マスク映像３１３において白で示される領域）をマスクするよう制御する。即ち、ディスプレイ装置５９において、再生用映像３１２のうちのマスク部分には、何も表示されず（符号３１４の状態）、当該部分が透過状態となる。これにより、マスク部分については、ユーザーは透過状態となっているディスプレイ装置９の画面を通して、実空間（符号３１１）を視認する。つまり、ユーザーは、提示映像（符号３１５）として、マスクされた再生用映像３１４とマスク部分の実空間とが合成された状態の映像を視認する。なお、再生用映像３１２のサイズとマスク映像３１３のサイズとは異なり、再生用映像３１２のサイズのほうが大きい。マスク映像３１３は、再生用映像３１１の所定の一部領域に割り当てられる。提示部７４は、このように、光学シースルー方式で合成した提示映像（符号３１５）を、ディスプレイ装置５９に表示する。提示部７４は、この光学シースルー方式においては、得られる効果として、全天周映像（再生用映像）から切り出された映像の中に、映像提示装置１のユーザー自身の身体（ディスプレイ装置５９を透過して視認され得る実空間）と、当該ユーザー自身の近傍（所定距離内）の実空間（ディスプレイ装置５９を透過して視認され得る実空間）と、同じ空間内（部屋内）に存在している人（例えば、同一のコンテンツを同時に視聴、体験している人）の身体（ディスプレイ装置５９を透過して視認され得る実空間）とが、透過して視認可能な状態での提示を行う。こういった光学シースルー方式の映像を表示するのに向いているのは、例えば、ユーザーの頭部に装着するタイプのヘッドマウントディスプレイである。特に、ディスプレイ画面上の各画素を、透過状態にするか、再生用画像内の画素を表示している状態にするかを制御することのできるヘッドマウントディスプレイである。 FIG. 9 is a schematic view showing a configuration example of a video presented by the video presentation device 51 of the present embodiment. As shown in the figure, the presentation unit 74 of the image presentation device 51 is a combination of the real space of (1) and the reproduction image (with a masked portion) of (2), and the presentation image (optical) of (3). See-through method) is presented to the user. In the figure, reference numeral 311 is a real space that is visible to the user through the display device 59. Reference numeral 312 is a reproduction video. The playback video 312 is a video obtained by cutting out a part of the original all-sky video. Further, reference numeral 313 is a mask image generated based on the distance from the image presenting device 51 and the recognition result of the peripheral image. The white portion of the mask image 313 is an area allocated so that the user can see the real space (reference numeral 311). The hatched portion of the mask image 313 is an area allocated for the reproduction image 312. The presentation unit 74 controls to mask the masked portion (the area shown in white in the masked image 313) of the reproduced image 312. That is, in the display device 59, nothing is displayed on the mask portion of the reproduction video 312 (state of reference numeral 314), and the portion is in a transparent state. As a result, the user visually recognizes the real space (reference numeral 311) of the mask portion through the screen of the display device 9 which is in the transparent state. That is, the user visually recognizes the image in which the masked reproduction image 314 and the real space of the masked portion are combined as the presented image (reference numeral 315). Note that the size of the playback video 312 is different from the size of the playback video 312 and the size of the mask video 313, and the size of the playback video 312 is larger. The mask image 313 is assigned to a predetermined partial area of the reproduction image 311. The presentation unit 74 displays the presentation image (reference numeral 315) synthesized by the optical see-through method on the display device 59 in this way. In this optical see-through method, the presentation unit 74 transmits the user's own body (display device 59) of the image presentation device 1 in the image cut out from the all-sky image (reproduction image) as an effect obtained. It exists in the same space (inside the room) as the real space (the real space that can be visually recognized through the display device 59) and the real space (the real space that can be visually recognized through the display device 59) in the vicinity (within a predetermined distance) of the user himself / herself. The body of a person (for example, a person who is viewing and experiencing the same content at the same time) (a real space that can be visually recognized through the display device 59) is presented in a transparent and visible state. For example, a head-mounted display that is worn on the user's head is suitable for displaying such an optical see-through type image. In particular, it is a head-mounted display capable of controlling whether each pixel on the display screen is in a transparent state or a state in which a pixel in a reproduction image is displayed.

本実施形態に特有の構成をまとめると、次の通りである。 The configuration peculiar to this embodiment is summarized as follows.

提示部７４は、再生用映像の透過度を可変とする機能を有する。提示部７４は、マスク情報に応じた透過度で再生用映像を出力する。周辺映像に相当する状況を提示すべき箇所においては、提示部７４は、再生用映像を透過とする。これにより、視聴者は、実空間を視認することが可能となる。また、透過度を、０以上且つ１以下の実数としてよい。その中間の領域（透過度が０より大きく、１より小さい領域）において、提示部７４は、マスク情報の数値に応じた透過度で前記再生用映像を出力する。つまり、提示部７４は、再生用映像と実空間とが所定の混合比率で混合した状態で視聴者に視認されるよう、映像出力を制御する。 The presentation unit 74 has a function of making the transparency of the playback video variable. The presentation unit 74 outputs a playback image with transparency according to the mask information. Where the situation corresponding to the peripheral image should be presented, the presentation unit 74 transmits the reproduction image. This allows the viewer to visually recognize the real space. Further, the transparency may be a real number of 0 or more and 1 or less. In the intermediate region (the region where the transmittance is larger than 0 and smaller than 1), the presentation unit 74 outputs the reproduction video with the transmittance corresponding to the numerical value of the mask information. That is, the presentation unit 74 controls the video output so that the viewer can see the playback video and the real space in a mixed state at a predetermined mixing ratio.

以上、説明したように、本実施形態によれば、光学シースルー方式により、第１実施形態と同様の、あるいは第１実施形態と類似の、コンテンツ提示方法を実現することが可能となる。 As described above, according to the present embodiment, it is possible to realize a content presentation method similar to or similar to the first embodiment by the optical see-through method.

なお、上述した各実施形態における映像提示装置が有する機能の少なくとも一部をコンピューターで実現することができる。その場合、この機能を実現するためのプログラムをコンピューター読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピューターシステムに読み込ませ、実行することによって実現しても良い。なお、ここでいう「コンピューターシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピューター読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、ＵＳＢメモリー等の可搬媒体、コンピューターシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピューター読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、一時的に、動的にプログラムを保持するもの、その場合のサーバーやクライアントとなるコンピューターシステム内部の揮発性メモリーのように、一定時間プログラムを保持しているものも含んでも良い。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピューターシステムにすでに記録されているプログラムとの組み合わせで実現できるものであっても良い。 It should be noted that at least a part of the functions of the video presentation device in each of the above-described embodiments can be realized by a computer. In that case, a program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by a computer system and executed. The term "computer system" as used herein includes hardware such as an OS and peripheral devices. The "computer-readable recording medium" is a portable medium such as a flexible disk, a magneto-optical disk, a ROM, a CD-ROM, a DVD-ROM, or a USB memory, or a storage device such as a hard disk built in a computer system. Say that. Furthermore, a "computer-readable recording medium" is a device that temporarily and dynamically holds a program, such as a communication line when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. In that case, it may include a program that holds a program for a certain period of time, such as a volatile memory inside a computer system that serves as a server or a client. Further, the above-mentioned program may be a program for realizing a part of the above-mentioned functions, and may be a program for realizing the above-mentioned functions in combination with a program already recorded in the computer system.

なお、上記の実施形態の変形例を実施するようにしてもよい。
変形例の一例として、同期部７を持たない映像提示装置を実施してもよい。この場合、複数の映像提示装置間で情報を交換して、同一のコンテンツを同時に提示するための同期を図ることはできない。
変形例の別の一例として、提示領域抽出部７２が提示領域の抽出を行わないようにしてもよい。例えば、周辺映像取得部３が取得する周辺映像の視野角と、ディスプレイ装置に周辺映像を表示させるときの視野角とが、同一である場合、または近い場合には、両者の視野角をあわせるための提示領域抽出部７２の処理を省略することができる。 It should be noted that a modified example of the above embodiment may be implemented.
As an example of the modification, a video presentation device that does not have the synchronization unit 7 may be implemented. In this case, it is not possible to exchange information between a plurality of video presentation devices and achieve synchronization for presenting the same content at the same time.
As another example of the modification, the presentation area extraction unit 72 may not extract the presentation area. For example, when the viewing angle of the peripheral image acquired by the peripheral image acquisition unit 3 and the viewing angle when displaying the peripheral image on the display device are the same or close to each other, the viewing angles of both are matched. The processing of the presentation area extraction unit 72 of the above can be omitted.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although the embodiments of the present invention have been described in detail with reference to the drawings, the specific configuration is not limited to this embodiment, and includes designs and the like within a range that does not deviate from the gist of the present invention.

本発明は、例えば、映像コンテンツを提示するための装置や、映像コンテンツを提示するサービス等に利用することができる。但し、本発明の利用範囲はここに例示したものには限られない。 The present invention can be used, for example, in a device for presenting video content, a service for presenting video content, and the like. However, the scope of use of the present invention is not limited to those exemplified here.

１映像提示装置
２処理部
３周辺映像取得部
４距離情報取得部
５再生用映像取得部
６位置・姿勢取得部
７同期部
９ディスプレイ装置
２１認識部
２２提示領域抽出部
２３マスク生成部
２４提示部
３１無線ルーター
５１映像提示装置
５２処理部
５９ディスプレイ装置
７２提示領域抽出部
７４提示部 1 Video presentation device 2 Processing unit 3 Peripheral image acquisition unit 4 Distance information acquisition unit 5 Playback video acquisition unit 6 Position / orientation acquisition unit 7 Synchronization unit 9 Display device 21 Recognition unit 22 Presentation area extraction unit 23 Mask generation unit 24 Presentation unit 31 Wireless router 51 Video presentation device 52 Processing unit 59 Display device 72 Presentation area extraction unit 74 Presentation unit

Claims

A playback video acquisition unit that acquires playback video, which is playback video,
A peripheral image acquisition unit that acquires peripheral images that are images of the surroundings of the own device,
A recognition unit that recognizes a predetermined subject included in the peripheral image,
A distance information acquisition unit that acquires distance information corresponding to the peripheral image,
Mask information indicating whether or not the playback image should be presented based on the distance information, and mask indicating whether or not the subject recognized by the recognition unit exists in the area. A mask generator that generates information and
A presentation unit that outputs at least the playback video based on the mask information generated by the mask generation unit.
A video presentation device including.

A synchronization unit that synchronizes the playback position of the playback video output by the presentation unit in the time direction with another video presentation device.
The video presentation device according to claim 1, further comprising.

Based on the mask information, the presenting unit outputs at least one of the playback video and the peripheral video for each area in the screen.
The video presentation device according to claim 1 or 2.

A presentation area extraction unit that cuts out only a part of the peripheral image acquired by the peripheral image acquisition unit,
With more
The recognition unit recognizes the predetermined subject based on the peripheral image before the presentation area extraction unit cuts out.
The presentation unit outputs the peripheral image cut out by the presentation area extraction unit.
The video presentation device according to claim 3.

The mask generation unit generates the mask information including the information of the mixing ratio in the region where the playback image and the peripheral image are mixed and presented.
In the region where the playback video and the peripheral video are mixed and presented, the presentation unit outputs the playback video and the peripheral video so as to be mixed based on the information of the mixing ratio.
The video presentation device according to claim 4.

The presenting unit has a function of changing the transparency of the playback video, and outputs the playback video with the transparency according to the mask information.
The video presentation device according to claim 1 or 2.

The transparency is a real number of 0 or more and 1 or less.
The presenting unit outputs the playback video with transparency according to the mask information.
The video presentation device according to claim 6.

The image presentation device is a type of device that is worn on the head of a mammalian organism.
Position and posture detection unit that detects the position and posture of the video presentation device,
With more
The presenting unit outputs a partial image corresponding to the position and posture detected by the position and posture detecting unit in the playback image which is an all-around image.
The video presentation device according to any one of claims 1 to 7.

The playback video acquisition process to acquire the playback video, which is the playback video,
Peripheral image acquisition process to acquire peripheral image which is the image around the own device,
A recognition process for recognizing a predetermined subject included in the peripheral image and
The distance acquisition process for acquiring the distance information corresponding to the peripheral image and
Mask information indicating whether or not the playback image should be presented based on the distance information, and mask indicating whether or not the subject recognized in the recognition process exists in the area. The mask generation process to generate information and
Based on the mask information generated in the mask generation process, at least the presentation process of outputting the playback video and the presentation process.
A program that causes a computer to perform the processing of.