JP2020204856A

JP2020204856A - Image generation system and program

Info

Publication number: JP2020204856A
Application number: JP2019111714A
Authority: JP
Inventors: 健太田辺; Kenta Tanabe; 義信飯村; Yoshinobu Iimura; 祐輝大森; Yuki Omori; 敏明田村; Toshiaki Tamura; 陽介黒田; Yosuke Kuroda; 祐司飯塚; Yuji Iizuka; 亮平箕浦; Ryohei Minoura
Original assignee: Bandai Namco Amusement Inc
Current assignee: Bandai Namco Amusement Inc
Priority date: 2019-06-17
Filing date: 2019-06-17
Publication date: 2020-12-24
Also published as: CN112104857A

Abstract

To provide an image generation system and a program in which a simple system can combine an image of a subject captured by a camera and an image of a virtual space with high dignity.SOLUTION: The image generation system includes an acquisition unit for acquiring a first image of a background and a subject captured by a camera positioned in a real space and a second image of a background captured by the camera, an image generation unit for generating a virtual space image visible from a shooting virtual camera positioned at a position in a virtual space corresponding to the position of the camera, and an image composition unit for extracting an image of the subject by obtaining a difference image between the first and second images to generate a composite image in which the image of the subject is combined with the virtual space image.SELECTED DRAWING: Figure 9

Description

本発明は、画像生成システム及びプログラム等に関する。 The present invention relates to an image generation system, a program, and the like.

従来より、仮想空間において仮想カメラから見える仮想空間画像を生成する画像生成システムが知られている。仮想カメラから見える画像をＨＭＤ（頭部装着型表示装置）に表示して、バーチャルリアリティ（ＶＲ）を実現するシステムの従来技術としては、例えば特許文献１に開示される技術がある。また実写映像と仮想空間画像をブルーバックにより合成するシステムの従来技術としては、例えば特許文献２に開示される技術がある。 Conventionally, an image generation system that generates a virtual space image that can be seen from a virtual camera in a virtual space has been known. As a conventional technique of a system that realizes virtual reality (VR) by displaying an image seen from a virtual camera on an HMD (head-mounted display device), for example, there is a technique disclosed in Patent Document 1. Further, as a conventional technique of a system for synthesizing a live-action image and a virtual space image by blue background, for example, there is a technique disclosed in Patent Document 2.

特開平１１−３０９２６９号公報Japanese Unexamined Patent Publication No. 11-309269 特開２０１１−３５６３８号公報Japanese Unexamined Patent Publication No. 2011-35638

ブルーバックやグリーンバックを用いるクロマキー合成では、画像生成システムの設置場所にブルーバックやグリーバックのための大掛かりな撮影用機材を設けなければならないという問題がある。また仮想空間画像と実空間画像の画像合成の品質が低いと、仮想空間画像と実空間画像の整合性が取れなくなり、望ましい結果が得られなくなってしまう。 Chroma key composition using a blue background or a green background has a problem that a large-scale shooting equipment for a blue background or a green background must be installed at the installation location of the image generation system. Further, if the quality of the image composition of the virtual space image and the real space image is low, the virtual space image and the real space image cannot be consistent with each other, and the desired result cannot be obtained.

本発明の幾つかの態様によれば、カメラにより撮影された被写体の画像と仮想空間画像とを簡素なシステムで高品位に合成できる画像生成システム及びプログラム等を提供できる。 According to some aspects of the present invention, it is possible to provide an image generation system and a program capable of combining an image of a subject taken by a camera and a virtual space image with a simple system with high quality.

本開示の一態様は、実空間に配置されたカメラにより背景及び被写体を撮影した第１画像と、前記カメラにより前記背景を撮影した第２画像と、を取得する取得部と、前記カメラの位置に対応する仮想空間の位置に配置された撮影用の仮想カメラから見える仮想空間画像を生成する画像生成部と、前記第１画像と前記第２画像の差分画像を求めることで前記被写体の画像を抽出して、前記仮想空間画像に前記被写体の画像が合成された合成画像を生成する画像合成部と、を含むことを特徴とする画像生成システムに関係する。また本開示の一態様は、上記各部としてコンピュータを機能させるプログラム、又は該プログラムを記憶したコンピュータ読み取り可能な情報記憶媒体に関係する。 One aspect of the present disclosure is an acquisition unit for acquiring a first image in which a background and a subject are photographed by a camera arranged in a real space and a second image in which the background is photographed by the camera, and a position of the camera. An image of the subject is obtained by obtaining an image generation unit that generates a virtual space image that can be seen from a virtual camera for shooting arranged at a position in the virtual space corresponding to the above, and a difference image between the first image and the second image. The present invention relates to an image generation system, which comprises an image synthesizing unit that extracts and generates a composite image in which an image of the subject is synthesized with the virtual space image. Further, one aspect of the present disclosure relates to a program that causes a computer to function as each of the above parts, or a computer-readable information storage medium that stores the program.

本開示の一態様によれば、背景及び被写体を撮影した第１画像と、背景を撮影した第２画像が取得されると共に、これらの第１画像及び第２画像を撮影したカメラの位置に対応する仮想空間の位置に、撮影用の仮想カメラが配置され、撮影用の仮想カメラから見える仮想空間画像が生成される。そして第１画像と第２画像の差分画像を求めることで被写体の画像が抽出され、仮想空間画像に被写体の画像が合成された合成画像を生成される。このようにすれば、仮想空間画像に被写体の画像が合成された合成画像を、大掛かりな撮影用機材を必要とすることなく生成できるようになり、カメラにより撮影された被写体の画像と仮想空間画像とを簡素なシステムで高品位に合成できる画像生成システム等の提供が可能になる。 According to one aspect of the present disclosure, the first image in which the background and the subject are photographed and the second image in which the background is photographed are acquired, and the positions of the cameras in which the first image and the second image are photographed correspond to each other. A virtual camera for shooting is arranged at the position of the virtual space to be shot, and a virtual space image that can be seen from the virtual camera for shooting is generated. Then, the image of the subject is extracted by obtaining the difference image between the first image and the second image, and a composite image in which the image of the subject is combined with the virtual space image is generated. In this way, it becomes possible to generate a composite image in which the image of the subject is combined with the virtual space image without the need for large-scale shooting equipment, and the image of the subject and the virtual space image taken by the camera. It will be possible to provide an image generation system or the like that can synthesize high-quality images with a simple system.

また本開示の一態様では、前記画像合成部は、前記実空間において前記被写体が搭乗する筐体の画像を抽出し、前記仮想空間画像に前記被写体の画像及び前記筐体の画像が合成された前記合成画像を生成してもよい。 Further, in one aspect of the present disclosure, the image synthesizing unit extracts an image of a housing on which the subject is mounted in the real space, and the image of the subject and the image of the housing are combined with the virtual space image. The composite image may be generated.

このようにすれば、実空間において被写体が筐体に搭乗している場合に、筐体に被写体が搭乗している状態の実空間画像が、仮想空間画像に合成された画像を生成できるようになる。 By doing so, when the subject is on the housing in the real space, the real space image in the state where the subject is on the housing can generate an image synthesized with the virtual space image. Become.

また本開示の一態様では、前記画像合成部は、前記筐体の画像の抽出範囲を指定する筐体マスク画像を用いて、前記筐体の画像を抽出してもよい。 Further, in one aspect of the present disclosure, the image synthesizing unit may extract the image of the housing by using the housing mask image that specifies the extraction range of the image of the housing.

このような筐体マスク画像を用いることで、本来は背景として非抽出となる筐体の画像を、被写体の画像と同様に抽出できるようになる。 By using such a housing mask image, it becomes possible to extract an image of the housing that is not originally extracted as a background in the same manner as the image of the subject.

また本開示の一態様では、前記画像合成部は、前記被写体が装着する少なくとも１つのトラッキング装置からのトラッキング情報に基づいて、前記被写体の画像の抽出範囲を設定して、前記被写体の画像を抽出してもよい。 Further, in one aspect of the present disclosure, the image synthesizing unit sets an extraction range of the image of the subject based on tracking information from at least one tracking device worn by the subject, and extracts the image of the subject. You may.

このようにすれば、被写体の姿勢が変化したり、被写体が種々の動作を行った場合にも、適正な抽出範囲での被写体の画像の抽出処理を実現でき、例えば被写体として予定していない物体が被写体として抽出されてしまう事態を防止できるようになる。 By doing so, even if the posture of the subject changes or the subject performs various actions, the image extraction process of the subject within an appropriate extraction range can be realized. For example, an object that is not planned as a subject. Can be prevented from being extracted as a subject.

また本開示の一態様では、前記画像合成部は、前記トラッキング装置の位置と、前記トラッキング装置の位置から所与の距離だけシフトした位置に設定された補助点の位置とに基づいて、前記被写体の画像の前記抽出範囲を設定してもよい。 Further, in one aspect of the present disclosure, the image compositing unit is based on the position of the tracking device and the position of an auxiliary point set to a position shifted by a predetermined distance from the position of the tracking device. The extraction range of the image of the above may be set.

このようにすれば、トラッキング装置の位置だけでは適切な抽出範囲を設定できないような状況においても、補助点を用いて、被写体を内包するような抽出範囲を設定できるようになり、被写体の画像の適正な抽出処理を実現できる。 In this way, even in a situation where an appropriate extraction range cannot be set only by the position of the tracking device, it is possible to set an extraction range that includes the subject by using the auxiliary points, and the image of the subject can be set. Appropriate extraction processing can be realized.

また本開示の一態様では、前記画像生成部は、前記被写体であるプレーヤに表示されるプレーヤ用の仮想空間画像として、前記仮想空間において前記撮影用の仮想カメラの位置に対応する位置に、前記撮影用の仮想カメラの画像及び撮影者キャラクタの画像の少なくとも一方が表示される仮想空間画像を生成してもよい。 Further, in one aspect of the present disclosure, the image generation unit is set at a position corresponding to the position of the virtual camera for shooting in the virtual space as a virtual space image for the player displayed on the player who is the subject. You may generate a virtual space image in which at least one of the image of the virtual camera for shooting and the image of the photographer character is displayed.

このようにすれば、撮影用の仮想カメラにより自身が撮影されていることをプレーヤに意識させることが可能になり、プレーヤの動作やポーズを誘引するなどの演出効果の実現が可能になる。 In this way, it becomes possible to make the player aware that he / she is being photographed by the virtual camera for photography, and it is possible to realize an effect such as inducing the movement or pose of the player.

また本開示の一態様では、前記被写体であるプレーヤが装着し、前記仮想空間においてプレーヤ用の仮想カメラから見えるプレーヤ用の仮想空間画像が表示される頭部装着型表示装置と、前記合成画像がギャラリー用画像として表示されるギャラリー用表示装置と、を含んでもよい。 Further, in one aspect of the present disclosure, a head-mounted display device worn by the player who is the subject and displaying a virtual space image for the player seen from the virtual camera for the player in the virtual space, and the composite image are A gallery display device, which is displayed as a gallery image, may be included.

このようにすれば、仮想空間において頭部装着型表示装置を装着してプレイするプレーヤの行動の様子をギャラリーに見物させることが可能になる。 In this way, it is possible to let the gallery observe the behavior of the player who wears the head-mounted display device and plays in the virtual space.

また本開示の一態様では、前記取得部は、前記背景及び前記被写体を前記カメラにより撮影したデプス画像を取得し、前記画像合成部は、前記差分画像と前記デプス画像に基づいて、前記被写体の画像を抽出してもよい。 Further, in one aspect of the present disclosure, the acquisition unit acquires a depth image of the background and the subject taken by the camera, and the image synthesis unit acquires the depth image of the subject based on the difference image and the depth image. The image may be extracted.

このように差分画像のみならず、デプス画像を用いて、被写体の画像を抽出することで、仮想空間画像と被写体の画像の更に高品位な画像合成を実現できるようになる。 By extracting the image of the subject using not only the difference image but also the depth image in this way, it becomes possible to realize a higher quality image composition of the virtual space image and the image of the subject.

また本開示の一態様では、前記画像合成部は、前記差分画像に基づいて差分マスク画像を生成し、前記デプス画像に基づいて、デプス値が所与のデプス範囲となる画素を識別するデプスマスク画像を生成し、前記差分マスク画像と前記デプスマスク画像に基づいて、前記被写体を識別する被写体マスク画像を生成し、前記被写体マスク画像と前記第１画像に基づいて、前記被写体の画像を抽出してもよい。 Further, in one aspect of the present disclosure, the image synthesizing unit generates a difference mask image based on the difference image, and identifies a pixel whose depth value is in a given depth range based on the depth image. An image is generated, a subject mask image that identifies the subject is generated based on the difference mask image and the depth mask image, and an image of the subject is extracted based on the subject mask image and the first image. You may.

このような差分マスク画像とデプスマスク画像を用いることで、背景及び被写体が映る第１画像から、被写体の画像を高品位に抽出することが可能になる。 By using such a difference mask image and a depth mask image, it is possible to extract a high-quality image of the subject from the first image in which the background and the subject are reflected.

また本開示の一態様では、前記画像合成部は、前記デプスマスク画像の補正処理を行い、前記補正処理後の前記デプスマスク画像と前記差分マスク画像に基づいて前記被写体マスク画像を生成してもよい。 Further, in one aspect of the present disclosure, the image synthesizing unit may perform correction processing of the depth mask image and generate the subject mask image based on the depth mask image and the difference mask image after the correction processing. Good.

このようにデプスマスク画像に対して補正処理を行えば、エッジ部分のちらつきの防止や、細かいノイズの除去等を実現でき、高品位な合成画像の生成が可能になる。 By performing the correction processing on the depth mask image in this way, it is possible to prevent flicker of the edge portion, remove fine noise, and the like, and it is possible to generate a high-quality composite image.

また本開示の一態様では、前記画像合成部は、前記背景及び前記被写体を前記カメラにより撮影した第１デプス画像と前記背景を前記カメラにより撮影した第２デプス画像との差分デプスマスク画像を生成し、前記差分デプスマスク画像に基づいて、前記補正処理後の前記デプスマスク画像を生成してもよい。 Further, in one aspect of the present disclosure, the image synthesizing unit generates a difference depth mask image between the first depth image of the background and the subject taken by the camera and the second depth image of the background taken by the camera. Then, the depth mask image after the correction process may be generated based on the difference depth mask image.

このような差分デプスマスク画像を用いれば、背景の領域を被写体の領域と誤認識してしまうような事態を防止できる。 By using such a difference depth mask image, it is possible to prevent a situation in which the background area is erroneously recognized as the subject area.

また本開示の一態様では、前記画像合成部は、モルフォロジーフィルター処理及び時系列フィルター処理の少なくとも一方を行うことで、前記補正処理後の前記デプスマスク画像を生成してもよい。 Further, in one aspect of the present disclosure, the image synthesizing unit may generate the depth mask image after the correction process by performing at least one of the morphology filter process and the time series filter process.

このようなモルフォロジーフィルター処理や時系列フィルター処理を行うことで、細かいサイズのノイズを除去したり、細かなノイズのちらつきの発生を抑制できるようになる。 By performing such morphology filtering and time-series filtering, it becomes possible to remove fine-sized noise and suppress the occurrence of fine noise flicker.

また本開示の一態様では、前記画像合成部は、前記デプス画像において前記デプス値が取得できなかった画素の画素値を、前記差分画像に基づき設定する処理を行うことで、前記補正処理後の前記デプスマスク画像を生成してもよい。 Further, in one aspect of the present disclosure, the image synthesizing unit performs a process of setting the pixel value of the pixel for which the depth value could not be acquired in the depth image based on the difference image, thereby performing the correction process. The depth mask image may be generated.

このようにすれば、デプス値を取得できなかったことが原因で画像が欠けてしまう問題を防止でき、適切な被写体の抽出画像を生成できるようになる。 By doing so, it is possible to prevent the problem that the image is chipped due to the failure to acquire the depth value, and it becomes possible to generate an appropriate extracted image of the subject.

また本開示の一態様では、前記画像合成部は、前記デプス値が前記デプス範囲となる画素群の領域サイズを求め、前記領域サイズによるフィルター処理を行うことで、前記補正処理後の前記デプスマスク画像を生成してもよい。 Further, in one aspect of the present disclosure, the image synthesizing unit obtains a region size of a pixel group in which the depth value is in the depth range, and performs a filter process according to the region size to perform the correction process to obtain the depth mask. An image may be generated.

このようにすれば、ノイズ等を原因とする小さな領域サイズの画素群を排除して、被写体に対応する画素群を抽出することが可能になり、高品位な合成画像を生成できる。 By doing so, it is possible to eliminate a pixel group having a small area size caused by noise or the like and extract a pixel group corresponding to the subject, and it is possible to generate a high-quality composite image.

また本開示の一態様では、前記画像合成部は、前記被写体の領域と判断される被写体領域でのデプス値に基づいて第２デプス範囲を設定し、前記デプス値が前記第２デプス範囲となる画素を識別する画像を、前記デプスマスク画像として生成してもよい。 Further, in one aspect of the present disclosure, the image synthesizing unit sets a second depth range based on a depth value in a subject area determined to be the area of the subject, and the depth value becomes the second depth range. An image that identifies the pixels may be generated as the depth mask image.

このようにすれば、被写体の移動を反映させたデプスマスク画像の生成が可能になり、高品位な合成画像を生成できるようになる。 By doing so, it becomes possible to generate a depth mask image that reflects the movement of the subject, and it becomes possible to generate a high-quality composite image.

本実施形態の画像生成システムの構成例を示すブロック図。The block diagram which shows the configuration example of the image generation system of this embodiment. 図２（Ａ）、図２（Ｂ）はトラッキング処理の一例の説明図。2 (A) and 2 (B) are explanatory views of an example of tracking processing. 筐体の構成例を示す斜視図。The perspective view which shows the structural example of the housing. 図４（Ａ）、図４（Ｂ）はプレーヤに表示される仮想空間画像の例。4 (A) and 4 (B) are examples of virtual space images displayed on the player. 図５（Ａ）、図５（Ｂ）は仮想空間画像と実空間画像の合成画像の例。5 (A) and 5 (B) are examples of a composite image of a virtual space image and a real space image. 図６（Ａ）、図６（Ｂ）は筐体マスク画像を用いた筐体の画像の合成の説明図。6 (A) and 6 (B) are explanatory views of synthesizing the image of the housing using the housing mask image. 被写体の画像の抽出範囲を設定して被写体の画像を抽出する処理の説明図。The explanatory view of the process which sets the extraction range of the subject image and extracts the subject image. 図８（Ａ）、図８（Ｂ）は補助点を用いて抽出範囲を設定して被写体の画像を抽出する手法の説明図。8 (A) and 8 (B) are explanatory views of a method of extracting an image of a subject by setting an extraction range using auxiliary points. 本実施形態の詳細な処理例を説明するフローチャート。The flowchart explaining the detailed processing example of this embodiment. 図１０（Ａ）〜図１０（Ｃ）は本実施形態の画像合成処理の説明図。10 (A) to 10 (C) are explanatory views of the image composition process of the present embodiment. 図１１（Ａ）〜図１１（Ｃ）は本実施形態の画像合成処理の説明図。11 (A) to 11 (C) are explanatory views of the image composition process of the present embodiment. 図１２（Ａ）、図１２（Ｂ）は仮想空間画像と被写体の画像の合成画像をギャラリー用の表示装置に表示する手法の説明図。12 (A) and 12 (B) are explanatory views of a method of displaying a composite image of a virtual space image and an image of a subject on a display device for a gallery. 図１３（Ａ）、図１３（Ｂ）はデプス画像を用いた画像合成処理の説明図。13 (A) and 13 (B) are explanatory views of an image composition process using a depth image. 図１４（Ａ）〜図１４（Ｄ）は差分デプスマスク画像による補正処理の説明図。14 (A) to 14 (D) are explanatory views of the correction process using the difference depth mask image. 図１５（Ａ）、図１５（Ｂ）はデプス画像を取得できなかった画素の画素値を差分画像に基づき設定する補正処理の説明図。15 (A) and 15 (B) are explanatory views of a correction process for setting the pixel values of the pixels for which the depth image could not be acquired based on the difference image. 図１６（Ａ）〜図１６（Ｃ）はデプス画像を取得できなかった画素の画素値を差分画像に基づき設定する補正処理の説明図。16 (A) to 16 (C) are explanatory views of a correction process in which the pixel values of the pixels for which the depth image could not be acquired are set based on the difference image. 図１７（Ａ）、図１７（Ｂ）は領域サイズによるフィルター処理による補正処理の説明図。17 (A) and 17 (B) are explanatory views of correction processing by filter processing according to the area size. 図１８（Ａ）、図１８（Ｂ）は被写体領域でのデプス値に基づいて第２デプス範囲を設定してデプスマスク画像を生成する処理の説明図。18 (A) and 18 (B) are explanatory views of a process of setting a second depth range based on a depth value in a subject area and generating a depth mask image. 図１９（Ａ）〜図１９（Ｃ）は筐体マスク画像を用いる手法の説明図。19 (A) to 19 (C) are explanatory views of a method using a housing mask image. 図２０（Ａ）〜図２０（Ｄ）は筐体マスク画像を用いる手法の説明図。20 (A) to 20 (D) are explanatory views of a method using a housing mask image. 図２１（Ａ）〜図２１（Ｄ）は筐体マスク画像を用いない手法の説明図。21 (A) to 21 (D) are explanatory views of a method that does not use a housing mask image. 抽出範囲を設定する手法の説明図。Explanatory drawing of the method of setting the extraction range.

以下、本実施形態について説明する。なお、以下に説明する本実施形態は、特許請求の範囲の記載内容を不当に限定するものではない。また本実施形態で説明される構成の全てが必須構成要件であるとは限らない。 Hereinafter, this embodiment will be described. It should be noted that the present embodiment described below does not unreasonably limit the description of the scope of claims. Moreover, not all of the configurations described in the present embodiment are essential configuration requirements.

１．画像生成システム
図１は、本実施形態の画像生成システムの構成例を示すブロック図である。本実施形態の画像生成システムにより、例えばバーチャルリアリティ（ＶＲ）をシミュレートするシミュレーションシステムが実現される。本実施形態の画像生成システムは、ゲームコンテンツを提供するゲームシステム、運転シミュレータやスポーツ競技シミュレータなどのリアルタイムシミュレーションシステム、或いは映像等のコンテンツを提供するコンテンツ提供システムなどの種々のシステムに適用可能である。なお、本実施形態の画像生成システムは図１の構成に限定されず、その構成要素（各部）の一部を省略したり、他の構成要素を追加するなどの種々の変形実施が可能である。 1. 1. Image generation system FIG. 1 is a block diagram showing a configuration example of the image generation system of the present embodiment. The image generation system of the present embodiment realizes, for example, a simulation system that simulates virtual reality (VR). The image generation system of the present embodiment can be applied to various systems such as a game system that provides game content, a real-time simulation system such as a driving simulator or a sports competition simulator, or a content providing system that provides content such as video. .. The image generation system of the present embodiment is not limited to the configuration shown in FIG. 1, and various modifications such as omitting a part of its constituent elements (each part) or adding other constituent elements are possible. ..

筐体３０は、例えばプレーヤが搭乗可能なライド筐体である。具体的には筐体３０は、例えばプレーヤのプレイ位置を変化させる可動筐体である。この筐体３０は、例えばアーケード筐体などと呼ばれるものであり、画像生成システムにより実現されるシミュレーションシステムの装置の外殻となるものであり、箱状である必要はない。筐体３０は、車ゲームやロボットゲームや飛行機ゲームなどにおけるコックピット筐体（体感筐体）であってもよいし、その他の形態の筐体であってもよい。筐体３０は、シミュレーションシステムの本体部分であり、シミュレーションシステムを実現するための種々の機器、構造物が設けられる。筐体３０には、少なくともプレイ位置が設定されている。筐体３０の構成例については後述の図３により詳細に説明する。 The housing 30 is, for example, a ride housing on which a player can board. Specifically, the housing 30 is, for example, a movable housing that changes the play position of the player. The housing 30 is called, for example, an arcade housing, and is an outer shell of a device of a simulation system realized by an image generation system, and does not have to be box-shaped. The housing 30 may be a cockpit housing (experience housing) in a car game, a robot game, an airplane game, or the like, or may be a housing of another form. The housing 30 is the main body of the simulation system, and is provided with various devices and structures for realizing the simulation system. At least a play position is set in the housing 30. A configuration example of the housing 30 will be described in detail with reference to FIG. 3 described later.

カメラ１５０は、カラー画像やデプス画像などの画像を撮影する装置である。例えばカメラ１５０はカラーカメラ１５２とデプスカメラ１５４を含む。カラーカメラ１５２はＲＧＢなどのカラー画像を撮影するカメラであり、ＣＭＯＳセンサーやＣＣＤなどのイメージセンサーと、レンズ等の光学系により実現できる。デプスカメラ１５４は、視野にある物体の奥行き方向の位置関係を検知できるカメラであり、デプスカメラ１５４を用いることでデプス画像を取得できる。デプス画像は、例えば各画素の画素値としてＺ値であるデプス値が設定される画像である。例えばデプスカメラ１５４は、ステレオカメラを構成する第１、第２のデプスセンサーと、赤外線を投射するＩＲ投射器により実現できる。第１、第２のデプスセンサーは例えば赤外線カメラにより実現できる。例えば第１、第２のデプスセンサーによりデプス値を計測すると共に、ＩＲ投射器がＩＲパターンを投射することで、計測されるデプス値の精度を向上できる。具体的には第１のデプスセンサーにより取得された左目画像における点と、第２のデプスセンサーにより取得された右目画像における点との対応づけを行い、これらの点の間のシフト量により、奥行き値であるデプス値を算出する。なおデプス値の計測手法は上記したものに限定されず、種々の変形実施が可能である。またカラーカメラ１５２とデプスカメラ１５４を１つのカメラ筐体により実現してもよいし、カラーカメラ１５２とデプスカメラ１５４の各々を別体のカメラ筐体により実現してもよい。 The camera 150 is a device that captures an image such as a color image or a depth image. For example, the camera 150 includes a color camera 152 and a depth camera 154. The color camera 152 is a camera that captures a color image such as RGB, and can be realized by an image sensor such as a CMOS sensor or a CCD and an optical system such as a lens. The depth camera 154 is a camera capable of detecting the positional relationship of an object in the field of view in the depth direction, and a depth image can be acquired by using the depth camera 154. The depth image is, for example, an image in which a depth value, which is a Z value, is set as the pixel value of each pixel. For example, the depth camera 154 can be realized by the first and second depth sensors constituting the stereo camera and an IR projector that projects infrared rays. The first and second depth sensors can be realized by, for example, an infrared camera. For example, the accuracy of the measured depth value can be improved by measuring the depth value by the first and second depth sensors and projecting the IR pattern by the IR projector. Specifically, the points in the left eye image acquired by the first depth sensor are associated with the points in the right eye image acquired by the second depth sensor, and the depth is determined by the amount of shift between these points. Calculate the depth value, which is the value. The depth value measurement method is not limited to the above-mentioned method, and various modifications can be performed. Further, the color camera 152 and the depth camera 154 may be realized by one camera housing, or each of the color camera 152 and the depth camera 154 may be realized by a separate camera housing.

操作部１６０は、プレーヤ（ユーザ）が種々の操作情報（入力情報）を入力するためのものである。操作部１６０は、例えばハンドル、アクセルペダル、ブレーキペダル、レバー、操作ボタン、方向指示キー、ゲームコントローラ、ガン型コントローラ、タッチパネル、又は音声入力装置等の種々の操作デバイスにより実現できる。 The operation unit 160 is for the player (user) to input various operation information (input information). The operation unit 160 can be realized by various operation devices such as a steering wheel, an accelerator pedal, a brake pedal, a lever, an operation button, a direction instruction key, a game controller, a gun type controller, a touch panel, or a voice input device.

記憶部１７０は各種の情報を記憶する。記憶部１７０は、処理部１００や通信部１９６などのワーク領域として機能する。画像生成処理用やゲーム処理用のプログラムや、プログラムの実行に必要なデータは、この記憶部１７０に保持される。記憶部１７０の機能は、半導体メモリ（ＤＲＡＭ、ＶＲＡＭ）、ＨＤＤ（ハードディスクドライブ）、ＳＳＤ、光ディスク装置などにより実現できる。記憶部１７０は、仮想空間情報記憶部１７２、描画バッファ１７８を含む。 The storage unit 170 stores various types of information. The storage unit 170 functions as a work area such as the processing unit 100 and the communication unit 196. A program for image generation processing and game processing, and data necessary for executing the program are stored in the storage unit 170. The function of the storage unit 170 can be realized by a semiconductor memory (DRAM, VRAM), an HDD (hard disk drive), an SSD, an optical disk device, or the like. The storage unit 170 includes a virtual space information storage unit 172 and a drawing buffer 178.

コンピュータにより読み取り可能な媒体である情報記憶媒体１８０は、プログラムやデータなどを格納するものであり、その機能は、光ディスク（ＤＶＤ、ＢＤ、ＣＤ）、ＨＤＤ、或いは半導体メモリ（ＲＯＭ）などにより実現できる。処理部１００は、情報記憶媒体１８０に格納されるプログラム（データ）に基づいて本実施形態の種々の処理を行う。即ち情報記憶媒体１８０には、本実施形態の各部としてコンピュータ（入力装置、処理部、記憶部、出力部を備える装置）を機能させるためのプログラム（各部の処理をコンピュータに実行させるためのプログラム）が記憶される。 The information storage medium 180, which is a medium that can be read by a computer, stores programs, data, and the like, and its function can be realized by an optical disk (DVD, BD, CD), an HDD, a semiconductor memory (ROM), or the like. .. The processing unit 100 performs various processes of the present embodiment based on the program (data) stored in the information storage medium 180. That is, the information storage medium 180 is a program for causing a computer (a device including an input device, a processing unit, a storage unit, and an output unit) to function as each part of the present embodiment (a program for causing the computer to execute processing of each part). Is memorized.

ＨＭＤ２００（頭部装着型表示装置）は、プレーヤの頭部に装着されて、プレーヤの眼前に画像を表示する装置である。ＨＭＤ２００は非透過型であることが望ましいが、透過型であってもよい。またＨＭＤ２００は、いわゆるメガネタイプのＨＭＤであってもよい。ＨＭＤ２００は、例えばヘッドトラッキングなどのトラッキング処理を実現するためのトラッキング装置２０６を含むことができる。トラッキング装置２０６を用いたトラッキング処理により、ＨＭＤ２００の位置、方向を特定する。ＨＭＤ２００の位置、方向が特定されることで、プレーヤの視点位置、視線方向を特定できる。トラッキング方式としては種々の方式を採用できる。トラッキング方式の一例である第１のトラッキング方式では、トラッキング装置２０６として複数の受光素子（フォトダイオード等）を設ける。そして外部に設けられた発光素子（ＬＥＤ等）からの光（レーザー等）をこれらの複数の受光素子により受光することで、現実世界の３次元空間でのＨＭＤ２００（プレーヤの頭部）の位置、方向を特定する。第２のトラッキング方式では、トラッキング装置２０６として複数の発光素子（ＬＥＤ）をＨＭＤ２００に設ける。そして、これらの複数の発光素子からの光を、外部に設けられた撮像部で撮像することで、ＨＭＤ２００の位置、方向を特定する。第３のトラッキング方式では、トラッキング装置２０６としてモーションセンサーを設け、このモーションセンサーを用いてＨＭＤ２００の位置、方向を特定する。モーションセンサーは例えば加速度センサーやジャイロセンサーなどにより実現できる。例えば３軸の加速度センサーと３軸のジャイロセンサーを用いた６軸のモーションセンサーを用いることで、現実世界の３次元空間でのＨＭＤ２００の位置、方向を特定できる。なお、第１のトラッキング方式と第２のトラッキング方式の組合わせ、或いは第１のトラッキング方式と第３のトラッキング方式の組合わせなどにより、ＨＭＤ２００の位置、方向を特定してもよい。またＨＭＤ２００の位置、方向を特定することでプレーヤの視点位置、視線方向を特定するのではなく、プレーヤの視点位置、視線方向を直接に特定するトラッキング処理を採用してもよい。例えばアイトラッキング、フェイストラッキング又はヘッドトラッキングなどの種々の視点トラッキング手法を用いてもよい。また環境認識カメラを用いてプレーヤの周囲の実空間の認識処理を行い、認識処理の結果に基づいてプレーヤの位置や方向等を特定してもよい。例えば認識された実空間の物体との相対的位置関係からプレーヤの位置や方向等を特定してもよい。 The HMD200 (head-mounted display device) is a device that is attached to the player's head and displays an image in front of the player's eyes. The HMD 200 is preferably a non-transparent type, but may be a transparent type. Further, the HMD 200 may be a so-called eyeglass type HMD. The HMD 200 can include a tracking device 206 for realizing a tracking process such as head tracking. The position and direction of the HMD 200 are specified by the tracking process using the tracking device 206. By specifying the position and direction of the HMD 200, the viewpoint position and line-of-sight direction of the player can be specified. Various methods can be adopted as the tracking method. In the first tracking method, which is an example of the tracking method, a plurality of light receiving elements (photodiodes, etc.) are provided as the tracking device 206. Then, by receiving the light (laser, etc.) from the light emitting element (LED, etc.) provided outside by these plurality of light receiving elements, the position of the HMD200 (player's head) in the three-dimensional space in the real world, Specify the direction. In the second tracking method, a plurality of light emitting elements (LEDs) are provided in the HMD 200 as the tracking device 206. Then, the position and direction of the HMD 200 are specified by capturing the light from these plurality of light emitting elements with an imaging unit provided outside. In the third tracking method, a motion sensor is provided as the tracking device 206, and the position and direction of the HMD 200 are specified using the motion sensor. The motion sensor can be realized by, for example, an acceleration sensor or a gyro sensor. For example, by using a 6-axis motion sensor using a 3-axis acceleration sensor and a 3-axis gyro sensor, the position and direction of the HMD200 in a three-dimensional space in the real world can be specified. The position and direction of the HMD 200 may be specified by a combination of the first tracking method and the second tracking method, or a combination of the first tracking method and the third tracking method. Further, instead of specifying the player's viewpoint position and line-of-sight direction by specifying the position and direction of the HMD 200, a tracking process that directly specifies the player's viewpoint position and line-of-sight direction may be adopted. For example, various viewpoint tracking methods such as eye tracking, face tracking or head tracking may be used. Further, the environment recognition camera may be used to perform recognition processing of the real space around the player, and the position and direction of the player may be specified based on the result of the recognition processing. For example, the position and direction of the player may be specified from the relative positional relationship with the recognized real space object.

ＨＭＤ２００の表示部２０８は例えば有機ＥＬディスプレイ（ＯＥＬ）や液晶ディスプレイ（ＬＣＤ）などにより実現できる。例えばＨＭＤ２００の表示部２０８には、プレーヤの左眼の前に設定される第１のディスプレイ又は第１の表示領域と、右眼の前に設定される第２のディスプレイ又は第２の表示領域が設けられており、立体視表示が可能になっている。立体視表示を行う場合には、例えば視差が異なる左眼用画像と右眼用画像を生成し、第１のディスプレイに左眼用画像を表示し、第２のディスプレイに右眼用画像を表示する。或いは１つのディスプレイの第１の表示領域に左眼用画像を表示し、第２の表示領域に右眼用画像を表示する。またＨＭＤ２００には左眼用、右眼用の２つの接眼レンズ（魚眼レンズ）が設けられており、これによりプレーヤの視界の全周囲に亘って広がるＶＲ空間が表現される。そして接眼レンズ等の光学系で生じる歪みを補正するための補正処理が、左眼用画像、右眼用画像に対して行われる。 The display unit 208 of the HMD 200 can be realized by, for example, an organic EL display (OEL) or a liquid crystal display (LCD). For example, the display unit 208 of the HMD 200 includes a first display or a first display area set in front of the left eye of the player and a second display or a second display area set in front of the right eye. It is provided so that a stereoscopic display is possible. In the case of stereoscopic display, for example, an image for the left eye and an image for the right eye having different parallax are generated, the image for the left eye is displayed on the first display, and the image for the right eye is displayed on the second display. To do. Alternatively, the image for the left eye is displayed in the first display area of one display, and the image for the right eye is displayed in the second display area. Further, the HMD 200 is provided with two eyepieces (fisheye lens), one for the left eye and the other for the right eye, thereby expressing a VR space that extends over the entire circumference of the player's field of view. Then, a correction process for correcting the distortion generated in the optical system such as the eyepiece is performed on the left eye image and the right eye image.

ギャラリー用表示装置２１０は、ギャラリー用画像を表示するための装置であり、例えばＬＣＤ、有機ＥＬディスプレイ、或いはＣＲＴなどにより実現できる。例えば本実施形態により生成された合成画像がギャラリー用表示装置２１０に表示され、観客であるギャラリーは、プレーヤがプレイする様子を見物することができる。ギャラリー用表示装置２１０は、例えば画像生成システムにより実現されるシミュレーションシステムの施設に設置される。 The gallery display device 210 is a device for displaying a gallery image, and can be realized by, for example, an LCD, an organic EL display, or a CRT. For example, the composite image generated by the present embodiment is displayed on the gallery display device 210, and the gallery, which is a spectator, can watch the player playing. The gallery display device 210 is installed in a facility of a simulation system realized by, for example, an image generation system.

音出力部１９２は、本実施形態により生成された音を出力するものであり、例えばスピーカ又はヘッドホン等により実現できる。 The sound output unit 192 outputs the sound generated by the present embodiment, and can be realized by, for example, a speaker or headphones.

Ｉ／Ｆ（インターフェース）部１９４は、携帯型情報記憶媒体１９５とのインターフェース処理を行うものであり、その機能はＩ／Ｆ処理用のＡＳＩＣなどにより実現できる。携帯型情報記憶媒体１９５は、プレーヤが各種の情報を保存するためのものであり、電源が非供給になった場合にもこれらの情報の記憶を保持する記憶装置である。携帯型情報記憶媒体１９５は、ＩＣカード（メモリカード）、ＵＳＢメモリ、或いは磁気カードなどにより実現できる。 The I / F (interface) unit 194 performs interface processing with the portable information storage medium 195, and the function can be realized by an ASIC or the like for I / F processing. The portable information storage medium 195 is a storage device for the player to store various types of information, and holds the storage of such information even when the power is not supplied. The portable information storage medium 195 can be realized by an IC card (memory card), a USB memory, a magnetic card, or the like.

通信部１９６は、有線や無線のネットワークを介して外部（他の装置）との間で通信を行うものであり、その機能は、通信用ＡＳＩＣ又は通信用プロセッサなどのハードウェアや、通信用ファームウェアにより実現できる。 The communication unit 196 communicates with an external device (another device) via a wired or wireless network, and its functions include hardware such as a communication ASIC or a communication processor, and communication firmware. Can be realized by.

なお本実施形態の各部としてコンピュータを機能させるためのプログラム（データ）は、サーバ（ホスト装置）が有する情報記憶媒体からネットワーク及び通信部１９６を介して情報記憶媒体１８０（あるいは記憶部１７０）に配信してもよい。このようなサーバ（ホスト装置）による情報記憶媒体の使用も本開示の範囲内に含めることができる。 The program (data) for operating the computer as each part of the present embodiment is distributed from the information storage medium of the server (host device) to the information storage medium 180 (or storage unit 170) via the network and the communication unit 196. You may. The use of information storage media by such a server (host device) can also be included within the scope of the present disclosure.

処理部１００（プロセッサ）は、操作部１６０からの操作情報や、ＨＭＤ２００でのトラッキング情報（ＨＭＤの位置及び方向の少なくとも一方の情報。視点位置及び視線方向の少なくとも一方の情報）や、プログラムなどに基づいて、画像の取得処理、仮想空間設定処理、ゲーム処理（シミュレーション処理）、仮想カメラ制御処理、画像生成処理、画像合成処理、或いは音生成処理などを行う。 The processing unit 100 (processor) provides operation information from the operation unit 160, tracking information in the HMD 200 (information on at least one of the position and direction of the HMD, information on at least one of the viewpoint position and the line-of-sight direction), a program, and the like. Based on this, image acquisition processing, virtual space setting processing, game processing (simulation processing), virtual camera control processing, image generation processing, image composition processing, sound generation processing, and the like are performed.

処理部１００の各部が行う本実施形態の各処理（各機能）はプロセッサ（ハードウェアを含むプロセッサ）により実現できる。例えば本実施形態の各処理は、プログラム等の情報に基づき動作するプロセッサと、プログラム等の情報を記憶するメモリにより実現できる。プロセッサは、例えば各部の機能が個別のハードウェアで実現されてもよいし、或いは各部の機能が一体のハードウェアで実現されてもよい。例えば、プロセッサはハードウェアを含み、そのハードウェアは、デジタル信号を処理する回路及びアナログ信号を処理する回路の少なくとも一方を含むことができる。例えば、プロセッサは、回路基板に実装された１又は複数の回路装置（例えばＩＣ等）や、１又は複数の回路素子（例えば抵抗、キャパシター等）で構成することもできる。プロセッサは、例えばＣＰＵ（Central Processing Unit）であってもよい。但し、プロセッサはＣＰＵに限定されるものではなく、ＧＰＵ（Graphics Processing Unit）、或いはＤＳＰ（Digital Signal Processor）等、各種のプロセッサを用いることが可能である。またプロセッサはＡＳＩＣによるハードウェア回路であってもよい。またプロセッサは、アナログ信号を処理するアンプ回路やフィルター回路等を含んでもよい。メモリ（記憶部１７０）は、ＳＲＡＭ、ＤＲＡＭ等の半導体メモリであってもよいし、レジスターであってもよい。或いはハードディスク装置（ＨＤＤ）等の磁気記憶装置であってもよいし、光学ディスク装置等の光学式記憶装置であってもよい。例えば、メモリはコンピュータにより読み取り可能な命令を格納しており、当該命令がプロセッサにより実行されることで、処理部１００の各部の処理（機能）が実現されることになる。ここでの命令は、プログラムを構成する命令セットでもよいし、プロセッサのハードウェア回路に対して動作を指示する命令であってもよい。 Each process (each function) of the present embodiment performed by each unit of the processing unit 100 can be realized by a processor (a processor including hardware). For example, each process of the present embodiment can be realized by a processor that operates based on information such as a program and a memory that stores information such as a program. In the processor, for example, the functions of each part may be realized by individual hardware, or the functions of each part may be realized by integrated hardware. For example, a processor includes hardware, which hardware can include at least one of a circuit that processes a digital signal and a circuit that processes an analog signal. For example, the processor may be composed of one or more circuit devices (for example, ICs, etc.) mounted on a circuit board, or one or more circuit elements (for example, resistors, capacitors, etc.). The processor may be, for example, a CPU (Central Processing Unit). However, the processor is not limited to the CPU, and various processors such as a GPU (Graphics Processing Unit) or a DSP (Digital Signal Processor) can be used. Further, the processor may be a hardware circuit by ASIC. Further, the processor may include an amplifier circuit, a filter circuit, and the like for processing an analog signal. The memory (storage unit 170) may be a semiconductor memory such as SRAM or DRAM, or may be a register. Alternatively, it may be a magnetic storage device such as a hard disk device (HDD), or an optical storage device such as an optical disk device. For example, the memory stores instructions that can be read by a computer, and when the instructions are executed by the processor, the processing (function) of each part of the processing unit 100 is realized. The instructions here may be an instruction set that constitutes a program, or may be an instruction that instructs the hardware circuit of the processor to operate.

処理部１００は、取得部１０２、仮想空間設定部１０４、ゲーム処理部１０６、仮想カメラ制御部１１０、画像生成部１２０、画像合成部１２２、音生成部１３０を含む。ゲーム処理部１０６は、移動体処理部１０７、筐体制御部１０８を含む。上述したように、これらの各部により実行される本実施形態の各処理は、プロセッサ（或いはプロセッサ及びメモリ）により実現できる。なお、これらの構成要素（各部）の一部を省略したり、他の構成要素を追加するなどの種々の変形実施が可能である。 The processing unit 100 includes an acquisition unit 102, a virtual space setting unit 104, a game processing unit 106, a virtual camera control unit 110, an image generation unit 120, an image composition unit 122, and a sound generation unit 130. The game processing unit 106 includes a mobile processing unit 107 and a housing control unit 108. As described above, each process of the present embodiment executed by each of these parts can be realized by a processor (or a processor and a memory). It is possible to carry out various modifications such as omitting a part of these components (each part) or adding other components.

取得部１０２は、種々の情報の取得処理を行うものであり、情報取得のためのインターフェースである。例えば取得部１０２は、カメラ１５０により撮影された画像を取得する。例えばカメラ１５０により撮影されたカラー画像やデプス画像を取得する。また取得部１０２は、プレーヤの位置情報（視点位置情報）、方向情報（視線方向情報）及び姿勢情報（動き情報）の少なくとも１つを含むプレーヤ情報を取得する。 The acquisition unit 102 performs various information acquisition processes, and is an interface for information acquisition. For example, the acquisition unit 102 acquires an image taken by the camera 150. For example, a color image or a depth image taken by the camera 150 is acquired. Further, the acquisition unit 102 acquires player information including at least one of the player's position information (viewpoint position information), direction information (line-of-sight direction information), and posture information (motion information).

仮想空間設定部１０４は、オブジェクトが配置される仮想空間（オブジェクト空間）の設定処理を行う。例えば、移動体（車、人、ロボット、電車、飛行機、船、モンスター又は動物等）、マップ（地形）、建物、観客席、コース（道路）、アイテム、樹木、壁、水面などの表示物を表す各種オブジェクト（ポリゴン、自由曲面又はサブディビジョンサーフェイスなどのプリミティブ面で構成されるオブジェクト）を仮想空間に配置設定する処理を行う。即ちワールド座標系でのオブジェクトの位置や回転角度（向き、方向と同義）を決定し、その位置（Ｘ、Ｙ、Ｚ）にその回転角度（Ｘ、Ｙ、Ｚ軸回りでの回転角度）でオブジェクトを配置する。具体的には、記憶部１７０の仮想空間情報記憶部１７２には、仮想空間でのオブジェクト（パーツオブジェクト）の位置、回転角度、移動速度又は移動方向等の情報であるオブジェクト情報がオブジェクト番号に対応づけて記憶される。即ち、オブジェクト情報が仮想空間情報として仮想空間情報記憶部１７２に記憶される。仮想空間設定部１０４は、例えば各フレーム毎に、仮想空間情報であるオブジェクト情報の更新処理を行う。 The virtual space setting unit 104 performs setting processing of the virtual space (object space) in which the object is arranged. For example, moving objects (cars, people, robots, trains, planes, ships, monsters or animals, etc.), maps (terrain), buildings, spectators' seats, courses (roads), items, trees, walls, water surfaces, etc. Performs processing to arrange and set various objects to be represented (objects composed of primitive surfaces such as polygons, free-form surfaces, and subdivision surfaces) in virtual space. That is, the position and rotation angle (synonymous with orientation and direction) of an object in the world coordinate system are determined, and the rotation angle (rotation angle around the X, Y, Z axis) is used at that position (X, Y, Z). Place the object. Specifically, in the virtual space information storage unit 172 of the storage unit 170, object information which is information such as the position, rotation angle, moving speed, or moving direction of an object (part object) in the virtual space corresponds to the object number. It will be remembered. That is, the object information is stored in the virtual space information storage unit 172 as virtual space information. The virtual space setting unit 104 updates the object information, which is the virtual space information, for each frame, for example.

ゲーム処理部１０６は、プレーヤがゲームをプレイするための種々のゲーム処理を行う。別の言い方をすれば、ゲーム処理部１０６（シミュレーション処理部）は、プレーヤが仮想現実（バーチャルリアリティ）を体験するための種々のシミュレーション処理を実行する。ゲーム処理は、例えば、ゲーム開始条件が満たされた場合にゲームを開始する処理、開始したゲームを進行させる処理、ゲーム終了条件が満たされた場合にゲームを終了する処理、或いはゲーム成績を演算する処理などである。このゲーム処理部１０６は、移動体処理部１０７、筐体制御部１０８を含む。 The game processing unit 106 performs various game processing for the player to play the game. In other words, the game processing unit 106 (simulation processing unit) executes various simulation processes for the player to experience virtual reality (virtual reality). The game process calculates, for example, a process of starting a game when the game start condition is satisfied, a process of advancing the started game, a process of ending the game when the game end condition is satisfied, or a game result. Processing etc. The game processing unit 106 includes a mobile processing unit 107 and a housing control unit 108.

移動体処理部１０７は、仮想空間内で移動する移動体についての種々の処理を行う。例えば仮想空間（オブジェクト空間、ゲーム空間）において移動体を移動させる処理や、移動体を動作させる処理を行う。例えば移動体処理部１０７は、操作部１６０によりプレーヤが入力した操作情報や、取得されたトラッキング情報や、プログラム（移動・動作アルゴリズム）や、各種データ（モーションデータ）などに基づいて、移動体（モデルオブジェクト）を仮想空間内で移動させたり、移動体を動作（モーション、アニメーション）させる制御処理を行う。具体的には、移動体の移動情報（位置、回転角度、速度、或いは加速度）や動作情報（パーツオブジェクトの位置、或いは回転角度）を、１フレーム（例えば１／６０秒）毎に順次求めるシミュレーション処理を行う。なおフレームは、移動体の移動・動作処理（シミュレーション処理）や画像生成処理を行う時間の単位である。移動体は、例えば実空間のプレーヤに対応するプレーヤキャラクタ（仮想プレーヤ）や、プレーヤキャラクタが搭乗する搭乗移動体である。搭乗移動体は、例えば仮想空間に登場する車、船、ボート、飛行機、戦車、又はロボット等の乗り物を模した移動体である。そして実空間の筐体３０に対応する仮想空間の搭乗移動体にプレーヤキャラクタが搭乗する。移動体処理部１０７は、このような搭乗移動体を仮想空間内で移動させたり、プレーヤキャラクタを仮想空間内に移動させる処理を行う。 The moving body processing unit 107 performs various processing on the moving body moving in the virtual space. For example, a process of moving a moving body in a virtual space (object space, game space) or a process of operating a moving body is performed. For example, the mobile body processing unit 107 is based on the operation information input by the player by the operation unit 160, the tracking information acquired, the program (movement / motion algorithm), various data (motion data), and the like. Performs control processing to move a model object) in a virtual space or to move a moving body (motion, animation). Specifically, a simulation in which movement information (position, rotation angle, velocity, or acceleration) and motion information (position or rotation angle of a part object) of a moving body are sequentially obtained for each frame (for example, 1/60 second). Perform processing. A frame is a unit of time for performing movement / motion processing (simulation processing) and image generation processing of a moving body. The moving body is, for example, a player character (virtual player) corresponding to a player in real space, or a boarding moving body on which the player character is boarded. The boarding mobile is a mobile that imitates a vehicle such as a car, a ship, a boat, an airplane, a tank, or a robot that appears in a virtual space, for example. Then, the player character is boarded on the boarding mobile body in the virtual space corresponding to the housing 30 in the real space. The moving body processing unit 107 performs a process of moving such a boarding moving body in the virtual space or moving the player character into the virtual space.

筐体制御部１０８は、筐体３０の制御処理を行う。例えば筐体３０の可動機構を制御して、実空間の筐体３０の姿勢や位置を変化させる制御処理を行う。例えば筐体３０の姿勢や位置が変化することで、筐体３０に搭乗するプレーヤのプレイ位置が変化する。 The housing control unit 108 performs control processing for the housing 30. For example, the movable mechanism of the housing 30 is controlled to perform a control process for changing the posture and position of the housing 30 in the real space. For example, by changing the posture and position of the housing 30, the play position of the player boarding the housing 30 changes.

仮想カメラ制御部１１０は、仮想カメラの制御を行う。例えば、操作部１６０により入力されたプレーヤの操作情報やトラッキング情報などに基づいて、仮想カメラを制御する。具体的にはプレーヤ用の仮想カメラを制御する。例えば仮想空間において移動するプレーヤキャラクタの視点（一人称視点）に対応する位置に、プレーヤ用の仮想カメラを設定して、仮想カメラの視点位置や視線方向を設定することで、仮想カメラの位置（位置座標）や姿勢（回転軸回りでの回転角度）を制御する。或いは、プレーヤキャラクタや搭乗移動体などの移動体に追従する視点（三人称視点）の位置に、プレーヤ用の仮想カメラを設定して、仮想カメラの視点位置や視線方向を設定することで、仮想カメラの位置や姿勢を制御する。 The virtual camera control unit 110 controls the virtual camera. For example, the virtual camera is controlled based on the player's operation information and tracking information input by the operation unit 160. Specifically, it controls a virtual camera for the player. For example, by setting a virtual camera for the player at a position corresponding to the viewpoint (first-person viewpoint) of the player character moving in the virtual space and setting the viewpoint position and line-of-sight direction of the virtual camera, the position (position) of the virtual camera is set. Controls the coordinates) and posture (rotation angle around the rotation axis). Alternatively, the virtual camera for the player is set at the position of the viewpoint (third person viewpoint) that follows the moving object such as the player character or the boarding mobile body, and the viewpoint position and the line-of-sight direction of the virtual camera are set. Control the position and posture of the camera.

例えば仮想カメラ制御部１１０は、視点トラッキングにより取得されたプレーヤの視点情報のトラッキング情報に基づいて、プレーヤの視点変化に追従するようにプレーヤ用の仮想カメラを制御する。例えば本実施形態では、プレーヤの視点位置、視線方向の少なくとも１つである視点情報のトラッキング情報（視点トラッキング情報）が取得される。このトラッキング情報は、例えばＨＭＤ２００のトラッキング処理を行うことで取得できる。そして仮想カメラ制御部１１０は、取得されたトラッキング情報（プレーヤの視点位置及び視線方向の少なくとも一方の情報）に基づいてプレーヤ用の仮想カメラの視点位置、視線方向を変化させる。例えば、仮想カメラ制御部１１０は、実空間でのプレーヤの視点位置、視線方向の変化に応じて、仮想空間での仮想カメラの視点位置、視線方向（位置、姿勢）が変化するように、仮想カメラを設定する。このようにすることで、プレーヤの視点情報のトラッキング情報に基づいて、プレーヤの視点変化に追従するように仮想カメラを制御できる。 For example, the virtual camera control unit 110 controls the virtual camera for the player so as to follow the change in the viewpoint of the player based on the tracking information of the viewpoint information of the player acquired by the viewpoint tracking. For example, in the present embodiment, tracking information (viewpoint tracking information) of viewpoint information that is at least one of the player's viewpoint position and line-of-sight direction is acquired. This tracking information can be obtained, for example, by performing tracking processing of HMD200. Then, the virtual camera control unit 110 changes the viewpoint position and the line-of-sight direction of the virtual camera for the player based on the acquired tracking information (information on at least one of the viewpoint position and the line-of-sight direction of the player). For example, the virtual camera control unit 110 virtually changes the viewpoint position and line-of-sight direction (position, posture) of the virtual camera in the virtual space according to changes in the viewpoint position and line-of-sight direction of the player in the real space. Set the camera. By doing so, the virtual camera can be controlled so as to follow the change in the viewpoint of the player based on the tracking information of the viewpoint information of the player.

また仮想カメラ制御部１１０は、実空間画像との合成対象となる仮想空間画像の撮影用の仮想カメラの制御も行う。例えば実空間のカメラ１５０の位置に対応する仮想空間の位置に撮影用の仮想カメラを配置する。そして撮影用の仮想カメラから見える仮想空間画像が、カメラ１５０により撮影された実空間画像に合成される。 The virtual camera control unit 110 also controls a virtual camera for capturing a virtual space image to be combined with the real space image. For example, a virtual camera for shooting is arranged at a position in the virtual space corresponding to the position of the camera 150 in the real space. Then, the virtual space image seen from the virtual camera for shooting is combined with the real space image taken by the camera 150.

画像生成部１２０は、仮想空間画像の生成処理を行う。仮想空間画像はゲーム画像やシミュレーション画像である。例えば画像生成部１２０は、処理部１００で行われる種々の処理（ゲーム処理、シミュレーション処理）の結果に基づいて描画処理を行い、これにより仮想空間画像を生成する。例えばプレーヤ用の仮想カメラから見えるプレーヤ用の仮想空間画像はＨＭＤ２００に表示される。撮影用の仮想カメラから見える仮想空間画像は、実空間画像と合成される。具体的には、座標変換（ワールド座標変換、カメラ座標変換）、クリッピング処理、透視変換、或いは光源処理等のジオメトリ処理が行われ、その処理結果に基づいて、描画データ（プリミティブ面の頂点の位置座標、テクスチャ座標、色データ、法線ベクトル或いはα値等）が作成される。そして、この描画データ（プリミティブ面データ）に基づいて、透視変換後（ジオメトリ処理後）のオブジェクト（１又は複数プリミティブ面）を、描画バッファ１７８（フレームバッファ、ワークバッファ等のピクセル単位で画像情報を記憶できるバッファ）に描画する。これにより、仮想空間において仮想カメラ（所与の視点。左眼用、右眼用の第１、第２の視点）から見える画像が生成される。なお、画像生成部１２０で行われる描画処理は、頂点シェーダ処理やピクセルシェーダ処理等により実現することができる。 The image generation unit 120 performs a virtual space image generation process. Virtual space images are game images and simulation images. For example, the image generation unit 120 performs drawing processing based on the results of various processes (game processing, simulation processing) performed by the processing unit 100, thereby generating a virtual space image. For example, a virtual space image for a player that can be seen from a virtual camera for a player is displayed on the HMD 200. The virtual space image seen from the virtual camera for shooting is combined with the real space image. Specifically, geometry processing such as coordinate transformation (world coordinate transformation, camera coordinate transformation), clipping processing, fluoroscopic conversion, or light source processing is performed, and drawing data (positions of vertices on the primitive surface) is performed based on the processing results. Coordinates, texture coordinates, color data, normal vector or α value, etc.) are created. Then, based on this drawing data (primitive surface data), the object (one or a plurality of primitive surfaces) after perspective transformation (after geometry processing) is subjected to image information in pixel units such as a drawing buffer 178 (frame buffer, work buffer, etc.). Draw in a memorable buffer). As a result, an image that can be seen from a virtual camera (given viewpoint. First and second viewpoints for the left eye and the right eye) is generated in the virtual space. The drawing process performed by the image generation unit 120 can be realized by a vertex shader process, a pixel shader process, or the like.

画像合成部１２２は、仮想空間画像と実空間画像の合成処理を行って、合成画像を生成する。例えば画像合成部１２２は、仮想空間画像に実空間の被写体の画像が合成された合成画像を生成する。画像合成部１２２により生成された合成画像は、ギャラリー用画像としてギャラリー用表示装置２１０に表示される。 The image synthesizing unit 122 performs a synthesizing process of the virtual space image and the real space image to generate a composite image. For example, the image synthesizing unit 122 generates a composite image in which an image of a subject in real space is combined with a virtual space image. The composite image generated by the image compositing unit 122 is displayed on the gallery display device 210 as a gallery image.

音生成部１３０は、処理部１００で行われる種々の処理の結果に基づいて音処理を行う。具体的には、楽曲（音楽、ＢＧＭ）、効果音、又は音声などのゲーム音を生成し、ゲーム音を音出力部１９２に出力させる。 The sound generation unit 130 performs sound processing based on the results of various processes performed by the processing unit 100. Specifically, a game sound such as a musical piece (music, BGM), a sound effect, or a voice is generated, and the game sound is output to the sound output unit 192.

そして本実施形態の画像生成システムは図１に示すように、取得部１０２と画像生成部１２０と画像合成部１２２を含む。 Then, as shown in FIG. 1, the image generation system of the present embodiment includes an acquisition unit 102, an image generation unit 120, and an image composition unit 122.

取得部１０２は、カメラ１５０により背景及び被写体を撮影した第１画像を取得する。またカメラ１５０により背景を撮影した第２画像を取得する。例えば取得部１０２は、カラーカメラ１５２により撮影されたカラー画像である第１画像と第２画像を取得する。第１画像では、背景及び被写体の両方が撮影されているが、第２画像では、背景が撮影されている一方で被写体は撮影されていない。被写体は、カメラの撮影対象となる実空間の物体であり、例えばプレーヤである。但し本実施形態の被写体はこれに限定されるものではなく、プレーヤ以外の被写体であってもよい。 The acquisition unit 102 acquires the first image in which the background and the subject are photographed by the camera 150. In addition, a second image in which the background is photographed by the camera 150 is acquired. For example, the acquisition unit 102 acquires the first image and the second image, which are color images taken by the color camera 152. In the first image, both the background and the subject are photographed, but in the second image, the background is photographed but the subject is not photographed. The subject is a real-space object to be photographed by the camera, for example, a player. However, the subject of the present embodiment is not limited to this, and may be a subject other than the player.

画像生成部１２０は、仮想空間において仮想カメラから見える仮想空間画像を生成する。例えば複数のオブジェクトが配置されるオブジェクト空間である仮想空間において、仮想カメラから見える画像である仮想空間画像を生成する。例えば画像生成部１２０は、仮想空間においてプレーヤ用の仮想カメラから見えるプレーヤ用の仮想空間画像を生成する。このようにして生成されたプレーヤ用の仮想空間画像が、ＨＭＤ２００に表示されることで、ＶＲの世界をプレーヤに体験させることができる。なお画像生成部１２０により生成されたプレーヤ用の仮想空間画像を、ＨＭＤ２００とは異なるタイプの表示装置に表示してもよい。例えばプレーヤの視界を覆うドーム形状の表示スクリーンを有するような表示装置に表示してもよい。 The image generation unit 120 generates a virtual space image that can be seen from the virtual camera in the virtual space. For example, in a virtual space which is an object space in which a plurality of objects are arranged, a virtual space image which is an image seen from a virtual camera is generated. For example, the image generation unit 120 generates a virtual space image for the player that can be seen from the virtual camera for the player in the virtual space. By displaying the virtual space image for the player generated in this way on the HMD 200, the player can experience the world of VR. The virtual space image for the player generated by the image generation unit 120 may be displayed on a display device of a type different from that of the HMD 200. For example, it may be displayed on a display device having a dome-shaped display screen that covers the player's field of view.

また画像生成部１２０は、カメラ１５０の位置に対応する仮想空間の位置に配置された撮影用の仮想カメラから見える仮想空間画像を生成する。撮影用の仮想カメラは、カメラ１５０により撮影された実空間画像の合成対象となる仮想空間画像を撮影するための仮想カメラである。撮影用の仮想カメラは、仮想空間において、実空間のカメラ１５０の位置に対応する位置に配置される。一例としてはカメラ１５０は、実空間の被写体の前方において被写体に正対するように配置される。即ち被写体が撮影範囲に入るようにカメラ１５０が配置される。この場合には撮影用の仮想カメラは、被写体に対応する仮想空間のオブジェクトの前方において、当該オブジェクトに正対するように配置される。この場合に撮影用の仮想カメラと当該オブジェクトとのカメラ距離は、カメラ１５０と被写体とのカメラ距離に対応する距離に設定される。被写体がプレーヤである場合を例にとれば、カメラ１５０は、実空間のプレーヤの前方においてプレーヤに正対するように配置される。そして撮影用の仮想カメラは、プレーヤに対応するプレーヤキャラクタ（プレーヤ移動体）の前方において、プレーヤキャラクタに正対するように配置される。この場合に撮影用の仮想カメラとプレーヤキャラクタとの距離は、カメラ１５０とプレーヤとの距離に対応する距離に設定される。 Further, the image generation unit 120 generates a virtual space image that can be seen from a shooting virtual camera arranged at a position in the virtual space corresponding to the position of the camera 150. The virtual camera for shooting is a virtual camera for shooting a virtual space image to be combined with the real space image shot by the camera 150. The virtual camera for shooting is arranged at a position corresponding to the position of the camera 150 in the real space in the virtual space. As an example, the camera 150 is arranged in front of the subject in the real space so as to face the subject. That is, the camera 150 is arranged so that the subject is within the shooting range. In this case, the virtual camera for shooting is arranged in front of the object in the virtual space corresponding to the subject so as to face the object. In this case, the camera distance between the virtual camera for shooting and the object is set to a distance corresponding to the camera distance between the camera 150 and the subject. Taking the case where the subject is a player as an example, the camera 150 is arranged in front of the player in the real space so as to face the player. Then, the virtual camera for shooting is arranged in front of the player character (player moving body) corresponding to the player so as to face the player character. In this case, the distance between the virtual camera for shooting and the player character is set to a distance corresponding to the distance between the camera 150 and the player.

画像合成部１２２は、第１画像と第２画像の差分画像を求めることで被写体の画像を抽出する。即ち背景及び被写体が映る第１画像と、背景が映る第２画像の差分画像から、被写体の画像を抽出する処理を行う。そして画像合成部１２２は、仮想空間画像に被写体の画像が合成された合成画像を生成する。この合成画像は例えばギャラリー用表示装置２１０に表示される。 The image synthesizing unit 122 extracts the image of the subject by obtaining the difference image between the first image and the second image. That is, the process of extracting the image of the subject from the difference image between the first image in which the background and the subject are reflected and the second image in which the background is reflected is performed. Then, the image composition unit 122 generates a composite image in which the image of the subject is combined with the virtual space image. This composite image is displayed, for example, on the gallery display device 210.

なお合成画像が表示される表示装置は、ギャラリー用表示装置２１０には限定されない。例えば合成画像をネットワークを介して配信して、ＰＣやゲーム装置等の端末装置に表示してもよい。配信としては、例えばインターネットやサーバ等を用いた配信が考えられる。或いはＨＭＤ２００の表示領域の一部の領域に、仮想空間画像に被写体の画像が合成された合成画像を表示してもよい。 The display device on which the composite image is displayed is not limited to the gallery display device 210. For example, a composite image may be distributed via a network and displayed on a terminal device such as a PC or a game device. As the distribution, for example, distribution using the Internet, a server, or the like can be considered. Alternatively, a composite image in which the image of the subject is combined with the virtual space image may be displayed in a part of the display area of the HMD 200.

また画像合成部１２２は、実空間において被写体が搭乗する筐体３０の画像を抽出する。そして仮想空間画像に被写体の画像及び筐体３０の画像が合成された合成画像を生成する。この合成画像はギャラリー用表示装置２１０等の表示装置に表示される。例えば被写体であるプレーヤは、ライド筐体である筐体３０に搭乗することで、ＶＲゲーム（シミュレーションゲーム）のプレイを楽しむ。例えば仮想空間画像には、筐体３０に対応する車、船又はロボット等の移動体が表示される。筐体３０が可動筐体である場合には、筐体３０の姿勢が変化するように筐体３０が制御される。これにより、筐体３０に搭乗するプレーヤのプレイ位置が様々に変化する。画像合成部１２２は、プレーヤなどの被写体の画像のみならず、このような筐体３０の画像が仮想空間画像に合成された合成画像を生成する。これにより、実空間においてプレーヤが筐体３０に搭乗している場合に、筐体３０にプレーヤが搭乗している様子を表す実空間画像が、仮想空間画像に合成された合成画像を生成できる。そしてこの合成画像が、ギャラリー用表示装置２１０などの表示装置に表示される。 Further, the image synthesizing unit 122 extracts an image of the housing 30 on which the subject is boarded in the real space. Then, a composite image is generated in which the image of the subject and the image of the housing 30 are combined with the virtual space image. This composite image is displayed on a display device such as a gallery display device 210. For example, a player who is a subject enjoys playing a VR game (simulation game) by boarding a housing 30 which is a riding housing. For example, in the virtual space image, a moving body such as a car, a ship, or a robot corresponding to the housing 30 is displayed. When the housing 30 is a movable housing, the housing 30 is controlled so that the posture of the housing 30 changes. As a result, the play position of the player boarding the housing 30 changes variously. The image composition unit 122 generates not only an image of a subject such as a player, but also a composite image in which such an image of the housing 30 is combined with a virtual space image. As a result, when the player is boarding the housing 30 in the real space, the real space image showing the state in which the player is boarding the housing 30 can generate a composite image synthesized with the virtual space image. Then, this composite image is displayed on a display device such as a gallery display device 210.

また画像合成部１２２は、筐体３０の画像の抽出範囲を指定する筐体マスク画像を用いて、筐体３０の画像を抽出する。例えば第１画像等のカラー画像から筐体３０の画像が抽出される。例えばオペレータが、手動で筐体３０の概形をなぞるような操作を行って抽出範囲を指定することで、範囲指定用の筐体マスク画像が生成される。なお筐体３０の画像の抽出範囲（エッジ）を画像処理により自動的に認識して筐体マスク画像を生成する変形実施も可能である。筐体マスク画像は、筐体３０の抽出範囲を他の背景から識別するためのマスク画像である。この筐体マスク画像により抽出範囲が指定されることで、筐体３０の画像は、実際には背景に対応する画像であるのにも関わらず、被写体と同様に抽出されて、仮想空間画像に合成されるようになる。筐体マスク画像を用いた抽出処理の詳細は後述する。 Further, the image synthesizing unit 122 extracts the image of the housing 30 by using the housing mask image that specifies the extraction range of the image of the housing 30. For example, the image of the housing 30 is extracted from the color image such as the first image. For example, the operator manually performs an operation such as tracing the outline of the housing 30 to specify the extraction range, so that the housing mask image for specifying the range is generated. It is also possible to perform deformation to automatically recognize the extraction range (edge) of the image of the housing 30 by image processing and generate a housing mask image. The housing mask image is a mask image for identifying the extraction range of the housing 30 from other backgrounds. By designating the extraction range by this housing mask image, the image of the housing 30 is extracted in the same manner as the subject even though it is actually an image corresponding to the background, and becomes a virtual space image. It will be synthesized. Details of the extraction process using the housing mask image will be described later.

また画像合成部１２２は、被写体が装着する少なくとも１つのトラッキング装置からのトラッキング情報に基づいて、被写体の画像の抽出範囲を設定して、被写体の画像を抽出する。例えばウェアラブル機器として被写体が装着するトラッキング装置からのトラッキング情報に基づいて、トラッキング装置の位置情報等を取得し、取得された位置情報等に基づいて、被写体の画像の抽出範囲を設定する。例えば被写体が複数のトラッキング装置を装着する場合には、これらの複数のトラッキング装置の位置を内包する範囲を、被写体の画像の抽出範囲に設定し、その抽出範囲を抽出処理の対象範囲として、被写体の画像を抽出する。なおトラッキング装置は、被写体が装着するＨＭＤ２００に内蔵されるトラッキング装置２０６であってもよい。 Further, the image synthesizing unit 122 sets an extraction range of the image of the subject based on the tracking information from at least one tracking device worn by the subject, and extracts the image of the subject. For example, the position information of the tracking device is acquired based on the tracking information from the tracking device worn by the subject as a wearable device, and the extraction range of the image of the subject is set based on the acquired position information and the like. For example, when the subject is equipped with a plurality of tracking devices, the range including the positions of the plurality of tracking devices is set as the extraction range of the image of the subject, and the extraction range is set as the target range of the extraction process. Extract the image of. The tracking device may be a tracking device 206 built in the HMD 200 worn by the subject.

また画像合成部１２２は、トラッキング装置の位置と、トラッキング装置の位置から所与の距離だけシフトした位置に設定された補助点の位置とに基づいて、被写体の画像の抽出範囲を設定する。例えばトラッキング情報からトラッキング装置の配置方向の情報を取得する。そして、トラッキング装置の位置から、トラッキング装置の配置方向に沿って所与の距離だけシフト（オフセット）した位置を、補助点の位置に設定する。そしてトラッキング装置の位置と補助点の位置を内包するような範囲を抽出範囲に設定して、被写体の画像を抽出する。 Further, the image synthesizing unit 122 sets the extraction range of the image of the subject based on the position of the tracking device and the position of the auxiliary point set to the position shifted by a predetermined distance from the position of the tracking device. For example, information on the placement direction of the tracking device is acquired from the tracking information. Then, a position shifted (offset) by a predetermined distance from the position of the tracking device along the arrangement direction of the tracking device is set as the position of the auxiliary point. Then, the extraction range is set to include the position of the tracking device and the position of the auxiliary point, and the image of the subject is extracted.

また画像生成部１２０は、被写体であるプレーヤに表示されるプレーヤ用の仮想空間画像として、仮想空間において撮影用の仮想カメラの位置に対応する位置に、仮想カメラの画像及び撮影者キャラクタの画像の少なくとも一方が表示される仮想空間画像を生成する。例えば撮影用の仮想カメラの位置及び方向に対応する位置及び方向で、撮影用の仮想カメラを表す３次元モデルである仮想カメラオブジェクトを配置する。或いは、撮影用の仮想カメラを用いて撮影を行う撮影者キャラクタの画像を、仮想カメラの位置に対応する位置に配置する。そして撮影者キャラクタが、撮影動作のモーションを行ったり、撮影に関連するセリフを発声するなどの演出処理を行う。 Further, the image generation unit 120 sets the image of the virtual camera and the image of the photographer character at a position corresponding to the position of the virtual camera for shooting in the virtual space as a virtual space image for the player displayed on the player as the subject. Generate a virtual space image that displays at least one. For example, a virtual camera object that is a three-dimensional model representing a virtual camera for shooting is arranged at a position and direction corresponding to the position and direction of the virtual camera for shooting. Alternatively, the image of the photographer character to be photographed using the virtual camera for photographing is arranged at a position corresponding to the position of the virtual camera. Then, the photographer character performs an effect process such as performing a motion of the shooting operation or uttering a line related to the shooting.

また本実施形態の画像生成システム（シミュレーションシステム）は、被写体であるプレーヤが装着し、仮想空間においてプレーヤ用の仮想カメラから見えるプレーヤ用の仮想空間画像が表示されるＨＭＤ２００と、合成画像がギャラリー用画像をして表示されるギャラリー用表示装置２１０を含む。このようにすることで、画像生成部１２０により生成されたプレーヤ用の仮想空間画像については、ＨＭＤ２００に表示され、プレーヤは、仮想空間においてプレーヤ用の仮想カメラから見える画像を、ＶＲ画像としてＨＭＤ２００を介して見ることが可能になる。一方、仮想空間画像に被写体の画像が合成された合成画像についてはギャラリー用表示装置２１０に表示され、ギャラリーは、仮想空間においてプレーヤがどのような状況になっており、どのような行動をとっているかを、ギャラリー用画像を見ることで把握できるようになる。 Further, the image generation system (simulation system) of the present embodiment is mounted on the player who is the subject and displays the virtual space image for the player which can be seen from the virtual camera for the player in the virtual space, and the composite image for the gallery. Includes a gallery display device 210 that displays images. By doing so, the virtual space image for the player generated by the image generation unit 120 is displayed on the HMD200, and the player displays the image seen from the virtual camera for the player in the virtual space as the VR image of the HMD200. It will be possible to see through. On the other hand, the composite image in which the image of the subject is combined with the virtual space image is displayed on the gallery display device 210, and the gallery shows what kind of situation the player is in in the virtual space and what kind of action is taken. You can see if it is by looking at the image for the gallery.

また取得部１０２は、背景及び被写体をカメラ１５０により撮影したデプス画像を取得する。例えばカメラ１５０（デプスカメラ１５４）から見たときの背景及び被写体の奥行き情報を、デプス画像として取得する。そして画像合成部１２２は、差分画像とデプス画像に基づいて、被写体の画像を抽出する。即ち被写体及び背景が映る第１画像と背景が映る第２画像の差分画像と、デプス画像を用いて、被写体の画像を抽出する。例えば差分画像とデプス画像の両方により被写体に対応する画素であると判断される画素を、被写体の画素として、被写体の画像を抽出する。 Further, the acquisition unit 102 acquires a depth image of the background and the subject taken by the camera 150. For example, the depth information of the background and the subject when viewed from the camera 150 (depth camera 154) is acquired as a depth image. Then, the image synthesizing unit 122 extracts the image of the subject based on the difference image and the depth image. That is, the image of the subject is extracted by using the difference image between the first image showing the subject and the background and the second image showing the background, and the depth image. For example, the image of the subject is extracted by using the pixels determined to be the pixels corresponding to the subject by both the difference image and the depth image as the pixels of the subject.

また画像合成部１２２は、差分画像に基づいて差分マスク画像を生成する。例えば差分画像の二値化処理などを行って差分マスク画像を生成する。この場合に画像合成部１２２は、差分画像とデプス画像（デプスマスク画像）とに基づいて差分マスク画像を生成してもよい。例えば差分画像の二値化処理を行って得られたマスク画像と、デプス画像から得られたデプスマスク画像とのＡＮＤ（論理積）をとるなどして、差分マスク画像を生成してもよい。この場合にＡＮＤをとる対象となるデプスマスク画像は、後述する補正処理前のデプスマスク画像でもよいし、補正処理後のデプスマスク画像でもよい。また画像合成部１２２は、デプス画像に基づいて、デプス値が所与のデプス範囲（デプス有効範囲）となる画素を識別するデプスマスク画像を生成する。例えばデプス範囲は、デプス値が、ニア側の第１デプス値以上となり、ファー側の第２デプス値以下となる範囲である。画像合成部１２２は、デプス範囲の画素が第１画素値（例えば白の画素値）となり、当該デプス範囲外の画素が第２画素値（例えば黒の画素値）となるデプスマスク画像を生成する。そして画像合成部１２２は、差分マスク画像とデプスマスク画像に基づいて、被写体を識別する被写体マスク画像を生成する。例えば被写体の範囲の画素が第１画素値（例えば白の画素値）となり、被写体の範囲外の画素が第２画素値（例えば黒の画素値）となるような被写体マスク画像を生成する。そして画像合成部１２２は、被写体マスク画像と第１画像に基づいて、被写体の画像を抽出する。例えば第１画像において、被写体マスク画像により指定される範囲を被写体の画像と判断して、被写体の画像を抽出する。 Further, the image synthesizing unit 122 generates a difference mask image based on the difference image. For example, a difference mask image is generated by performing a binarization process of the difference image. In this case, the image synthesizing unit 122 may generate a difference mask image based on the difference image and the depth image (depth mask image). For example, the difference mask image may be generated by ANDing (logical producting) the mask image obtained by performing the binarization process of the difference image and the depth mask image obtained from the depth image. In this case, the depth mask image to be ANDed may be a depth mask image before correction processing, which will be described later, or a depth mask image after correction processing. Further, the image synthesizing unit 122 generates a depth mask image that identifies pixels having a depth value within a given depth range (depth effective range) based on the depth image. For example, the depth range is a range in which the depth value is equal to or greater than the first depth value on the near side and equal to or less than the second depth value on the far side. The image composition unit 122 generates a depth mask image in which the pixels in the depth range have the first pixel value (for example, the white pixel value) and the pixels outside the depth range have the second pixel value (for example, the black pixel value). .. Then, the image synthesizing unit 122 generates a subject mask image for identifying the subject based on the difference mask image and the depth mask image. For example, a subject mask image is generated such that the pixels in the range of the subject have the first pixel value (for example, the white pixel value) and the pixels outside the range of the subject have the second pixel value (for example, the black pixel value). Then, the image synthesizing unit 122 extracts the image of the subject based on the subject mask image and the first image. For example, in the first image, the range specified by the subject mask image is determined to be the subject image, and the subject image is extracted.

また画像合成部１２２は、デプスマスク画像の補正処理を行い、補正処理後のデプスマスク画像と差分マスク画像に基づいて被写体マスク画像を生成する。例えば画像合成部１２２は、デプス値が所与のデプス範囲となる画素を識別するデプスマスク画像に対して、ノイズ等を除去するための補正処理を行う。そして補正処理後のデプスマスク画像と、第１画像と第２画像の差分画像から得られた差分マスク画像とに基づいて、被写体マスク画像を生成する。 Further, the image synthesizing unit 122 corrects the depth mask image and generates a subject mask image based on the corrected depth mask image and the difference mask image. For example, the image synthesizing unit 122 performs correction processing for removing noise or the like on a depth mask image that identifies a pixel whose depth value is in a given depth range. Then, a subject mask image is generated based on the depth mask image after the correction process and the difference mask image obtained from the difference image between the first image and the second image.

例えば画像合成部１２２は、背景及び被写体をカメラ１５０（デプスカメラ１５４）により撮影した第１デプス画像と、背景をカメラ１５０（デプスカメラ１５４）により撮影した第２デプス画像との差分デプスマスク画像を生成する。即ち第１デプス画像と第２デプス画像の差分画像である差分デプスマスク画像を生成する。そして画像合成部１２２は、差分デプスマスク画像に基づいて、補正処理後のデプスマスク画像を生成する。即ちデプスマスク画像の補正処理として、差分デプスマスク画像を生成する処理を行う。 For example, the image synthesizing unit 122 creates a difference depth mask image between the first depth image of the background and the subject taken by the camera 150 (depth camera 154) and the second depth image of the background taken by the camera 150 (depth camera 154). Generate. That is, a difference depth mask image, which is a difference image between the first depth image and the second depth image, is generated. Then, the image synthesizing unit 122 generates a depth mask image after the correction process based on the difference depth mask image. That is, as a correction process for the depth mask image, a process for generating a difference depth mask image is performed.

また画像合成部１２２は、モルフォロジーフィルター処理及び時系列フィルター処理の少なくとも一方を行うことで、補正処理後のデプスマスク画像を生成する。即ちデプスマスク画像の補正処理として、モルフォロジーフィルター処理及び時系列フィルター処理の少なくとも一方を行って、ノイズ等を除去する。モルフォロジーフィルター処理は、膨張伸縮化のフィルター処理である。時系列フィルター処理は、平滑化処理であり、例えば所定数のフレーム以上、連続して所定値以上の差分値があるときにマスクとして有効にする処理である。 Further, the image composition unit 122 generates a depth mask image after the correction process by performing at least one of the morphology filter process and the time series filter process. That is, as the correction process of the depth mask image, at least one of the morphology filter process and the time series filter process is performed to remove noise and the like. The morphology filtering process is an expansion / contraction filtering process. The time-series filter process is a smoothing process, which is a process of enabling as a mask when, for example, there are a predetermined number of frames or more and a continuous difference value of a predetermined value or more.

また画像合成部１２２は、デプス画像においてデプス値が取得できなかった画素の画素値を、差分画像に基づき設定する処理を行うことで、補正処理後のデプスマスク画像を生成する。例えばデプスカメラ１５４のステレオカメラにより、デプス値が取得できなかった画素をブランク画素として設定する。そして、ブランク画素については、カラー画像である第１画像と第２画像の差分画像に基づいて設定する。例えば差分画像に基づく差分マスク画像を用いて、デプスマスク画像におけるブランク画素の画素値を埋める処理を行う。 Further, the image synthesizing unit 122 generates a depth mask image after the correction process by performing a process of setting the pixel values of the pixels for which the depth value could not be acquired in the depth image based on the difference image. For example, with the stereo camera of the depth camera 154, the pixels for which the depth value could not be acquired are set as blank pixels. Then, the blank pixel is set based on the difference image between the first image and the second image, which are color images. For example, using a difference mask image based on the difference image, a process of filling the pixel values of blank pixels in the depth mask image is performed.

また画像合成部１２２は、デプス値がデプス範囲となる画素群（隣り合う画素群）の領域サイズを求める。例えばデプス値が、所与のデプス範囲となる画素群の画素のカウント処理を行うことで、領域サイズを求める。そして領域サイズが最も大きい領域や、或いは領域サイズが所定サイズ以上の領域の画素群を、被写体に対応する画素群と判断して、デプスマスク画像を生成する。 Further, the image synthesizing unit 122 obtains the area size of the pixel group (adjacent pixel group) whose depth value is in the depth range. For example, the area size is obtained by counting the pixels of a pixel group whose depth value is in a given depth range. Then, the pixel group of the region having the largest region size or the region having the region size of a predetermined size or more is determined as the pixel group corresponding to the subject, and the depth mask image is generated.

また画像合成部１２２は、被写体の領域と判断される被写体領域でのデプス値に基づいて第２デプス範囲を設定する。例えば上記のデプス範囲よりも狭い第２デプス範囲を設定する。そしてデプス値が第２デプス範囲となる画素を識別する画像を、デプスマスク画像として生成する。即ちデプス値が第２デプス範囲となる画素群を、被写体に対応する画素群と判断して、デプスマスク画像を生成する。 Further, the image composition unit 122 sets the second depth range based on the depth value in the subject area determined to be the area of the subject. For example, a second depth range narrower than the above depth range is set. Then, an image that identifies the pixel whose depth value is in the second depth range is generated as a depth mask image. That is, the pixel group whose depth value is in the second depth range is determined to be the pixel group corresponding to the subject, and the depth mask image is generated.

なお本実施形態では、プレーヤがプレイするゲームのゲーム処理として、仮想現実のシミュレーション処理を行う。仮想現実のシミュレーション処理は、実空間での事象を仮想空間で模擬するためのシミュレーション処理であり、当該事象をプレーヤに仮想体験させるための処理である。例えば実空間のプレーヤに対応するプレーヤキャラクタやその搭乗移動体などの移動体を、仮想空間で移動させたり、移動に伴う環境や周囲の変化をプレーヤに体感させるための処理を行う。 In the present embodiment, virtual reality simulation processing is performed as game processing of the game played by the player. The virtual reality simulation process is a simulation process for simulating an event in the real space in the virtual space, and is a process for allowing the player to experience the event virtually. For example, a moving body such as a player character corresponding to a player in a real space or a moving body on board the player is moved in a virtual space, or a process is performed to allow the player to experience changes in the environment and surroundings due to the movement.

また図１の本実施形態の画像生成システムの処理は、業務用ゲーム装置や家庭用ゲーム装置などの処理装置、施設に設置されるＰＣ等の処理装置、プレーヤが背中等に装着する処理装置（バックパックＰＣ）、或いはこれらの処理装置の分散処理などにより実現できる。この場合に例えば画像合成部１２２の処理を実現する処理装置と、画像生成部１２０やゲーム処理部１０６などの他の処理を実現する処理装置を、別のハードウェア装置により実現してもよい。或いは、本実施形態の画像生成システムの処理を、サーバシステムと端末装置により実現してもよい。例えばサーバシステムと端末装置の分散処理などにより実現してもよい。 Further, the processing of the image generation system of the present embodiment of FIG. 1 includes a processing device such as a business game device and a home game device, a processing device such as a PC installed in a facility, and a processing device worn by a player on the back or the like ( It can be realized by backpack PC) or distributed processing of these processing devices. In this case, for example, a processing device that realizes the processing of the image synthesizing unit 122 and a processing device that realizes other processing such as the image generation unit 120 and the game processing unit 106 may be realized by another hardware device. Alternatively, the processing of the image generation system of the present embodiment may be realized by the server system and the terminal device. For example, it may be realized by distributed processing of the server system and the terminal device.

２．トラッキング処理
次にトラッキング処理の例について説明する。図２（Ａ）にＨＭＤ２００の一例を示す。ＨＭＤ２００には複数の受光素子２０１、２０２、２０３（フォトダイオード）が設けられている。受光素子２０１、２０２はＨＭＤ２００の前面側に設けられ、受光素子２０３はＨＭＤ２００の右側面に設けられている。またＨＭＤの左側面、上面等にも不図示の受光素子が設けられている。またプレーヤＰＬは、所定部位である手にトラッキング装置２５０、２６０を装着している。右手に装着されるトラッキング装置２５０には、ＨＭＤ２００と同様に複数の受光素子２５１〜２５４が設けられている。左手に装着されるトラッキング装置２６０にも複数の受光素子２６１〜２６４（不図示）が設けられている。このような受光素子が設けられたトラッキング装置２５０、２６０を用いることで、ＨＭＤ２００の場合と同様に、手等の所定部位の位置、方向等の部位情報を特定できるようになる。なおトラッキング装置が装着されるプレーヤＰＬの所定部位は、手には限定されず、例えば足、頭部、胸部、腹部又は腰等の種々の部位を想定できる。 2. 2. Tracking process Next, an example of tracking process will be described. FIG. 2A shows an example of HMD200. The HMD 200 is provided with a plurality of light receiving elements 201, 202, 203 (photodiodes). The light receiving elements 201 and 202 are provided on the front surface side of the HMD 200, and the light receiving elements 203 are provided on the right side surface of the HMD 200. Further, a light receiving element (not shown) is also provided on the left side surface, the upper surface, and the like of the HMD. Further, the player PL has the tracking devices 250 and 260 attached to the hand, which is a predetermined portion. Similar to the HMD 200, the tracking device 250 mounted on the right hand is provided with a plurality of light receiving elements 251 to 254. The tracking device 260 mounted on the left hand is also provided with a plurality of light receiving elements 261 to 264 (not shown). By using the tracking devices 250 and 260 provided with such a light receiving element, it becomes possible to specify the part information such as the position and direction of a predetermined part such as a hand, as in the case of the HMD200. The predetermined portion of the player PL on which the tracking device is mounted is not limited to the hand, and various regions such as the legs, head, chest, abdomen, and hips can be assumed.

図２（Ｂ）に示すように、筐体３０の周辺には、ベースステーション２８０、２８４が設置されている。ベースステーション２８０には発光素子２８１、２８２が設けられ、ベースステーション２８４には発光素子２８５、２８６が設けられている。発光素子２８１、２８２、２８５、２８６は、例えば赤外線レーザー等のレーザーを出射するＬＥＤにより実現される。ベースステーション２８０、２８４は、これら発光素子２８１、２８２、２８５、２８６を用いて、例えばレーザーを放射状に出射する。そして図２（Ａ）のＨＭＤ２００に設けられた受光素子２０１〜２０３等が、ベースステーション２８０、２８４からのレーザーを受光することで、ＨＭＤ２００のトラッキング処理が実現され、プレーヤＰＬの頭の位置や向く方向（視点位置、視線方向）を検出できるようになる。例えばプレーヤＰＬの位置情報や姿勢情報（方向情報）を検出できるようになる。 As shown in FIG. 2B, base stations 280 and 284 are installed around the housing 30. The base station 280 is provided with light emitting elements 281 and 282, and the base station 284 is provided with light emitting elements 285 and 286. The light emitting elements 281, 282, 285, and 286 are realized by LEDs that emit a laser such as an infrared laser. The base stations 280 and 284 use these light emitting elements 281, 282, 285 and 286 to emit, for example, a laser radially. Then, the light receiving elements 201 to 203 and the like provided in the HMD200 of FIG. 2A receive the laser from the base stations 280 and 284, so that the tracking process of the HMD200 is realized, and the position and orientation of the head of the player PL are oriented. The direction (viewpoint position, line-of-sight direction) can be detected. For example, the position information and attitude information (direction information) of the player PL can be detected.

またトラッキング装置２５０、２６０に設けられる受光素子２５１〜２５４、２６１〜２６４が、ベースステーション２８０、２８４からのレーザーを受光することで、プレーヤＰＬの手（所定部位）の位置及び方向の少なくとも一方を検出できるようになる。なおトラッキング装置２５０、２６０をリープモーションのトラッカーにより実現してもよい。 Further, the light receiving elements 251 to 254 and 261 to 264 provided in the tracking devices 250 and 260 receive the laser from the base stations 280 and 284 to receive at least one of the positions and directions of the hands (predetermined parts) of the player PL. It will be possible to detect. Note that the tracking devices 250 and 260 may be realized by a Leap Motion tracker.

また筐体３０の前方には、カラーカメラ１５２とデプスカメラ１５４を有するカメラ１５０が設置されている。例えばカメラ１５０は、プレーヤＰＬや筐体３０の前方において、プレーヤＰＬや筐体３０に正対するように配置される。そして後述の図４（Ａ）、図４（Ｂ）で説明する撮影用の仮想カメラＶＣＭは、カメラ１５０の位置に対応する仮想空間の位置に配置される。例えば仮想カメラＶＣＭは、プレーヤＰＬに対応するプレーヤキャラクタや、筐体３０に対応する車等の移動体の前方において、プレーヤキャラクタや移動体に正対するように配置される。またプレーヤキャラクタや移動体と仮想カメラＶＣＭとのカメラ距離も、プレーヤＰＬや筐体３０とカメラ１５０とのカメラ距離に対応する距離に設定される。 A camera 150 having a color camera 152 and a depth camera 154 is installed in front of the housing 30. For example, the camera 150 is arranged in front of the player PL and the housing 30 so as to face the player PL and the housing 30. Then, the virtual camera VCM for shooting described with reference to FIGS. 4 (A) and 4 (B) described later is arranged at a position in the virtual space corresponding to the position of the camera 150. For example, the virtual camera VCM is arranged in front of a moving body such as a player character corresponding to the player PL or a car corresponding to the housing 30 so as to face the player character or the moving body. Further, the camera distance between the player character or the moving body and the virtual camera VCM is also set to a distance corresponding to the camera distance between the player PL or the housing 30 and the camera 150.

３．筐体
図３に筐体３０の構成例を示す。図３の筐体３０では、底部３２の上にカバー部３３が設けられ、その上に、ベース部３４（台座部）が設けられる。ベース部３４は底部３２に対して対向するように設けられる。ベース部３４には、シート６２を有するライド部６０が設けられる。プレーヤＰＬは、このライド部６０のシート６２に着座する。またベース部３４には、移動部４０が設けられており、移動部４０には、ハンドル５０やアクセルペダル５２、ブレーキペダル５４（不図示）などの操作部１６０や、プレーヤに対して送風を行う送風機８０が設けられている。この移動部４０は、前後方向に移動自在となっている。 3. 3. Case Figure 3 shows a configuration example of the case 30. In the housing 30 of FIG. 3, a cover portion 33 is provided on the bottom portion 32, and a base portion 34 (pedestal portion) is provided on the cover portion 33. The base portion 34 is provided so as to face the bottom portion 32. The base portion 34 is provided with a ride portion 60 having a seat 62. The player PL is seated on the seat 62 of the ride unit 60. Further, the base portion 34 is provided with a moving portion 40, and the moving portion 40 blows air to an operating unit 160 such as a handle 50, an accelerator pedal 52, and a brake pedal 54 (not shown), and a player. A blower 80 is provided. The moving portion 40 is movable in the front-rear direction.

また筐体３０の前方には、カメラ１５０が配置されている。このようなカメラ１５０を設けることで、筐体３０や、筐体３０に搭乗するプレーヤを撮影することが可能になる。 A camera 150 is arranged in front of the housing 30. By providing such a camera 150, it becomes possible to take a picture of the housing 30 and the player boarding the housing 30.

筐体３０は、ゲーム処理の結果（ゲーム状況）に応じて、プレーヤのプレイ位置を変化させる。例えば本実施形態では、プレーヤがプレイするゲームのゲーム処理として、仮想現実のシミュレーション処理を行う。例えば実空間のプレーヤに対応するプレーヤキャラクタが搭乗する移動体（車、ロボット等）を、仮想空間で移動させたり、移動に伴う環境や周囲の変化をプレーヤに体感させるための処理を行う。そして筐体３０は、ゲーム処理であるシミュレーション処理の結果に基づいてプレイ位置を変化させる。例えばプレーヤキャラクタが搭乗する移動体（或いはプレーヤキャラクタ）の仮想空間での移動処理の結果等に基づいて、プレイ位置を変化させる。例えばレースゲームでは、車（レーシングカー）の移動の際の加速や減速や方向の変化に伴う加速度を、プレーヤに体感させるためのシミュレーション処理として、プレイ位置を変化させる処理を行う。或いは敵からの攻撃が車にヒットした場合に、その衝撃をプレーヤに体感させるためのシミュレーション処理として、プレイ位置を変化させる処理を行う。ここでプレイ位置は、仮想現実（ＶＲ）のシミュレーションゲームをプレイする際にプレーヤが位置するプレイポジションである。例えばプレイ位置は、プレーヤのライド部６０のライド位置である。 The housing 30 changes the play position of the player according to the result of the game processing (game situation). For example, in the present embodiment, a virtual reality simulation process is performed as a game process of a game played by a player. For example, a moving body (car, robot, etc.) on which a player character corresponding to a player in a real space is boarded is moved in a virtual space, and processing is performed so that the player can experience changes in the environment and surroundings due to the movement. Then, the housing 30 changes the play position based on the result of the simulation process which is the game process. For example, the play position is changed based on the result of movement processing in the virtual space of the moving body (or player character) on which the player character is boarded. For example, in a racing game, a process of changing the play position is performed as a simulation process for the player to experience acceleration or deceleration when the car (racing car) is moving or acceleration due to a change in direction. Alternatively, when an attack from an enemy hits a car, a process of changing the play position is performed as a simulation process for allowing the player to experience the impact. Here, the play position is a play position in which the player is located when playing a virtual reality (VR) simulation game. For example, the play position is the ride position of the ride unit 60 of the player.

具体的には図３において、底部３２とベース部３４の間には、その四隅に、不図示の第１〜第４のエアバネ部（広義には伸縮部）が設けられている。これらの第１〜第４のエアバネ部は、エアコンプレッサやバブルを用いて空気の供給や排出が行われることで、図３のＹ軸方向において伸縮する。例えば全ての第１〜第４のエアバネ部が伸びたり、縮むことで、ベース部３４を、Ｙ軸方向で上側や下側に移動させることができる。これらの上下方向でのベース部３４の移動により、例えばコースの路面状態の再現などが可能になる。例えば少ないストロークで、且つ、速い速度で上下方向にベース部３４を移動させることで、路面の凹凸（ガタガタ道）などを表現できる。 Specifically, in FIG. 3, between the bottom portion 32 and the base portion 34, first to fourth air spring portions (expandable portions in a broad sense) (not shown) are provided at four corners thereof. These first to fourth air spring portions expand and contract in the Y-axis direction of FIG. 3 by supplying and discharging air using an air compressor or a bubble. For example, the base portion 34 can be moved upward or downward in the Y-axis direction by expanding or contracting all the first to fourth air spring portions. By moving the base portion 34 in the vertical direction, for example, it is possible to reproduce the road surface condition of the course. For example, by moving the base portion 34 in the vertical direction with a small stroke and at a high speed, unevenness (rattling road) of the road surface can be expressed.

また、四隅の第１〜第４のエアバネ部のうちの前側及び後ろ側の一方のエアバネ部が伸び、他方のエアバネ部が縮むことで、ベース部３４を、Ｘ軸の回りにピッチングさせることができる。また四隅の第１〜第４のエアバネ部のうちの左側及び右側の一方のエアバネ部が伸び、他方のエアバネ部が縮むことで、ベース部３４を、Ｚ軸の回りにローリングさせことができる。このようなピッチングやローリングを行うことで、車等の移動体の移動に伴う加速感、減速感や、コーナリング時の慣性力をプレーヤに体感させることができる。これによりプレーヤの仮想現実感を向上できると共に、いわゆる３Ｄ酔いを抑制することも可能になる。 Further, one of the front and rear air spring portions of the first to fourth air spring portions at the four corners extends and the other air spring portion contracts, so that the base portion 34 can be pitched around the X axis. it can. Further, the base portion 34 can be rolled around the Z axis by extending one of the left and right air spring portions of the first to fourth air spring portions at the four corners and contracting the other air spring portion. By performing such pitching and rolling, the player can experience the feeling of acceleration and deceleration associated with the movement of a moving body such as a car, and the inertial force at the time of cornering. As a result, the virtual reality of the player can be improved, and so-called 3D sickness can be suppressed.

４．本実施形態の手法
次に本実施形態の手法について説明する。なお、以下では、プレーヤキャラクタが車に搭乗して他の車と競争するレースゲームに、本実施形態の手法を適用した場合について主に説明する。但し本実施形態の手法が適用されるゲームはこのようなレースゲームには限定されない。例えば本実施形態の手法は、レースゲーム以外の種々のゲーム（車以外の競争ゲーム、ロボットゲーム、ＦＰＳ（First Person shooter）や格闘ゲーム等の対戦ゲーム、電車や飛行機等の乗り物のシミュレーションゲーム、ＲＰＧ、アクションゲーム、仮想体験ゲーム、スポーツゲーム、ホラー体験ゲーム、或いは音楽ゲーム等）に適用でき、ゲーム以外にも適用可能である。また以下では、カメラ１５０の被写体がプレーヤである場合を主に例にとり説明を行う。 4. Method of this embodiment Next, the method of this embodiment will be described. In the following, a case where the method of the present embodiment is applied to a racing game in which a player character gets into a car and competes with another car will be mainly described. However, the game to which the method of the present embodiment is applied is not limited to such a racing game. For example, the method of the present embodiment includes various games other than racing games (competition games other than cars, robot games, fighting games such as FPS (First Person shooter) and fighting games, simulation games for vehicles such as trains and airplanes, and RPGs. , Action games, virtual experience games, sports games, horror experience games, music games, etc.), and can be applied to other than games. Further, in the following description, the case where the subject of the camera 150 is a player will be mainly taken as an example.

４．１ゲームの説明
図４（Ａ）、図４（Ｂ）は本実施形態により実現されるレースゲームにおいてプレーヤに表示されるプレーヤ用の仮想空間画像の例である。本実施形態では図４（Ａ）、図４（Ｂ）のプレーヤ用の仮想空間画像が、プレーヤが装着するＨＭＤ２００に表示される。このレースゲームでは、実空間のプレーヤに対応する仮想空間のプレーヤキャラクタがドライバーとなって、移動体である車（レーシングカー）に搭乗して運転し、敵車と競争する。コースにはアイテムエリアが設定され、このアイテムエリアのアイテムを獲得して使用することで、レースゲームを優位に進めることができる。 4.1 Description of the game FIGS. 4 (A) and 4 (B) are examples of virtual space images for the player displayed to the player in the racing game realized by the present embodiment. In this embodiment, the virtual space images for the players of FIGS. 4 (A) and 4 (B) are displayed on the HMD 200 worn by the player. In this racing game, a player character in a virtual space corresponding to a player in the real space acts as a driver, rides on a moving car (racing car), and competes with an enemy car. An item area is set on the course, and by acquiring and using items in this item area, you can advance the racing game to your advantage.

図４（Ａ）、図４（Ｂ）のプレーヤ用の仮想空間画像（ゲーム画像）では、仮想空間のコースで走行するプレーヤの車ＭＶ（移動体）や敵車ＭＶＥ（敵移動体）が表示されている。また車ＭＶのハンドルＳＴや、ハンドルＳＴを操作するプレーヤキャラクタの手ＨＲ、ＨＬなどの部位も表示されている。そして実空間のプレーヤが図３のハンドル５０を左に操舵すると、図４（Ａ）に示すように仮想空間においても、車ＭＶのハンドルＳＴがプレーヤキャラクタの手ＨＲ、ＨＬにより左に操舵される仮想空間画像が表示される。また実空間のプレーヤＰＬがハンドル５０を右に操舵すると、図４（Ｂ）に示すように仮想空間においても、ハンドルＳＴがプレーヤキャラクタの手ＨＲ、ＨＬにより右に操舵される仮想空間画像が表示される。即ち実空間のプレーヤの手の動きに連動して、仮想空間のプレーヤキャラクタの手ＨＲ、ＨＬも動くような仮想空間画像が表示される。このようにすることで、あたかも本物の車を運転しているかのような仮想現実感をプレーヤに与えることができる。また図４（Ａ）、図４（Ｂ）では、実空間画像の合成対象となる仮想空間画像の撮影用の仮想カメラＶＣＭの画像も表示されている。 In the virtual space images (game images) for the players of FIGS. 4 (A) and 4 (B), the player's car MV (moving body) and enemy car MV (enemy moving body) traveling on the course of the virtual space are displayed. Has been done. In addition, parts such as the steering wheel ST of the car MV and the hands HR and HL of the player character who operates the steering wheel ST are also displayed. Then, when the player in the real space steers the steering wheel 50 in FIG. 3 to the left, the steering wheel ST of the car MV is steered to the left by the hands HR and HL of the player character even in the virtual space as shown in FIG. 4 (A). A virtual space image is displayed. Further, when the player PL in the real space steers the steering wheel 50 to the right, a virtual space image in which the steering wheel ST is steered to the right by the player character's hands HR and HL is displayed even in the virtual space as shown in FIG. 4 (B). Will be done. That is, a virtual space image is displayed in which the hands HR and HL of the player character in the virtual space move in conjunction with the movement of the player's hand in the real space. By doing so, it is possible to give the player a virtual reality as if he / she is driving a real car. Further, in FIGS. 4 (A) and 4 (B), an image of a virtual camera VCM for taking a virtual space image to be combined with the real space image is also displayed.

図５（Ａ）、図５（Ｂ）は、本実施形態により生成された仮想空間画像と実空間画像の合成画像の例である。図５（Ａ）、図５（Ｂ）においてプレーヤＰＬと筐体３０の画像は、図２（Ｂ）、図３のカメラ１５０によりプレーヤＰＬと筐体３０を撮影した実空間画像である。具体的には、カメラ１５０の撮影画像から背景差分法でプレーヤＰＬ等を抽出した実空間画像である。一方、敵キャラクタＣＨＥ、敵車ＭＶＥ、エフェクトＥＦ１、ＥＦ２、コースＣＳ等の画像は、図４（Ａ）、図４（Ｂ）の撮影用の仮想カメラＶＣＭにより撮影された仮想空間画像である。撮影用の仮想カメラＶＣＭから見える仮想空間画像に対して、プレーヤＰＬと筐体３０の実空間画像を合成することで、図５（Ａ）、図５（Ｂ）の合成画像が生成されている。この合成画像は、後述の図１２（Ｂ）に示すように例えばギャラリー用表示装置２１０に表示される。これにより、ギャラリーは、プレーヤＰＬが仮想空間においてどのようにゲームを楽しんでプレイしているのかを把握できるようになる。 5 (A) and 5 (B) are examples of a composite image of the virtual space image and the real space image generated by the present embodiment. The images of the player PL and the housing 30 in FIGS. 5A and 5B are real space images of the player PL and the housing 30 taken by the camera 150 of FIGS. 2B and 3B. Specifically, it is a real space image obtained by extracting a player PL or the like from a captured image of the camera 150 by the background subtraction method. On the other hand, the images of the enemy character CHE, the enemy vehicle MVE, the effects EF1, EF2, the course CS, and the like are virtual space images taken by the virtual camera VCM for shooting in FIGS. 4 (A) and 4 (B). By synthesizing the real space image of the player PL and the housing 30 with the virtual space image seen from the virtual camera VCM for shooting, the composite images of FIGS. 5 (A) and 5 (B) are generated. .. This composite image is displayed on, for example, a gallery display device 210 as shown in FIG. 12B described later. As a result, the gallery can grasp how the player PL enjoys playing the game in the virtual space.

そして図５（Ａ）では、撮影用の仮想カメラＶＣＭから見て、プレーヤＰＬ、筐体３０の奥側に、敵キャラクタＣＨＥ、敵車ＭＶＥが位置しているため、敵キャラクタＣＨＥ、敵車ＭＶＥの方の画像が隠面消去される。一方、図５（Ｂ）では、撮影用の仮想カメラＶＣＭから見て、敵キャラクタＣＨＥ、敵車ＭＶＥの奥側にプレーヤＰＬ、筐体３０が位置しているため、プレーヤＰＬ、筐体３０の方の画像が隠面消去される。このように本実施形態では、実空間画像であるプレーヤＰＬ、筐体３０についても、仮想空間での位置関係を反映させた隠面消去が行われる。また本実施形態では例えば仮想空間での照明情報に基づいて、プレーヤＰＬ、筐体３０へのライティング処理（シェーディング処理）が行われる。例えば仮想空間の明るさが暗い状況の場合には、プレーヤＰＬ、筐体３０の画像も暗くなり、明るい状況の場合には、プレーヤＰＬ、筐体３０の画像も明るくなるようにライティング処理が行われる。 In FIG. 5A, since the enemy character CHE and the enemy vehicle MVE are located behind the player PL and the housing 30 when viewed from the virtual camera VCM for shooting, the enemy character CHE and the enemy vehicle MVE The image of the person is erased. On the other hand, in FIG. 5B, since the player PL and the housing 30 are located behind the enemy character CHE and the enemy vehicle MVE when viewed from the virtual camera VCM for shooting, the player PL and the housing 30 The image on the other side is erased. As described above, in the present embodiment, the hidden surface is erased by reflecting the positional relationship in the virtual space of the player PL and the housing 30 which are real space images. Further, in the present embodiment, for example, a lighting process (shading process) for the player PL and the housing 30 is performed based on the lighting information in the virtual space. For example, when the brightness of the virtual space is dark, the images of the player PL and the housing 30 are also darkened, and when the virtual space is bright, the images of the player PL and the housing 30 are also brightened. It is said.

本実施形態では、実空間に配置されたカメラ１５０により、背景及び被写体を撮影した第１画像と、カメラ１５０により背景を撮影した第２画像を取得する。被写体はプレーヤＰＬなどである。またカメラ１５０の位置に対応する仮想空間の位置に配置された撮影用の仮想カメラＶＣＭから見える仮想空間画像を生成する。そして第１画像と第２画像の差分画像を求めることで、プレーヤＰＬなどの被写体の画像を抽出して、図５（Ａ）、図５（Ｂ）に示すように、仮想空間画像に被写体の画像が合成された合成画像を生成する。このようにすることで、あたかも実空間の被写体が仮想空間に出現したかのように見える合成画像を生成できるようになる。 In the present embodiment, the first image in which the background and the subject are photographed by the camera 150 arranged in the real space and the second image in which the background is photographed by the camera 150 are acquired. The subject is a player PL or the like. Further, a virtual space image that can be seen from the virtual camera VCM for shooting arranged at the position of the virtual space corresponding to the position of the camera 150 is generated. Then, by obtaining the difference image between the first image and the second image, the image of the subject such as the player PL is extracted, and as shown in FIGS. 5 (A) and 5 (B), the subject is displayed in the virtual space image. Generates a composite image in which the images are combined. By doing so, it becomes possible to generate a composite image as if a subject in the real space appeared in the virtual space.

更に本実施形態では、実空間において被写体であるプレーヤＰＬが搭乗する筐体３０の画像を抽出する。そして図５（Ａ）、図５（Ｂ）に示すように、仮想空間画像に被写体であるプレーヤＰＬの画像及び筐体３０の画像が合成された合成画像を生成する。このようにすれば、被写体であるプレーヤＰＬのみならず、プレーヤＰＬが実空間において搭乗する筐体３０が、仮想空間に出現したかのように見える合成画像を生成できるようになる。 Further, in the present embodiment, an image of the housing 30 on which the player PL, which is the subject, is boarded in the real space is extracted. Then, as shown in FIGS. 5A and 5B, a composite image is generated in which the image of the player PL as the subject and the image of the housing 30 are combined with the virtual space image. In this way, not only the player PL, which is the subject, but also the housing 30 on which the player PL is boarded in the real space can generate a composite image as if it appeared in the virtual space.

この場合に本実施形態では、図６（Ａ）に示すような筐体マスク画像ＭＳＣを用いて、図６（Ｂ）に示すような筐体３０の画像を抽出する。筐体マスク画像ＭＳＣは、筐体３０の画像の抽出範囲を指定するマスク画像であり、一例としてはオペレータの手作業により作成される。例えば筐体３０を撮影した画像を見ながら、オペレータが、筐体３０の概形をなぞるような操作を行って抽出範囲を指定することで、筐体マスク画像ＭＳＣが作成される。筐体マスク画像ＭＳＣは、例えば筐体３０の画像の抽出範囲での画素値が第１画素値（白の画素値）となり、抽出範囲外の画素値が第２画素値（黒の画素値）となるようなマスク画像である。このような筐体マスク画像ＭＳＣを用いることで、本来は背景として非抽出となってしまう筐体３０の画像を、被写体の画像と同様に抽出できるようになる。なお筐体マスク画像ＭＳＣを用いた抽出処理の具体例については後述の図１９（Ａ）〜図２１（Ｄ）において詳細に説明する。 In this case, in the present embodiment, the housing mask image MSC as shown in FIG. 6 (A) is used to extract the image of the housing 30 as shown in FIG. 6 (B). The housing mask image MSC is a mask image that specifies the extraction range of the image of the housing 30, and is created manually by an operator as an example. For example, the housing mask image MSC is created by the operator specifying the extraction range by performing an operation such as tracing the outline of the housing 30 while looking at the captured image of the housing 30. In the housing mask image MSC, for example, the pixel value in the extraction range of the image of the housing 30 is the first pixel value (white pixel value), and the pixel value outside the extraction range is the second pixel value (black pixel value). It is a mask image that becomes. By using such a housing mask image MSC, it becomes possible to extract an image of the housing 30, which is originally non-extracted as a background, in the same manner as the image of the subject. A specific example of the extraction process using the housing mask image MSC will be described in detail in FIGS. 19 (A) to 21 (D) described later.

また本実施形態では、被写体であるプレーヤに表示されるプレーヤ用の仮想空間画像として、仮想空間において撮影用の仮想カメラの位置に対応する位置に、撮影用の仮想カメラの画像及び撮影者キャラクタの画像の少なくとも一方が表示される仮想空間画像を生成する。例えば図４（Ａ）では、プレーヤキャラクタが搭乗する車ＭＶの前方に、撮影用の仮想カメラＶＣＭの画像が表示されている。また図４（Ｂ）では、車ＭＶの前方に撮影者キャラクタＣＨＭの画像が表示されている。撮影者キャラクタＣＨＭは、仮想カメラＶＣＭを車ＭＶの方に向けて、「こっちを見て」というセリフをつぶやいている。 Further, in the present embodiment, as a virtual space image for the player displayed on the player as the subject, the image of the virtual camera for shooting and the photographer character are placed at a position corresponding to the position of the virtual camera for shooting in the virtual space. Generate a virtual space image that displays at least one of the images. For example, in FIG. 4A, an image of a virtual camera VCM for shooting is displayed in front of the vehicle MV on which the player character is boarding. Further, in FIG. 4B, an image of the photographer character CHM is displayed in front of the vehicle MV. The photographer character CHM points the virtual camera VCM toward the car MV and mutters the line "Look at this".

図２（Ｂ）、図３で説明したように、実空間のカメラ１５０は、プレーヤＰＬが搭乗する筐体３０の前方に配置される。具体的には、実空間のカメラ１５０は、筐体３０の前方において、所与のカメラ距離だけ離れた位置に、カメラ方向を筐体３０の方に向けて配置される。そして本実施形態では、撮影用の仮想カメラＶＣＭは、実空間のカメラ１５０の位置に対応する仮想空間の位置に配置される。即ち撮影用の仮想カメラＶＣＭは、仮想空間の車ＭＶの前方において、上記のカメラ距離に対応する距離だけ離れた位置に、カメラ方向を車ＭＶの方に向けて配置される。 As described with reference to FIGS. 2B and 3, the camera 150 in the real space is arranged in front of the housing 30 on which the player PL is boarded. Specifically, the camera 150 in the real space is arranged in front of the housing 30 at a position separated by a given camera distance with the camera direction facing the housing 30. Then, in the present embodiment, the virtual camera VCM for shooting is arranged at a position in the virtual space corresponding to the position of the camera 150 in the real space. That is, the virtual camera VCM for shooting is arranged in front of the car MV in the virtual space at a position separated by a distance corresponding to the above-mentioned camera distance, with the camera direction facing the car MV.

そして図４（Ａ）、図４（Ｂ）では、このように実空間のカメラ１５０に対応する撮影用の仮想カメラＶＣＭの画像を仮想空間画像に表示する。通常は、このような仮想空間画像においては、仮想カメラＶＣＭの画像は表示しないが、図４（Ａ）では、仮想カメラＶＣＭの画像を敢えて表示している。また図４（Ｂ）では、仮想カメラＶＣＭを所持する撮影者キャラクタＣＨＭの画像を仮想空間画像に表示している。例えば空中を飛びながら仮想カメラＶＣＭをプレーヤキャラクタの方に向けて撮影を行う撮影者キャラクタＣＨＭの画像を表示する。 Then, in FIGS. 4A and 4B, the image of the virtual camera VCM for shooting corresponding to the camera 150 in the real space is displayed in the virtual space image in this way. Normally, the image of the virtual camera VCM is not displayed in such a virtual space image, but in FIG. 4A, the image of the virtual camera VCM is intentionally displayed. Further, in FIG. 4B, the image of the photographer character CHM possessing the virtual camera VCM is displayed in the virtual space image. For example, an image of a photographer character CHM that shoots while flying in the air with the virtual camera VCM directed toward the player character is displayed.

このようにすればプレーヤは、仮想カメラＶＣＭにより自身が撮影されていることを意識するようになる。そしてプレーヤが、自身を撮影している仮想カメラＶＣＭに対して手を振るなどの動作を行うと、図５（Ａ）、図５（Ｂ）の合成画像においても、プレーヤＰＬが手を振るなどの動作を行った実空間画像が、仮想空間画像に合成されるようになる。そしてこのような合成画像を、図１２（Ｂ）に示すようなギャラリー用表示装置２１０に表示することで、プレーヤとギャラリーが一体となって盛り上がるような演出効果を実現できる。 In this way, the player becomes aware that he / she is being photographed by the virtual camera VCM. Then, when the player performs an action such as waving to the virtual camera VCM that is shooting itself, the player PL also waves the hand in the composite images of FIGS. 5 (A) and 5 (B). The real space image obtained by the above operation is combined with the virtual space image. Then, by displaying such a composite image on the gallery display device 210 as shown in FIG. 12B, it is possible to realize an effect that the player and the gallery are united and excited.

また本実施形態はマルチプレーヤゲームにも適用可能である。この場合には、第１〜第ｎのプレーヤが搭乗する第１〜第ｎの筐体（ｎは２以上の整数）を設ける。また第１〜第ｎの筐体の前方等の位置において、第１〜第ｎの筐体や第１〜第ｎのプレーヤを撮影する第１〜第ｎのカメラを設置する。そして第１〜第ｎのプレーヤのうちの第ｉのプレーヤ（ｉは１≦ｉ≦ｎとなる整数）についてのギャラリー用画像を生成する場合には、第ｉのプレーヤに対応する第ｉのプレーヤキャラクタの前方等の位置に撮影用の仮想カメラＶＣＭを移動して、第ｉのプレーヤキャラクタや第ｉのプレーヤキャラクタが搭乗する第ｉの移動体を撮影する。そして第ｉの筐体及び第ｉのプレーヤを撮影する第ｉのカメラで撮影された実空間画像と、撮影用の仮想カメラＶＣＭから見える仮想空間画像を合成することで、合成画像を生成する。そして生成された合成画像をギャラリー用表示装置２１０に表示する。このようにすれば、第ｉのプレーヤは、自身の前方に撮影用の仮想カメラＶＣＭが表示されることで、自身が撮影対象となっていることを認識することができ、撮影用の仮想カメラＶＣＭや撮影者キャラクタＣＨＭに対して手を振るなどの動作を行うようになる。これにより、プレーヤとギャラリーが一体となって盛り上がるような演出効果を実現できる。 The present embodiment is also applicable to a multiplayer game. In this case, the first to nth housings (n is an integer of 2 or more) on which the first to nth players are boarded are provided. Further, at a position such as in front of the first to nth housings, the first to nth cameras for photographing the first to nth housings and the first to nth players are installed. Then, when generating a gallery image for the i-th player (i is an integer such that 1 ≦ i ≦ n) among the first to n-th players, the i-th player corresponding to the i-th player is generated. The virtual camera VCM for shooting is moved to a position such as in front of the character, and the i-th moving body on which the i-th player character or the i-th player character is boarded is photographed. Then, a composite image is generated by synthesizing the real space image taken by the third camera that shoots the i-th housing and the i-player and the virtual space image that can be seen from the virtual camera VCM for shooting. Then, the generated composite image is displayed on the gallery display device 210. In this way, the i-th player can recognize that he / she is the shooting target by displaying the virtual camera VCM for shooting in front of himself / herself, and the virtual camera for shooting can be recognized. Actions such as waving at the VCM and the photographer character CHM will be performed. As a result, it is possible to realize a production effect in which the player and the gallery are united and excited.

図７は、ビルの間に掛けられた細い橋を渡るという高所体験のＶＲゲームへの本実施形態の適用例である。本実施形態では、被写体が装着する少なくとも１つのトラッキング装置からのトラッキング情報に基づいて、被写体の画像の抽出範囲を設定して、被写体の画像を抽出する。例えば図７では、プレーヤＰＬは、左手、右手、左足、右足に、図２（Ａ）で説明したようなトラッキング装置ＴＲ１、ＴＲ２、ＴＲ３、ＴＲ４を装着している。このようなトラッキング装置ＴＲ１〜ＴＲ４を用いることで、プレーヤＰＬの手足の動きをトラッキングできるようになり、プレーヤＰＬの姿勢や動作の検出が可能になる。そして本実施形態では、これらのトラッキング装置ＴＲ１〜ＴＲ４からのトラッキング情報に基づいて、被写体であるプレーヤＰＬの画像の抽出範囲ＡＲを設定し、プレーヤＰＬの画像を抽出する。更に具体的には図７ではＨＭＤ２００もトラッキング装置ＴＲ５を内蔵しているため、手足に装着されたトラッキング装置ＴＲ１〜ＴＲ４とＨＭＤ２００に設けられるトラッキング装置ＴＲ５からのトラッキング情報に基づいて、抽出範囲ＡＲを設定する。例えばトラッキング装置ＴＲ１〜ＴＲ５からのトラッキング情報として、トラッキング装置ＴＲ１〜ＴＲ５の位置などの情報を取得し、このトラッキング装置ＴＲ１〜ＴＲ５の位置を内包する範囲を抽出範囲ＡＲに設定する。そしてこの抽出範囲ＡＲにおいて、プレーヤＰＬの画像の抽出処理を行う。このようにすれば、ＶＲゲームにおいてプレーヤの姿勢が変化したり、種々の動作を行った場合にも、適正な抽出範囲ＡＲでのプレーヤＰＬの画像の抽出処理を実現できるようになる。また被写体と予定していない者を被写体として誤って抽出してしまう事態も防止できる。 FIG. 7 is an example of application of this embodiment to a VR game of high altitude experience of crossing a narrow bridge hung between buildings. In the present embodiment, the extraction range of the image of the subject is set based on the tracking information from at least one tracking device worn by the subject, and the image of the subject is extracted. For example, in FIG. 7, the player PL is equipped with tracking devices TR1, TR2, TR3, and TR4 as described in FIG. 2A on the left hand, right hand, left foot, and right foot. By using such tracking devices TR1 to TR4, the movements of the limbs of the player PL can be tracked, and the posture and movement of the player PL can be detected. Then, in the present embodiment, the extraction range AR of the image of the player PL as the subject is set based on the tracking information from these tracking devices TR1 to TR4, and the image of the player PL is extracted. More specifically, in FIG. 7, since the HMD200 also has the tracking device TR5 built-in, the extraction range AR is determined based on the tracking information from the tracking devices TR1 to TR4 mounted on the limbs and the tracking device TR5 provided on the HMD200. Set. For example, as tracking information from the tracking devices TR1 to TR5, information such as the positions of the tracking devices TR1 to TR5 is acquired, and the range including the positions of the tracking devices TR1 to TR5 is set in the extraction range AR. Then, in this extraction range AR, the image extraction process of the player PL is performed. By doing so, even when the posture of the player changes or various operations are performed in the VR game, the image extraction process of the player PL in the appropriate extraction range AR can be realized. In addition, it is possible to prevent a situation in which a person who is not scheduled as a subject is mistakenly extracted as a subject.

また本実施形態では、トラッキング装置の位置と、トラッキング装置の位置から所与の距離だけシフト（オフセット）した位置に設定された補助点の位置とに基づいて、被写体の画像の抽出範囲を設定する。例えば図８（Ａ）、図８（Ｂ）ではプレーヤＰＬがビルのエレベーターに乗っている。この場合に図８（Ａ）に示すようにトラッキング装置ＴＲ１〜ＴＲ５の位置だけに基づいて、抽出範囲ＡＲを設定すると、例えばプレーヤＰＬの背中の部分等が抽出範囲ＡＲに入らなくなり、適正な抽出処理を実現できない。 Further, in the present embodiment, the extraction range of the image of the subject is set based on the position of the tracking device and the position of the auxiliary point set at the position shifted (offset) by a predetermined distance from the position of the tracking device. .. For example, in FIGS. 8 (A) and 8 (B), the player PL is on the elevator of the building. In this case, if the extraction range AR is set based only on the positions of the tracking devices TR1 to TR5 as shown in FIG. 8A, for example, the back portion of the player PL will not be included in the extraction range AR, and proper extraction will be performed. Processing cannot be realized.

そこで本実施形態では図８（Ｂ）に示すように、例えばプレーヤＰＬが装着するＨＭＤ２００のトラッキング装置ＴＲ５の位置からシフトした位置に、抽出範囲ＡＲの設定のための補助点ＰＡＸを設定する。例えばＨＭＤ２００のトラッキング装置ＴＲ５からのトラッキング情報に基づいて、ＨＭＤ２００の方向情報についても取得されている。この方向情報を用いて、ＨＭＤ２００の位置から後ろ方向に所与の距離だけシフトした位置に補助点ＰＡＸを設定する。そしてトラッキング装置ＴＲ１〜ＴＲ５の位置と補助点ＰＡＸの位置とに基づいて、被写体であるプレーヤＰＬの抽出範囲ＡＲを設定する。このようにすれば図８（Ｂ）に示すように、プレーヤＰＬの全てを内包するような抽出範囲ＡＲを設定できるようになり、被写体であるプレーヤＰＬの画像の適正な抽出処理を実現できるようになる。 Therefore, in the present embodiment, as shown in FIG. 8B, an auxiliary point PAX for setting the extraction range AR is set at a position shifted from the position of the tracking device TR5 of the HMD200 mounted on the player PL, for example. For example, the direction information of the HMD200 is also acquired based on the tracking information from the tracking device TR5 of the HMD200. Using this direction information, the auxiliary point PAX is set at a position shifted backward by a given distance from the position of the HMD 200. Then, the extraction range AR of the player PL, which is the subject, is set based on the positions of the tracking devices TR1 to TR5 and the positions of the auxiliary points PAX. By doing so, as shown in FIG. 8B, it becomes possible to set the extraction range AR that includes all of the player PL, and it is possible to realize an appropriate extraction process of the image of the player PL that is the subject. become.

４．２画像合成処理
次に仮想空間画像と実空間画像を合成する本実施形態の画像合成処理の詳細例について説明する。図９は本実施形態の詳細な処理例を説明するフローチャートである。 4.2 Image composition processing Next, a detailed example of the image composition processing of the present embodiment for synthesizing a virtual space image and a real space image will be described. FIG. 9 is a flowchart illustrating a detailed processing example of the present embodiment.

まずカメラ１５０により背景を撮影した第２画像を取得する（ステップＳ１）。例えば図２（Ｂ）、図３においてプレーヤが筐体３０に搭乗していない状態で、カメラ１５０により筐体３０などを含む背景を撮影した第２画像を取得する。そしてゲームが開始したか否かを判断し（ステップＳ２）、ゲームが開始した場合にはフレーム更新か否かを判断する（ステップＳ３）。例えば本実施形態では、フレーム更新毎に仮想空間画像や合成画像の生成処理が行われる。 First, a second image in which the background is photographed by the camera 150 is acquired (step S1). For example, in FIGS. 2B and 3B, when the player is not on the housing 30, the camera 150 acquires a second image of the background including the housing 30 and the like. Then, it is determined whether or not the game has started (step S2), and when the game has started, it is determined whether or not the frame has been updated (step S3). For example, in the present embodiment, a virtual space image or a composite image is generated every time a frame is updated.

フレーム更新である場合には、仮想カメラにおいて仮想カメラから見える仮想空間画像を生成する（ステップＳ４）。例えば撮影用の仮想カメラから見える仮想空間画像や、プレーヤのＨＭＤ２００に表示されるプレーヤ用の仮想空間画像を生成する。またカメラ１５０により背景及び被写体を撮影した第１画像を取得する（ステップＳ５）。即ち、背景と、被写体であるプレーヤが映る第１画像をカメラ１５０により撮影して取得する。 In the case of frame update, the virtual camera generates a virtual space image that can be seen from the virtual camera (step S4). For example, a virtual space image seen from a virtual camera for shooting and a virtual space image for a player displayed on the player's HMD200 are generated. Further, the camera 150 acquires a first image of the background and the subject (step S5). That is, the background and the first image in which the player as the subject is reflected are captured by the camera 150 and acquired.

次に第１画像と第２画像の差分画像を求めることで被写体の画像を抽出する（ステップＳ６）。即ち後述するように背景差分法を用いて被写体であるプレーヤの画像を抽出する。そして仮想空間画像に被写体の画像が合成された合成画像を生成する（ステップＳ７）。即ち図５（Ａ）、図５（Ｂ）に示すように、敵キャラクタＣＨＥ、敵車ＭＶＥ、コースＣＳなどが映る仮想空間画像に対して、被写体であるプレーヤＰＬ等の実空間画像（実写画像）が合成された合成画像を生成する。そして合成画像をギャラリー用表示装置２１０に表示する（ステップＳ８）。なおプレーヤ用の仮想カメラから見えるプレーヤ用の仮想空間画像についてはＨＭＤ２００に表示する。そしてゲームが終了したか否かを判断し（ステップＳ９）、ゲームが終了していない場合にはステップＳ３に戻り、ゲームが終了した場合には処理を終了する。 Next, the image of the subject is extracted by obtaining the difference image between the first image and the second image (step S6). That is, as will be described later, an image of the player who is the subject is extracted using the background subtraction method. Then, a composite image in which the image of the subject is combined with the virtual space image is generated (step S7). That is, as shown in FIGS. 5 (A) and 5 (B), a real space image (live-action image) of the player PL or the like as the subject is used with respect to the virtual space image in which the enemy character CHE, the enemy vehicle MVE, the course CS, etc. are displayed. ) Generates a composite image. Then, the composite image is displayed on the gallery display device 210 (step S8). The virtual space image for the player that can be seen from the virtual camera for the player is displayed on the HMD 200. Then, it is determined whether or not the game is finished (step S9), and if the game is not finished, the process returns to step S3, and if the game is finished, the process is finished.

以上のように本実施形態では、実空間に配置されたカメラ１５０により背景及び被写体を撮影した第１画像と、カメラ１５０により背景を撮影した第２画像を取得する。図９ではステップＳ１で第２画像が取得され、ステップＳ５で第１画像が取得される。またカメラ１５０の位置に対応する仮想空間の位置に配置された撮影用の仮想カメラから見える仮想空間画像を生成する。即ち図４（Ａ）、図４（Ｂ）に示すような撮影用の仮想カメラＶＣＭから見える仮想空間画像を生成する。そしてステップＳ６、Ｓ７に示すように、第１画像と第２画像の差分画像を求めることで被写体の画像を抽出して、仮想空間画像に被写体の画像が合成された合成画像を生成する。例えば図１０（Ａ）は第１画像ＩＭ１の例である。第１画像ＩＭ１では、被写体であるプレーヤＰＬの画像や、天井、壁、柱等の背景ＢＧの画像が映っている。図１０（Ｂ）は第２画像ＩＭ２の例である。第２画像ＩＭ２では背景ＢＧの画像は映っているが、被写体であるプレーヤＰＬの画像は映っていない。このような第１画像ＩＭ１と第２画像ＩＭ２の差分画像を求めることで、背景差分法により被写体の画像を抽出できる。これにより図５（Ａ）、図５（Ｂ）に示すように、ゲーム画像である仮想空間画像に対して、実空間画像である被写体の画像が合成された合成画像を生成できるようになる。 As described above, in the present embodiment, the first image in which the background and the subject are photographed by the camera 150 arranged in the real space and the second image in which the background is photographed by the camera 150 are acquired. In FIG. 9, the second image is acquired in step S1 and the first image is acquired in step S5. Further, a virtual space image that can be seen from a virtual camera for shooting arranged at a position in the virtual space corresponding to the position of the camera 150 is generated. That is, a virtual space image that can be seen from the virtual camera VCM for shooting as shown in FIGS. 4 (A) and 4 (B) is generated. Then, as shown in steps S6 and S7, the image of the subject is extracted by obtaining the difference image between the first image and the second image, and a composite image in which the image of the subject is combined with the virtual space image is generated. For example, FIG. 10A is an example of the first image IM1. In the first image IM1, an image of the player PL, which is the subject, and an image of the background BG such as a ceiling, a wall, and a pillar are displayed. FIG. 10B is an example of the second image IM2. In the second image IM2, the image of the background BG is displayed, but the image of the player PL, which is the subject, is not displayed. By obtaining such a difference image between the first image IM1 and the second image IM2, the image of the subject can be extracted by the background subtraction method. As a result, as shown in FIGS. 5A and 5B, it becomes possible to generate a composite image in which the image of the subject, which is a real space image, is combined with the virtual space image, which is a game image.

例えばＨＭＤ２００などを用いたＶＲゲームでは、実際にプレーヤが体験してみないと、プレーヤに対して面白さが伝わらないという問題がある。またギャラリーは、ＶＲゲームにおいてプレーヤが何をしているのかが分からない。このためプレーヤとギャラリーがＶＲゲームの体験を共有できないという問題がある。 For example, in a VR game using HMD200 or the like, there is a problem that the fun is not transmitted to the player unless the player actually experiences it. Also, the gallery does not know what the player is doing in the VR game. Therefore, there is a problem that the player and the gallery cannot share the experience of the VR game.

この場合にＶＲゲームの仮想空間画像と被写体であるプレーヤの実空間画像を合成して、プレーヤがあたかも仮想空間に入り込んでプレイしているように見える映像をギャラリーに表示すれば、ギャラリーは、ＶＲゲームにおいてプレーヤがどのようなプレイをしているかを把握できるため、ＶＲゲームの体験を共有することが可能になる。ところが、このような仮想空間画像と実空間画像の画像合成を、ブルーバックやグリーンバックを用いたクロマキー合成により行うと、ブルーバックやグリーンバックを実現するための大掛かりな撮影用機材を設けなければならないという問題がある。例えばゲームの筐体３０が設置されるゲーム施設において、このような大掛かりな撮影用機材を設けることは現実的ではないほか、店内装飾も制限されてしまう。またクロマキー合成では、被写体においてブルーやグリーンの部分は背景と見なされてしまい、色が欠けてしまう問題も生じる。また、プレーヤがあたかも仮想空間に入り込んでプレイしているように見えるためには、仮想空間画像と実空間画像の整合性が必要である。このとき、実空間画像にプレーヤ以外の人や物など、余計な物体が入ってしまっているなど、仮想空間画像と実空間画像の画像合成の品質が低いと、仮想空間画像と実空間画像の整合性が取れなくなってしまい、望んだ結果が得られなくなってしまう。 In this case, if the virtual space image of the VR game and the real space image of the player who is the subject are combined and the image that the player seems to enter the virtual space and play is displayed in the gallery, the gallery will be VR. Since it is possible to grasp what kind of play the player is playing in the game, it is possible to share the experience of the VR game. However, if such image composition of virtual space image and real space image is performed by chroma key composition using blue background or green background, it is necessary to provide large-scale shooting equipment to realize blue background or green background. There is a problem that it does not become. For example, in a game facility where a game housing 30 is installed, it is not realistic to provide such a large-scale shooting equipment, and the decoration inside the store is also limited. Further, in chroma key composition, the blue or green part of the subject is regarded as the background, and there is a problem that the color is lost. In addition, in order for the player to appear as if he / she is playing in the virtual space, it is necessary that the virtual space image and the real space image are consistent. At this time, if the quality of the image composition of the virtual space image and the real space image is low, such as when an extra object such as a person or an object other than the player is included in the real space image, the virtual space image and the real space image may be displayed. It will be inconsistent and you will not get the results you want.

そこで本実施形態では、実空間のカメラ１５０により背景及び被写体を撮影した第１画像と、背景を撮影した第２画像を取得する。また実空間のカメラ１５０の位置に対応する仮想空間の位置に、撮影用の仮想カメラを配置し、この撮影用の仮想カメラから見える仮想空間画像を生成する。そして第１画像と第２画像の差分画像を求めることで背景差分法により被写体の画像を抽出して、仮想空間画像に被写体の画像が合成された合成画像を生成する。このようにすれば、大掛かりな撮影用機材を設けることなく、仮想空間画像と被写体の実空間画像を画像合成できるようになる。従ってカメラ１５０により撮影された被写体の画像と仮想空間画像とを簡素なシステムで高品位に合成できる画像生成システムの提供が可能になる。 Therefore, in the present embodiment, the first image in which the background and the subject are photographed by the camera 150 in the real space and the second image in which the background is photographed are acquired. Further, a virtual camera for shooting is arranged at a position in the virtual space corresponding to the position of the camera 150 in the real space, and a virtual space image that can be seen from the virtual camera for shooting is generated. Then, by obtaining the difference image between the first image and the second image, the image of the subject is extracted by the background subtraction method, and a composite image in which the image of the subject is combined with the virtual space image is generated. In this way, the virtual space image and the real space image of the subject can be image-combined without providing a large-scale shooting equipment. Therefore, it is possible to provide an image generation system capable of combining the image of the subject taken by the camera 150 and the virtual space image with a simple system with high quality.

また本実施形態では、背景及び被写体をカメラ１５０により撮影したデプス画像を取得し、第１画像と第２画像の差分画像と、デプス画像とに基づいて、被写体の画像を抽出する。このように背景差分法による差分画像のみならず、デプス画像を用いて、被写体の画像を抽出することで、仮想空間画像と被写体の画像の更に高品位な画像合成を実現できるようになる。例えば背景差分法のみによって被写体の画像を抽出すると、被写体の画像において背景と同じ色の画素については色被りにより欠けてしまうという問題がある。またブースの内側にカメラ１５０が設置されており、カメラ１５０から見てプレーヤの奥側にある通路を、一般通過者が往来するというような状況では、カラーの差分画像を用いた背景差分法だけでは、適正な抽出処理を実現できないという問題もある。この点、背景差分法による差分画像に加えて、デプス画像を用いて画像合成を行えば、上記のような問題を解消できる。 Further, in the present embodiment, the depth image of the background and the subject taken by the camera 150 is acquired, and the image of the subject is extracted based on the difference image between the first image and the second image and the depth image. By extracting the image of the subject using not only the difference image by the background subtraction method but also the depth image in this way, it becomes possible to realize a higher quality image composition of the virtual space image and the image of the subject. For example, if an image of a subject is extracted only by the background subtraction method, there is a problem that pixels having the same color as the background in the image of the subject are chipped due to color cast. In addition, the camera 150 is installed inside the booth, and in situations where general passers-by come and go through the passage behind the player when viewed from the camera 150, only the background subtraction method using color subtraction images is used. Then, there is also a problem that proper extraction processing cannot be realized. In this regard, the above problems can be solved by performing image composition using a depth image in addition to the difference image obtained by the background subtraction method.

更に具体的には本実施形態では、差分画像に基づいて差分マスク画像を生成し、デプス画像に基づいて、デプス値が所与のデプス範囲となる画素を識別するデプスマスク画像を生成する。そして差分マスク画像とデプスマスク画像に基づいて、被写体を識別する被写体マスク画像を生成し、被写体マスク画像と第１画像に基づいて、被写体の画像を抽出する。このような差分マスク画像を用いることで、背景の領域と被写体の領域を、差分マスク画像を用いて容易に識別できるようになる。またデプス値が所与のデプス範囲となる画素を識別するデプスマスク画像を用いることで、デプス範囲に位置する被写体を容易に識別できるようになる。従って、このような差分マスク画像とデプスマスク画像を用いることで、背景及び被写体が映る第１画像から、被写体の画像を高品位に抽出することが可能になる。 More specifically, in the present embodiment, a difference mask image is generated based on the difference image, and a depth mask image for identifying a pixel whose depth value is in a given depth range is generated based on the depth image. Then, a subject mask image that identifies the subject is generated based on the difference mask image and the depth mask image, and the subject image is extracted based on the subject mask image and the first image. By using such a difference mask image, the background area and the subject area can be easily identified by using the difference mask image. Further, by using a depth mask image that identifies a pixel whose depth value is in a given depth range, a subject located in the depth range can be easily identified. Therefore, by using such a difference mask image and a depth mask image, it is possible to extract the image of the subject with high quality from the first image in which the background and the subject are reflected.

図１０（Ｃ）は差分マスク画像ＭＳＤＦの例である。この差分マスク画像ＭＳＤＦは、図１０（Ａ）の第１画像ＩＭ１と図１０（Ｂ）の第２画像ＩＭ２の差分画像の二値化処理を行うことで生成できる。例えば差分マスク画像ＭＳＤＦは、差分画像とデプス画像に基づいて生成してもよい。例えば差分画像の二値化処理を行うことで生成されたマスク画像と、デプス画像から生成されたデプスマスク画像ＭＳＤＰ（補正処理後又は補正処理前のデプスマスク画像）とに基づいて、差分マスク画像ＭＳＤＦを生成する。この差分マスク画像ＭＳＤＦでは、プレーヤＰＬ（被写体）の領域では白の画素値の画素になり、背景ＢＧの領域では黒の画素値の画素になっている。但し図１０（Ｃ）に示すように背景と色が被る部分については色が欠けてしまっている。なお画素値の範囲が０〜２５５である場合に、白の画素値は最大画素値である「２５５」であり、黒の画素値は最小画素値である「０」である。 FIG. 10C is an example of the difference mask image MSDF. The difference mask image MSDF can be generated by performing a binarization process of the difference image between the first image IM1 of FIG. 10A and the second image IM2 of FIG. 10B. For example, the difference mask image MSDF may be generated based on the difference image and the depth image. For example, a difference mask image based on a mask image generated by performing a binarization process of a difference image and a depth mask image MSDP (depth mask image after correction processing or before correction processing) generated from the depth image. Generate MSDF. In this difference mask image MSDF, the area of the player PL (subject) is a pixel with a white pixel value, and the area of the background BG is a pixel with a black pixel value. However, as shown in FIG. 10C, the background and the portion covered with the color are lacking in color. When the pixel value range is 0 to 255, the white pixel value is the maximum pixel value "255" and the black pixel value is the minimum pixel value "0".

図１１（Ａ）はデプスマスク画像ＭＳＤＰの例である。このデプスマスク画像ＭＳＤＰは、例えば図１３（Ａ）に示すようにデプス値が所与のデプス範囲ＲＡとなる画素を識別するマスク画像である。デプス範囲ＲＡは、デプス値が、ニア側のデプス値ＺＮ以上であり、ファー側のデプス値ＺＦ以下となる範囲であり、このデプス範囲ＲＡに位置する被写体ＳＢについては、図１１（Ａ）のデプスマスク画像ＭＳＤＰにおいて白の画素になる。このようにデプス値がデプス範囲ＲＡとなる画素を識別するデプスマスク画像ＭＳＤＰを用いれば、デプス範囲ＲＡ内にある物体を抽出対象として設定できるようになる。そしてデプス範囲ＲＡの手前側や奥側に、被写体となるべきではない物体が存在したり、通過した場合に、当該物体を、デプスマスク画像ＭＳＤＰによる抽出対象から排除することも可能になる。 FIG. 11A is an example of a depth mask image MSDP. This depth mask image MSDP is, for example, as shown in FIG. 13A, a mask image that identifies pixels whose depth value is in a given depth range RA. The depth range RA is a range in which the depth value is equal to or greater than the depth value ZN on the near side and equal to or less than the depth value ZF on the far side. The subject SB located in this depth range RA is shown in FIG. 11 (A). Depth mask image White pixels in MSDP. By using the depth mask image MSDP that identifies the pixels whose depth value is in the depth range RA in this way, it becomes possible to set an object in the depth range RA as an extraction target. Then, when an object that should not be a subject exists or passes on the front side or the back side of the depth range RA, the object can be excluded from the extraction target by the depth mask image MSDP.

そして本実施形態では図１０（Ｃ）の差分マスク画像ＭＳＤＦと図１１（Ａ）のデプスマスク画像ＭＳＤＰに基づいて、図１１（Ｂ）に示す被写体マスク画像ＭＳＳＢを生成する。被写体マスク画像ＭＳＳＢは被写体を識別するためのマスク画像である。例えば被写体マスク画像ＭＳＳＢは、被写体の領域では白の画素になり、被写体以外の領域では黒の画素になるマスク画像である。一例としては、図１０（Ｃ）の差分マスク画像ＭＳＤＦと図１１（Ａ）のデプスマスク画像ＭＳＤＰのＯＲ（論理和）をとることなどで、図１１（Ｂ）の被写体マスク画像ＭＳＳＢを生成できる。具体的には、差分マスク画像ＭＳＤＦと、後述するモルフォロジーフィルター処理等の補正処理後のデプスマスク画像ＭＳＤＰとのＯＲをとることで、被写体マスク画像ＭＳＳＢを生成する。更に具体的には、差分画像の二値化処理を行うことで生成されたマスク画像と、補正処理後又は補正処理前のデプスマスク画像ＭＳＤＰとのＡＮＤ（論理積）をとることで、差分マスク画像ＭＳＤＦを生成する。そして生成された差分マスク画像ＭＳＤＦと、補正処理後のデプスマスク画像ＭＳＤＰとのＯＲをとることで、被写体マスク画像ＭＳＳＢを生成する。例えば被写体マスク画像ＭＳＳＢでは、差分マスク画像ＭＳＤＦにおいて白となる画素、又は、デプスマスク画像ＭＳＤＰにおいて白となる画素が、白の画素に設定される。また被写体マスク画像ＭＳＳＢでは、差分マスク画像ＭＳＤＦとデプスマスク画像ＭＳＤＰの両方において黒となる画素は、黒の画素に設定される。そして図１１（Ｂ）の被写体マスク画像ＭＳＳＢと図１０（Ａ）の第１画像ＩＭ１に基づいて、図１１（Ｃ）に示すように被写体の画像を抽出する。例えば第１画像ＩＭ１から、被写体マスク画像ＭＳＳＢが白の画素となる画素群の領域を切り取ることで、被写体の画像を抽出できる。例えば図１０（Ｃ）に示すように、差分マスク画像ＭＳＤＦでは、被写体の画像において背景と同じ色の画素については色被りにより欠けてしまうという問題がある。この点、図１１（Ａ）のデプスマスク画像ＭＳＤＰでは、差分マスク画像ＭＳＤＦにおいて色被りにより欠けて黒となった画素が、白の画素になる。従って、差分マスク画像ＭＳＤＦとデプスマスク画像ＭＳＤＰとに基づいて被写体マスク画像ＭＳＳＢを生成することで、背景と同じ色の画素が色被りにより欠けてしまう問題を解消でき、被写体の適切な抽出処理を実現できるようになる。 Then, in this embodiment, the subject mask image MSSB shown in FIG. 11 (B) is generated based on the difference mask image MSDF of FIG. 10 (C) and the depth mask image MSDP of FIG. 11 (A). Subject mask image MSSB is a mask image for identifying a subject. For example, the subject mask image MSSB is a mask image in which white pixels are formed in a subject area and black pixels are formed in a region other than the subject. As an example, the subject mask image MSSB of FIG. 11 (B) can be generated by taking the OR (logical sum) of the difference mask image MSDF of FIG. 10 (C) and the depth mask image MSDP of FIG. 11 (A). .. Specifically, the subject mask image MSSB is generated by taking an OR of the difference mask image MSDF and the depth mask image MSDP after correction processing such as morphology filter processing described later. More specifically, the difference mask is obtained by taking an AND (logical product) between the mask image generated by performing the binarization process of the difference image and the depth mask image MSDP after the correction process or before the correction process. Generate image MSDF. Then, the subject mask image MSSB is generated by taking an OR of the generated difference mask image MSDF and the depth mask image MSDP after the correction process. For example, in the subject mask image MSSB, the pixels that are white in the difference mask image MSDF or the pixels that are white in the depth mask image MSDP are set as white pixels. Further, in the subject mask image MSSB, the pixels that become black in both the difference mask image MSDF and the depth mask image MSDP are set to black pixels. Then, an image of the subject is extracted as shown in FIG. 11 (C) based on the subject mask image MSSB of FIG. 11 (B) and the first image IM1 of FIG. 10 (A). For example, an image of a subject can be extracted from the first image IM1 by cutting out a region of a pixel group in which the subject mask image MSSB is a white pixel. For example, as shown in FIG. 10C, the difference mask image MSDF has a problem that pixels of the same color as the background in the subject image are chipped due to color cast. In this regard, in the depth mask image MSDP of FIG. 11A, the black pixels lacking due to color cast in the difference mask image MSDF become white pixels. Therefore, by generating the subject mask image MSSB based on the difference mask image MSDF and the depth mask image MSDP, it is possible to solve the problem that pixels of the same color as the background are missing due to color cast, and an appropriate extraction process of the subject can be performed. It will be possible.

なお、被写体の手前側に、抽出対象として予定していないオペレータや通過者等の非抽出対象物体が存在して、被写体と重なる場合には、デプスマスク画像ＭＳＤＰにおいては、当該非抽出対象物体の領域の画素は黒になってしまい、非抽出対象物体の奥側の被写体を適正に抽出できなくなるおそれがある。差分マスク画像ＭＳＤＦにおいてもデプスの閾値を設定していると、非抽出対象物体の領域の画素は黒になってしまう。このように非抽出対象物体の存在により被写体を正しく抽出できないことをシステムが検出して、被写体であるプレーヤを抽出せずに、当該プレーヤの代わりにキャラクタの画像を表示したり、当該プレーヤの抽出から、一緒にプレイしている他のプレーヤの抽出に切り替えるようにしてもよい。例えばゲームプレイの開始前に、ＨＭＤの位置を検出しておく。そしてゲームプレイ中に、ＨＭＤの手前側の非抽出対象物体の存在によりＨＭＤが隠れてしまい、ＨＭＤの領域の画素が黒になった場合には、エラーが発生したと判定し、ゲーム空間のキャラクタの画像の表示に切り替えたり、他のプレーヤの画像に切り替えたり、或いは、複数台のカメラを用いている場合には、プレーヤを撮影するカメラを、高所や後方などの異なる位置に設置された他のカメラに切り替える。更に、例えば、プレイ中に常時、ＨＭＤなどの位置情報からプレーヤがいるべき抽出対象領域を推測する。その領域内のピクセル数が一定割合以下となった場合、ＨＭＤの手前側に非抽出対象物体が存在していると判定し、上述のような切り替えを行う。逆に、抽出対象領域があまりにも大きすぎる場合も、床や壁面などの誤検出が発生していると判定し、抽出・合成を中断し、ゲーム空間のキャラクタ表示に切り替えるか、他のカメラに切り替えるようにする。また図８（Ｂ）で説明した補助点を設定する手法と同様に、ＨＭＤの位置以外に、筐体の下部等にいくつかの補助点を設定し、ＨＭＤの位置の画素と補助点の位置の画素を用いて判断して、上記のような切替処理を行ってもよい。例えば図６（Ａ）、図６（Ｂ）で説明した筐体マスク画像以外にも、ＨＭＤ等のトラッキング装置からのトラッキング情報や補助点の情報を合わせて使用することで、より高精度な抽出処理を実現できるようになる。 If there is a non-extraction target object such as an operator or a passerby who is not planned to be extracted on the front side of the subject and it overlaps with the subject, in the depth mask image MSDP, the non-extraction target object is used. The pixels in the area become black, and there is a risk that the subject behind the non-extracted object cannot be properly extracted. If the depth threshold value is set in the difference mask image MSDF as well, the pixels in the region of the non-extracted object will be black. In this way, the system detects that the subject cannot be extracted correctly due to the presence of the non-extracted object, and instead of extracting the player who is the subject, displays a character image on behalf of the player or extracts the player. , You may switch to the extraction of other players playing with you. For example, the position of the HMD is detected before the start of game play. Then, during game play, if the HMD is hidden by the presence of a non-extractable object on the front side of the HMD and the pixels in the HMD area become black, it is determined that an error has occurred and the character in the game space. If you switch to the display of the image of the player, switch to the image of another player, or if you are using multiple cameras, the cameras that shoot the player are installed at different positions such as high places and rear. Switch to another camera. Further, for example, the extraction target area where the player should be is estimated from the position information such as the HMD at all times during play. When the number of pixels in the area becomes a certain percentage or less, it is determined that the non-extracted object exists on the front side of the HMD, and the switching as described above is performed. On the contrary, if the extraction target area is too large, it is determined that false detection of the floor or wall surface has occurred, extraction / synthesis is interrupted, and the character display in the game space is switched or switched to another camera. To do so. Further, similarly to the method of setting the auxiliary points described with reference to FIG. 8 (B), in addition to the position of the HMD, some auxiliary points are set at the lower part of the housing, etc. The switching process as described above may be performed by making a judgment using the pixels of. For example, in addition to the housing mask images described with reference to FIGS. 6 (A) and 6 (B), tracking information from a tracking device such as an HMD and information on auxiliary points can be used together for more accurate extraction. The processing can be realized.

そして本実施形態では、以上のように抽出した被写体の画像を仮想空間画像に合成する。例えば図１２（Ａ）は、ゲーム画像である仮想空間画像に対して、被写体であるプレーヤＰＬの画像が合成された合成画像の例である。これにより、あたかも仮想空間に、実空間のプレーヤＰＬが出現したかのように見える合成画像を生成できる。そして図１２（Ｂ）に示すように、図１２（Ａ）の合成画像をギャラリー用表示装置２１０に表示することで、ギャラリーは、仮想空間でプレイするプレーヤＰＬの行動の様子を見物できるようになる。例えばＶＲゲームの施設において、ＶＲゲームのプレイを待っているギャラリーに対して、ＶＲゲームの待ち時間において図１２（Ａ）に示すような合成画像を表示する。これにより、プレーヤＰＬとギャラリーがＶＲゲームの体験を共有できるようになり、ＶＲゲームの面白味を向上できる。 Then, in the present embodiment, the image of the subject extracted as described above is combined with the virtual space image. For example, FIG. 12A is an example of a composite image in which the image of the player PL, which is the subject, is combined with the virtual space image which is the game image. As a result, it is possible to generate a composite image that looks as if a real-space player PL has appeared in the virtual space. Then, as shown in FIG. 12B, by displaying the composite image of FIG. 12A on the gallery display device 210, the gallery can observe the behavior of the player PL playing in the virtual space. Become. For example, in a VR game facility, a composite image as shown in FIG. 12A is displayed in the waiting time of the VR game for a gallery waiting for the VR game to be played. As a result, the player PL and the gallery can share the experience of the VR game, and the fun of the VR game can be improved.

また本実施形態では、デプスマスク画像の補正処理を行い、補正処理後のデプスマスク画像と差分マスク画像に基づいて被写体マスク画像を生成する。即ち図１１（Ａ）のデプスマスク画像ＭＳＤＰについての補正処理を行い、補正処理後のデプスマスク画像ＭＳＤＰと、図１０（Ｃ）の差分マスク画像ＭＳＤＦに基づいて、図１１（Ｂ）の被写体マスク画像ＭＳＳＢを生成する。例えば差分マスク画像ＭＳＤＦには色抜けの問題があり、この色抜けの問題を解消するためにデプスマスク画像ＭＳＤＰを用いる。一方、差分マスク画像ＭＳＤＦには、エッジ部分についてまで綺麗に差分をとれるという利点があるが、デプスマスク画像ＭＳＤＰでは、エッジ部分がノイズによってちらついてしまったり、細かいノイズが重畳されるなどの問題がある。この点、デプスマスク画像ＭＳＤＰに対して補正処理を行えば、エッジ部分のちらつきの防止や、細かいノイズの除去等を実現でき、高品位な合成画像の生成が可能になる。 Further, in the present embodiment, the depth mask image is corrected, and the subject mask image is generated based on the corrected depth mask image and the difference mask image. That is, the depth mask image MSDP of FIG. 11 (A) is corrected, and the subject mask of FIG. 11 (B) is based on the corrected depth mask image MSDP and the difference mask image MSDF of FIG. 10 (C). Generate an image MSSB. For example, the difference mask image MSDF has a problem of color loss, and the depth mask image MSDP is used to solve this problem of color loss. On the other hand, the difference mask image MSDF has an advantage that the difference can be clearly taken even for the edge part, but the depth mask image MSDP has problems such as the edge part flickering due to noise and fine noise being superimposed. is there. In this regard, if the depth mask image MSDP is corrected, it is possible to prevent flicker at the edge portion, remove fine noise, and the like, and it is possible to generate a high-quality composite image.

例えば本実施形態では、背景及び被写体をカメラ１５０により撮影した第１デプス画像と、背景をカメラ１５０により撮影した第２デプス画像との差分デプスマスク画像を生成し、差分デプスマスク画像に基づいて、補正処理後のデプスマスク画像を生成する。例えば図１３（Ｂ）の第１デプス画像ＩＭＤＰ１は、カメラ１５０（デプスカメラ１５４）により背景ＢＧ及び被写体ＳＢを撮影したデプス画像であり、第２デプス画像ＩＭＤＰ２は、カメラ１５０により背景ＢＧを撮影したデプス画像である。この第１デプス画像ＩＭＤＰ１と第２デプス画像ＩＭＤＰ２の差分画像から差分デプスマスク画像ＭＳＤＰＤＦ（背景差分デプスマスク画像）を生成する。例えば差分画像の二値化処理等を行うことで差分デプスマスク画像ＭＳＤＰＤＦを生成できる。例えばデプス値の差分値が所定値以上となる画素を白の画素に設定するなどの処理を行う。図１４（Ａ）は差分デプスマスク画像ＭＳＤＰＤＦの例である。差分をとらない通常のデプスマスク画像では図１４（Ｃ）のように天井や床の部分についても白の画素になってしまう問題があるが、差分デプスマスク画像ＭＳＤＰＤＦを用いることで、このような問題を解消できるようになり、高品位な合成画像の生成が可能になる。 For example, in the present embodiment, a difference depth mask image of a first depth image in which the background and the subject are photographed by the camera 150 and a second depth image in which the background is photographed by the camera 150 is generated, and based on the difference depth mask image, the difference depth mask image is generated. Generates a depth mask image after correction processing. For example, the first depth image IMDP1 of FIG. 13B is a depth image obtained by photographing the background BG and the subject SB by the camera 150 (depth camera 154), and the second depth image IMDP2 is a depth image obtained by photographing the background BG by the camera 150. It is a depth image. A difference depth mask image MSDPDF (background subtraction depth mask image) is generated from the difference image between the first depth image IMDP1 and the second depth image IMDP2. For example, the difference depth mask image MSDPDF can be generated by performing a binarization process of the difference image. For example, processing such as setting a pixel whose depth value difference value is equal to or greater than a predetermined value to a white pixel is performed. FIG. 14A is an example of the difference depth mask image MSDPDF. In a normal depth mask image that does not take a difference, there is a problem that white pixels are also formed on the ceiling and floor as shown in FIG. 14 (C). By using the difference depth mask image MSDPDF, such a problem occurs. It will be possible to solve the problem and generate high-quality composite images.

また本実施形態では、モルフォロジーフィルター処理及び時系列フィルター処理の少なくとも一方を行うことで、補正処理後のデプスマスク画像を生成する。例えば図１４（Ｂ）は、図１４（Ａ）の差分デプスマスク画像ＭＳＤＰＤＦに対して、モルフォロジーフィルター処理や時系列フィルター処理の補正処理を行った後の差分デプスマスク画像ＭＳＤＰＤＦである。モルフォロジーフィルター処理は膨張伸縮化を行うフィルター処理であり、このようなモルフォロジーフィルター処理を行うことで、細かいサイズのノイズを除去できる。時系列フィルター処理では、例えば所定フレーム数以上、連続して、白の画素である判断された画素（デプス値の差分値が所定値以上の画素）を、マスクとして有効な白の画素として判断する。このような時系列フィルター処理を行うことで、細かなノイズのちらつきの発生を抑制できる。そして本実施形態では、このようなモルフォロジーフィルター処理や時系列フィルター処理が行われた差分デプスマスク画像ＭＳＤＰＤＦを用いて、図１４（Ｄ）に示すような補正処理後のデプスマスク画像ＭＳＤＰを生成する。例えば図１４（Ｂ）の差分デプスマスク画像ＭＳＤＰＤＦと、図１４（Ｃ）の通常のデプスマスク画像のＡＮＤ（論理積）をとることで、図１４（Ｄ）に示すデプスマスク画像ＭＳＤＰを生成する。そしてこのデプスマスク画像ＭＳＤＰを用いて、図１１（Ｂ）で説明した被写体マスク画像ＭＳＳＢを生成して、図１１（Ｃ）に示すように被写体の画像を抽出する。 Further, in the present embodiment, a depth mask image after the correction process is generated by performing at least one of the morphology filter process and the time series filter process. For example, FIG. 14B is a difference depth mask image MSDPDF after the difference depth mask image MSDPDF of FIG. 14A is corrected by a morphology filter process or a time series filter process. The morphology filtering process is a filtering process that expands and contracts, and by performing such a morphology filtering process, noise of a fine size can be removed. In the time-series filtering process, for example, pixels determined to be white pixels continuously for a predetermined number of frames or more (pixels having a depth value difference value of a predetermined value or more) are determined as white pixels effective as a mask. .. By performing such a time-series filter processing, it is possible to suppress the occurrence of fine noise flicker. Then, in the present embodiment, the difference depth mask image MSDPDF subjected to such morphology filtering and time series filtering is used to generate the corrected depth mask image MSDP as shown in FIG. 14 (D). .. For example, the depth mask image MSDP shown in FIG. 14 (D) is generated by ANDing the difference depth mask image MSDPDF of FIG. 14 (B) and the normal depth mask image of FIG. 14 (C) (logical product). .. Then, using this depth mask image MSDP, the subject mask image MSSB described in FIG. 11B is generated, and the image of the subject is extracted as shown in FIG. 11C.

また本実施形態では、デプス画像においてデプス値が取得できなかった画素の画素値を、差分画像に基づき設定する処理を行うことで、補正処理後のデプスマスク画像を生成する。即ちデプス値が取得できなかった画素であるブランク画素の画素値を、差分画像の画素値で埋める補正処理を行う。図１５（Ａ）は、このようなブランク画素を埋める補正処理を行った場合における被写体の抽出画像の例であり、図１５（Ｂ）は、ブランク画素を埋める補正処理を行わなかった場合における被写体の抽出画像の例である。図１５（Ｂ）に示すように、ブランク画素を埋める補正処理を行わなかった場合には、例えば手のエッジ付近などにおいて被写体の画像が欠けてしまう問題が発生する。 Further, in the present embodiment, the depth mask image after the correction process is generated by performing the process of setting the pixel value of the pixel for which the depth value could not be acquired in the depth image based on the difference image. That is, a correction process is performed in which the pixel value of the blank pixel, which is the pixel for which the depth value could not be acquired, is filled with the pixel value of the difference image. FIG. 15 (A) is an example of the extracted image of the subject when the correction process for filling the blank pixels is performed, and FIG. 15 (B) shows the subject when the correction process for filling the blank pixels is not performed. This is an example of the extracted image of. As shown in FIG. 15B, if the correction process for filling the blank pixels is not performed, there is a problem that the image of the subject is chipped, for example, near the edge of the hand.

このような問題の発生を防止するために本実施形態では、図１６（Ａ）に示すように、デプス値が取得できなかったブランク画素を識別するブランクマスク画像を生成する。即ち、ステレオカメラによるデプス値の取得では、ステレオカメラの一方のカメラでは見えるが、他方のカメラでは見えない画素が存在し、このような画素においては適正なデプス値を取得できないため、デプス値が取得できなかったという結果が返って来る。このようにデプス値が取得できないという結果が返って来た画素を、例えば白の画素に設定することで、図１６（Ａ）に示すようなブランクマスク画像を生成できる。そして図１６（Ｂ）は第１画像と第２画像の差分を求めることで生成された差分マスク画像である。図１６（Ｃ）は、図１６（Ａ）のブランクマスク画像と図１６（Ｂ）の差分マスク画像のＡＮＤ（論理積）をとったマスク画像である。図１６（Ｃ）のマスク画像では、図１６（Ａ）のブランクマスク画像においてデプス値を取得できないブランク画素であると判断された画素であり、且つ、図１６（Ｂ）の差分マスク画像において被写体と判断される画素が、白の画素に設定される。このようにデプス値が取得できなかった画素の画素値を、差分画像である差分マスク画像に基づき設定する処理（埋める処理）が行われている。そして図１６（Ｃ）における白の画素が、被写体の画素に設定されるように、被写体マスク画像を生成して、被写体の画像を抽出する。このようにすれば、図１５（Ｂ）のようにデプス値を取得できなかったことが原因で画像が欠けてしまう問題を防止でき、図１５（Ａ）に示すような適切な被写体の抽出画像を生成できるようになる。 In order to prevent the occurrence of such a problem, in the present embodiment, as shown in FIG. 16A, a blank mask image for identifying the blank pixel for which the depth value could not be acquired is generated. That is, in the acquisition of the depth value by the stereo camera, there are pixels that can be seen by one camera of the stereo camera but cannot be seen by the other camera, and an appropriate depth value cannot be acquired in such pixels. The result is that it could not be obtained. A blank mask image as shown in FIG. 16A can be generated by setting, for example, white pixels as the pixels for which the result that the depth value cannot be obtained is returned. FIG. 16B is a difference mask image generated by obtaining the difference between the first image and the second image. FIG. 16C is a mask image obtained by ANDing the blank mask image of FIG. 16A and the difference mask image of FIG. 16B. The mask image of FIG. 16 (C) is a pixel determined to be a blank pixel whose depth value cannot be obtained in the blank mask image of FIG. 16 (A), and is a subject in the difference mask image of FIG. 16 (B). The pixel determined to be is set to the white pixel. A process (filling process) of setting the pixel value of the pixel for which the depth value could not be obtained based on the difference mask image which is the difference image is performed. Then, a subject mask image is generated and the image of the subject is extracted so that the white pixels in FIG. 16C are set as the pixels of the subject. By doing so, it is possible to prevent the problem that the image is chipped due to the failure to acquire the depth value as shown in FIG. 15 (B), and the extracted image of an appropriate subject as shown in FIG. 15 (A). Will be able to generate.

また本実施形態では、デプス値がデプス範囲となる画素群の領域サイズを求め、領域サイズによるフィルター処理を行うことで、補正処理後のデプスマスク画像を生成する。例えばデプス値が図１３（Ａ）のデプス範囲ＲＡとなる画素群の領域サイズを求める。例えばデプス値がデプス範囲ＲＡとなる画素群（隣合う画素群）の画素数をカウントすることで、当該画素群の領域サイズを求めることができる。そして領域サイズが最も大きい画素群や、或いは領域サイズが所定サイズ以上の画素群を、被写体を構成する画素群とするフィルター処理を行う。例えば図１７（Ａ）は、領域サイズによるフィルター処理が行われる前のデプスマスク画像の例であり、図１７（Ｂ）は、領域サイズによるフィルター処理が行われた後のデプスマスク画像の例である。図１７（Ａ）では、小さな領域サイズの画素群が白の画素として残存しているが、図１７（Ｂ）では、小さな領域サイズの画素群を除去するフィルター処理が行われている。そして図１７（Ｂ）では、デプス値がデプス範囲ＲＡとなる画素群のうち、領域サイズが最も大きい画素群を、被写体の画素群として判断して、白の画素に設定している。このようにすれば、ノイズ等を原因として発生する領域サイズの小さな画素群を除去して、被写体に対応する画素群だけを抽出することが可能になり、高品位な合成画像を生成できるようになる。 Further, in the present embodiment, the area size of the pixel group whose depth value is in the depth range is obtained, and the depth mask image after the correction processing is generated by performing the filter processing according to the area size. For example, the area size of the pixel group whose depth value is the depth range RA in FIG. 13 (A) is obtained. For example, the area size of the pixel group can be obtained by counting the number of pixels of the pixel group (adjacent pixel group) whose depth value is the depth range RA. Then, a filter process is performed in which the pixel group having the largest area size or the pixel group having the area size of a predetermined size or more is set as the pixel group constituting the subject. For example, FIG. 17 (A) is an example of a depth mask image before filtering by area size, and FIG. 17 (B) is an example of a depth mask image after filtering by area size. is there. In FIG. 17A, a pixel group having a small area size remains as white pixels, but in FIG. 17B, a filter process for removing a pixel group having a small area size is performed. Then, in FIG. 17B, among the pixel groups whose depth value is the depth range RA, the pixel group having the largest area size is determined as the pixel group of the subject and set as white pixels. By doing so, it is possible to remove the pixel group having a small area size generated due to noise or the like and extract only the pixel group corresponding to the subject, so that a high-quality composite image can be generated. Become.

また本実施形態では、被写体の領域と判断される被写体領域でのデプス値に基づいて第２デプス範囲を設定し、デプス値が第２デプス範囲となる画素を識別する画像を、デプスマスク画像として生成する。例えば図１８（Ａ）のデプスマスク画像では、白の画素の領域が被写体領域であると判断されるため、この被写体領域のデプス値の平均値ＺＡＶを求める。そしてこの平均値ＺＡＶに基づいて、図１８（Ｂ）に示すような第２デプス範囲ＲＡ２を設定する。第２デプス範囲ＲＡ２は、デプス値が、ニア側のデプス値ＺＮ２以上であり、ファー側のデプス値ＺＦ２以下となる範囲である。この第２デプス範囲ＲＡ２は、図１３のデプス範囲ＲＡに比べて狭い範囲となっており、被写体ＳＢのデプス値の範囲を、より厳密に特定している。 Further, in the present embodiment, a second depth range is set based on the depth value in the subject area determined to be the subject area, and an image that identifies pixels whose depth value is the second depth range is used as a depth mask image. Generate. For example, in the depth mask image of FIG. 18A, since the region of white pixels is determined to be the subject region, the average value ZAV of the depth values of this subject region is obtained. Then, based on this average value ZAV, the second depth range RA2 as shown in FIG. 18B is set. The second depth range RA2 is a range in which the depth value is equal to or greater than the near-side depth value ZN2 and equal to or less than the far-side depth value ZF2. The second depth range RA2 is narrower than the depth range RA of FIG. 13, and the range of the depth value of the subject SB is specified more strictly.

例えばプレーヤが、仮想空間の世界を自由に歩き回って移動するようなフリーローム（オープンワールド）のコンテンツでは、図１３（Ａ）のデプス範囲ＲＡを設定することが難しい。例えばプレーヤが移動する可能性がある範囲を全てカバーする距離範囲（ＲＡ）を指定しておくと、余計な物体まで誤抽出してしまうおそれがある。 For example, in a free loam (open world) content in which a player freely roams and moves around the world of virtual space, it is difficult to set the depth range RA of FIG. 13 (A). For example, if a distance range (RA) that covers the entire range in which the player may move is specified, there is a risk that unnecessary objects may be erroneously extracted.

この点、図１８（Ａ）では、被写体の領域と判断される被写体領域でのデプス値（ＺＡＶ）に基づいて、第２デプス範囲ＲＡ２が設定される。従って、プレーヤが移動した場合にも、このプレーヤの移動に追従するように第２デプス範囲ＲＡ２を設定できるようになる。例えば、上述したような余計な物体の誤抽出が行われないように、デプス範囲から第２デプス範囲ＲＡ２というように段階的にデプス範囲を狭めている。従って、被写体であるプレーヤの移動を反映させた厳密なデプスマスク画像の生成が可能になり、高品位な合成画像を生成できるようになる。 In this regard, in FIG. 18A, the second depth range RA2 is set based on the depth value (ZAV) in the subject region determined to be the subject region. Therefore, even when the player moves, the second depth range RA2 can be set so as to follow the movement of the player. For example, the depth range is gradually narrowed from the depth range to the second depth range RA2 so as not to erroneously extract an unnecessary object as described above. Therefore, it is possible to generate a strict depth mask image that reflects the movement of the player who is the subject, and it is possible to generate a high-quality composite image.

またトラッキング情報を使ってプレーヤ領域を限定するように、ＨＭＤなどのトラッキング情報に基づいてカメラからの距離範囲を求め、その距離に基づいてデプスの範囲を設定してもよい。 Further, the distance range from the camera may be obtained based on the tracking information such as the HMD so as to limit the player area by using the tracking information, and the depth range may be set based on the distance.

４．３種々の処理例
次に本実施形態の種々の処理例について説明する。例えば本実施形態では図５（Ａ）、図５（Ｂ）に示すように、実空間においてプレーヤＰＬが搭乗する筐体３０の画像を抽出し、仮想空間画像にプレーヤＰＬの画像及び筐体３０の画像が合成された画像を生成している。これにより、筐体３０にプレーヤＰＬが搭乗した状態の実空間画像を、仮想空間画像に合成できるようになる。この場合に本実施形態では、図１０（Ａ）、図１０（Ｂ）で説明したように第１画像ＩＭ１と第２画像ＩＭ２の差分画像を求める背景差分法により被写体の画像を抽出している。そして筐体３０は、第１画像ＩＭ１と第２画像ＩＭ２の両方に映るため、通常の背景差分法では、筐体３０は背景と判断されて、抽出されずに消えてしまう。また筐体３０が可動筐体である場合には、ゲームプレイ中に筐体３０が動くため、筐体３０が消えたり、消えなかったりする事態が発生する。 4.3 Various processing examples Next, various processing examples of the present embodiment will be described. For example, in the present embodiment, as shown in FIGS. 5A and 5B, an image of the housing 30 on which the player PL is boarded is extracted in the real space, and the image of the player PL and the housing 30 are combined with the virtual space image. Is generating a composite image of the above images. As a result, the real space image in which the player PL is mounted on the housing 30 can be combined with the virtual space image. In this case, in the present embodiment, as described with reference to FIGS. 10 (A) and 10 (B), the image of the subject is extracted by the background subtraction method for obtaining the difference image between the first image IM1 and the second image IM2. .. Since the housing 30 is reflected on both the first image IM1 and the second image IM2, the housing 30 is determined to be the background by the normal background subtraction method and disappears without being extracted. Further, when the housing 30 is a movable housing, the housing 30 moves during game play, so that the housing 30 may or may not disappear.

そこで本実施形態では、図６（Ａ）、図６（Ｂ）で説明したように、筐体３０の画像の抽出範囲を指定する筐体マスク画像を用いて、筐体３０の画像を抽出している。この筐体マスク画像を用いる手法について詳細に説明する。 Therefore, in the present embodiment, as described with reference to FIGS. 6 (A) and 6 (B), the image of the housing 30 is extracted by using the housing mask image that specifies the extraction range of the image of the housing 30. ing. A method using this housing mask image will be described in detail.

図１９（Ａ）は、オペレータが手動で筐体３０の範囲を指定することで設定された筐体マスク画像の例である。この筐体３０の範囲の指定は、筐体３０の概形をなぞる程度の粗さでよい。図１９（Ｂ）は、筐体３０についてのデプスマスク画像の例である。図１９（Ａ）の範囲指定用の筐体マスク画像と、図１９（Ｂ）のデプスマスク画像のＡＮＤ（論理積）をとることで、図１９（Ｃ）に示すような筐体領域を示す筐体マスク画像が生成される。実際には、図１９（Ｂ）のデプスマスク画像に対して、前述したような穴埋めやエッジ平滑化などの補正処理を行い、補正処理後のデプスマスク画像と、図１９（Ａ）の範囲指定用の筐体マスク画像のＡＮＤをとることになる。図１９（Ａ）では筐体３０の概形だけが指定されていたが、図１９（Ｂ）のデプスマスク画像とのＡＮＤをとることで、図１９（Ｃ）に示すように筐体３０の形が正確に反映された筐体マスク画像を生成できる。そして図１９（Ｃ）の筐体マスク画像と、差分マスク画像（カラー背景差分）とのＯＲをとることで、図２０（Ａ）のような画像を生成し、図１９（Ｃ）の筐体マスク画像と、差分デプスマスク画像（デプス背景差分）とのＯＲをとることで、図２０（Ｂ）のような画像を生成する。そして図２０（Ａ）の画像と図２０（Ｂ）の画像のＯＲをとることで、図２０（Ｃ）に示す被写体マスク画像を生成する。これにより図２０（Ｄ）に示すように、筐体３０及びプレーヤＰＬを適正に抽出できるようになる。例えば図２１（Ａ）〜図２１（Ｄ）は筐体マスク画像を用いない手法の説明図である。図２１（Ａ）は差分マスク画像（カラー背景差分）であり図２１（Ｂ）は差分デプスマスク画像（デプス背景差分）である。図２１（Ｂ）において筐体３０にプレーヤＰＬが乗った程度の差分は差分の閾値以下となる。図２１（Ｃ）は図２１（Ａ）の画像と図２１（Ｂ）の画像のＯＲをとることで生成された被写体マスク画像であり、この被写体マスク画像によっては、図２１（Ｄ）に示すようにプレーヤＰＬだけしか抽出できず、筐体３０を抽出することができない。 FIG. 19A is an example of a housing mask image set by the operator manually specifying the range of the housing 30. The range of the housing 30 may be specified with a roughness that traces the approximate shape of the housing 30. FIG. 19B is an example of a depth mask image of the housing 30. By ANDing the housing mask image for specifying the range of FIG. 19 (A) and the depth mask image of FIG. 19 (B), the housing region as shown in FIG. 19 (C) is shown. A housing mask image is generated. Actually, the depth mask image of FIG. 19 (B) is subjected to correction processing such as filling holes and edge smoothing as described above, and the depth mask image after the correction processing and the range designation of FIG. 19 (A) are specified. The AND of the housing mask image for In FIG. 19 (A), only the approximate shape of the housing 30 was specified, but by taking an AND with the depth mask image of FIG. 19 (B), as shown in FIG. 19 (C), the housing 30 It is possible to generate a housing mask image that accurately reflects the shape. Then, by taking OR of the housing mask image of FIG. 19 (C) and the difference mask image (color background subtraction), an image as shown in FIG. 20 (A) is generated, and the housing of FIG. 19 (C) is generated. By taking OR of the mask image and the difference depth mask image (depth background subtraction), an image as shown in FIG. 20B is generated. Then, by taking the OR of the image of FIG. 20 (A) and the image of FIG. 20 (B), the subject mask image shown in FIG. 20 (C) is generated. As a result, as shown in FIG. 20 (D), the housing 30 and the player PL can be appropriately extracted. For example, FIGS. 21 (A) to 21 (D) are explanatory views of a method that does not use a housing mask image. FIG. 21 (A) is a difference mask image (color background subtraction), and FIG. 21 (B) is a difference depth mask image (depth background difference). In FIG. 21B, the difference to the extent that the player PL rides on the housing 30 is equal to or less than the difference threshold. 21 (C) is a subject mask image generated by taking an OR of the image of FIG. 21 (A) and the image of FIG. 21 (B), and is shown in FIG. 21 (D) depending on the subject mask image. As described above, only the player PL can be extracted, and the housing 30 cannot be extracted.

以上に説明した筐体マスク画像を用いる手法によれば、オペレータは図１９（Ａ）に示すように筐体３０の概形をラフに指定すれば済むため、オペレータの範囲指定の作業が楽になる。また筐体３０が動くことで、静止時には筐体３０が映っていた一部の画素の領域が背景になった場合に、背景になった画素の領域も適切に背景として判定できるようになり、可動筐体である筐体３０の適切な抽出処理を実現できる。 According to the method using the housing mask image described above, the operator only needs to roughly specify the outline shape of the housing 30 as shown in FIG. 19A, which facilitates the work of specifying the range of the operator. .. Further, by moving the housing 30, when the area of a part of the pixels in which the housing 30 is reflected becomes the background when the housing 30 is stationary, the area of the pixels that became the background can be appropriately determined as the background. It is possible to realize an appropriate extraction process of the housing 30 which is a movable housing.

図２２は、図７〜図８（Ｂ）で説明した抽出範囲を設定する手法の説明図である。図２２では、ＨＭＤ２００を装着したプレーヤＰＬが、ゲーム施設に設置された細い板状の橋の上を移動している。そしてこのゲームでは、カメラ１５０の画角内であるプレイエリアの中において、オペレータＯＰがプレーヤＰＬの直ぐ近くで、ＨＭＤ２００を装着することで外の様子が見えないプレーヤＰＬを補助している。画像処理だけでは、カメラ１５０に写っているのが、プレーヤＰＬなのかオペレータＯＰなのかを判定できないため、プレーヤＰＬではない誤った対象が抽出されてしまう問題が生じる。 FIG. 22 is an explanatory diagram of the method for setting the extraction range described with reference to FIGS. 7 to 8 (B). In FIG. 22, a player PL equipped with the HMD200 is moving on a thin plate-shaped bridge installed in a game facility. Then, in this game, in the play area within the angle of view of the camera 150, the operator OP assists the player PL who cannot see the outside by wearing the HMD200 in the immediate vicinity of the player PL. Since it is not possible to determine whether the image captured by the camera 150 is the player PL or the operator OP only by image processing, there arises a problem that an erroneous target other than the player PL is extracted.

このような問題の発生を抑制するために、図７〜図８（Ｂ）で説明したように、プレーヤＰＬが存在すると考えられる領域にだけ、抽出範囲を設定して抽出処理を行う。即ち、ＨＭＤ２００などのトラッキング装置からのトラッキング情報を用いることで、プレーヤＰＬが存在する範囲を求めることができる。ＨＭＤ２００のトラッキング情報だけでは、プレーヤＰＬの頭の位置しか検出できないが、手、足などに装着した複数のトラッキング装置からのトラッキング情報を用いることで、被写体であるプレーヤＰＬの種々の姿勢、動作にも応じた適切な範囲に抽出範囲を設定できるようになる。 In order to suppress the occurrence of such a problem, as described with reference to FIGS. 7 to 8 (B), the extraction process is performed by setting the extraction range only in the region where the player PL is considered to exist. That is, by using the tracking information from a tracking device such as the HMD200, the range in which the player PL exists can be obtained. Only the position of the head of the player PL can be detected only by the tracking information of the HMD200, but by using the tracking information from a plurality of tracking devices attached to the hands, feet, etc., various postures and movements of the player PL as the subject can be obtained. It will be possible to set the extraction range to an appropriate range according to the above.

また本実施形態では、１台のカメラ１５０を用いる場合について説明したが、複数台のカメラを設け、これらの複数台のカメラを用いて被写体の抽出処理を行ってよい。例えば複数台のカメラからの複数のカラー画像と複数のデプス画像を用いて被写体の抽出処理を行う。 Further, in the present embodiment, the case where one camera 150 is used has been described, but a plurality of cameras may be provided and the subject extraction process may be performed using these plurality of cameras. For example, a subject extraction process is performed using a plurality of color images and a plurality of depth images from a plurality of cameras.

またマルチプレーヤゲームの場合には、ゲームをプレイする複数のプレーヤについて生成された複数の合成画像（実空間画像と仮想空間画像の合成画像）を順次に切り替えて表示したり、複数の合成画像を表示装置に対して同時表示するようにしてもよい。 In the case of a multiplayer game, a plurality of composite images (composite images of a real space image and a virtual space image) generated for a plurality of players playing the game can be sequentially switched and displayed, or a plurality of composite images can be displayed. Simultaneous display may be performed on the display device.

またデプス値の測定方式は、本実施形態で説明したステレオカメラを用いる方式には限定されず、光の往復の時間を測定するＴｏＦ（Time of Flight）の方式や、予め決まった赤外線等のパターンを投影し、そのパターンの変形から計算する構造化光法などの種々の方式を採用できる。例えばトラッキング用の赤外線の受光素子が設けられないようなタイプのＨＭＤであれば、赤外線を投射してから帰ってくるまでの時間から距離を計算してデプス値を測定するＴｏＦの方式を用いることができる。また反射の光の強度で距離を測る場合には、赤外線の受光部を有していないＨＭＤを用いてもよい。 The depth value measurement method is not limited to the method using the stereo camera described in the present embodiment, and is limited to the ToF (Time of Flight) method for measuring the round-trip time of light and a predetermined pattern such as infrared rays. Can be adopted by various methods such as a structured optical method in which the image is projected and calculated from the deformation of the pattern. For example, in the case of an HMD of the type that does not have an infrared light receiving element for tracking, use the ToF method that measures the depth value by calculating the distance from the time from the projection of infrared rays to the return. Can be done. When measuring the distance by the intensity of the reflected light, an HMD that does not have an infrared light receiving portion may be used.

また本実施形態では、実空間のカメラ１５０の位置に対応する仮想空間の位置に撮影用の仮想カメラを配置する。このため、カメラ１５０と被写体（プレーヤ、筐体）との間のカメラ距離を正確に測定することが望まれる。そこでＡＲマーカのような例えば箱状のマーカ物体を用意して、被写体の近くに配置する。そして、このマーカ物体をカメラ１５０により撮影することで、ＡＲマーカの場合と同様の手法でカメラ距離を測定する。こうすることで、実空間でのカメラ１５０と被写体との距離関係を正確に反映させた撮影用の仮想カメラの配置を実現できる。 Further, in the present embodiment, the virtual camera for shooting is arranged at the position of the virtual space corresponding to the position of the camera 150 in the real space. Therefore, it is desired to accurately measure the camera distance between the camera 150 and the subject (player, housing). Therefore, for example, a box-shaped marker object such as an AR marker is prepared and placed near the subject. Then, by photographing this marker object with the camera 150, the camera distance is measured by the same method as in the case of the AR marker. By doing so, it is possible to realize the arrangement of the virtual camera for shooting that accurately reflects the distance relationship between the camera 150 and the subject in the real space.

また本実施形態は種々のゲームやイベントに適用できる。例えばリズムゲームやダンスゲームなどにおいて、仮想空間の広場において演奏したり踊っているプレーヤの様子を、仮想空間画像と実空間画像の合成画像として生成し、例えばリプレイ画像として配信するようにしてもよい。例えば仮想空間において、ギャラリーとなる多数のキャラクタを配置し、プレーヤの演奏やダンスを盛り上げるようにする。このようにすればプレーヤは、異空間である仮想空間において演奏したり踊っている自分の姿を、後で楽しめるようになる。 Further, this embodiment can be applied to various games and events. For example, in a rhythm game or a dance game, the state of a player playing or dancing in a virtual space square may be generated as a composite image of a virtual space image and a real space image, and distributed as a replay image, for example. .. For example, in a virtual space, a large number of characters that serve as a gallery are arranged to excite the player's performance and dance. In this way, the player can later enjoy playing and dancing in a virtual space, which is a different space.

またキッズ向けのライドアトラクションにおいて、ライド筐体に乗車している子供の様子を合成画像（合成映像）として生成し、写真にプリントして持ち帰るようにしたり、配信するようにしてもよい。本実施形態によれば、ブルーバックやグリーンバックなどの撮影用機材をアトラクションに配置することなく、望む世界やシチュエーションの画像、映像を作れるようになる。 Further, in a ride attraction for kids, the state of the child riding in the ride housing may be generated as a composite image (composite video), printed on a photograph and taken home, or distributed. According to this embodiment, it is possible to create an image or video of a desired world or situation without arranging shooting equipment such as a blue back or a green background at an attraction.

また握手会、記念撮影会において、仮想空間のキャラクタと共に撮影した合成画像を提供するようにしてもよい。例えば漫画のキャラクタやゲームのキャラクタとプレーヤが出会えるようにする。このようにすれば、実空間には存在しないキャラクタと実際に触れ合った様子が映った合成画像を提供できる。また仮想的なカフェにおいてキャラクタと一緒に食事をしているような合成画像を生成してもよい。 Further, at the handshake event and the commemorative photo session, a composite image taken together with the character in the virtual space may be provided. For example, allow players to meet cartoon characters and game characters. In this way, it is possible to provide a composite image showing the actual contact with a character that does not exist in the real space. You may also generate a composite image that looks like you are eating with a character in a virtual cafe.

また図５（Ａ）、図５（Ｂ）では、ＨＭＤ２００を装着した状態のプレーヤＰＬの画像が表示されているが、ＨＭＤ２００の代わりとなるかぶり物の画像を合成してもよい。例えばプレーヤＰＬの自身の画像を、かぶり物の画像として合成したり、コスプレのようなかぶり物の画像を合成する。或いは、プレーヤＰＬが武器、アイテムを装着したり、衣装を着たりするように見える合成画像を生成してもよい。或いは、筐体３０の画像を、よりリアルな画像に見えるような画像合成処理を行ってもよい。 Further, in FIGS. 5A and 5B, an image of the player PL with the HMD200 attached is displayed, but an image of a headgear instead of the HMD200 may be combined. For example, the player PL's own image is combined as an image of a headgear, or an image of a headgear such as a cosplay is combined. Alternatively, the player PL may generate a composite image that appears to be wearing a weapon, item, or wearing a costume. Alternatively, the image of the housing 30 may be subjected to image composition processing so that it looks like a more realistic image.

また協力プレイのゲームにおいて、仮想空間画像と実空間画像の合成画像（合成映像）を見ている第１のプレーヤと、ゲーム操作を行ったり体を動かす第２のプレーヤとに分け、第２のプレーヤの操作や動作についてのヒントを、第１のプレーヤが、合成画像を見ながら第２のプレーヤに伝えるようにしてもよい。 Further, in the cooperative play game, the first player who is viewing the composite image (composite video) of the virtual space image and the real space image and the second player who performs the game operation or moves the body are divided into the second player. The first player may give hints about the operation and operation of the player to the second player while viewing the composite image.

なお、上記のように本実施形態について詳細に説明したが、本開示の新規事項および効果から実体的に逸脱しない多くの変形が可能であることは当業者には容易に理解できるであろう。従って、このような変形例はすべて本開示の範囲に含まれるものとする。例えば、明細書又は図面において、少なくとも一度、より広義または同義な異なる用語（被写体等）と共に記載された用語（プレーヤ等）は、明細書又は図面のいかなる箇所においても、その異なる用語に置き換えることができる。また画像生成システムの構成、筐体の構成、カメラの構成、画像の取得処理、仮想空間画像の生成処理、被写体や筐体の画像の抽出処理、仮想空間画像と実空間画像の画像合成処理等も、本実施形態で説明したものに限定されず、これらと均等な手法・処理・構成も本開示の範囲に含まれる。また本実施形態は種々のゲームに適用できる。また本実施形態は、業務用ゲーム装置、家庭用ゲーム装置、又は多数のプレーヤが参加する大型アトラクションシステム等の種々の画像生成システムに適用できる。 Although the present embodiment has been described in detail as described above, those skilled in the art will easily understand that many modifications that do not substantially deviate from the novel matters and effects of the present disclosure are possible. Therefore, all such variations are included in the scope of the present disclosure. For example, a term (player, etc.) described with a different term (subject, etc.) in a broader sense or synonymously at least once in the specification or drawing may be replaced with the different term in any part of the specification or drawing. it can. In addition, image generation system configuration, housing configuration, camera configuration, image acquisition processing, virtual space image generation processing, subject and housing image extraction processing, virtual space image and real space image image composition processing, etc. However, the method, processing, and configuration equivalent to these are also included in the scope of the present disclosure without being limited to those described in the present embodiment. Moreover, this embodiment can be applied to various games. Further, the present embodiment can be applied to various image generation systems such as a commercial game device, a home game device, or a large attraction system in which a large number of players participate.

ＰＬ…プレーヤ、ＶＣＭ…撮影用の仮想カメラ、ＣＨＭ…撮影者キャラクタ、
ＭＶ…車、ＭＶＥ…敵車、ＣＨＥ…敵キャラクタ、ＳＴハンドル、ＨＬ、ＨＲ手、
ＥＦ１、ＥＦ２…エフェクト、ＣＳ…コース、ＭＳＣ…筐体マスク画像、
ＴＲ１〜ＴＲ５…トラッキング装置、ＡＲ…抽出範囲、ＰＡＸ…補助点、
ＩＭ１…第１画像、ＩＭ２…第２画像、ＭＳＤＦ…差分マスク画像、
ＭＳＤＰ…デプスマスク画像、ＭＳＳＢ…被写体マスク画像、
ＩＭＤＰ１…第１デプス画像、ＩＭＤＰ２…第２デプス画像、
ＭＳＤＰＤＦ…差分デプスマスク画像、ＭＳＣ…筐体マスク画像、
ＢＧ…背景、ＳＢ…被写体、ＲＡ…デプス範囲、ＲＡ２…第２デプス範囲、
ＺＮ、ＺＮ２、ＺＦ、ＺＦ２…デプス値、ＺＡＶ…デプス値の平均値、
３０…筐体、３２…底部、３３…カバー部、４０…移動部、
５０…ハンドル、５２…アクセルペダル、５４…ブレーキペダル、
６０…ライド部、６２…シート、８０…送風機、
１００…処理部、１０２…取得部、１０４…仮想空間設定部、１０６…ゲーム処理部、
１０７…移動体処理部、１０８…筐体制御部、１１０…仮想カメラ制御部、
１２０…画像生成部、１２２…画像合成部、１３０…音生成部、
１５０…カメラ、１５２…カラーカメラ、１５４…デプスカメラ、１６０…操作部、
１７０…記憶部、１７２…仮想空間情報記憶部、１７８…描画バッファ、
１８０…情報記憶媒体、１９２…音出力部、１９４…Ｉ／Ｆ部、
１９５…携帯型情報記憶媒体、１９６…通信部、２００…ＨＭＤ（頭部装着型表示装置）、２０１〜２０３…受光素子、２０６…トラッキング装置、２０８…表示部、
２１０…ギャラリー用表示装置、２５０、２６０…トラッキング装置、
２５１〜２５４…受光素子、２６１〜２６４…受光素子、
２８０、２８４…ベースステーション、２８１、２８２、２８５、２８６…発光素子 PL ... player, VCM ... virtual camera for shooting, CHM ... photographer character,
MV ... car, MVE ... enemy car, CHE ... enemy character, ST handle, HL, HR hand,
EF1, EF2 ... effect, CS ... course, MSC ... housing mask image,
TR1 to TR5 ... Tracking device, AR ... Extraction range, PAX ... Auxiliary point,
IM1 ... 1st image, IM2 ... 2nd image, MSDF ... Difference mask image,
MSDP ... Depth mask image, MSSB ... Subject mask image,
IMDP1 ... 1st depth image, IMDP2 ... 2nd depth image,
MSDPDF ... differential depth mask image, MSC ... housing mask image,
BG ... background, SB ... subject, RA ... depth range, RA2 ... second depth range,
ZN, ZN2, ZF, ZF2 ... Depth value, ZAV ... Mean value of depth value,
30 ... housing, 32 ... bottom, 33 ... cover, 40 ... moving part,
50 ... steering wheel, 52 ... accelerator pedal, 54 ... brake pedal,
60 ... Ride part, 62 ... Seat, 80 ... Blower,
100 ... Processing unit, 102 ... Acquisition unit, 104 ... Virtual space setting unit, 106 ... Game processing unit,
107 ... Mobile processing unit, 108 ... Housing control unit, 110 ... Virtual camera control unit,
120 ... image generation unit, 122 ... image composition unit, 130 ... sound generation unit,
150 ... camera, 152 ... color camera, 154 ... depth camera, 160 ... operation unit,
170 ... storage unit, 172 ... virtual space information storage unit, 178 ... drawing buffer,
180 ... Information storage medium, 192 ... Sound output section, 194 ... I / F section,
195 ... Portable information storage medium, 196 ... Communication unit, 200 ... HMD (head-mounted display device), 201-203 ... Light receiving element, 206 ... Tracking device, 208 ... Display unit,
210 ... Gallery display device, 250, 260 ... Tracking device,
251 to 254 ... light receiving element, 261 to 264 ... light receiving element,
280, 284 ... Base station, 281, 282, 285, 286 ... Light emitting element

Claims

An acquisition unit that acquires a first image in which a background and a subject are photographed by a camera arranged in a real space and a second image in which the background is photographed by the camera.
An image generation unit that generates a virtual space image that can be seen from a virtual camera for shooting arranged at a position in the virtual space corresponding to the position of the camera.
An image compositing unit that extracts the image of the subject by obtaining the difference image between the first image and the second image and generates a composite image in which the image of the subject is combined with the virtual space image.
An image generation system characterized by including.

In claim 1,
The image synthesizing unit
An image generation system characterized in that an image of a housing on which the subject is mounted is extracted in the real space, and the composite image in which the image of the subject and the image of the housing are combined with the virtual space image is generated. ..

In claim 2,
The image synthesizing unit
An image generation system characterized in that an image of the housing is extracted by using a housing mask image that specifies an extraction range of the image of the housing.

In claim 1,
The image synthesizing unit
An image generation system characterized in that an image extraction range of the subject is set based on tracking information from at least one tracking device worn by the subject, and the image of the subject is extracted.

In claim 4,
The image synthesizing unit
It is characterized in that the extraction range of the image of the subject is set based on the position of the tracking device and the position of an auxiliary point set at a position shifted by a predetermined distance from the position of the tracking device. Image generation system.

In any of claims 1 to 5,
The image generation unit
As a virtual space image for the player displayed on the player which is the subject, the image of the virtual camera for shooting and the image of the photographer character are displayed at positions corresponding to the positions of the virtual camera for shooting in the virtual space. An image generation system characterized in generating a virtual space image in which at least one is displayed.

In any of claims 1 to 6,
A head-mounted display device that is worn by the player who is the subject and displays a virtual space image for the player that can be seen from the virtual camera for the player in the virtual space.
A gallery display device in which the composite image is displayed as a gallery image, and
An image generation system characterized by including.

In any of claims 1 to 7,
The acquisition unit
A depth image of the background and the subject taken by the camera is acquired.
The image synthesizing unit
An image generation system characterized in that an image of the subject is extracted based on the difference image and the depth image.

In claim 8.
The image synthesizing unit
A difference mask image is generated based on the difference image, a depth mask image for identifying a pixel whose depth value is in a given depth range is generated based on the depth image, and the difference mask image and the depth mask image are generated. An image generation system characterized in that a subject mask image for identifying the subject is generated based on the above, and an image of the subject is extracted based on the subject mask image and the first image.

In claim 9.
The image synthesizing unit
An image generation system characterized by performing correction processing of the depth mask image and generating the subject mask image based on the depth mask image and the difference mask image after the correction processing.

In claim 10,
The image synthesizing unit
A difference depth mask image is generated between the first depth image of the background and the subject taken by the camera and the second depth image of the background taken by the camera, and the correction process is performed based on the difference depth mask image. An image generation system characterized by generating the depth mask image later.

In claim 10 or 11, the image synthesizing unit
An image generation system characterized in that the depth mask image after the correction process is generated by performing at least one of a morphology filter process and a time series filter process.

In any of claims 10 to 12,
The image synthesizing unit
Image generation characterized in that the depth mask image after the correction process is generated by performing a process of setting the pixel values of the pixels for which the depth value could not be acquired in the depth image based on the difference image. system.

In any of claims 10 to 13,
The image synthesizing unit
An image generation system characterized in that a depth mask image after the correction processing is generated by obtaining an area size of a pixel group in which the depth value is in the depth range and performing a filter process according to the area size.

In any of claims 9 to 14,
The image synthesizing unit
A second depth range is set based on the depth value in the subject area determined to be the subject area, and an image that identifies pixels whose depth value is in the second depth range is generated as the depth mask image. An image generation system characterized by that.

An acquisition unit that acquires a first image in which a background and a subject are photographed by a camera arranged in a real space and a second image in which the background is photographed by the camera.
An image generation unit that generates a virtual space image that can be seen from a virtual camera for shooting arranged at a position in the virtual space corresponding to the position of the camera.
As an image synthesizing unit, an image of the subject is extracted by obtaining a difference image between the first image and the second image, and a composite image in which the image of the subject is combined with the virtual space image is generated.
A program characterized by making a computer work.