JP2014120093A

JP2014120093A - Device, method and program for generating virtual viewpoint image

Info

Publication number: JP2014120093A
Application number: JP2012276510A
Authority: JP
Inventors: Kentaro Yamada; 健太郎山田; Hiroshi Sanko; 浩嗣三功; Hitoshi Naito; 整内藤
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2012-12-19
Filing date: 2012-12-19
Publication date: 2014-06-30
Anticipated expiration: 2032-12-19
Also published as: JP5969376B2

Abstract

【課題】仮想視点での映像を適切に生成できる仮想視点映像生成装置、仮想視点映像生成方法、およびプログラムを提供すること。
【解決手段】仮想視点映像生成装置１は、オブジェクト分離部１０、カメラ間補間部２０、およびフレーム間補間部３０を備える。オブジェクト分離部１０は、オクルージョンを形成する複数のオブジェクトのそれぞれを分離して、これら複数のオブジェクトのそれぞれのテクスチャを取得する。カメラ間補間部２０は、複数の視点映像のうち、オクルージョンが発生している視点映像を除くものから、このオクルージョンにより隠れている部分を含むオブジェクトのテクスチャを取得する。フレーム間補間部３０は、オクルージョンが発生しているフレームの前または後ろのフレームから、このオクルージョンにより隠れている部分を含むオブジェクトのテクスチャを取得する。
【選択図】図１A virtual viewpoint video generation device, a virtual viewpoint video generation method, and a program capable of appropriately generating a video at a virtual viewpoint are provided.
A virtual viewpoint video generation apparatus includes an object separation unit, an inter-camera interpolation unit, and an inter-frame interpolation unit. The object separation unit 10 separates each of the plurality of objects forming the occlusion and acquires the texture of each of the plurality of objects. The inter-camera interpolation unit 20 acquires a texture of an object including a portion hidden by the occlusion from a plurality of viewpoint videos except for a viewpoint video in which occlusion occurs. The inter-frame interpolation unit 30 acquires the texture of the object including the portion hidden by the occlusion from the frame before or after the frame where the occlusion occurs.
[Selection] Figure 1

Description

本発明は、仮想視点映像を生成する仮想視点映像生成装置、仮想視点映像生成方法、およびプログラムに関する。 The present invention relates to a virtual viewpoint video generation device, a virtual viewpoint video generation method, and a program for generating a virtual viewpoint video.

従来、複数の視点映像から仮想視点での映像を生成する手法が提案されている。この手法として、例えば特許文献１には、大規模な空間で行われるイベントを複数台のカメラで撮影した映像から、各オブジェクトのテクスチャを生成し、オブジェクトごとに、生成したテクスチャを３次元空間内に配置した平面にマッピングする手法が示されている。 Conventionally, a method for generating a video at a virtual viewpoint from a plurality of viewpoint videos has been proposed. As this technique, for example, in Patent Document 1, a texture of each object is generated from a video in which an event performed in a large space is captured by a plurality of cameras, and the generated texture is stored in a three-dimensional space for each object. A method of mapping to a plane arranged in FIG.

しかし、特許文献１に示されている上述の手法では、複数のオブジェクトが重なり合うオクルージョンが発生していると、１枚のテクスチャに複数のオブジェクトが含まれてしまう。このため、仮想視点での映像を適切には生成できないおそれがあった。 However, in the above-described method disclosed in Patent Document 1, if an occlusion in which a plurality of objects overlap is generated, a plurality of objects are included in one texture. For this reason, there is a possibility that the video at the virtual viewpoint cannot be appropriately generated.

そこで、例えば非特許文献１には、オクルージョンが発生している視点映像とは視点の異なる他の視点映像を用いて、粗い３次元情報を復元し、手前に存在するオブジェクトを分離してテクスチャを取得する手法が示されている。この手法では、オクルージョンが発生している映像に平行な平面を３次元空間中のオブジェクト位置近傍に設定し、この平面に、他の視点映像から抽出された手前に存在するオブジェクトの領域を投影して、手前のオブジェクトのテクスチャ領域を獲得し、手前のオブジェクトのテクスチャを取得する。また、奥のオブジェクトについては、手前のオブジェクトにより隠れている部分が存在するため、視点の近接する視点映像からテクスチャを取得する。 Therefore, for example, in Non-Patent Document 1, using another viewpoint video having a different viewpoint from the viewpoint video in which occlusion occurs, rough three-dimensional information is restored, and an object existing in the foreground is separated to create a texture. The method of acquisition is shown. In this method, a plane parallel to the image where the occlusion occurs is set near the object position in the three-dimensional space, and the area of the object existing in the foreground extracted from other viewpoint images is projected onto this plane. The texture area of the object in the foreground is acquired, and the texture of the object in the foreground is acquired. Further, since there is a portion hidden behind the object in front, the texture is acquired from the viewpoint video close to the viewpoint.

特開２０１１−１７０４８７号公報JP 2011-170487 A

Y. Ohta, I. Kitahara, Y. Kameda, H. Ishikawa, and T. Koyama. Live 3D video in soccer stadium. IJCV, 75(1):173-187, 2007.Y. Ohta, I. Kitahara, Y. Kameda, H. Ishikawa, and T. Koyama. Live 3D video in soccer stadium. IJCV, 75 (1): 173-187, 2007.

また、非特許文献１には、自由視点映像の生成過程においてオクルージョンが発生している場合に、オブジェクトの分離処理と、カメラ間補間処理と、を併用する手法が示されている。 Non-Patent Document 1 discloses a method of using both object separation processing and inter-camera interpolation processing when occlusion occurs in a free viewpoint video generation process.

分離処理は、オクルージョンが小さい場合には有効である。しかし、オクルージョンが大きい場合には、テクスチャの欠損が大きくなり、視覚的品質が低下してしまう。 The separation process is effective when the occlusion is small. However, when the occlusion is large, the loss of texture becomes large and the visual quality is degraded.

そこで、一番手前に存在するオブジェクトに対しては、テクスチャの欠損が生じないと判断して、分離処理を適用する。一方、一番手前に存在するオブジェクトよりも奥に存在するオブジェクトに対しては、テクスチャの欠損が発生する可能性があるため、分離処理ではなくカメラ間補間処理を一律に適用する。カメラ間補間処理では、視点の近接する視点映像からテクスチャを取得する。 Therefore, the separation process is applied to the object that is present in the foreground by determining that no texture loss occurs. On the other hand, since there is a possibility that a texture loss may occur with respect to an object existing behind the object existing in the foreground, the inter-camera interpolation process is uniformly applied instead of the separation process. In the inter-camera interpolation process, a texture is acquired from a viewpoint video that is close to the viewpoint.

しかし、視点の近接する視点映像でもオクルージョンが発生している場合には、カメラ間補間処理を適用すると、オクルージョンが発生したままの状態でテクスチャが取得されてしまうため、仮想視点映像の視覚的品質が低下してしまうおそれがあった。 However, if occlusion occurs even in viewpoint videos that are close to the viewpoint, if the inter-camera interpolation process is applied, the texture is acquired with the occlusion still occurring, so the visual quality of the virtual viewpoint video There was a risk that it would fall.

また、仮想視点映像の向きと、カメラ間補間処理においてテクスチャが取得された視点映像の向きと、の差異が大きくなるに従って、取得されたテクスチャの向きが本来のオブジェクトの向きからかけ離れていってしまう。このため、これらの向きの差異が大きい場合にも、カメラ間補間処理を適用すると、仮想視点映像の視覚的品質が低下してしまうおそれがあった。 Also, as the difference between the direction of the virtual viewpoint video and the direction of the viewpoint video from which the texture was acquired in the inter-camera interpolation process becomes larger, the acquired texture direction is far from the original object direction. . For this reason, even when the difference between these directions is large, there is a possibility that the visual quality of the virtual viewpoint video is deteriorated when the inter-camera interpolation process is applied.

そこで、本発明は、上述の課題に鑑みてなされたものであり、仮想視点映像の視覚的品質を向上できる仮想視点映像生成装置、仮想視点映像生成方法、およびプログラムを提供することを目的とする。 Accordingly, the present invention has been made in view of the above-described problems, and an object thereof is to provide a virtual viewpoint video generation device, a virtual viewpoint video generation method, and a program that can improve the visual quality of a virtual viewpoint video. .

本発明は、上記の課題を解決するために、以下の事項を提案している。
（１）本発明は、複数の視点映像から仮想視点での映像を生成する仮想視点映像生成装置（例えば、図１の仮想視点映像生成装置１に相当）であって、前記複数の視点映像のいずれかにおいてオクルージョンが発生すると、当該オクルージョンを形成する複数のオブジェクトのそれぞれを分離して、当該複数のオブジェクトのそれぞれのテクスチャを取得する第１のテクスチャ取得手段（例えば、図１のオブジェクト分離部１０に相当）と、前記複数の視点映像のいずれかにおいてオクルージョンが発生すると、前記複数の視点映像のうち、当該オクルージョンが発生している視点映像を除くものから、当該オクルージョンにより隠れている部分を含むオブジェクトのテクスチャを取得する第２のテクスチャ取得手段（例えば、図１のカメラ間補間部２０に相当）と、前記複数の視点映像のいずれかにおいてオクルージョンが発生すると、当該オクルージョンが発生しているフレームの前または後ろのフレームから、当該オクルージョンにより隠れている部分を含むオブジェクトのテクスチャを取得する第３のテクスチャ取得手段（例えば、図１のフレーム間補間部３０に相当）と、前記第１のテクスチャ取得手段により取得されたテクスチャと、前記第２のテクスチャ取得手段により取得されたテクスチャと、前記第３のテクスチャ取得手段により取得されたテクスチャと、の中から、前記オクルージョンにより隠れている部分を含むオブジェクトのテクスチャとして用いるものを判定する判定手段（例えば、図１の判定部４０に相当）と、前記判定手段により判定されたテクスチャを用いて、前記仮想視点での映像を生成する映像生成手段（例えば、図１の映像生成部５０に相当）と、を備えることを特徴とする仮想視点映像生成装置を提案している。 The present invention proposes the following matters in order to solve the above problems.
(1) The present invention is a virtual viewpoint video generation device (e.g., equivalent to the virtual viewpoint video generation device 1 in FIG. 1) that generates a video at a virtual viewpoint from a plurality of viewpoint videos. When occlusion occurs in any of the first texture acquisition means (for example, the object separation unit 10 in FIG. 1) that separates each of the plurality of objects forming the occlusion and obtains the texture of each of the plurality of objects. And when the occlusion occurs in any one of the plurality of viewpoint videos, the part of the plurality of viewpoint videos excluding the viewpoint video in which the occlusion occurs includes a portion hidden by the occlusion. Second texture acquisition means for acquiring the texture of the object (for example, the camera of FIG. When the occlusion occurs in any of the plurality of viewpoint videos, the object including the portion hidden by the occlusion from the frame before or after the frame where the occlusion occurs is equivalent. Third texture acquisition means (for example, corresponding to the inter-frame interpolation unit 30 in FIG. 1) for acquiring textures, texture acquired by the first texture acquisition means, and second texture acquisition means Determination means (for example, the determination unit in FIG. 1) that determines the texture to be used as the texture of the object including the portion hidden by the occlusion from the texture obtained by the third texture acquisition means 40) and the text determined by the determination means With turbocharger, said image generating means for generating an image of a virtual viewpoint (e.g., corresponding to the image generation unit 50 of FIG. 1) proposes a virtual viewpoint video generation device characterized by comprising a, a.

この発明によれば、複数の視点映像から仮想視点での映像を生成する仮想視点映像生成装置に、第１のテクスチャ取得手段、第２のテクスチャ取得手段、第３のテクスチャ取得手段、判定手段、および映像生成手段を設けた。そして、第１のテクスチャ取得手段により、複数の視点映像のいずれかにおいてオクルージョンが発生すると、このオクルージョンを形成する複数のオブジェクトのそれぞれを分離して、これら複数のオブジェクトのそれぞれのテクスチャを取得することとした。また、第２のテクスチャ取得手段により、複数の視点映像のいずれかにおいてオクルージョンが発生すると、これら複数の視点映像のうち、オクルージョンが発生している視点映像を除くものから、このオクルージョンにより隠れている部分を含むオブジェクトのテクスチャを取得することとした。また、第３のテクスチャ取得手段により、複数の視点映像のいずれかにおいてオクルージョンが発生すると、このオクルージョンが発生しているフレームの前または後ろのフレームから、このオクルージョンにより隠れている部分を含むオブジェクトのテクスチャを取得することとした。また、判定手段により、第１のテクスチャ取得手段により取得されたテクスチャと、第２のテクスチャ取得手段により取得されたテクスチャと、第３のテクスチャ取得手段により取得されたテクスチャと、の中から、オクルージョンにより隠れている部分を含むオブジェクトのテクスチャとして用いるものを判定することとした。また、映像生成手段により、判定手段により判定されたテクスチャを用いて、仮想視点での映像を生成することとした。 According to the present invention, a virtual viewpoint video generation device that generates a video at a virtual viewpoint from a plurality of viewpoint videos includes a first texture acquisition unit, a second texture acquisition unit, a third texture acquisition unit, a determination unit, And video generation means. Then, when occlusion occurs in any of the plurality of viewpoint videos by the first texture acquisition means, the plurality of objects forming the occlusion are separated and the respective textures of the plurality of objects are acquired. It was. Further, when occlusion occurs in any one of the plurality of viewpoint videos by the second texture acquisition unit, the occlusion is hidden from the viewpoint videos except for the viewpoint video in which the occlusion occurs among the plurality of viewpoint videos. The texture of the object including the part was acquired. Further, when occlusion occurs in any of a plurality of viewpoint videos by the third texture acquisition means, an object including a portion hidden by this occlusion from the frame before or after the frame where the occlusion occurs is displayed. I decided to get a texture. Further, the occlusion from the texture acquired by the first texture acquisition means, the texture acquired by the second texture acquisition means, and the texture acquired by the third texture acquisition means by the determination means. It was decided to determine what to use as the texture of the object including the hidden part. In addition, the video generation unit generates the video at the virtual viewpoint using the texture determined by the determination unit.

このため、第１のテクスチャ取得手段により、オブジェクトを分離してテクスチャを取得する分離処理が行われ、第２のテクスチャ取得手段により、他の視点映像からテクスチャを取得するカメラ間補間処理が行われるとともに、第３のテクスチャ取得手段により、他のフレームからテクスチャを取得するフレーム間補間処理が行われる。したがって、仮想視点映像の生成を、分離処理およびカメラ間補間処理だけでなく、フレーム間補間処理も組み合わせて行うことができるので、仮想視点映像の視覚的品質を向上できる。 For this reason, the first texture acquisition unit performs a separation process for separating an object and acquires a texture, and the second texture acquisition unit performs an inter-camera interpolation process for acquiring a texture from another viewpoint video. At the same time, the third texture acquisition means performs inter-frame interpolation processing for acquiring textures from other frames. Accordingly, the generation of the virtual viewpoint video can be performed by combining not only the separation process and the inter-camera interpolation process but also the inter-frame interpolation process, so that the visual quality of the virtual viewpoint video can be improved.

（２）本発明は、（１）の仮想視点映像生成装置について、前記判定手段は、前記オクルージョンにより隠れている部分を含むオブジェクトのテクスチャとして用いるものとして、前記第１のテクスチャ取得手段により取得されたテクスチャ、前記第２のテクスチャ取得手段により取得されたテクスチャ、前記第３のテクスチャ取得手段により取得されたテクスチャ、の優先度順で判定することを特徴とする仮想視点映像生成装置を提案している。 (2) In the virtual viewpoint video generation device according to (1), the determination unit is acquired by the first texture acquisition unit as a texture of an object including a portion hidden by the occlusion. Proposing a virtual viewpoint video generation device characterized by determining in order of priority of a texture obtained by the second texture obtaining unit, and a texture obtained by the third texture obtaining unit Yes.

この発明によれば、（１）の仮想視点映像生成装置において、判定手段により、オクルージョンにより隠れている部分を含むオブジェクトのテクスチャとして用いるものとして、第１のテクスチャ取得手段により取得されたテクスチャ、第２のテクスチャ取得手段により取得されたテクスチャ、第３のテクスチャ取得手段により取得されたテクスチャ、の優先度順で判定することとした。 According to this invention, in the virtual viewpoint video generation device of (1), the determination unit uses the texture acquired by the first texture acquisition unit as the texture of the object including the portion hidden by occlusion, The textures acquired by the second texture acquisition unit and the textures acquired by the third texture acquisition unit are determined in order of priority.

このため、オクルージョンにより隠れている部分を含むオブジェクトのテクスチャとして、第１のテクスチャ取得手段、第２のテクスチャ取得手段、第３のテクスチャ取得手段の優先度順に、取得されたテクスチャを用いることができる。 For this reason, the acquired texture can be used in the order of priority of the first texture acquisition unit, the second texture acquisition unit, and the third texture acquisition unit as the texture of the object including the portion hidden by occlusion. .

（３）本発明は、（１）または（２）の仮想視点映像生成装置について、前記第３のテクスチャ取得手段は、前記オクルージョンが発生している視点映像を構成する複数のフレームのうち、当該オクルージョンが発生しているフレームの前または後ろのフレームから、前記オブジェクトのテクスチャを取得することを特徴とする仮想視点映像生成装置を提案している。 (3) In the virtual viewpoint video generation device according to (1) or (2), the third texture acquisition unit may include the third texture acquisition unit among the plurality of frames constituting the viewpoint video in which the occlusion occurs. There has been proposed a virtual viewpoint video generation device characterized in that the texture of the object is acquired from a frame before or after a frame where occlusion occurs.

この発明によれば、（１）または（２）の仮想視点映像生成装置において、第３のテクスチャ取得手段により、オクルージョンが発生している視点映像を構成する複数のフレームのうち、このオクルージョンが発生しているフレームの前または後ろのフレームから、オブジェクトのテクスチャを取得することとした。 According to the present invention, in the virtual viewpoint video generation device according to (1) or (2), the third texture acquisition unit generates this occlusion among a plurality of frames constituting the viewpoint video where the occlusion occurs. The texture of the object is obtained from the frame before or after the current frame.

このため、フレーム間補間処理においてテクスチャを取得する際に、他のフレームとして、オクルージョンが発生している視点映像を構成するフレームが用いられることになる。したがって、オクルージョンが発生している視点映像とは異なる他の視点映像を構成するフレームが用いられる場合と比べて、視点が同一であるため、仮想視点映像の視覚的品質をより向上できる。 For this reason, when acquiring a texture in the inter-frame interpolation process, a frame constituting a viewpoint video in which occlusion occurs is used as another frame. Therefore, since the viewpoint is the same as compared with the case where a frame constituting another viewpoint video different from the viewpoint video in which occlusion occurs, the visual quality of the virtual viewpoint video can be further improved.

（４）本発明は、（１）〜（３）のいずれかの仮想視点映像生成装置について、前記判定手段は、前記第１のテクスチャ取得手段により取得されたテクスチャを用いるか否かを判定する第１の判定手順（例えば、図２のステップＳ９の処理に相当）と、前記第２のテクスチャ取得手段により取得されたテクスチャと、前記第３のテクスチャ取得手段により取得されたテクスチャと、のうちいずれを用いるかを判定する第２の判定手順（例えば、図２のステップＳ１１の処理に相当）と、を行うことを特徴とする仮想視点映像生成装置を提案している。 (4) In the virtual viewpoint video generation device according to any one of (1) to (3), the determination unit determines whether to use the texture acquired by the first texture acquisition unit. Of the first determination procedure (e.g., corresponding to the process of step S9 in FIG. 2), the texture acquired by the second texture acquisition unit, and the texture acquired by the third texture acquisition unit A virtual viewpoint video generation apparatus characterized by performing a second determination procedure for determining which one to use (for example, corresponding to the processing of step S11 in FIG. 2) is proposed.

この発明によれば、（１）〜（３）のいずれかの仮想視点映像生成装置において、判定手段により、第１の判定手順および第２の判定手順を行うこととした。第１の判定手順では、第１のテクスチャ取得手段により取得されたテクスチャを用いるか否かを判定することとした。第２の判定手順では、第２のテクスチャ取得手段により取得されたテクスチャと、第３のテクスチャ取得手段により取得されたテクスチャと、のうちいずれを用いるかを判定することとした。 According to this invention, in the virtual viewpoint video generation device according to any one of (1) to (3), the determination unit performs the first determination procedure and the second determination procedure. In the first determination procedure, it is determined whether to use the texture acquired by the first texture acquisition means. In the second determination procedure, it is determined which of the texture acquired by the second texture acquisition unit and the texture acquired by the third texture acquisition unit is to be used.

このため、第１の判定手順、第２の判定手順の順番に行うようにすることで、第１の判定手順において用いると判定された場合には、第２の判定手順を行わないようにすることができる。したがって、仮想視点映像生成装置の処理を軽減できる場合がある。 Therefore, by performing the first determination procedure and the second determination procedure in this order, if it is determined to be used in the first determination procedure, the second determination procedure is not performed. be able to. Therefore, the processing of the virtual viewpoint video generation device may be reduced.

（５）本発明は、（４）の仮想視点映像生成装置について、前記判定手段は、前記第１の判定手順では、前記第１のテクスチャ取得手段により取得されたテクスチャに対してオブジェクト検出を行って、オブジェクト検出の結果に基づいて判定することを特徴とする仮想視点映像生成蔵置を提案している。 (5) In the virtual viewpoint video generation device according to (4), the determination unit performs object detection on the texture acquired by the first texture acquisition unit in the first determination procedure. Thus, a virtual viewpoint video generation storage characterized by determining based on the result of object detection is proposed.

ここで、非特許文献１では、上述のように、一番手前に存在するオブジェクトよりも奥に存在するオブジェクトに対しては、分離処理ではなくカメラ間補間処理を一律に適用する。しかし、奥に存在するオブジェクトであっても、オクルージョンが小さい場合には、テクスチャの欠損が小さいため、カメラ間補間処理よりも分離処理を適用した方が、フレーム間での視点の一貫性を保つことができ、仮想視点映像の視覚的品質が向上する。 Here, in Non-Patent Document 1, as described above, the inter-camera interpolation process is applied uniformly to the object existing behind the foremost object instead of the separation process. However, even if the object is in the back, if the occlusion is small, the loss of texture is small, so it is better to apply the separation process than the inter-camera interpolation process to maintain the consistency of the viewpoint between frames. And the visual quality of the virtual viewpoint video is improved.

そこで、この発明によれば、（４）の仮想視点映像生成装置において、判定手段により、第１の判定手順では、第１のテクスチャ取得手段により取得されたテクスチャに対してオブジェクト検出を行って、オブジェクト検出の結果に基づいて判定することとした。 Therefore, according to the present invention, in the virtual viewpoint video generation device of (4), in the first determination procedure, object detection is performed on the texture acquired by the first texture acquisition unit by the determination unit, The determination is based on the result of object detection.

このため、分離処理により取得されたテクスチャが、オクルージョンにより隠れている部分を含むオブジェクトのテクスチャとして適正であるか否かを、オブジェクト検出を用いて判別することができる。これによれば、分離処理により取得されたテクスチャと、カメラ間補間処理またはフレーム間補間処理により取得されたテクスチャと、のどちらを用いるべきかをより適切に判別することができる。したがって、仮想視点映像の視覚的品質をさらに向上できる。 For this reason, it is possible to determine whether or not the texture acquired by the separation process is appropriate as a texture of an object including a portion hidden by occlusion, using object detection. According to this, it is possible to more appropriately determine which one of the texture acquired by the separation process and the texture acquired by the inter-camera interpolation process or the inter-frame interpolation process should be used. Therefore, the visual quality of the virtual viewpoint video can be further improved.

（６）本発明は、（５）の仮想視点映像生成装置について、前記判定手段は、前記オブジェクト検出においてＨＯＧ特徴量を用いることを特徴とする仮想視点映像生成装置を提案している。 (6) The virtual viewpoint video generation device according to (5) proposes a virtual viewpoint video generation device characterized in that the determination means uses a HOG feature amount in the object detection.

この発明によれば、（５）の仮想視点映像生成装置において、判定手段により、オブジェクト検出においてＨＯＧ特徴量を用いることとした。 According to the present invention, in the virtual viewpoint video generation device of (5), the determination unit uses the HOG feature amount for object detection.

このため、ＨＯＧ特徴量を用いてオブジェクト検出を行うことができる。 For this reason, object detection can be performed using the HOG feature amount.

（７）本発明は、（４）〜（６）のいずれかの仮想視点映像生成装置について、前記判定手段は、前記第２の判定手順では、前記オクルージョンにより隠れている部分を含むオブジェクトについて前記第２のテクスチャ取得手段によりテクスチャが取得された視点映像ではオクルージョンが発生しておらず、かつ、前記第２のテクスチャ取得手段によりテクスチャが取得された視点映像の視点と前記仮想視点との角度差が予め定められた閾値以下である場合に、前記第２のテクスチャ取得手段により取得されたテクスチャを用いると判定し、前記オクルージョンにより隠れている部分を含むオブジェクトについて前記第２のテクスチャ取得手段によりテクスチャが取得された視点映像ではオクルージョンが発生している場合と、前記角度差が前記閾値より大きい場合と、の少なくともいずれかの場合に、前記第３のテクスチャ取得手段により取得されたテクスチャを用いると判定することを特徴とする仮想視点映像生成装置を提案している。 (7) The present invention relates to the virtual viewpoint video generation device according to any one of (4) to (6), wherein the determination means includes the object including a portion hidden by the occlusion in the second determination procedure. There is no occlusion in the viewpoint video from which the texture has been acquired by the second texture acquisition means, and the angle difference between the viewpoint of the viewpoint video from which the texture has been acquired by the second texture acquisition means and the virtual viewpoint Is less than a predetermined threshold value, it is determined that the texture acquired by the second texture acquisition unit is used, and the texture including the portion hidden by the occlusion is detected by the second texture acquisition unit. The angle difference between the case where occlusion has occurred Wherein the larger than the threshold, at least in the case of either, we propose a virtual viewpoint video generation apparatus and judging the used texture acquired by the third texture acquisition means.

この発明によれば、（４）〜（６）のいずれかの仮想視点映像生成装置において、判定手順により、第２の判定手順では、以下の第１の条件および第２の条件をともに満たしている場合には、第２のテクスチャ取得手段により取得されたテクスチャを用いると判定し、以下の第１の条件または第２の条件のうち少なくともいずれかを満たしていない場合には、第３のテクスチャ取得手段により取得されたテクスチャを用いると判定することとした。第１の条件は、オクルージョンにより隠れている部分を含むオブジェクトについて、第２のテクスチャ取得手段によりテクスチャが取得された視点映像ではオクルージョンが発生していない、というものである。第２の条件は、第２のテクスチャ取得手段によりテクスチャが取得された視点映像の視点と、仮想視点と、の角度差が予め定められた閾値以下である、というものである。 According to this invention, in the virtual viewpoint video generation device according to any one of (4) to (6), the following first condition and second condition are satisfied in the second determination procedure by the determination procedure. If it is determined that the texture acquired by the second texture acquisition unit is to be used, and if at least one of the following first condition or second condition is not satisfied, the third texture It was decided to use the texture acquired by the acquisition means. The first condition is that no occlusion occurs in the viewpoint video in which the texture is acquired by the second texture acquisition unit for the object including the portion hidden by the occlusion. The second condition is that the angle difference between the viewpoint of the viewpoint video from which the texture is acquired by the second texture acquisition means and the virtual viewpoint is equal to or less than a predetermined threshold.

このため、第２のテクスチャ取得手段によりテクスチャが取得された視点映像でもオクルージョンが発生している場合と、上述の角度差が閾値より大きい場合とでは、カメラ間補間処理により取得されたテクスチャの代わりに、フレーム間補間処理により取得されたテクスチャが用いられることになる。したがって、仮想視点映像の視覚的品質をより一層向上できる。 Therefore, instead of the texture acquired by the inter-camera interpolation process between the case where the occlusion occurs even in the viewpoint video from which the texture is acquired by the second texture acquisition means and the case where the above-described angle difference is larger than the threshold value. In addition, the texture acquired by the inter-frame interpolation process is used. Therefore, the visual quality of the virtual viewpoint video can be further improved.

（８）本発明は、第１のテクスチャ取得手段（例えば、図１のオブジェクト分離部１０に相当）、第２のテクスチャ取得手段（例えば、図１のカメラ間補間部２０に相当）、第３のテクスチャ取得手段（例えば、図１のフレーム間補間部３０に相当）、判定手段（例えば、図１の判定部４０に相当）、および映像生成手段（例えば、図１の映像生成部５０に相当）を備え、複数の視点映像から仮想視点での映像を生成する仮想視点映像生成装置（例えば、図１の仮想視点映像生成装置１に相当）における仮想視点映像生成方法であって、前記第１のテクスチャ取得手段が、前記複数の視点映像のいずれかにおいてオクルージョンが発生すると、当該オクルージョンを形成する複数のオブジェクトのそれぞれを分離して、当該複数のオブジェクトのそれぞれのテクスチャを取得する第１のステップ（例えば、図２のステップＳ６の処理に相当）と、前記第２のテクスチャ取得手段が、前記複数の視点映像のいずれかにおいてオクルージョンが発生すると、前記複数の視点映像のうち、当該オクルージョンが発生している視点映像を除くものから、当該オクルージョンにより隠れている部分を含むオブジェクトのテクスチャを取得する第２のステップ（例えば、図２のステップＳ７の処理に相当）と、前記第３のテクスチャ取得手段が、前記複数の視点映像のいずれかにおいてオクルージョンが発生すると、当該オクルージョンが発生しているフレームの前または後ろのフレームから、当該オクルージョンにより隠れている部分を含むオブジェクトのテクスチャを取得する第３のステップ（例えば、図２のステップＳ８の処理に相当）と、前記判定手段が、前記第１のステップにおいて取得されたテクスチャと、前記第２のステップにおいて取得されたテクスチャと、前記第３のステップにおいて取得されたテクスチャと、の中から、前記オクルージョンにより隠れている部分を含むオブジェクトのテクスチャとして用いるものを判定する第４のステップ（例えば、図２のステップＳ９、Ｓ１１の処理に相当）と、前記映像生成手段が、前記第４のステップにおいて判定されたテクスチャを用いて、前記仮想視点での映像を生成する第５のステップ（例えば、図２のステップＳ１２の処理に相当）と、を備えることを特徴とする仮想視点映像生成方法を提案している。 (8) The present invention provides first texture acquisition means (for example, equivalent to the object separation unit 10 in FIG. 1), second texture acquisition means (for example, equivalent to the inter-camera interpolation unit 20 in FIG. 1), third Texture acquisition means (for example, equivalent to the inter-frame interpolation section 30 in FIG. 1), determination means (for example, equivalent to the determination section 40 in FIG. 1), and video generation means (for example, equivalent to the video generation section 50 in FIG. 1). ), And a virtual viewpoint video generation method (for example, equivalent to the virtual viewpoint video generation device 1 in FIG. 1) that generates a video at a virtual viewpoint from a plurality of viewpoint videos, When the occlusion occurs in any one of the plurality of viewpoint videos, the texture acquisition unit separates each of the plurality of objects forming the occlusion and outputs the plurality of objects. When the first step (for example, corresponding to the process of step S6 in FIG. 2) and the second texture acquisition unit generate occlusion in any of the plurality of viewpoint videos, A second step (for example, step S7 in FIG. 2) of obtaining the texture of the object including the portion hidden by the occlusion from the plurality of viewpoint videos excluding the viewpoint video in which the occlusion occurs. When the occlusion occurs in any of the plurality of viewpoint videos, the third texture acquisition unit hides the occlusion from the frame before or after the frame where the occlusion occurs. The third step to get the texture of the object (For example, corresponding to the process of step S8 in FIG. 2), the determination means includes the texture acquired in the first step, the texture acquired in the second step, and the third step. A fourth step (e.g., corresponding to the processing of steps S9 and S11 in FIG. 2) of determining the texture to be used as the texture of the object including the portion hidden by the occlusion, The video generation means includes a fifth step (for example, corresponding to the process of step S12 in FIG. 2) that generates a video at the virtual viewpoint using the texture determined in the fourth step. We have proposed a virtual viewpoint video generation method characterized by this.

この発明によれば、第１のテクスチャ取得手段により、複数の視点映像のいずれかにおいてオクルージョンが発生すると、このオクルージョンを形成する複数のオブジェクトのそれぞれを分離して、これら複数のオブジェクトのそれぞれのテクスチャを取得することとした。また、第２のテクスチャ取得手段により、複数の視点映像のいずれかにおいてオクルージョンが発生すると、これら複数の視点映像のうち、このオクルージョンが発生している視点映像を除くものから、このオクルージョンにより隠れている部分を含むオブジェクトのテクスチャを取得することとした。また、第３のテクスチャ取得手段により、複数の視点映像のいずれかにおいてオクルージョンが発生すると、このオクルージョンが発生しているフレームの前または後ろのフレームから、このオクルージョンにより隠れている部分を含むオブジェクトのテクスチャを取得することとした。また、判定手段により、第１のテクスチャ取得手段により取得されたテクスチャと、第２のテクスチャ取得手段により取得されたテクスチャと、第３のテクスチャ取得手段により取得されたテクスチャと、の中から、オクルージョンにより隠れている部分を含むオブジェクトのテクスチャとして用いるものを判定することとした。また、映像生成手段により、判定手段により判定されたテクスチャを用いて、仮想視点での映像を生成することとした。このため、上述した効果と同様の効果を奏することができる。 According to the present invention, when occlusion occurs in any one of the plurality of viewpoint videos by the first texture acquisition unit, the plurality of objects forming the occlusion are separated, and the textures of the plurality of objects are separated. Decided to get. Further, when occlusion occurs in any one of the plurality of viewpoint videos by the second texture acquisition means, it is hidden by this occlusion from the viewpoint videos other than the viewpoint video in which this occlusion is generated. I decided to get the texture of the object including the part. Further, when occlusion occurs in any of a plurality of viewpoint videos by the third texture acquisition means, an object including a portion hidden by this occlusion from the frame before or after the frame where the occlusion occurs is displayed. I decided to get a texture. Further, the occlusion from the texture acquired by the first texture acquisition means, the texture acquired by the second texture acquisition means, and the texture acquired by the third texture acquisition means by the determination means. It was decided to determine what to use as the texture of the object including the hidden part. In addition, the video generation unit generates the video at the virtual viewpoint using the texture determined by the determination unit. For this reason, the effect similar to the effect mentioned above can be produced.

（９）本発明は、第１のテクスチャ取得手段（例えば、図１のオブジェクト分離部１０に相当）、第２のテクスチャ取得手段（例えば、図１のカメラ間補間部２０に相当）、第３のテクスチャ取得手段（例えば、図１のフレーム間補間部３０に相当）、判定手段（例えば、図１の判定部４０に相当）、および映像生成手段（例えば、図１の映像生成部５０に相当）を備え、複数の視点映像から仮想視点での映像を生成する仮想視点映像生成装置（例えば、図１の仮想視点映像生成装置１に相当）における仮想視点映像生成方法を、コンピュータに実行させるためのプログラムであって、前記第１のテクスチャ取得手段が、前記複数の視点映像のいずれかにおいてオクルージョンが発生すると、当該オクルージョンを形成する複数のオブジェクトのそれぞれを分離して、当該複数のオブジェクトのそれぞれのテクスチャを取得する第１のステップ（例えば、図２のステップＳ６の処理に相当）と、前記第２のテクスチャ取得手段が、前記複数の視点映像のいずれかにおいてオクルージョンが発生すると、前記複数の視点映像のうち、当該オクルージョンが発生している視点映像を除くものから、当該オクルージョンにより隠れている部分を含むオブジェクトのテクスチャを取得する第２のステップ（例えば、図２のステップＳ７の処理に相当）と、前記第３のテクスチャ取得手段が、前記複数の視点映像のいずれかにおいてオクルージョンが発生すると、当該オクルージョンが発生しているフレームの前または後ろのフレームから、当該オクルージョンにより隠れている部分を含むオブジェクトのテクスチャを取得する第３のステップ（例えば、図２のステップＳ８の処理に相当）と、前記判定手段が、前記第１のステップにおいて取得されたテクスチャと、前記第２のステップにおいて取得されたテクスチャと、前記第３のステップにおいて取得されたテクスチャと、の中から、前記オクルージョンにより隠れている部分を含むオブジェクトのテクスチャとして用いるものを判定する第４のステップ（例えば、図２のステップＳ９、Ｓ１１の処理に相当）と、前記映像生成手段が、前記第４のステップにおいて判定されたテクスチャを用いて、前記仮想視点での映像を生成する第５のステップ（例えば、図２のステップＳ１２の処理に相当）と、をコンピュータに実行させるためのプログラムを提案している。 (9) The present invention provides first texture acquisition means (for example, equivalent to the object separation unit 10 in FIG. 1), second texture acquisition means (for example, equivalent to the inter-camera interpolation unit 20 in FIG. 1), third Texture acquisition means (for example, equivalent to the inter-frame interpolation section 30 in FIG. 1), determination means (for example, equivalent to the determination section 40 in FIG. 1), and video generation means (for example, equivalent to the video generation section 50 in FIG. 1). ) And causing a computer to execute a virtual viewpoint video generation method in a virtual viewpoint video generation device (for example, equivalent to the virtual viewpoint video generation device 1 in FIG. 1) that generates a video at a virtual viewpoint from a plurality of viewpoint videos. When the first texture acquisition unit generates occlusion in any of the plurality of viewpoint videos, the plurality of objects forming the occlusion A first step (for example, corresponding to the process of step S6 in FIG. 2), and the second texture acquisition means are configured to acquire the texture of each of the plurality of objects. When occlusion occurs in any of the viewpoint videos, a texture of an object including a portion hidden by the occlusion is acquired from the plurality of viewpoint videos except for the viewpoint video in which the occlusion occurs. When the occlusion occurs in any one of the plurality of viewpoint videos, the third texture acquisition unit, before the frame in which the occlusion has occurred, for example, corresponds to step S7 in FIG. Or, from the back frame, the occlusion that includes the part hidden by the occlusion The third step of acquiring the texture of the object (for example, corresponding to the process of step S8 in FIG. 2), and the determination means are acquired in the second step and the texture acquired in the first step. A fourth step (for example, step S9 in FIG. 2) of determining the texture to be used as the texture of the object including the part hidden by the occlusion from the texture obtained in the third step. , Corresponding to the process of S11), and the video generation means generates a video at the virtual viewpoint using the texture determined in the fourth step (for example, step S12 of FIG. 2). And a program for causing a computer to execute the program.

この発明によれば、コンピュータを用いてプログラムを実行することで、第１のテクスチャ取得手段により、複数の視点映像のいずれかにおいてオクルージョンが発生すると、このオクルージョンを形成する複数のオブジェクトのそれぞれを分離して、これら複数のオブジェクトのそれぞれのテクスチャを取得することとした。また、第２のテクスチャ取得手段により、複数の視点映像のいずれかにおいてオクルージョンが発生すると、これら複数の視点映像のうち、このオクルージョンが発生している視点映像を除くものから、このオクルージョンにより隠れている部分を含むオブジェクトのテクスチャを取得することとした。また、第３のテクスチャ取得手段により、複数の視点映像のいずれかにおいてオクルージョンが発生すると、このオクルージョンが発生しているフレームの前または後ろのフレームから、このオクルージョンにより隠れている部分を含むオブジェクトのテクスチャを取得することとした。また、判定手段により、第１のテクスチャ取得手段により取得されたテクスチャと、第２のテクスチャ取得手段により取得されたテクスチャと、第３のテクスチャ取得手段により取得されたテクスチャと、の中から、オクルージョンにより隠れている部分を含むオブジェクトのテクスチャとして用いるものを判定することとした。また、映像生成手段により、判定手段により判定されたテクスチャを用いて、仮想視点での映像を生成することとした。このため、上述した効果と同様の効果を奏することができる。 According to the present invention, by executing a program using a computer, when occlusion occurs in any one of a plurality of viewpoint videos by the first texture acquisition unit, each of a plurality of objects forming the occlusion is separated. Thus, the texture of each of the plurality of objects is acquired. Further, when occlusion occurs in any one of the plurality of viewpoint videos by the second texture acquisition means, it is hidden by this occlusion from the viewpoint videos other than the viewpoint video in which this occlusion is generated. I decided to get the texture of the object including the part. Further, when occlusion occurs in any of a plurality of viewpoint videos by the third texture acquisition means, an object including a portion hidden by this occlusion from the frame before or after the frame where the occlusion occurs is displayed. I decided to get a texture. Further, the occlusion from the texture acquired by the first texture acquisition means, the texture acquired by the second texture acquisition means, and the texture acquired by the third texture acquisition means by the determination means. It was decided to determine what to use as the texture of the object including the hidden part. In addition, the video generation unit generates the video at the virtual viewpoint using the texture determined by the determination unit. For this reason, the effect similar to the effect mentioned above can be produced.

本発明によれば、仮想視点映像の視覚的品質を向上できる。 According to the present invention, the visual quality of a virtual viewpoint video can be improved.

本発明の一実施形態に係る仮想視点映像生成装置のブロック図である。It is a block diagram of the virtual viewpoint video generation device concerning one embodiment of the present invention. 前記仮想視点映像生成装置が行う仮想視点映像生成処理のフローチャートである。It is a flowchart of the virtual viewpoint video generation processing performed by the virtual viewpoint video generation device. 前記仮想視点映像生成装置が行う仮想視点映像生成処理のフローチャートである。It is a flowchart of the virtual viewpoint video generation processing performed by the virtual viewpoint video generation device.

以下、本発明の実施の形態について図面を参照しながら説明する。なお、以下の実施形態における構成要素は適宜、既存の構成要素などとの置き換えが可能であり、また、他の既存の構成要素との組み合せを含む様々なバリエーションが可能である。したがって、以下の実施形態の記載をもって、特許請求の範囲に記載された発明の内容を限定するものではない。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. Note that the constituent elements in the following embodiments can be appropriately replaced with existing constituent elements, and various variations including combinations with other existing constituent elements are possible. Accordingly, the description of the following embodiments does not limit the contents of the invention described in the claims.

［仮想視点映像生成装置１の構成］
図１は、本発明の一実施形態に係る仮想視点映像生成装置１のブロック図である。仮想視点映像生成装置１は、互いに異なる視点からの映像である複数の視点映像を入力とし、これら複数の視点映像から仮想視点での映像である仮想視点映像を生成する。この仮想視点映像生成装置１は、オブジェクト分離部１０、カメラ間補間部２０、フレーム間補間部３０、判定部４０、映像生成部５０、および制御部６０を備える。 [Configuration of Virtual Viewpoint Video Generating Device 1]
FIG. 1 is a block diagram of a virtual viewpoint video generation device 1 according to an embodiment of the present invention. The virtual viewpoint video generation device 1 receives a plurality of viewpoint videos that are videos from different viewpoints, and generates a virtual viewpoint video that is a video at a virtual viewpoint from these viewpoint videos. The virtual viewpoint video generation apparatus 1 includes an object separation unit 10, an inter-camera interpolation unit 20, an inter-frame interpolation unit 30, a determination unit 40, a video generation unit 50, and a control unit 60.

オブジェクト分離部１０は、分離処理（後述の図２のステップＳ６の処理を参照）を行う。これによれば、複数の視点映像のいずれかにおいてオクルージョンが発生すると、オクルージョンを形成する複数のオブジェクトのそれぞれを分離して、これら複数のオブジェクトのそれぞれのテクスチャを取得することができる。 The object separation unit 10 performs separation processing (see processing in step S6 in FIG. 2 described later). According to this, when occlusion occurs in any of a plurality of viewpoint videos, it is possible to separate each of the plurality of objects forming the occlusion and acquire the texture of each of the plurality of objects.

カメラ間補間部２０は、カメラ間補間処理（後述の図２のステップＳ７の処理を参照）を行う。これによれば、複数の視点映像のいずれかにおいてオクルージョンが発生すると、これら複数の視点映像のうち、オクルージョンが発生している視点映像を除くものから、オクルージョンにより隠れている部分を含むオブジェクトのテクスチャを取得することができる。 The inter-camera interpolation unit 20 performs an inter-camera interpolation process (see the process in step S7 in FIG. 2 described later). According to this, when occlusion occurs in any of a plurality of viewpoint videos, the texture of an object including a portion hidden by occlusion from those viewpoint videos excluding the viewpoint video where the occlusion occurs. Can be obtained.

フレーム間補間部３０は、フレーム間補間処理（後述の図２のステップＳ８の処理を参照）を行う。これによれば、複数の視点映像のいずれかにおいてオクルージョンが発生すると、オクルージョンが発生しているフレームの前または後ろのフレームから、オクルージョンにより隠れている部分を含むオブジェクトのテクスチャを取得することができる。 The inter-frame interpolation unit 30 performs inter-frame interpolation processing (see processing in step S8 in FIG. 2 described later). According to this, when occlusion occurs in any of a plurality of viewpoint videos, it is possible to acquire the texture of an object including a portion hidden by occlusion from a frame before or after the frame where the occlusion occurs. .

判定部４０は、オブジェクト分離部１０により取得されたテクスチャと、カメラ間補間部２０により取得されたテクスチャと、フレーム間補間部３０により取得されたテクスチャと、の中から、オクルージョンにより隠れている部分を含むオブジェクトのテクスチャとして用いるものを判定する。 The determination unit 40 is a portion hidden by occlusion from among the texture acquired by the object separation unit 10, the texture acquired by the inter-camera interpolation unit 20, and the texture acquired by the inter-frame interpolation unit 30. What is used as the texture of an object including

映像生成部５０は、判定部４０により判定されたテクスチャを用いて、仮想視点映像を生成する。 The video generation unit 50 generates a virtual viewpoint video using the texture determined by the determination unit 40.

制御部６０は、図２、３を用いて後述するように、種々の処理を行う。 The control unit 60 performs various processes as described later with reference to FIGS.

［仮想視点映像生成装置１の動作］
以上の構成を備える仮想視点映像生成装置１の動作について、図２、３を用いて以下に説明する。 [Operation of Virtual Viewpoint Video Generation Device 1]
The operation of the virtual viewpoint video generation device 1 having the above configuration will be described below with reference to FIGS.

図２は、仮想視点映像生成装置１により行われる仮想視点映像生成処理のフローチャートである。 FIG. 2 is a flowchart of a virtual viewpoint video generation process performed by the virtual viewpoint video generation device 1.

ステップＳ１において、制御部６０は、複数の視点映像のそれぞれについて、視点映像とフィールド座標との間で成立する平面射影行列Ｈと、視点映像の視点および仮想視点においてフィールド座標との間で成立する中心射影行列Ｐと、を推定し、ステップＳ２に処理を移す。平面射影行列Ｈについては、視点映像の４点以上の対応点をフィールド平面上で指定することにより、推定する。中心射影行列Ｐについては、視点映像の４点以上の対応点をフィールド平面上でそれぞれ指定するとともに、２点以上の対応点をフィールド空間上でそれぞれ指定することにより、推定する。 In step S <b> 1, for each of the plurality of viewpoint videos, the control unit 60 holds between the planar projection matrix H established between the viewpoint video and the field coordinates and the field coordinates at the viewpoint and virtual viewpoint of the viewpoint video. The central projection matrix P is estimated, and the process proceeds to step S2. The planar projection matrix H is estimated by designating four or more corresponding points of the viewpoint video on the field plane. The central projection matrix P is estimated by designating four or more corresponding points of the viewpoint video on the field plane and designating two or more corresponding points on the field space.

ステップＳ２において、制御部６０は、複数の視点映像のそれぞれにおいて、各フレーム画像を前景領域と背景領域とに分割し、ステップＳ３に処理を移す。具体的には、前景領域が写っていない背景領域を取得し、取得した背景領域と現在のフレーム画像とから背景差分を行って、前景領域と背景領域とを分割したマスク画像を生成する。なお、前景領域とは、時間とともに位置や形状の変化する領域のことであり、対象とする空間が例えばサッカーフィールドである場合には、サッカー選手や審判やボールなどのオブジェクト領域のことである。また、背景領域とは、前景領域以外の領域のことである。 In step S2, the control unit 60 divides each frame image into a foreground area and a background area in each of the plurality of viewpoint videos, and moves the process to step S3. Specifically, a background area in which the foreground area is not captured is acquired, a background difference is performed from the acquired background area and the current frame image, and a mask image in which the foreground area and the background area are divided is generated. Note that the foreground area is an area whose position and shape change with time. When the target space is a soccer field, for example, it is an object area such as a soccer player, a referee, or a ball. The background area is an area other than the foreground area.

ステップＳ３において、制御部６０は、現在のフレームにおいて、フレーム間の時系列的処理により、オブジェクトごとにパーティクルフィルタによるオブジェクト追跡処理を行い、ステップＳ４に処理を移す。このパーティクルフィルタによる時系列的なオブジェクト追跡処理について、図３を用いて以下に詳述する。 In step S3, the control unit 60 performs object tracking processing using a particle filter for each object by time-series processing between frames in the current frame, and moves the processing to step S4. The time-series object tracking processing by the particle filter will be described in detail below with reference to FIG.

図３は、制御部６０が行う、パーティクルフィルタによる時系列的なオブジェクト追跡処理のフローチャートである。 FIG. 3 is a flowchart of time-series object tracking processing by the particle filter performed by the control unit 60.

ステップＳ３１において、制御部６０は、所定のフレームにおいて、ステップＳ２において生成したマスク画像の外接領域を、オブジェクトごとのパーティクルフィルタ外接枠として設定し、ステップＳ３２に処理を移す。なお、マスク画像の外接領域については、例えば、ステップＳ２において生成したマスク画像をユーザが目視で確認し、オクルージョンが発生しているオブジェクトに対してユーザが設定するものとしてもよい。 In step S31, the control unit 60 sets the circumscribed area of the mask image generated in step S2 as a particle filter circumscribed frame for each object in a predetermined frame, and moves the process to step S32. Note that the circumscribing area of the mask image may be set by the user with respect to the object in which the occlusion has occurred by visually confirming the mask image generated in step S2, for example.

ステップＳ３２において、制御部６０は、時刻ｔにおいて各オブジェクトのパーティクルフィルタ内に存在する粒子ごとの状態量ｃ（ｔ）を、以下の式（１）のように定義し、ステップＳ３３に処理を移す。 In step S32, the control unit 60 defines a state quantity c (t) for each particle existing in the particle filter of each object at time t as shown in the following equation (1), and moves the process to step S33. .

式（１）において、ｘ（ｔ）およびｙ（ｔ）は、時刻ｔにおける粒子の２次元座標を示し、Δｘ（ｔ）は、時刻ｔにおける粒子の横方向の速度を示し、Δｙ（ｔ）は、時刻ｔにおける粒子の縦方向の速度を示す。なお、状態遷移関数については、フレーム間で粒子が等速直線運動を行うものと仮定し、ガウスノイズω（ｔ）を用いて以下の式（２）のように設定する。 In equation (1), x (t) and y (t) indicate the two-dimensional coordinates of the particle at time t, Δx (t) indicates the lateral velocity of the particle at time t, and Δy (t) Indicates the vertical velocity of the particles at time t. Note that the state transition function is set as shown in the following equation (2) using Gaussian noise ω (t) on the assumption that the particles perform a uniform linear motion between frames.

ステップＳ３３において、制御部６０は、各粒子の尤度を求め、ステップＳ３４に処理を移す。具体的には、まず、各粒子の位置が、マスク画像において前景領域であるか背景領域であるかを判別する。そして、背景領域であると判別した粒子については、尤度をゼロとし、前景領域であると判別した粒子については、オブジェクトごとに付与された事前情報に基づいて尤度を設定する。 In step S33, the control unit 60 obtains the likelihood of each particle, and moves the process to step S34. Specifically, first, it is determined whether the position of each particle is the foreground area or the background area in the mask image. Then, the likelihood is set to zero for the particles determined to be the background region, and the likelihood is set based on the prior information given for each object for the particles determined to be the foreground region.

例えばオブジェクトがサッカー選手である場合には、ユニフォームの色情報Ｉ_ｒｅｆに基づいて、閾値ｔｈ_ｒｅｆを用いて、粒子位置の画素値Ｉが以下の式（３）を見たすか否かを判別する。そして、満たす場合には尤度をゼロとし、満たさない場合には尤度を「１」とする。 For example, when the object is a soccer player, based on the color information I _{ref of the} uniform, it is determined whether or not the pixel value I of the particle position satisfies the following formula (3) using the threshold th _ref. . And when satisfy | filling, likelihood is set to zero, and when not satisfy | filling, likelihood is set to "1".

ステップＳ３４において、制御部６０は、ステップＳ３３において尤度をゼロとした粒子を消滅させ、尤度が「１」である粒子の近傍に粒子を再配置し、ステップＳ３５に処理を移す。 In step S34, the control unit 60 eliminates the particles having the likelihood of zero in step S33, rearranges the particles in the vicinity of the particles having the likelihood “1”, and moves the process to step S35.

ステップＳ３５において、制御部６０は、オブジェクトごとに、ステップＳ３４において再配置した粒子群の外接領域を算出し、各オブジェクトのパーティクルフィルタ外接枠を更新し、ステップＳ３６に処理を移す。 In step S35, the control unit 60 calculates the circumscribed area of the particle group rearranged in step S34 for each object, updates the particle filter circumscribed frame of each object, and moves the process to step S36.

ステップＳ３６において、制御部６０は、ステップＳ３２〜Ｓ３５の処理を最終フレームまで行ったか否かを判別する。そして、行っていないと判別した場合には、ステップＳ３７に処理を移し、行ったと判別した場合には、図２のステップＳ４に処理を移す。 In step S36, the control unit 60 determines whether or not the processing in steps S32 to S35 has been performed up to the final frame. If it is determined that the process has not been performed, the process proceeds to step S37. If it is determined that the process has been performed, the process proceeds to step S4 in FIG.

ステップＳ３７において、制御部６０は、処理対象フレームを更新し、ステップＳ３２に処理を移す。 In step S37, the control unit 60 updates the process target frame, and moves the process to step S32.

図２に戻って、ステップＳ４において、制御部６０は、オクルージョンが発生しているか否かを判別する。そして、発生していると判別した場合には、ステップＳ５に処理を移し、発生していないと判別した場合には、ステップＳ１２に処理を移す。 Returning to FIG. 2, in step S4, the control unit 60 determines whether or not occlusion has occurred. If it is determined that the error has occurred, the process proceeds to step S5. If it is determined that the error has not occurred, the process proceeds to step S12.

ステップＳ５において、制御部６０は、ステップＳ２において前景領域と背景領域とを分割した結果と、ステップＳ３におけるオブジェクト追跡処理による現在のフレームでの結果と、に基づいて、オクルージョン領域を検出し、ステップＳ６に処理を移す。具体的には、ステップＳ３におけるオブジェクト追跡処理により追跡した全てのオブジェクトについて、パーティクルフィルタの外接枠の位置により、オブジェクトがいずれの連結した前景領域に含まれるかを求めることによって、複数のオブジェクトが含まれる連結した前景領域を求め、求めた前景領域をオクルージョン領域とする。 In step S5, the control unit 60 detects an occlusion area based on the result of dividing the foreground area and the background area in step S2 and the result in the current frame by the object tracking process in step S3. The process is moved to S6. Specifically, for all the objects tracked by the object tracking process in step S3, a plurality of objects are included by determining in which connected foreground area the object is included by the position of the circumscribed frame of the particle filter. The connected foreground area is obtained, and the obtained foreground area is set as an occlusion area.

ステップＳ６において、オブジェクト分離部１０は、ステップＳ５において検出されたオクルージョン領域について分離処理を行って、ステップ７に処理を移す。具体的には、ステップＳ５において検出されたオクルージョン領域について、このオクルージョンを形成する複数のオブジェクトのそれぞれを分離して、これら複数のオブジェクトのそれぞれのテクスチャを取得する。オブジェクトの分離には、例えば、複数の視点映像のうちこのオクルージョンが発生している視点映像を除くもの、すなわち他の視点での視点映像におけるテクスチャを投影することで分離する方法や、ステップＳ３におけるオブジェクト追跡処理による結果に基づいて分離する方法を、適用することができる。 In step S6, the object separation unit 10 performs a separation process on the occlusion area detected in step S5, and moves the process to step 7. Specifically, for the occlusion area detected in step S5, a plurality of objects forming the occlusion are separated, and textures of the plurality of objects are obtained. Object separation includes, for example, a method of separating a plurality of viewpoint videos by excluding a viewpoint video in which this occlusion occurs, that is, by projecting a texture in a viewpoint video at another viewpoint, or in step S3. A method of separating based on the result of the object tracking process can be applied.

ステップＳ７において、カメラ間補間部２０は、ステップＳ５において検出されたオクルージョン領域についてカメラ間補間処理を行って、ステップＳ８に処理を移す。具体的には、ステップＳ５において検出されたオクルージョン領域について、複数の視点映像のうちこのオクルージョンが発生している視点映像を除くもの、すなわち他の視点での視点映像から、オクルージョンにより隠れている部分を含むオブジェクトのテクスチャを取得する。 In step S7, the inter-camera interpolation unit 20 performs inter-camera interpolation processing on the occlusion area detected in step S5, and the process proceeds to step S8. Specifically, with respect to the occlusion area detected in step S5, a part of the plurality of viewpoint videos excluding the viewpoint video in which this occlusion occurs, that is, a portion hidden by occlusion from the viewpoint video at another viewpoint Get the texture of the object containing.

ここで、ステップＳ３におけるオブジェクト追跡処理では、複数の視点映像の視点間で各オブジェクトを追跡するために、オブジェクトごとに、複数の視点映像の視点間で同一のＩＤ（ｉｄｅｎｔｉｆｉｃａｔｉｏｎ）が割り当てられている。このため、ステップＳ７においても上述のＩＤを用いることによって、オクルージョンにより隠れている部分を含むオブジェクトについて、他の視点の視点映像においても判別でき、他の視点の視点映像からテクスチャを取得することができる。 Here, in the object tracking process in step S3, in order to track each object between the viewpoints of the plurality of viewpoint videos, the same ID (identification) is assigned between the viewpoints of the plurality of viewpoint videos for each object. . For this reason, in step S7, by using the ID described above, an object including a portion hidden by occlusion can be discriminated in the viewpoint video of another viewpoint, and a texture can be acquired from the viewpoint video of the other viewpoint. it can.

ステップＳ８において、フレーム間補間部３０は、ステップＳ５において検出されたオクルージョン領域についてフレーム間補間処理を行って、ステップＳ９に処理を移す。具体的には、ステップＳ５において検出されたオクルージョン領域について、オクルージョンが発生しているフレームの前または後ろのフレームから、オクルージョンにより隠れている部分を含むオブジェクトのテクスチャを取得する。 In step S8, the inter-frame interpolation unit 30 performs inter-frame interpolation processing on the occlusion area detected in step S5, and the process proceeds to step S9. Specifically, for the occlusion area detected in step S5, the texture of the object including the portion hidden by the occlusion is acquired from the frame before or after the frame where the occlusion occurs.

ここで、ステップＳ３におけるオブジェクト追跡処理では、フレーム間で各オブジェクトを追跡するために、オブジェクトごとに、フレーム間で同一のＩＤが割り振られている。このため、ステップＳ８においても上述のＩＤを用いることによって、オクルージョンにより隠れている部分を含むオブジェクトについて、他のフレームにおいても判別でき、他のフレームからテクスチャを取得することができる。 Here, in the object tracking process in step S3, in order to track each object between frames, the same ID is allocated between frames for each object. For this reason, also in step S8, by using the above-mentioned ID, an object including a portion hidden by occlusion can be discriminated in another frame, and a texture can be acquired from the other frame.

ステップＳ９において、判定部４０は、ステップＳ６における分離処理により取得されたテクスチャを、オクルージョンにより隠れている部分を含むオブジェクトのテクスチャとして用いることが、適正か否かを判別し、ステップＳ１０に処理を移す。判別には、テクスチャごとにオブジェクト検出を適用する。そして、オブジェクトがただ１つ検出された場合、言い換えるとオブジェクトの欠損や過剰領域が閾値以下である場合には、適正であると判別する。一方、オブジェクトが検出されなかった場合や、オブジェクトが複数検出された場合、言い換えるとオブジェクトの欠損や過剰領域が閾値より大きい場合には、適正ではないと判別する。 In step S9, the determination unit 40 determines whether it is appropriate to use the texture acquired by the separation process in step S6 as the texture of the object including the portion hidden by occlusion, and performs the process in step S10. Move. For detection, object detection is applied for each texture. When only one object is detected, in other words, when the missing or excessive area of the object is equal to or less than the threshold value, it is determined to be appropriate. On the other hand, when an object is not detected, or when a plurality of objects are detected, in other words, when an object loss or excess area is larger than a threshold, it is determined that the object is not appropriate.

なお、分離処理では、他のオブジェクトの一部または全部が連結したままの状態で、分離したいオブジェクトが分離されてしまうことが有り得る。この分離したいオブジェクトに対して連結している他のオブジェクトの領域のことを、上述のように過剰領域と呼ぶ。 In the separation process, an object to be separated may be separated while some or all of the other objects remain connected. The area of another object connected to the object to be separated is called an excess area as described above.

また、上述のオブジェクトの検出では、例えば、対象とするオブジェクトの種類に応じて、オブジェクト群の画像のＨＯＧ（ＨｉｓｔｏｇｒａｍｏｆＯｒｉｅｎｔｅｄＧｒａｄｉｅｎｔ）特徴量を正事例として学習し、ＳＶＭ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）などによる識別器を作成してもよい。 In the above-described object detection, for example, HOG (Histogram of Oriented Gradient) feature values of the image of the object group are learned as positive examples according to the type of the target object, and the SVM (Support Vector Machine) is used. A discriminator may be created.

ステップＳ１０において、判定部４０は、ステップＳ９において適正であると判別した場合には、ステップＳ１２に処理を移し、ステップＳ９において適正ではないと判別した場合には、ステップＳ１１に処理を移す。 In step S10, the determination unit 40 moves the process to step S12 if it is determined to be appropriate in step S9, and moves to step S11 if it is determined that it is not appropriate in step S9.

ステップＳ１１において、判定部４０は、ステップＳ７におけるカメラ間補間処理により取得されたテクスチャと、ステップＳ８におけるフレーム間補間処理により取得されたテクスチャと、のうち、オクルージョンにより隠れている部分を含むオブジェクトのテクスチャとして用いることが適正であるものを、判別し、ステップＳ１２に処理を移す。判別には、以下の第１の条件および第２の条件を用いる。第１の条件は、オクルージョンにより隠れている部分を含むオブジェクトについて、カメラ間補間部２０によりテクスチャが取得された視点映像ではオクルージョンが発生していない、というものである。第２の条件は、カメラ間補間部２０によりテクスチャが取得された視点映像の視点と、仮想視点と、の角度差が予め定められた閾値以下である、というものである。第１の条件および第２の条件をともに満たしている場合には、ステップＳ７におけるカメラ間補間処理により取得されたテクスチャが適正であると判別する。一方、第１の条件または第２の条件の少なくともいずれかを満たしていない場合には、ステップＳ８におけるフレーム間補間処理により取得されたテクスチャが適正であると判別する。 In step S11, the determination unit 40 selects an object including a portion hidden by occlusion from the texture acquired by the inter-camera interpolation process in step S7 and the texture acquired by the inter-frame interpolation process in step S8. What is appropriate for use as a texture is determined, and the process proceeds to step S12. For the determination, the following first condition and second condition are used. The first condition is that no occlusion occurs in the viewpoint video in which the texture is acquired by the inter-camera interpolation unit 20 for an object including a portion hidden by occlusion. The second condition is that the angle difference between the viewpoint of the viewpoint video from which the texture is acquired by the inter-camera interpolation unit 20 and the virtual viewpoint is equal to or less than a predetermined threshold. If both the first condition and the second condition are satisfied, it is determined that the texture acquired by the inter-camera interpolation process in step S7 is appropriate. On the other hand, if at least one of the first condition and the second condition is not satisfied, it is determined that the texture acquired by the inter-frame interpolation process in step S8 is appropriate.

なお、カメラ間補間部２０によりテクスチャが取得された視点映像の視点と、仮想視点と、の角度差は、カメラ間補間部２０によりテクスチャが取得された視点映像の視点の向きと、仮想視点の向きと、の内積に基づいて定義される類似度により求めることができる。類似度が閾値以上である場合には、上述の角度差が予め定められた閾値以下であると判別し、類似度が閾値より小さい場合には、上述の角度差が予め定められた閾値より大きいと判別する。カメラ間補間部２０によりテクスチャが取得された視点映像の視点の向きと、仮想視点の向きと、の内積は、仮想視点のベクトルと、ステップＳ１において推定された中心射影行列Ｐにより得られる光軸のベクトルと、から算出することができる。 Note that the angle difference between the viewpoint of the viewpoint video from which the texture is acquired by the inter-camera interpolation unit 20 and the virtual viewpoint is based on the orientation of the viewpoint of the viewpoint video from which the texture is acquired by the inter-camera interpolation unit 20 and the virtual viewpoint. It can be obtained from the similarity defined based on the inner product of the direction. When the degree of similarity is greater than or equal to the threshold, it is determined that the angle difference is less than or equal to a predetermined threshold, and when the degree of similarity is less than the threshold, the angle difference is greater than the predetermined threshold. Is determined. The inner product of the viewpoint direction of the viewpoint video from which the texture is acquired by the inter-camera interpolation unit 20 and the direction of the virtual viewpoint is the optical axis obtained from the virtual viewpoint vector and the central projection matrix P estimated in step S1. It is possible to calculate from these vectors.

ステップＳ１２において、映像生成部５０は、仮想視点映像を生成し、図２の処理を終了する。具体的には、映像中のテクスチャ位置と、ステップＳ１において推定された平面射影行列Ｈと、に基づいて、各テクスチャを３次元フィールド上に配置して、仮想視点の映像を合成する。 In step S12, the video generation unit 50 generates a virtual viewpoint video, and ends the process of FIG. Specifically, based on the texture position in the video and the planar projection matrix H estimated in step S1, each texture is arranged on a three-dimensional field to synthesize a virtual viewpoint video.

なお、ステップＳ９において、ステップＳ６における分離処理により取得されたテクスチャが適正であると判別された場合には、ステップＳ６における分離処理により取得されたテクスチャを、仮想視点映像の生成に用いる。また、ステップＳ１１において、ステップＳ７におけるカメラ間補間処理により取得されたテクスチャが適正であると判別された場合には、ステップＳ７におけるカメラ間補間処理により取得されたテクスチャを、仮想視点映像の生成に用いる。 In step S9, when it is determined that the texture acquired by the separation process in step S6 is appropriate, the texture acquired by the separation process in step S6 is used for generating a virtual viewpoint video. If it is determined in step S11 that the texture acquired by the inter-camera interpolation process in step S7 is appropriate, the texture acquired by the inter-camera interpolation process in step S7 is used to generate a virtual viewpoint video. Use.

一方、ステップＳ１１において、ステップＳ８におけるフレーム間補間処理により取得されたテクスチャが適正であると判別された場合には、ステップＳ８におけるフレーム間補間処理により取得されたテクスチャを、仮想視点映像の生成に用いる。また、この場合には、オクルージョンにより隠れている部分を含むオブジェクトのテクスチャについて、ステップＳ２における前景抽出処理またはステップＳ６における分離処理により適正に取得されている直近フレームのテクスチャを求め、ステップＳ３におけるオブジェクト追跡処理により推定された位置に、求めたテクスチャを配置する。 On the other hand, if it is determined in step S11 that the texture acquired by the interframe interpolation process in step S8 is appropriate, the texture acquired by the interframe interpolation process in step S8 is used to generate a virtual viewpoint video. Use. In this case, the texture of the object including the portion hidden by the occlusion is obtained for the texture of the nearest frame properly acquired by the foreground extraction process in step S2 or the separation process in step S6, and the object in step S3 is obtained. The obtained texture is arranged at the position estimated by the tracking process.

以上の仮想視点映像生成装置１によれば、以下の効果を奏することができる。 According to the virtual viewpoint video generation device 1 described above, the following effects can be obtained.

仮想視点映像生成装置１は、オブジェクトを分離してテクスチャを取得する分離処理と、他の視点映像からテクスチャを取得するカメラ間補間処理と、他のフレームからテクスチャを取得するフレーム間補間処理と、を行う。このため、仮想視点映像の生成を、分離処理およびカメラ間補間処理だけでなく、フレーム間補間処理も組み合わせて行うことができるので、仮想視点映像の視覚的品質を向上できる。 The virtual viewpoint video generation device 1 separates an object to acquire a texture, an inter-camera interpolation process to acquire a texture from another viewpoint video, an inter-frame interpolation process to acquire a texture from another frame, I do. For this reason, since the generation of the virtual viewpoint video can be performed by combining not only the separation process and the inter-camera interpolation process but also the inter-frame interpolation process, the visual quality of the virtual viewpoint video can be improved.

また、仮想視点映像生成装置１は、図２のステップＳ９において、分離処理により取得されたテクスチャが適正か否かを判別し、適正であると判別した場合には、このテクスチャを用いて仮想視点映像を生成する。一方、適正ではないと判別した場合には、図２のステップＳ１１において、カメラ間補間処理により取得されたテクスチャが適正か否かを判別し、適正であると判別した場合には、このテクスチャを用いて仮想視点映像を生成し、適切ではないと判別した場合には、フレーム間補間処理により取得されたテクスチャを用いて仮想視点映像を生成する。このため、分離処理により取得されたテクスチャ、カメラ間補間処理により取得されたテクスチャ、フレーム間補間処理により取得されたテクスチャ、の優先度順に、取得されたテクスチャを用いることができる。 In addition, in step S9 of FIG. 2, the virtual viewpoint video generation device 1 determines whether or not the texture acquired by the separation process is appropriate. If it is determined that the texture is appropriate, the virtual viewpoint video generation device 1 uses the texture to determine the virtual viewpoint. Generate video. On the other hand, if it is determined that it is not appropriate, it is determined in step S11 of FIG. 2 whether or not the texture acquired by the inter-camera interpolation processing is appropriate. The virtual viewpoint video is generated using the texture, and when it is determined that the virtual viewpoint video is not appropriate, the virtual viewpoint video is generated using the texture acquired by the inter-frame interpolation processing. For this reason, the acquired textures can be used in the order of priority of the texture acquired by the separation process, the texture acquired by the inter-camera interpolation process, and the texture acquired by the inter-frame interpolation process.

また、仮想視点映像生成装置１は、図２のステップＳ９において、分離処理により取得されたテクスチャが適正か否かを判別し、適正であると判別した場合には、このテクスチャを用いて仮想視点映像を生成する。一方、適正ではないと判別した場合には、図２のステップＳ１１において、カメラ間補間処理により取得されたテクスチャが適正か否かを判別し、適正であると判別した場合には、このテクスチャを用いて仮想視点映像を生成し、適切ではないと判別した場合には、フレーム間補間処理により取得されたテクスチャを用いて仮想視点映像を生成する。このため、ステップＳ９において適正であると判別した場合には、ステップＳ１１は行わないことになる。したがって、仮想視点映像生成装置１の処理を軽減できる場合がある。 In addition, in step S9 of FIG. 2, the virtual viewpoint video generation device 1 determines whether or not the texture acquired by the separation process is appropriate. If it is determined that the texture is appropriate, the virtual viewpoint video generation device 1 uses the texture to determine the virtual viewpoint. Generate video. On the other hand, if it is determined that it is not appropriate, it is determined in step S11 of FIG. 2 whether or not the texture acquired by the inter-camera interpolation processing is appropriate. The virtual viewpoint video is generated using the texture, and when it is determined that the virtual viewpoint video is not appropriate, the virtual viewpoint video is generated using the texture acquired by the inter-frame interpolation processing. For this reason, step S11 is not performed when it determines with it being appropriate in step S9. Therefore, there are cases where the processing of the virtual viewpoint video generation device 1 can be reduced.

また、仮想視点映像生成装置１は、図２のステップＳ９において、分離処理により取得されたテクスチャが適正か否かを判別する際に、ＨＯＧ特徴量を用いたオブジェクト検出を用いる。このため、分離処理により取得されたテクスチャと、カメラ間補間処理またはフレーム間補間処理により取得されたテクスチャと、のどちらを用いるべきかをより適切に判別することができる。したがって、仮想視点映像の視覚的品質をさらに向上できる。 The virtual viewpoint video generation device 1 uses object detection using the HOG feature amount when determining whether or not the texture acquired by the separation process is appropriate in step S9 in FIG. For this reason, it is possible to more appropriately determine which one of the texture acquired by the separation process and the texture acquired by the inter-camera interpolation process or the inter-frame interpolation process should be used. Therefore, the visual quality of the virtual viewpoint video can be further improved.

また、仮想視点映像生成装置１は、オクルージョンにより隠れている部分を含むオブジェクトについて、カメラ間補間部２０によりテクスチャが取得された視点映像でもオクルージョンが発生している場合と、カメラ間補間部２０によりテクスチャが取得された視点映像の視点と仮想視点との角度差が閾値より大きい場合とでは、カメラ間補間処理により取得されたテクスチャの代わりに、フレーム間補間処理により取得されたテクスチャが用いる。したがって、仮想視点映像の視覚的品質をより一層向上できる。 Further, the virtual viewpoint video generation device 1 uses the case where the occlusion occurs in the viewpoint video in which the texture is acquired by the inter-camera interpolation unit 20 for the object including the portion hidden by the occlusion, and the inter-camera interpolation unit 20 In the case where the angle difference between the viewpoint of the viewpoint video from which the texture is acquired and the virtual viewpoint is larger than the threshold value, the texture acquired by the inter-frame interpolation process is used instead of the texture acquired by the inter-camera interpolation process. Therefore, the visual quality of the virtual viewpoint video can be further improved.

なお、本発明の仮想視点映像生成装置１の処理を、コンピュータ読み取り可能な非一時的な記録媒体に記録し、この記録媒体に記録されたプログラムを仮想視点映像生成装置１に読み込ませ、実行することによって、本発明を実現できる。 The processing of the virtual viewpoint video generation apparatus 1 of the present invention is recorded on a computer-readable non-transitory recording medium, and the program recorded on this recording medium is read by the virtual viewpoint video generation apparatus 1 and executed. Thus, the present invention can be realized.

ここで、上述の記録媒体には、例えば、ＥＰＲＯＭやフラッシュメモリといった不揮発性のメモリ、ハードディスクといった磁気ディスク、ＣＤ−ＲＯＭなどを適用できる。また、この記録媒体に記録されたプログラムの読み込みおよび実行は、仮想視点映像生成装置１に設けられたプロセッサによって行われる。 Here, for example, a nonvolatile memory such as an EPROM or a flash memory, a magnetic disk such as a hard disk, a CD-ROM, or the like can be applied to the above-described recording medium. Further, reading and execution of the program recorded on the recording medium is performed by a processor provided in the virtual viewpoint video generation apparatus 1.

また、上述のプログラムは、このプログラムを記憶装置などに格納した仮想視点映像生成装置１から、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネットなどのネットワーク（通信網）や電話回線などの通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。 Further, the above-described program may be transmitted from the virtual viewpoint video generation device 1 storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line.

また、上述のプログラムは、上述の機能の一部を実現するためのものであってもよい。さらに、上述の機能を仮想視点映像生成装置１にすでに記録されているプログラムとの組み合せで実現できるもの、いわゆる差分ファイル（差分プログラム）であってもよい。 Further, the above-described program may be for realizing a part of the above-described function. Furthermore, what can implement | achieve the above-mentioned function in combination with the program already recorded on the virtual viewpoint image | video production | generation apparatus 1, and what is called a difference file (difference program) may be sufficient.

以上、この発明の実施形態につき、図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計なども含まれる。 The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes a design that does not depart from the gist of the present invention.

例えば、上述の実施形態では、ステップＳ８におけるフレーム間補間処理において、オクルージョンにより隠れている部分を含むオブジェクトについて、他のフレームからテクスチャを取得することとした。ここで、他のフレームとは、オクルージョンが発生しているオブジェクトを含む視点映像と、同一の視点映像におけるフレームとしてもよいし、異なる視点映像におけるフレームとしてもよい。同一の視点映像におけるフレームとした場合には、異なる視点映像におけるフレームとした場合と比べて、視点が同一であるため、仮想視点映像の視覚的品質をより向上できる。 For example, in the above-described embodiment, in the inter-frame interpolation process in step S8, a texture is acquired from another frame for an object including a portion hidden by occlusion. Here, the other frame may be a frame in the same viewpoint video as the viewpoint video including the object in which occlusion occurs, or a frame in a different viewpoint video. When the frames in the same viewpoint video are used, since the viewpoints are the same as in the case of the frames in different viewpoint videos, the visual quality of the virtual viewpoint video can be further improved.

また、上述の実施形態では、分離処理（ステップＳ６）、カメラ間補間処理（ステップＳ７）、フレーム間補間処理（ステップＳ８）の順番に各処理を行うこととした。しかし、例えば、カメラ間補間処理およびフレーム間補間処理については、図２のステップＳ１１の処理の前に行うこととしてもよい。これによれば、ステップＳ９において適正であると判別した場合には、ステップＳ１１だけでなく、カメラ間補間処理およびフレーム間補間処理も行わないことになる。したがって、仮想視点映像生成装置１の処理をさらに軽減できる場合がある。 In the above-described embodiment, each process is performed in the order of the separation process (step S6), the inter-camera interpolation process (step S7), and the inter-frame interpolation process (step S8). However, for example, the inter-camera interpolating process and the inter-frame interpolating process may be performed before the process of step S11 in FIG. According to this, when it is determined to be appropriate in step S9, not only the step S11 but also the inter-camera interpolation process and the inter-frame interpolation process are not performed. Therefore, the processing of the virtual viewpoint video generation device 1 may be further reduced.

１・・・仮想視点映像生成装置
１０・・・オブジェクト分離部
２０・・・カメラ間補間部
３０・・・フレーム間補間部
４０・・・判定部
５０・・・映像生成部
６０・・・制御部 DESCRIPTION OF SYMBOLS 1 ... Virtual viewpoint image generation apparatus 10 ... Object separation part 20 ... Interpolation part between cameras 30 ... Interpolation part between frames 40 ... Determination part 50 ... Video generation part 60 ... Control Part

Claims

A virtual viewpoint video generation device that generates a virtual viewpoint video from a plurality of viewpoint videos,
When occlusion occurs in any of the plurality of viewpoint videos, a first texture acquisition unit that separates each of a plurality of objects forming the occlusion and acquires a texture of each of the plurality of objects;
When occlusion occurs in any of the plurality of viewpoint videos, a texture of an object including a portion hidden by the occlusion is obtained from the plurality of viewpoint videos except for the viewpoint video in which the occlusion occurs. Second texture acquisition means for
When occlusion occurs in any of the plurality of viewpoint videos, third texture acquisition that acquires the texture of an object including a portion hidden by the occlusion from the frame before or after the frame in which the occlusion occurs Means,
The texture acquired by the first texture acquisition means, the texture acquired by the second texture acquisition means, and the texture acquired by the third texture acquisition means are hidden by the occlusion. Determining means for determining what is used as the texture of the object including the portion that is
A virtual viewpoint video generation apparatus comprising: video generation means for generating a video at the virtual viewpoint using the texture determined by the determination means.

The determination unit is used as a texture of an object including a portion hidden by the occlusion, the texture acquired by the first texture acquisition unit, the texture acquired by the second texture acquisition unit, the first 3. The virtual viewpoint video generation device according to claim 1, wherein the determination is performed in order of priority of the textures acquired by the texture acquisition unit of 3.

The third texture acquisition means acquires the texture of the object from a frame before or after the frame where the occlusion occurs, among a plurality of frames constituting the viewpoint video where the occlusion occurs. The virtual viewpoint video generation device according to claim 1, wherein the virtual viewpoint video generation device is provided.

The determination means includes
A first determination procedure for determining whether to use the texture acquired by the first texture acquisition means;
Performing a second determination procedure for determining which one of the texture acquired by the second texture acquisition means and the texture acquired by the third texture acquisition means is to be used. The virtual viewpoint video generation device according to any one of claims 1 to 3.

The determination unit performs object detection on the texture acquired by the first texture acquisition unit in the first determination procedure, and determines based on a result of object detection. 4. Virtual viewpoint video generation storehouse according to 4.

The virtual viewpoint video generation apparatus according to claim 5, wherein the determination unit uses a HOG feature amount in the object detection.

In the second determination procedure, the determination unit includes:
No occlusion has occurred in the viewpoint video in which the texture is acquired by the second texture acquisition unit for the object including the portion hidden by the occlusion, and the texture is acquired by the second texture acquisition unit When the angle difference between the viewpoint of the viewpoint video and the virtual viewpoint is equal to or less than a predetermined threshold, it is determined that the texture acquired by the second texture acquisition unit is used,
At least one of the case where occlusion occurs in the viewpoint image in which the texture is acquired by the second texture acquisition unit for the object including the portion hidden by the occlusion and the case where the angle difference is larger than the threshold value. In this case, it is determined that the texture acquired by the third texture acquisition unit is used. 7. The virtual viewpoint video generation apparatus according to claim 4, wherein:

In a virtual viewpoint video generation device that includes a first texture acquisition unit, a second texture acquisition unit, a third texture acquisition unit, a determination unit, and a video generation unit, and generates a video at a virtual viewpoint from a plurality of viewpoint videos A virtual viewpoint video generation method,
When the first texture acquisition unit separates each of the plurality of objects forming the occlusion when occlusion occurs in any of the plurality of viewpoint videos, the first texture acquisition unit acquires the texture of each of the plurality of objects. 1 step,
When the second texture acquisition unit causes occlusion in any of the plurality of viewpoint videos, the second texture acquisition unit hides the viewpoint video from the viewpoint videos in which the occlusion is generated out of the plurality of viewpoint videos. A second step of obtaining a texture of the object including the portion that is
When occlusion occurs in any of the plurality of viewpoint videos, the third texture acquisition unit includes an object including a portion hidden by the occlusion from a frame before or after the frame where the occlusion occurs. A third step of obtaining a texture;
The determination means is hidden by the occlusion from the texture acquired in the first step, the texture acquired in the second step, and the texture acquired in the third step. A fourth step of determining what to use as the texture of the object including the part that is present;
A virtual viewpoint video generation method comprising: a fifth step in which the video generation means generates a video at the virtual viewpoint using the texture determined in the fourth step.

In a virtual viewpoint video generation device that includes a first texture acquisition unit, a second texture acquisition unit, a third texture acquisition unit, a determination unit, and a video generation unit, and generates a video at a virtual viewpoint from a plurality of viewpoint videos A program for causing a computer to execute a virtual viewpoint video generation method,
When the first texture acquisition unit separates each of the plurality of objects forming the occlusion when occlusion occurs in any of the plurality of viewpoint videos, the first texture acquisition unit acquires the texture of each of the plurality of objects. 1 step,
When the second texture acquisition unit causes occlusion in any of the plurality of viewpoint videos, the second texture acquisition unit hides the viewpoint video from the viewpoint videos in which the occlusion is generated out of the plurality of viewpoint videos. A second step of obtaining a texture of the object including the portion that is
When occlusion occurs in any of the plurality of viewpoint videos, the third texture acquisition unit includes an object including a portion hidden by the occlusion from a frame before or after the frame where the occlusion occurs. A third step of obtaining a texture;
The determination means is hidden by the occlusion from the texture acquired in the first step, the texture acquired in the second step, and the texture acquired in the third step. A fourth step of determining what to use as the texture of the object including the part that is present;
A program for causing the computer to execute a fifth step in which the video generation means generates a video at the virtual viewpoint using the texture determined in the fourth step.