JP6618117B2

JP6618117B2 - Billboard generating apparatus, method, and program for generating billboard of object

Info

Publication number: JP6618117B2
Application number: JP2016153935A
Authority: JP
Inventors: 敬介野中
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2016-08-04
Filing date: 2016-08-04
Publication date: 2019-12-11
Anticipated expiration: 2036-08-04
Also published as: JP2018022387A

Description

本発明は、自由視点映像の生成に使用するオブジェクトのビルボードの生成技術に関する。 The present invention relates to an object billboard generation technique used for generating a free viewpoint video.

スポーツなどを対象として、カメラで撮影されていない視点からの映像（以下、自由視点映像）を生成する技術が提案されている。具体的には、複数のカメラのそれぞれで映像を撮影し、複数のカメラで撮影した複数の映像に基づきユーザが指定した視点での映像を合成するというものである。ここで、非特許文献１は、ビルボードと呼ばれる簡易モデルを利用して高速に映像を合成することを開示している（以下、ビルボード方式と呼ぶ）。なお、ビルボードとは、カメラで撮影した映像のオブジェクトを厚みのないモデルとしたものである。ここで、オブジェクトとは、人などの映像内に含まれる移動物である。ビルボード方式では、ビルボードの最下点が地面（以下、フィールドとも呼ぶ）に接するものとし、カメラで撮影した視点（以下、基準視点と呼ぶ）から水平方向に移動した視点が指定されると、指定された視点（以下、指定視点と呼ぶ）と基準視点との水平方向の距離に基づきビルボードを回転させて映像を生成している。一方、指定視点が基準視点に対して垂直方向に移動したものであると、基準視点でのビルボードをそのまま使用して映像を生成している。 A technique for generating a video from a viewpoint not captured by a camera (hereinafter referred to as a free viewpoint video) has been proposed for sports and the like. Specifically, video is captured by each of a plurality of cameras, and video at a viewpoint designated by the user is synthesized based on the plurality of videos captured by the plurality of cameras. Here, Non-Patent Document 1 discloses synthesizing video at high speed using a simple model called billboard (hereinafter referred to as billboard method). Note that the billboard is a model in which an object of a video photographed by a camera has no thickness. Here, the object is a moving object included in a video such as a person. In the billboard method, the lowest point of the billboard is in contact with the ground (hereinafter also referred to as the field), and a viewpoint moved in the horizontal direction from the viewpoint photographed by the camera (hereinafter referred to as the reference viewpoint) is specified. The billboard is rotated based on the horizontal distance between the designated viewpoint (hereinafter referred to as the designated viewpoint) and the reference viewpoint to generate an image. On the other hand, if the designated viewpoint is moved in the direction perpendicular to the reference viewpoint, an image is generated using the billboard at the reference viewpoint as it is.

Ｈａｙａｓｈｉ，Ｋ．ｅｔａｌ，"ＳｙｎｔｈｅｓｉｚｉｎｇＦｒｅｅ−ＶｉｅｗｐｏｉｎｇＩｍａｇｅｓｆｒｏｍＭｕｌｔｉｐｌｅＶｉｅｗＶｉｄｅｏｓｉｎＳｏｃｃｅｒＳｔａｄｉｕｍ" ｉｎＣｏｍｐｕｔｅｒＧｒａｐｈｉｃｓ，ＩｍａｇｉｎｇａｎｄＶｉｓｕａｌｉｓａｔｉｏｎ，ｐｐ. ２２０−２２５，２００６年７月Hayashi, K .; et al, "Synthesizing Free-Viewpouring Images from Multiple View Video in Soccer Stadium, in Computer Graphics, Imaging and Visualization-2, May 200, pp. 220. ＴａｋｅｏＩｇａｒａｓｈｉ，ｅｔａｌ，"Ａｓ−Ｒｉｇｉｄ−Ａｓ−ＰｏｓｓｉｂｌｅＳｈａｐｅＭａｎｉｐｕｌａｔｉｏｎ"，ＡＣＭＴｒａｎｓａｃｔｉｏｎｓｏｎＣｏｍｐｕｔｅｒＧｒａｐｈｉｃｓ，Ｖｏｌ.２４，Ｎｏ.３，ＡＣＭＳＩＧＧＲＡＰＨ２００５，２００５年Takeo Igarashi, et al, "As-Rigid-As-Posible Shape Manipulation", ACM Transactions on Computer Graphics, Vol. 24, No. 3, ACM SIGGRAPH 2005, 2005.

ビルボード方式は、高速に映像を合成可能であるが、厚みのないテクスチャだけのモデルに基づき映像を合成することによる問題が生じる。例えば、図１（Ａ）は、基準視点、つまり、カメラで撮影した映像であり、図１（Ｂ）は、図１（Ａ）の画像から、当該画像内の人物に対応するビルボードを生成し、このビルボードを利用して合成した指定視点での画像である。なお、図１（Ｂ）の指定視点は、基準視点に対して垂直下方向の視点である。図１（Ｂ）の画像では、人があたかも手のみで支えられ、足が地面から浮いているように見える。これは、ビルボード方式では、オブジェクト（図１では人物）の奥行を考慮しないことと、オブジェクトと地面の接触位置を１ヶ所としていることが原因である。 The billboard method is capable of synthesizing video at high speed, but there is a problem caused by synthesizing video based on a model having only a thin texture. For example, FIG. 1A is a reference viewpoint, that is, an image taken by a camera, and FIG. 1B generates a billboard corresponding to a person in the image from the image of FIG. However, this is an image at a specified viewpoint synthesized using this billboard. Note that the designated viewpoint in FIG. 1B is a viewpoint vertically downward with respect to the reference viewpoint. In the image of FIG. 1 (B), it seems as if a person is supported only by hands and the feet are floating from the ground. This is because the billboard system does not consider the depth of an object (a person in FIG. 1) and has a single contact position between the object and the ground.

本発明は、ユーザが指定した指定視点での映像をより自然な形で生成するためのビルボードを生成するビルボード生成装置、方法及びプログラムを提供するものである。 The present invention provides a billboard generating apparatus, method, and program for generating a billboard for generating a video at a specified viewpoint designated by a user in a more natural form.

本発明の一側面によると、ユーザが指定した指定視点での映像の生成に使用する、オブジェクトのビルボードを生成するビルボード生成装置は、前記オブジェクトの３次元モデルを示すモデル情報を保持する保持手段と、カメラで撮影したフレームの前記オブジェクトを判定し、前記判定したオブジェクトを示すオブジェクト画像を生成する第１生成手段と、前記カメラに対応する視点を基準視点とし、当該基準視点と同じ水平方向位置であり、かつ、前記指定視点と垂直方向位置が同じである視点を第１視点とし、当該第１視点から見た前記オブジェクトを示す参照画像を前記３次元モデルに基づき生成する第２生成手段と、前記オブジェクト画像と前記参照画像に基づき前記オブジェクト画像が示す前記オブジェクトを変形して前記第１視点での前記オブジェクトのビルボードを生成する第３生成手段と、前記第１視点と前記指定視点との水平方向の距離に基づき前記第１視点での前記オブジェクトのビルボードを水平方向に回転させて前記指定視点での映像の生成に使用する前記オブジェクトのビルボードを生成する第４生成手段と、を備えていることを特徴とする。 According to one aspect of the present invention, a billboard generating apparatus for generating a billboard of an object used for generating a video at a specified viewpoint specified by a user holds model information indicating a three-dimensional model of the object. Means for determining the object of the frame photographed by the camera, and generating an object image indicating the determined object; a viewpoint corresponding to the camera as a reference viewpoint, and the same horizontal direction as the reference viewpoint A second generation unit that generates a reference image indicating the object viewed from the first viewpoint based on the three-dimensional model, with a viewpoint that is a position and has the same vertical position as the specified viewpoint as a first viewpoint And deforming the object indicated by the object image based on the object image and the reference image, Third generation means for generating a billboard of the object at a point, and horizontally rotating the billboard of the object at the first viewpoint based on a horizontal distance between the first viewpoint and the designated viewpoint. And a fourth generation means for generating a billboard of the object used for generating the video at the designated viewpoint.

本発明によると、ユーザが指定した指定視点での映像をより自然な形で生成するためのビルボードを生成することができる。 According to the present invention, it is possible to generate a billboard for generating a video at a specified viewpoint designated by a user in a more natural form.

本発明の課題の説明図。Explanatory drawing of the subject of this invention. 一実施形態による自由視点映像生成のための映像の撮影形態を示す図。The figure which shows the imaging | photography form of the image | video for free viewpoint image | video production | generation by one Embodiment. 一実施形態による自由視点映像生成装置の構成図。The block diagram of the free viewpoint image | video production | generation apparatus by one Embodiment. 一実施形態によるビルボード生成部の構成図。The block diagram of the billboard production | generation part by one Embodiment. 一実施形態によるオブジェクトの小領域への分割を示す図。The figure which shows the division | segmentation into the small area | region of the object by one Embodiment. 一実施形態による小領域に対応する参照画像の領域判定の説明図。Explanatory drawing of the area | region determination of the reference image corresponding to the small area by one Embodiment. 一実施形態による小領域の変換後の座標判定の説明図。Explanatory drawing of the coordinate determination after the conversion of the small area by one Embodiment. 一実施形態による小領域の変換後の座標判定の説明図。Explanatory drawing of the coordinate determination after the conversion of the small area by one Embodiment. 一実施形態による小領域の変換後の座標判定の説明図。Explanatory drawing of the coordinate determination after the conversion of the small area by one Embodiment.

以下、本発明の例示的な実施形態について図面を参照して説明する。なお、以下の実施形態は例示であり、本発明を実施形態の内容に限定するものではない。また、以下の各図においては、実施形態の説明に必要ではない構成要素については図から省略する。 Hereinafter, exemplary embodiments of the present invention will be described with reference to the drawings. In addition, the following embodiment is an illustration and does not limit this invention to the content of embodiment. In the following drawings, components that are not necessary for describing the embodiment are omitted from the drawings.

図２は、自由視点映像生成のための映像の撮影例を示している。図２において、各矢印はカメラを示している。図２の例においては、複数のカメラ９１〜９８をオブジェクトの周囲に配置し、複数のカメラ９１〜９８それぞれで映像を撮影している。なお、カメラの台数は設計事項であり、カメラの配置も、図２に示す円状に限定されず、オブジェクトを複数の位置から撮影すれば良い。なお、各カメラ９１〜９８は映像を時刻同期で撮影する様に調整される。つまり、各カメラ９１〜９８が撮影する映像の各フレームの撮影時刻はほぼ一致する様に調整される。なお、各カメラ９１〜９８のキャリブレーションは公知の方法で行われ、各カメラ９１〜９８のカメラパラメータは各フレームにおいて既知であるものとする。 FIG. 2 shows an example of shooting a video for generating a free viewpoint video. In FIG. 2, each arrow indicates a camera. In the example of FIG. 2, a plurality of cameras 91 to 98 are arranged around the object, and images are taken by each of the plurality of cameras 91 to 98. Note that the number of cameras is a design matter, and the arrangement of the cameras is not limited to the circular shape shown in FIG. 2, and the object may be photographed from a plurality of positions. Each of the cameras 91 to 98 is adjusted so as to capture the video in time synchronization. That is, the shooting times of the frames of the images shot by the cameras 91 to 98 are adjusted so as to substantially match. Note that the calibration of each camera 91 to 98 is performed by a known method, and the camera parameters of each camera 91 to 98 are known in each frame.

図３は、本実施形態による自由視点映像生成装置の構成図である。なお、以下では、図２の様に撮影した映像において、オブジェクトが１つ、例えば、オブジェクトが人物である場合には、１人の人物のみが映像内に写っているものとして説明を行う。なお、オブジェクトが複数ある場合には、各オブジェクトについて以下に説明する処理を行う。 FIG. 3 is a configuration diagram of the free viewpoint video generation apparatus according to the present embodiment. In the following description, in the video imaged as shown in FIG. 2, when there is one object, for example, when the object is a person, only one person is shown in the video image. When there are a plurality of objects, the processing described below is performed for each object.

背景差分部１には、複数のカメラ９１〜９８それぞれで撮影した映像を示す映像データが入力される。背景差分部１は、公知の背景差分法により各映像データの各フレームについて、オブジェクトの領域を判定してオブジェクト画像データを出力する。例えば、背景差分部１は、背景差分により背景と前景を分離し、連続している前景の画素に対応する領域を１つのオブジェクトの領域と判定する。オブジェクト画像データは、各映像の各フレームにそれぞれに対して生成され、各画素について、オブジェクトに対応するか、オブジェクト以外の背景に対応するかを示す２値画像データである。 The background difference unit 1 receives video data indicating videos taken by the plurality of cameras 91 to 98. The background difference unit 1 determines an object region for each frame of each video data by a known background difference method, and outputs object image data. For example, the background difference unit 1 separates the background and the foreground based on the background difference, and determines the area corresponding to the continuous foreground pixels as the area of one object. The object image data is generated for each frame of each video, and is binary image data indicating whether each pixel corresponds to an object or a background other than the object.

モデル構築部２は、複数の映像の内、同じ時刻に撮影された各フレームの同じオブジェクトを示すオブジェクト画像データと、各カメラ９１〜９８のカメラパラメータに基づき、例えば、公知の視体積交差法を使用してオブジェクトの３次元モデル（３Ｄモデル）を構築し、構築した３Ｄモデルを示すモデルデータ（モデル情報）を出力する。モデルデータは、例えば、オブジェクトが占有する空間上の位置をボクセルで示すものである。なお、モデルデータは、各フレームそれぞれに対して生成されるが、あるフレームについては、カメラの台数に拘らず１つのモデルデータとなる。このモデルデータは、モデル構築部２に保持される。 Based on the object image data indicating the same object of each frame captured at the same time and the camera parameters of the cameras 91 to 98 among the plurality of videos, the model construction unit 2 performs, for example, a known visual volume intersection method. Using this, a three-dimensional model (3D model) of the object is constructed, and model data (model information) indicating the constructed 3D model is output. The model data indicates, for example, the position in the space occupied by the object by voxels. The model data is generated for each frame, but a certain frame is one model data regardless of the number of cameras. This model data is held in the model construction unit 2.

参照画像生成部３には、ユーザが指定する視点（指定視点）を示す視点情報と、モデル構築部２からのモデルデータが入力される。参照画像生成部３は、オブジェクトのモデルデータが示す３Ｄモデルに基づき、指定視点から当該オブジェクトがどの様に見えるかを計算して参照画像データを出力する。参照画像データは、指定視点から見た画像において、各画素が、オブジェクトに対応するか、オブジェクト以外の背景に対応するかを示す２値画像データである。 The reference image generation unit 3 receives viewpoint information indicating a viewpoint designated by the user (designated viewpoint) and model data from the model construction unit 2. The reference image generation unit 3 calculates how the object looks from the designated viewpoint based on the 3D model indicated by the model data of the object, and outputs the reference image data. The reference image data is binary image data indicating whether each pixel corresponds to an object or a background other than the object in an image viewed from a designated viewpoint.

ビルボード生成部４は、背景差分部１からのオブジェクト画像データと、参照画像生成部３からの参照画像データに基づき、オブジェクトのビルボードを示すビルボードデータを生成する。ビルボードデータは、各オブジェクトについてフレーム毎に生成される。以下、ビルボード生成部４での処理の詳細について説明する。まず、以下では、指定視点が、カメラ９１の基準視点に対して垂直方向のみに動かした位置であるものとする。 The billboard generation unit 4 generates billboard data indicating the billboard of the object based on the object image data from the background difference unit 1 and the reference image data from the reference image generation unit 3. Billboard data is generated for each frame for each object. The details of the processing in the billboard generation unit 4 will be described below. First, in the following, it is assumed that the designated viewpoint is a position moved only in the vertical direction with respect to the reference viewpoint of the camera 91.

図４は、ビルボード生成部４の構成図である。エッジ領域判定部４１は、カメラ９１が撮影したフレームからのオブジェクト画像データに基づきオブジェクトに外接する最小の矩形領域５１を判定する。図５は、カメラ９１が撮影したフレームから生成されたオブジェクト画像データのオブジェクト５０を示し、矩形領域５１は、オブジェクト５０に外接する最小の矩形領域である。さらに、エッジ領域判定部４１は、矩形領域５１を所定サイズの小領域に分割する。なお、小領域のサイズ（垂直方向及び水平方向の画素数）は予め決めておく。なお、本実施形態において、小領域は四角形とするが他の多角形であっても良い。図５の例では、１５個の小領域に分割している。続いて、エッジ領域判定部４１は、エッジ、つまり、オブジェクトと背景の境界を含む小領域の内、所定条件を満たす小領域を判定する。本実施形態では、この所定条件を、小領域内の全画素数に対する、小領域内のオブジェクトに対応する画素数の比が所定範囲内であることとする。なお、所定範囲の最小値は０より大きく、所定範囲の最大値は１より小さいものとする。一例として、求めた比が０.３〜０．７の範囲内にある小領域を、エッジを含み、かつ、所定条件を満たす小領域とする。背景差分法により抽出したオブジェクトが理想的であるとすると、この比が０及び１でなければエッジを含む小領域である。しかしながら、背景差分法により抽出したオブジェクトにはノイズが載ることと、大部分がオブジェクトである小領域や、大部分が背景である小領域については、以下の処理には不適切であるため本実施形態では比が所定範囲内であるとの条件を設け、ある程度の大きさのエッジを含む小領域を判定する。 FIG. 4 is a configuration diagram of the billboard generation unit 4. The edge region determination unit 41 determines the smallest rectangular region 51 that circumscribes the object based on the object image data from the frame captured by the camera 91. FIG. 5 shows the object 50 of the object image data generated from the frame captured by the camera 91, and the rectangular area 51 is the smallest rectangular area that circumscribes the object 50. Further, the edge area determination unit 41 divides the rectangular area 51 into small areas of a predetermined size. Note that the size of the small area (the number of pixels in the vertical and horizontal directions) is determined in advance. In the present embodiment, the small area is a square, but may be another polygon. In the example of FIG. 5, it is divided into 15 small regions. Subsequently, the edge area determination unit 41 determines an edge, that is, a small area satisfying a predetermined condition among the small areas including the boundary between the object and the background. In the present embodiment, the predetermined condition is that the ratio of the number of pixels corresponding to the object in the small area to the total number of pixels in the small area is within the predetermined range. Note that the minimum value of the predetermined range is larger than 0, and the maximum value of the predetermined range is smaller than 1. As an example, a small region in which the obtained ratio is within a range of 0.3 to 0.7 is a small region that includes an edge and satisfies a predetermined condition. If the object extracted by the background subtraction method is ideal, if this ratio is not 0 or 1, it is a small region including an edge. However, the object extracted by the background subtraction method is noisy, and small areas that are mostly objects and small areas that are mostly backgrounds are inappropriate for the following processing. In the embodiment, a condition that the ratio is within a predetermined range is provided, and a small region including an edge having a certain size is determined.

対応小領域判定部４２は、参照画像データが示す画像について、エッジ領域判定部４１が判定した各小領域に対応する小領域を判定する。例えば、エッジ領域判定部４１が判定した小領域の１つが図６（Ａ）に示すものであったとする。図６（Ａ）において各四角は画素であり、網掛部分は、オブジェクトの画素を示し、白塗りの画素は背景の画素を示している。対応小領域判定部４２は、まず、参照画像データが示す画像（以下、参照画像）の、図６（Ａ）に示す小領域と同じ位置の領域を判定する。図６（Ｂ）は、参照画像であり、図６（Ａ）と同様に、網掛部分は、オブジェクトの画素を示し、白塗りの画素は背景の画素を示している。また、図６（Ｂ）において、太枠の四角部分の領域が、図６（Ａ）の小領域と同じ位置の領域を示している。オブジェク画像データが示す画像（以下、オブジェクト画像）は基準視点でのオブジェクトを示し、参照画像は指定視点でのオブジェクトを示しているため、オブジェクト画像が示すオブジェクトと参照画像が示すオブジェクトは同じではない。よって、対応小領域判定部４２は、図６（Ｂ）の太枠の位置を基準に、図６（Ａ）の小領域の画像との差が最も少ない位置を探索する。図６（Ｃ）は、図６（Ｂ）に対して太枠の位置を水平方向及び垂直方向それぞれに１画素だけ移動させた状態を示し、この状態において太枠内の画像は、図６（Ａ）の画像と一致している。なお、探索範囲、つまり、水平方向及び垂直方向それぞれにおいて探索する画素数は予め決めておく。或いは、基準視点と指定視点との垂直方向の距離により探索範囲を決定する構成とすることもできる。なお、図６（Ｃ）において、太枠内の画像は図６（Ａ）の画像と完全に一致しているが、一般的には探索範囲内において、図６（Ａ）に示す小領域の画像との差が最も小さくなる領域を、図６（Ａ）に示す小領域に対応する参照画像の領域とする。なお、画像の差は、対応する画素の値が異なる数により判定する。対応小領域判定部４２は、この様にしてエッジ領域判定部４が判定した、オブジェクト画像の各小領域に対応する参照画像の小領域を決定する。 The corresponding small area determination unit 42 determines a small area corresponding to each small area determined by the edge area determination unit 41 for the image indicated by the reference image data. For example, it is assumed that one of the small areas determined by the edge area determination unit 41 is as shown in FIG. In FIG. 6A, each square is a pixel, a shaded portion indicates a pixel of an object, and a white pixel indicates a background pixel. The corresponding small area determination unit 42 first determines an area at the same position as the small area shown in FIG. 6A of an image indicated by the reference image data (hereinafter referred to as a reference image). FIG. 6B shows a reference image. As in FIG. 6A, shaded portions indicate object pixels, and white pixels indicate background pixels. Further, in FIG. 6B, a square area of the thick frame indicates an area at the same position as the small area of FIG. The image indicated by the object image data (hereinafter referred to as the object image) indicates the object at the standard viewpoint, and the reference image indicates the object at the specified viewpoint. Therefore, the object indicated by the object image and the object indicated by the reference image are not the same. . Therefore, the corresponding small area determination unit 42 searches for a position having the smallest difference from the small area image in FIG. 6A on the basis of the position of the thick frame in FIG. FIG. 6C shows a state in which the position of the thick frame is moved by one pixel in each of the horizontal direction and the vertical direction with respect to FIG. 6B. In this state, an image in the thick frame is shown in FIG. It matches the image of A). The search range, that is, the number of pixels to be searched in each of the horizontal direction and the vertical direction is determined in advance. Alternatively, the search range may be determined by the vertical distance between the reference viewpoint and the designated viewpoint. In FIG. 6C, the image in the thick frame completely matches the image in FIG. 6A. In general, however, the small area shown in FIG. The area where the difference from the image is the smallest is the area of the reference image corresponding to the small area shown in FIG. Note that the difference between images is determined by the number of different values of corresponding pixels. The corresponding small area determination unit 42 determines the small area of the reference image corresponding to each small area of the object image determined by the edge area determination unit 4 in this way.

図７（Ａ）は、エッジ領域判定部４１が判定した小領域６１〜６４を示し、図７（Ｂ）は、対応小領域判定部４２が小領域６１〜６４に対応すると判定した参照画像の小領域７１〜７４を示している。参照画像は、３Ｄモデルに基づき生成した指定視点でのオブジェクトを示している。一方、オブジェクト画像データが示すオブジェクトは、カメラ９１の基準視点から見たオブジェクトを示している。したがって、本実施形態では、指定視点でのビルボードを生成するため、小領域６１〜６４を、参照画像の小領域７１〜７４に基づき変形させる。このため、変換後座標判定部４３は、小領域６１〜６４の計１０個の頂点Ａ１〜Ｊ１の変換後の座標を求める。なお、図７（Ａ）に示す様に頂点Ｃ１は小領域６１及び６２に共有されており、頂点Ｄ１は小領域６１〜６３に共有されており、頂点Ｇ１は小領域６２〜６４に共有されており、頂点Ｈ１は小領域６３及び６４に共有されている。このため、本実施形態では、この共有関係を維持することを拘束条件として小領域６１〜６４の各頂点の変換後の座標を求める。以下では、頂点Ａ１〜Ｊ１の変換後の頂点を頂点Ａ３〜Ｊ３とする。 7A shows the small areas 61 to 64 determined by the edge area determination unit 41, and FIG. 7B shows the reference image that the corresponding small area determination unit 42 determines to correspond to the small areas 61 to 64. Small regions 71 to 74 are shown. The reference image shows the object at the designated viewpoint generated based on the 3D model. On the other hand, the object indicated by the object image data indicates an object viewed from the reference viewpoint of the camera 91. Therefore, in this embodiment, in order to generate a billboard at a specified viewpoint, the small areas 61 to 64 are deformed based on the small areas 71 to 74 of the reference image. For this reason, the post-conversion coordinate determination unit 43 obtains the post-conversion coordinates of a total of ten vertices A1 to J1 of the small regions 61 to 64. As shown in FIG. 7A, the vertex C1 is shared by the small regions 61 and 62, the vertex D1 is shared by the small regions 61-63, and the vertex G1 is shared by the small regions 62-64. The vertex H1 is shared by the small areas 63 and 64. For this reason, in this embodiment, the coordinate after conversion of each vertex of the small area | regions 61-64 is calculated | required on condition that this shared relationship is maintained. In the following, vertices after conversion of the vertices A1 to J1 are assumed to be vertices A3 to J3.

図７（Ａ）及び（Ｂ）から明らかな様に、頂点Ａ１は頂点Ａ２に対応し、頂点Ｂ１は頂点Ｂ２に対応し、頂点Ｃ１は頂点Ｃ２１及びＣ２２に対応し、頂点Ｄ１は頂点Ｄ２１〜Ｄ２３に対応し、頂点Ｅ１は頂点Ｅ２に対応し、頂点Ｆ１は頂点Ｆ２に対応し、頂点Ｇ１は頂点Ｇ２１〜Ｇ２３に対応し、頂点Ｈ１は頂点Ｈ２１及びＨ２２に対応し、頂点Ｉ１は頂点Ｉ２に対応し、頂点Ｊ１は頂点Ｊ２に対応する。本実施形態において、変換後座標判定部４３は、変換後の頂点Ａ３〜Ｊ３それぞれと、対応する参照画像の頂点との距離の総和が最小となる様に頂点Ａ３〜Ｊ３の座標を求める。具体的には、頂点Ａ３と頂点Ａ２の距離と、頂点Ｂ３と頂点Ｂ２の距離と、頂点Ｃ３と頂点Ｃ２１及びＣ２２それぞれの距離と、頂点Ｄ３と頂点Ｄ２１〜Ｄ２３それぞれの距離と、頂点Ｅ３と頂点Ｅ２の距離と、頂点Ｆ３と頂点Ｆ２の距離と、頂点Ｇ３と頂点Ｇ２１〜Ｇ２３それぞれの距離と、頂点Ｈ３と頂点Ｈ２１及びＨ２２それぞれの距離と、頂点Ｉ３と頂点Ｉ２の距離と、頂点Ｊ３と頂点Ｊ２の距離との和が最小となる様に、頂点Ａ３〜Ｊ３の座標を求める。図８に頂点Ａ３〜Ｊ３の例を示し、図９は、小領域６１〜６４の変形後の小領域８１〜８４を実線で示している。 As is clear from FIGS. 7A and 7B, the vertex A1 corresponds to the vertex A2, the vertex B1 corresponds to the vertex B2, the vertex C1 corresponds to the vertices C21 and C22, and the vertex D1 corresponds to the vertices D21 to D21. D23, vertex E1 corresponds to vertex E2, vertex F1 corresponds to vertex F2, vertex G1 corresponds to vertices G21 to G23, vertex H1 corresponds to vertices H21 and H22, and vertex I1 corresponds to vertex I2. And vertex J1 corresponds to vertex J2. In the present embodiment, the post-conversion coordinate determination unit 43 obtains the coordinates of the vertices A3 to J3 so that the sum of the distances between the vertices A3 to J3 after conversion and the vertices of the corresponding reference image is minimized. Specifically, the distance between the vertex A3 and the vertex A2, the distance between the vertex B3 and the vertex B2, the distance between the vertex C3 and each of the vertices C21 and C22, the distance between the vertex D3 and each of the vertices D21 to D23, and the vertex E3 The distance between the vertex E2, the distance between the vertex F3 and the vertex F2, the distance between the vertex G3 and each of the vertices G21 to G23, the distance between the vertex H3 and each of the vertices H21 and H22, the distance between the vertex I3 and the vertex I2, and the vertex J3 And the coordinates of the vertices A3 to J3 are obtained so that the sum of the distance between the vertex A2 and the vertex J2 is minimized. FIG. 8 shows an example of the vertices A3 to J3, and FIG. 9 shows the small areas 81 to 84 after deformation of the small areas 61 to 64 by solid lines.

エッジ領域判定部４１により選択されなかった小領域については、図９に示すエッジ領域判定部４１により選択された小領域の変換後の領域との隣接関係に基づき非特許文献２に記載されている様に、最適化問題を解くことで算出する。この様に、エッジ領域判定部４１は、最終的に、オブジェクトを含むすべての小領域について、変換後の小領域を決定する。ビルボード変換部４４は、図５のオブジェクト５０を、変換後座標判定部４３が判定した変換後の小領域に基づき変換して変換後のオブジェクト５０を示すオブジェクト画像を生成する。そして、変換後のオブジェクト５０に対応する各画素の色値を、変換前のオブジェクト画像の元となったフレームから判定し、これにより、オブジェクトのビルボードデータを生成して出力する。 The small areas not selected by the edge area determination unit 41 are described in Non-Patent Document 2 based on the adjacent relationship with the converted area of the small area selected by the edge area determination unit 41 shown in FIG. Similarly, it is calculated by solving the optimization problem. In this manner, the edge region determination unit 41 finally determines the converted small region for all the small regions including the object. The billboard conversion unit 44 converts the object 50 of FIG. 5 based on the converted small area determined by the converted coordinate determination unit 43 and generates an object image indicating the converted object 50. Then, the color value of each pixel corresponding to the converted object 50 is determined from the frame from which the object image before conversion is generated, thereby generating and outputting billboard data of the object.

図３に戻り、自由視点映像生成部５は、ユーザが入力する視点情報と、ビルボード生成部４が出力する、カメラ９１の映像から抽出したオブジェクトを参照画像に基づき変換して生成したビルボードデータに基づき、公知のビルボード方式に基づき指定視点での映像を合成・生成して出力する。従来のビルボード方式では、指定視点がカメラ９１の基準視点から垂直方向に移動させたものである場合、カメラ９１の映像から抽出したオブジェクトに基づくビルボードをそのまま使用して自由視点映像を生成していた。しかしながら、本実施形態では、３Ｄモデルを作成し、３Ｄモデルに基づき指定視点から見たオブジェクトを生成して参照画像とし、参照画像に基づきカメラ９１の映像から抽出したオブジェクトを変換し、変換後のオブジェクトに基づきビルボードを生成する。そして、自由視点映像生成部５は、カメラ９１の映像から抽出したオブジェクトに基づくビルボードではなく、参照画像に基づき変換されたビルボードを使用して自由視点映像を生成する。したがって、オブジェクトの奥行等により生じる不具合を抑えつつ、ビルボード方式の利点を生かした負荷の軽い自由視点映像の生成が可能になる。 Returning to FIG. 3, the free viewpoint video generation unit 5 generates the billboard generated by converting the viewpoint information input by the user and the object extracted from the video of the camera 91 output by the billboard generation unit 4 based on the reference image. Based on the data, it synthesizes and generates a video from a specified viewpoint based on a known billboard method and outputs it. In the conventional billboard method, when the designated viewpoint is moved in the vertical direction from the reference viewpoint of the camera 91, a free viewpoint image is generated using the billboard based on the object extracted from the image of the camera 91 as it is. It was. However, in the present embodiment, a 3D model is created, an object viewed from a specified viewpoint is generated based on the 3D model and used as a reference image, an object extracted from the video of the camera 91 is converted based on the reference image, Generate a billboard based on the object. Then, the free viewpoint video generation unit 5 generates a free viewpoint video using the billboard converted based on the reference image, not the billboard based on the object extracted from the video of the camera 91. Therefore, it is possible to generate a free viewpoint video with a light load by taking advantage of the billboard method while suppressing problems caused by the depth of the object.

上記説明においては、指定視点がカメラ９１の基準視点に対して垂直方向に移動させたものとしていた。以下では、指定視点がその他の位置である場合について説明する。まず、指定視点がカメラ９１〜９８のいずれかの基準視点と同じ場合には、その基準視点に対応するカメラからの映像をそのまま使用すれば良い。また、指定視点の垂直方向の位置がカメラ９１〜９８の基準視点と同じ位置であり、かつ、水平方向の位置が各カメラの基準視点間の位置である場合、従来のビルボード方式をそのまま使用する。つまり、ビルボードは、一番近い基準視点のカメラの映像から生成したビルボードを、基準視点と指定視点との水平方向の距離に基づき回転させて使用する。一方、指定視点の垂直方向の位置がカメラ９１〜９８の基準視点と異なる位置であり、かつ、水平方向の位置が各カメラの基準視点間の位置である場合、まず、指定視点に近い一番近い基準視点のカメラを選択する。そして、指定視点を水平移動させて、選択したカメラの基準視点と水平方向が同じ位置になる仮指定視点を判定する。つまり、基準視点と水平方向位置が同じであり、かつ、指定視点と垂直方向位置が同じである位置を仮指定視点とする。そして、仮指定視点から見たオブジェクトを示す参照画像を３Ｄモデルに基づき生成し、選択したカメラの映像から抽出したオブジェクト画像を参照画像に基づき変換してビルボードを生成する。そして、変換後のビルボードを仮指定視点と指定視点との水平方向の距離に基づき回転させて自由視点映像を生成する。 In the above description, it is assumed that the designated viewpoint is moved in the vertical direction with respect to the reference viewpoint of the camera 91. Hereinafter, a case where the designated viewpoint is at another position will be described. First, when the designated viewpoint is the same as any of the reference viewpoints of the cameras 91 to 98, the video from the camera corresponding to the reference viewpoint may be used as it is. In addition, when the vertical position of the designated viewpoint is the same position as the reference viewpoint of the cameras 91 to 98 and the horizontal position is a position between the reference viewpoints of the cameras, the conventional billboard method is used as it is. To do. That is, the billboard is used by rotating the billboard generated from the image of the camera of the nearest reference viewpoint based on the horizontal distance between the reference viewpoint and the designated viewpoint. On the other hand, when the vertical position of the designated viewpoint is different from the reference viewpoints of the cameras 91 to 98 and the horizontal position is a position between the reference viewpoints of the cameras, first, the position closest to the designated viewpoint is the first. Select a camera with a close reference viewpoint. Then, the designated viewpoint is horizontally moved to determine a temporarily designated viewpoint in which the horizontal direction is the same as the reference viewpoint of the selected camera. In other words, a position where the horizontal position is the same as the reference viewpoint and the position where the specified viewpoint and the vertical position are the same is set as the temporary specified viewpoint. Then, a reference image indicating the object viewed from the temporarily designated viewpoint is generated based on the 3D model, and a billboard is generated by converting the object image extracted from the video of the selected camera based on the reference image. Then, the converted billboard is rotated based on the horizontal distance between the temporary designated viewpoint and the designated viewpoint to generate a free viewpoint video.

なお、上記実施形態において変換後座標判定部４３は、参照画像の頂点と、対応する変換後の頂点との距離の総和が最小となる様に変換後の頂点を求めていた。しかしながら、距離の二乗の総和や座標の差の絶対値の総和が最小となる様に変換後の頂点を求めるなど他の基準に基づき変換後の頂点を求める構成であっても良い。また、対応小領域判定部４２は、エッジ領域判定部４１が判定した小領域とのパターン・マッチングにより参照画像において、当該小領域に対応する小領域を判定していた。しかしながら、３Ｄモデルにおいて、各カメラの映像に基づき各ボクセルの色を判定し、参照画像として二値画像ではなくカラー画像を生成し、オブジェクト画像についても、元の映像からカラー画像として色値の比較により参照画像における対応する小領域を判定する構成とすることもできる。 In the above embodiment, the post-conversion coordinate determination unit 43 obtains the post-conversion vertex so that the sum of the distances between the vertex of the reference image and the corresponding post-conversion vertex is minimized. However, the configuration may be such that the converted vertex is obtained based on other criteria such as obtaining the converted vertex so that the sum of the squares of the distance and the absolute value of the difference in coordinates are minimized. Further, the corresponding small region determination unit 42 determines a small region corresponding to the small region in the reference image by pattern matching with the small region determined by the edge region determination unit 41. However, in the 3D model, the color of each voxel is determined based on the video of each camera, a color image is generated instead of a binary image as a reference image, and the color value of the object image is also compared as a color image from the original video Thus, a configuration can be adopted in which a corresponding small region in the reference image is determined.

また、上記実施形態では、指定視点での参照画像を生成してオブジェクト画像が示すオブジェクトを１回の処理で変換していた。しかしながら複数回の処理によりオブジェクト画像が示すオブジェクトを変換する構成とすることもできる。具体的には、指定視点が基準視点の下方に距離１０だけ移動させた位置であるものとする。この場合、まず、基準視点から距離１だけ下に移動させた仮指定視点に基づき上記処理を行って変換後のオブジェクトを示すオブジェクト画像を生成する（１回目の処理）。そして、続いて、１回目の処理での仮指定視点を更に距離１だけ下に移動させて上記処理を行い、１回目の処理での変換後のオブジェクトを更に変換して変換後のオブジェクトを示すオブジェクト画像を生成する（２回目の処理）。この処理を計１０回行うことで、指定視点での変換後のオブジェクトを生成する構成とすることができる。例えば、指定視点と基準視点との垂直方向の距離が大きい場合、参照画像とオブジェクト画像の差が大きいが、所定の距離を単位としてオブジェクト画像を徐々に変換することで精度良く指定視点でのビルボードを生成することができる。 In the above embodiment, a reference image at a specified viewpoint is generated, and the object indicated by the object image is converted by a single process. However, it is also possible to adopt a configuration in which the object indicated by the object image is converted by a plurality of processes. Specifically, it is assumed that the designated viewpoint is a position moved by a distance 10 below the reference viewpoint. In this case, first, the above process is performed based on the temporarily designated viewpoint moved by a distance 1 from the reference viewpoint to generate an object image indicating the converted object (first process). Subsequently, the temporary designated viewpoint in the first process is further moved down by the distance 1 and the above process is performed, and the converted object in the first process is further converted to indicate the converted object. An object image is generated (second processing). By performing this process a total of 10 times, it is possible to generate a converted object at a specified viewpoint. For example, when the vertical distance between the specified viewpoint and the reference viewpoint is large, the difference between the reference image and the object image is large. However, the object image is gradually converted in units of a predetermined distance, so that the building at the specified viewpoint can be accurately performed. A board can be generated.

また、図３は、自由視点映像を生成する自由視点映像生成装置を示すものであったが、例えば、図３の自由視点映像生成部５以外の部分を除き、自由視点映像の生成に使用するビルボードを生成するビルボード生成装置として本発明を実現することもできる。 FIG. 3 shows a free viewpoint video generation apparatus that generates a free viewpoint video. For example, except for a portion other than the free viewpoint video generation unit 5 of FIG. The present invention can also be realized as a billboard generating device that generates a billboard.

なお、本発明による自由視点映像生成装置又はビルボード生成装置は、コンピュータを上記自由視点映像生成装置又はビルボード生成装置として動作させるプログラムにより実現することができる。これらコンピュータプログラムは、コンピュータが読み取り可能な記憶媒体に記憶されて、又は、ネットワーク経由で配布が可能なものである。 The free viewpoint video generation apparatus or billboard generation apparatus according to the present invention can be realized by a program that causes a computer to operate as the free viewpoint video generation apparatus or billboard generation apparatus. These computer programs can be stored in a computer-readable storage medium or distributed via a network.

１：背景差分部、２：モデル構築部、３：参照画像生成部、４：ビルボード生成部 1: background difference unit, 2: model construction unit, 3: reference image generation unit, 4: billboard generation unit

Claims

A billboard generation device for generating a billboard of an object used for generating a video at a specified viewpoint specified by a user,
Holding means for holding model information indicating a three-dimensional model of the object;
First generating means for determining the object of the frame photographed by the camera and generating an object image indicating the determined object;
The viewpoint corresponding to the camera is set as a reference viewpoint, the viewpoint that is in the same horizontal position as the reference viewpoint and has the same vertical position as the specified viewpoint is defined as the first viewpoint, and the viewpoint viewed from the first viewpoint Second generation means for generating a reference image indicating an object based on the three-dimensional model;
Third generation means for generating a billboard of the object at the first viewpoint by deforming the object indicated by the object image based on the object image and the reference image;
The billboard of the object used for generating the video at the designated viewpoint by rotating the billboard of the object at the first viewpoint in the horizontal direction based on a horizontal distance between the first viewpoint and the designated viewpoint. Fourth generation means for generating
A billboard generator characterized by comprising:

The third generation means includes
First determination means for dividing the object image into a plurality of regions, and determining a plurality of first regions including an edge of the object among the plurality of regions;
Second determination means for determining a second region that is a region of the reference image corresponding to each of the plurality of first regions;
Fifth generation means for generating a billboard of the object at the first viewpoint by transforming the object of the object image based on a correspondence relationship between the first area and the second area;
The billboard generating apparatus according to claim 1, further comprising:

The plurality of first regions and the second region corresponding to each of the plurality of first regions are polygons,
The fifth generation means is configured to maintain a shared relationship of vertices shared by two or more first areas among the vertices of the plurality of first areas. The billboard of the object at the first viewpoint is generated by deforming the object of the object image by determining a deformed vertex based on a corresponding vertex of the second region. The billboard generating apparatus according to 2.

The fifth generation means includes a sum of distances between the deformed vertices of the plurality of first regions and the corresponding second region, or the deformed vertices of the plurality of first regions, 4. The billboard generation according to claim 3, wherein the vertexes after deformation of each of the plurality of first regions are calculated so that the sum of the squares of the distances from the corresponding second regions is minimized. apparatus.

5. The method according to claim 2, wherein the first determination unit determines, as the first area, an area in which a ratio of pixels indicating an object to all the pixels in the area is within a predetermined range. The billboard generator according to claim 1.

2. The apparatus according to claim 1, further comprising sixth generation means for generating the model information indicating the three-dimensional model of the object from the frames of the images photographed by each of a plurality of cameras and holding the information in the holding means. The billboard generating apparatus according to any one of 1 to 5.

The billboard generation device according to claim 6, wherein the first generation unit generates the object image from a frame captured by a camera closest to the designated viewpoint among the plurality of cameras.

A billboard generation device for generating a billboard of an object used for generating a video at a specified viewpoint specified by a user,
Holding means for holding model information indicating a three-dimensional model of the object;
First generating means for determining the object of the frame photographed by the camera and generating a first object image indicating the determined object;
A viewpoint corresponding to the camera is set as a reference viewpoint, a viewpoint having the same horizontal position as the reference viewpoint and the same vertical position as the specified viewpoint is defined as an nth viewpoint (n is an integer of 2 or more), Second generating means for generating a billboard of the object at the nth viewpoint by deforming the object of the first object image;
The billboard of the object used for generating the video at the designated viewpoint by rotating the billboard of the object at the nth viewpoint in the horizontal direction based on a horizontal distance between the nth viewpoint and the designated viewpoint. Third generation means for generating
With
The second generation means sets the first viewpoint to the (n−1) viewpoint from the reference viewpoint side on the line from the reference viewpoint to the nth viewpoint, and viewed from each of the first viewpoint to the nth viewpoint. Fourth generation means for generating a first reference image to an nth reference image indicating the object based on the three-dimensional model;
M = 1 to n is generated by deforming the object indicated by the mth object image based on the mth object image and the mth reference image (m is an integer of 1 to n). 5th generation means for generating the (n + 1) th object image by repeating in order, and generating the billboard of the object at the nth viewpoint based on the object indicated by the (n + 1) th object image,
A billboard generator characterized by comprising:

A billboard generation method in a billboard generation apparatus for generating a billboard of an object from a frame taken by a camera, which is used to generate a video at a specified viewpoint specified by a user,
The billboard generating apparatus holds model information indicating a three-dimensional model of the object,
The billboard generation method includes:
A first generation step of determining the object of the frame photographed by the camera and generating an object image indicating the determined object;
The viewpoint corresponding to the camera is set as a reference viewpoint, the viewpoint that is in the same horizontal position as the reference viewpoint and has the same vertical position as the specified viewpoint is defined as the first viewpoint, and the viewpoint viewed from the first viewpoint A second generation step of generating a reference image indicating an object based on the three-dimensional model;
A third generation step of generating a billboard of the object at the first viewpoint by deforming an object indicated by the object image based on the object image and the reference image;
The billboard of the object used for generating the video at the designated viewpoint by rotating the billboard of the object at the first viewpoint in the horizontal direction based on a horizontal distance between the first viewpoint and the designated viewpoint. A fourth generation step of generating
A billboard generation method comprising:

A billboard generation method in a billboard generation apparatus for generating a billboard of an object from a frame taken by a camera, which is used to generate a video at a specified viewpoint specified by a user,
The billboard generating apparatus holds model information indicating a three-dimensional model of the object,
A first generation step of determining the object of the frame photographed by the camera and generating a first object image indicating the determined object;
A viewpoint corresponding to the camera is set as a reference viewpoint, a viewpoint having the same horizontal position as the reference viewpoint and the same vertical position as the specified viewpoint is defined as an nth viewpoint (n is an integer of 2 or more), A second generation step of generating the billboard of the object at the nth viewpoint by deforming the object indicated by the first object image;
The billboard of the object used for generating the video at the designated viewpoint by rotating the billboard of the object at the nth viewpoint in the horizontal direction based on a horizontal distance between the nth viewpoint and the designated viewpoint. A third generation step of generating
Including
The second generation step includes
First reference to (n-1) viewpoints are set from the reference viewpoint side on the line from the reference viewpoint to the nth viewpoint, and the first reference indicating the object viewed from each of the first viewpoint to the nth viewpoint A fourth generation step of generating an image to an nth reference image based on the three-dimensional model;
M = 1 to n is generated by deforming the object indicated by the mth object image based on the mth object image and the mth reference image (m is an integer of 1 to n). A fifth generation step of generating a (n + 1) th object image by repeating in order, and generating a billboard of the object at the nth viewpoint based on the object indicated by the (n + 1) th object image;
A billboard generation method comprising:

A program for causing a computer to function as the billboard generating apparatus according to claim 1.