JP2020112928A

JP2020112928A - Background model generation device, background model generation method, and background model generation program

Info

Publication number: JP2020112928A
Application number: JP2019001928A
Authority: JP
Inventors: 恵近野; Megumi Konno
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2019-01-09
Filing date: 2019-01-09
Publication date: 2020-07-27
Anticipated expiration: 2039-01-09
Also published as: JP7275583B2

Abstract

To reduce divergence of a depth in a background model and a dynamic background.SOLUTION: A server device 10 has an acquisition part acquiring a camera image imaged by a camera from a prescribed imaging position, a calculation part calculating a depth image corresponding to the imaging position, a separation part separating a plurality of pixels included in the camera image into a foreground and a background, a correction part correcting a depth value of each pixel in the depth image by using a depth value of each pixel in the depth image corresponding to each pixel that is separated into the background in the camera image, and a background creation part creating a background model regarding the background by using a new depth image based on a depth value of each pixel in the depth image that the correction part has corrected.SELECTED DRAWING: Figure 6

Description

本発明は、背景モデル生成装置、背景モデル生成方法及び背景モデル生成プログラムに関する。 The present invention relates to a background model generation device, a background model generation method, and a background model generation program.

自由視点映像生成という技術が知られている。例えば、自由視点映像が生成される場合、複数の視点ごとに撮像された画像の各々から前景および背景が分離されたのち、前景部分および背景部分の各々について、３次元モデルが再現される。このように、前景部分の３次元モデルおよび背景部分の３次元モデルにより再現された３次元空間を、指定された仮想視点から見た映像として、提供する。 A technique called free-viewpoint image generation is known. For example, when a free viewpoint video is generated, the foreground and the background are separated from each of the images captured for each of a plurality of viewpoints, and then the three-dimensional model is reproduced for each of the foreground portion and the background portion. In this way, the 3D space reproduced by the 3D model of the foreground part and the 3D model of the background part is provided as an image viewed from the designated virtual viewpoint.

これらの３次元モデルのうち、前景部分の３次元モデルの生成には、ＶｉｓｕａｌＨｕｌｌが用いられる。一方、背景部分の３次元モデルは、コンピュータグラフィックスや３次元測距等を用いて予め生成される。そして、レンダリングの際に、複数の視点から撮像されたカメラ画像のうち、指定された仮想視点に対応するカメラ画像を前景部分の３次元モデルおよび背景部分の３次元モデルに投影する。なお、仮想視点には、カメラの視点に限らず、３次元空間上の任意の視点を指定することができる。 Of these three-dimensional models, Visual Hull is used to generate the three-dimensional model of the foreground part. On the other hand, the three-dimensional model of the background portion is generated in advance using computer graphics, three-dimensional distance measurement, or the like. Then, at the time of rendering, of the camera images captured from a plurality of viewpoints, the camera image corresponding to the designated virtual viewpoint is projected onto the three-dimensional model of the foreground portion and the three-dimensional model of the background portion. The virtual viewpoint is not limited to the viewpoint of the camera, and any arbitrary viewpoint in the three-dimensional space can be designated.

ここで、予め準備した３次元の背景モデルは、背景に含まれる被写体のうち動きがない被写体、例えばスポーツ観戦が行われるスタジアムなどの構造物やその観客席などの設備がモデリングされたものに過ぎない。このため、観客席でスポーツ観戦を行う観客などの動的背景が含まれる場合、自由視点画像の画質が低下する。 Here, the three-dimensional background model prepared in advance is nothing but a model of a stationary subject among the subjects included in the background, for example, a structure such as a stadium where a sports watching is held, and facilities such as its seats. Absent. Therefore, when a dynamic background such as a spectator watching a sport in the spectator seat is included, the image quality of the free viewpoint image is deteriorated.

なぜなら、動的背景が含まれる場合、仮想視点から背景モデルまでの奥行きと、仮想視点から動的背景までの奥行きとの間にずれが生じるからである。この奥行きのずれが一因となって、テクスチャとして用いられるカメラ画像のうち誤ったテクスチャ座標の画素がマッピングされる結果、自由視点画像の画質が低下する。 This is because when a dynamic background is included, a gap occurs between the depth from the virtual viewpoint to the background model and the depth from the virtual viewpoint to the dynamic background. This displacement of depth contributes to the mapping of pixels with incorrect texture coordinates in the camera image used as the texture, resulting in deterioration of the image quality of the free viewpoint image.

このような動的背景に対応する側面から、次のような自由視点映像生成装置が提案されている。この自由視点映像生成装置は、まず、参照画像と奥行マップから各フレームの仮の自由視点画像を生成する。そして、自由視点映像生成装置は、参照画像と奥行マップから曲面背景バッファに保存する背景画像とその奥行値とを背景領域として抽出する。その上で、自由視点映像生成装置は、仮の自由視点画像を曲面背景バッファに保存した背景画像とその奥行値で補完する。 From the aspect corresponding to such a dynamic background, the following free-viewpoint image generation device has been proposed. The free-viewpoint video generation device first generates a temporary free-viewpoint image of each frame from the reference image and the depth map. Then, the free viewpoint video generation device extracts the background image and its depth value to be stored in the curved background buffer from the reference image and the depth map as the background area. Then, the free viewpoint video generation device complements the temporary free viewpoint image with the background image stored in the curved background buffer and its depth value.

特開２００６−１４６８４６号公報JP, 2006-146846, A

しかしながら、上記の技術では、依然として、背景モデル及び動的背景のデプスのずれを低減できない場合がある。 However, the above technique may still fail to reduce the depth shift of the background model and the dynamic background.

すなわち、上記の自由視点映像生成装置では、参照画像の奥行分布が平滑化されたガウス分布のうち極小値に対応する奥行を前景と背景の分割に用いることにより、背景領域が抽出される。ところが、背景領域の抽出に奥行値が用いられる場合、前景と背景との奥行値が近くなるにつれて両者の分割が困難になる。このため、上記の曲面背景バッファには、前景に対応する被写体が誤って背景として保存される結果、デプスのずれが拡大する。 That is, in the above free-viewpoint image generation device, the background area is extracted by using the depth corresponding to the minimum value in the Gaussian distribution in which the depth distribution of the reference image is smoothed for dividing the foreground and the background. However, when the depth value is used to extract the background region, it becomes difficult to divide the foreground and the background as the depth values become closer to each other. Therefore, in the curved background buffer, the subject corresponding to the foreground is erroneously stored as the background, resulting in an increase in the depth shift.

１つの側面では、本発明は、背景モデル及び動的背景のデプスのずれを低減させることができる背景モデル生成装置、背景モデル生成方法及び背景モデル生成プログラムを提供することを目的とする。 In one aspect, an object of the present invention is to provide a background model generation device, a background model generation method, and a background model generation program capable of reducing the depth shift of the background model and the dynamic background.

一態様では、背景モデル生成装置は、所定の撮像位置からカメラにより撮像されたカメラ画像を取得する取得部と、前記撮像位置に対応するデプス画像を算出する算出部と、前記カメラ画像に含まれる複数の画素を前景と背景に分離する分離部と、前記カメラ画像において前記背景に分離された各画素に対応する前記デプス画像の各画素のデプス値を用いて、前記デプス画像の各画素のデプス値を補正する補正部と、前記補正部により補正された前記デプス画像の各画素のデプス値に基づく新たなデプス画像を用いて、前記背景に係る背景モデルを生成する背景生成部と、を有する。 In one aspect, the background model generation device is included in the camera image, an acquisition unit that acquires a camera image captured by a camera from a predetermined imaging position, a calculation unit that calculates a depth image corresponding to the imaging position. A separation unit that separates a plurality of pixels into a foreground and a background, and a depth value of each pixel of the depth image using a depth value of each pixel of the depth image corresponding to each pixel separated into the background in the camera image. A correction unit that corrects a value, and a background generation unit that generates a background model related to the background by using a new depth image based on the depth value of each pixel of the depth image corrected by the correction unit. ..

背景モデル及び動的背景のデプスのずれを低減させることができる。 The depth shift of the background model and the dynamic background can be reduced.

図１は、実施例１に係る映像生成システムの構成例を示す図である。FIG. 1 is a diagram illustrating a configuration example of a video generation system according to the first embodiment. 図２Ａは、カメラ画像の一例を示す図である。FIG. 2A is a diagram showing an example of a camera image. 図２Ｂは、シルエット画像の一例を示す図である。FIG. 2B is a diagram showing an example of a silhouette image. 図３は、ＶｉｓｕａｌＨｕｌｌの一例を示す図である。FIG. 3 is a diagram illustrating an example of the Visual Hull. 図４は、レンダリングの一例を示す図である。FIG. 4 is a diagram showing an example of rendering. 図５は、スタジアムの断面図の一例を示す図である。FIG. 5: is a figure which shows an example of the cross section of a stadium. 図６は、実施例１に係るサーバ装置の機能的構成を示すブロック図である。FIG. 6 is a block diagram illustrating the functional configuration of the server device according to the first embodiment. 図７は、実施例１に係る各機能部間で授受されるデータの一例を示す図である。FIG. 7 is a diagram illustrating an example of data transmitted and received between the functional units according to the first embodiment. 図８Ａは、シルエット画像の一例を示す図である。FIG. 8A is a diagram showing an example of a silhouette image. 図８Ｂは、デプス画像の一例を示す図である。FIG. 8B is a diagram showing an example of the depth image. 図９Ａは、画像ＩＤの一例を示す図である。FIG. 9A is a diagram showing an example of the image ID. 図９Ｂは、フィルタの畳み込み演算の一例を示す図である。FIG. 9B is a diagram showing an example of the convolution operation of the filter. 図９Ｃは、フィルタの畳み込み演算の一例を示す図である。FIG. 9C is a diagram showing an example of the convolution operation of the filter. 図１０Ａは、時間フィルタリングにおける注目画素の一例を示す図である。FIG. 10A is a diagram showing an example of a pixel of interest in temporal filtering. 図１０Ｂは、フィルタの畳み込み演算の一例を示す図である。FIG. 10B is a diagram illustrating an example of the convolution operation of the filter. 図１１は、実施例１に係る映像生成処理の手順を示すフローチャートである。FIG. 11 is a flowchart illustrating the procedure of the video generation process according to the first embodiment. 図１２は、応用例１における各機能部間で授受されるデータの一例を示す図である。FIG. 12 is a diagram illustrating an example of data exchanged between the functional units in the application example 1. 図１３は、評価値とデプスのグラフの一例を示す図である。FIG. 13 is a diagram showing an example of a graph of the evaluation value and the depth. 図１４は、応用例１に係る映像生成処理の手順を示すフローチャートである。FIG. 14 is a flowchart illustrating the procedure of the video generation process according to the application example 1. 図１５は、実施例１及び実施例２に係る背景モデル生成プログラムを実行するコンピュータのハードウェア構成例を示す図である。FIG. 15 is a diagram illustrating a hardware configuration example of a computer that executes the background model generation program according to the first and second embodiments.

以下に添付図面を参照して本願に係る背景モデル生成装置、背景モデル生成方法及び背景モデル生成プログラムについて説明する。なお、この実施例は開示の技術を限定するものではない。そして、各実施例は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 A background model generation device, a background model generation method, and a background model generation program according to the present application will be described below with reference to the accompanying drawings. Note that this embodiment does not limit the disclosed technology. Then, the respective embodiments can be appropriately combined within the range in which the processing contents do not contradict each other.

［システム構成］
図１は、実施例１に係る映像生成システムの構成例を示す図である。図１に示す映像生成システム１は、一側面として、視点が異なる複数のカメラ５Ａ〜５Ｎが撮像する多視点のカメラ画像を組み合わせることにより自由視点映像を生成する映像生成サービスを提供するものである。 [System configuration]
FIG. 1 is a diagram illustrating a configuration example of a video generation system according to the first embodiment. As one aspect, the video generation system 1 illustrated in FIG. 1 provides a video generation service that generates a free-viewpoint video by combining multi-viewpoint camera images captured by a plurality of cameras 5A to 5N having different viewpoints. ..

図１に示すように、映像生成システム１には、カメラ５Ａ〜５Ｎと、サーバ装置１０と、クライアント端末３０とが含まれる。以下では、カメラ５Ａ〜５Ｎのことを「カメラ５」と記載する場合がある。なお、図１には、あくまで一例として、１つのクライアント端末３０を図示したが、任意の数のクライアント端末３０が映像生成システム１に含まれることとしてもかまわない。 As shown in FIG. 1, the video generation system 1 includes cameras 5A to 5N, a server device 10, and a client terminal 30. Hereinafter, the cameras 5A to 5N may be referred to as "camera 5". Although only one client terminal 30 is shown in FIG. 1 as an example, any number of client terminals 30 may be included in the video generation system 1.

サーバ装置１０及びクライアント端末３０の間は、所定のネットワークＮＷを介して接続される。例えば、ネットワークＮＷは、有線または無線を問わず、インターネット、ＬＡＮ（Local Area Network）やＶＰＮ（Virtual Private Network）などの任意の種類の通信網により構築することができる。あくまで一例として、図１には、自由視点映像がネットワークＮＷを経由して提供される場合を例示するが、これはあくまで映像提供形態の一例に過ぎず、サーバ装置１０及びクライアント端末３０の間で必ずしも双方向に通信が行われずともかまわない。例えば、ネットワークＮＷを経由せず、自由視点映像が放送波を介してクライアント端末３０へ提供されることとしてもかまわない。 The server device 10 and the client terminal 30 are connected via a predetermined network NW. For example, the network NW can be constructed by any type of communication network such as the Internet, LAN (Local Area Network) or VPN (Virtual Private Network) regardless of wired or wireless. As an example, FIG. 1 illustrates a case where the free-viewpoint video is provided via the network NW, but this is merely an example of a video providing form, and the free-viewpoint video is provided between the server device 10 and the client terminal 30. It does not matter that bidirectional communication is not necessarily performed. For example, the free viewpoint video may be provided to the client terminal 30 via a broadcast wave without passing through the network NW.

カメラ５は、ＣＣＤ（Charge Coupled Device）やＣＭＯＳ（Complementary Metal Oxide Semiconductor）などの撮像素子を搭載する撮像装置である。 The camera 5 is an imaging device equipped with an imaging device such as a CCD (Charge Coupled Device) or a CMOS (Complementary Metal Oxide Semiconductor).

例えば、複数のカメラ５の撮影範囲が組み合わさることにより自由視点映像の生成対象とする３次元空間の全域が複数のカメラ５の撮影範囲に収まる配置で各カメラ５が設置される。さらに、２つ以上のカメラ５により撮像されたカメラ画像から３次元空間上に存在する被写体３の３次元形状を算出するために、各カメラ５は、他のカメラ５との間で撮影範囲の一部が重複する状態で配置される。このような配置の下、複数のカメラ５がフレームごとに同期して撮影することにより、異なる視点ごとに同一のタイミングで撮影された複数の画像、すなわち多視点のカメラ画像がフレーム単位で得られる。 For example, by combining the shooting ranges of the plurality of cameras 5, each camera 5 is installed in such an arrangement that the entire area of the three-dimensional space in which the free-viewpoint image is generated falls within the shooting range of the plurality of cameras 5. Further, in order to calculate the three-dimensional shape of the subject 3 existing in the three-dimensional space from the camera images picked up by two or more cameras 5, each camera 5 has a shooting range between the other cameras 5. They are arranged in a partially overlapping state. Under such an arrangement, the plurality of cameras 5 shoot in synchronization with each frame, so that a plurality of images taken at different timings at the same timing, that is, multi-view camera images are obtained in frame units. ..

サーバ装置１０は、上記の映像生成サービスを提供するコンピュータの一例に対応する。サーバ装置１０は、補正装置の一例にも対応する。ここでは、あくまでコンピュータの一例として、サーバ装置を例に挙げたが、これは機能を分類する上で付与されたラベルであり、そのハードウェア構成や導入されるソフトウェアの種類は限定されず、任意の種類のコンピュータであってかまわない。 The server device 10 corresponds to an example of a computer that provides the above-described video generation service. The server device 10 also corresponds to an example of a correction device. Here, the server device is given as an example of the computer to the last, but this is a label given for classifying the functions, and its hardware configuration and the type of software to be introduced are not limited, and are arbitrary. It can be any type of computer.

一実施形態として、サーバ装置１０は、パッケージソフトウェア又はオンラインソフトウェアとして、上記の映像生成サービスに対応する機能を実現する映像処理プログラムを所望のコンピュータにインストールさせることによって実装できる。例えば、サーバ装置１０は、上記の映像生成サービスを提供するサーバとしてオンプレミスに実装することとしてもよいし、アウトソーシングによって上記の映像生成サービスを提供するクラウドとして実装することとしてもかまわない。 As an embodiment, the server device 10 can be implemented as package software or online software by installing a video processing program that realizes a function corresponding to the above video generation service in a desired computer. For example, the server device 10 may be mounted on-premises as a server that provides the above-described video generation service, or may be implemented as a cloud that provides the above-mentioned video generation service by outsourcing.

クライアント端末３０は、上記の映像生成サービスの提供を受けるコンピュータの一例に対応する。 The client terminal 30 corresponds to an example of a computer that receives the above-described video generation service.

一実施形態として、上記の映像生成サービスの提供を受けるユーザにより使用される任意のコンピュータがクライアント端末３０に対応する。例えば、クライアント端末３０は、パーソナルコンピュータやワークステーションなどのデスクトップ型のコンピュータなどが対応する。このようなデスクトップ型のコンピュータに限定されず、ラックトップ型のコンピュータや携帯端末装置、ウェアラブル端末などの任意のコンピュータであってかまわない。 As one embodiment, the client terminal 30 corresponds to any computer used by a user who receives the above-described video generation service. For example, the client terminal 30 corresponds to a desktop computer such as a personal computer or a workstation. The computer is not limited to such a desktop computer, and may be any computer such as a rack-top computer, a mobile terminal device, and a wearable terminal.

［映像生成］
上記の自由視点映像は、一側面として、（１）前景背景分離、（２）前景モデルの生成、（３）背景モデルの生成、（４）レンダリングの４つの処理を実行することによって生成される。 [Video generation]
The above free viewpoint video is generated by executing four processes of (1) foreground/background separation, (2) foreground model generation, (3) background model generation, and (4) rendering, as one aspect. ..

（１）前景背景分離
上記の「前景背景分離」とは、各視点に対応するカメラ画像ごとに当該カメラ画像から前景と背景とを分離する処理を指す。この前景背景分離は、同一のフレームのカメラ画像ごとに並列して実行することもできれば、所定数ずつ順番に実行することもできる。 (1) Foreground/background separation The above-mentioned “foreground/background separation” refers to a process of separating the foreground and the background from the camera image for each camera image corresponding to each viewpoint. This foreground/background separation can be executed in parallel for each camera image of the same frame, or can be executed in order by a predetermined number.

ここで言う「前景」とは、カメラ５の撮影範囲内の３次元空間に存在する物体の中でも撮影の関心対象とする被写体に対応する。例えば、スポーツ観戦を例に挙げれば、選手やボールなどの被写体が前景に対応する。また、モータースポーツであれば、選手に用いられる乗り物、例えば自動車やオートバイなどの被写体も前景の範疇に含まれる。 The “foreground” referred to here corresponds to a subject of interest in photographing among objects existing in a three-dimensional space within the photographing range of the camera 5. For example, in watching sports, subjects such as players and balls correspond to the foreground. Further, in the case of motor sports, vehicles used for athletes, such as automobiles and motorcycles, are included in the foreground category.

一方、「背景」とは、前景に対応する被写体の背後に存在する被写体に対応する。例えば、背景に対応する被写体の中には、位置や姿勢に変化がないものが含まれる。以下、背景の中でも位置や姿勢に変化がない被写体のことを「静的背景」と記載することがある。例えば、スポーツ観戦を例で言えば、スポーツ観戦が行われるスタジアムなどの構造物やその観客席などの設備などが静的背景に対応する。このような静的背景の他にも、背景には、位置や姿勢に変化があるものも含まれる場合がある。以下、背景の中でも位置や姿勢に変化がある被写体のことを「動的背景」と記載することがある。例えば、スタジアムの観客席で観戦する観客などが動的背景に対応する。なぜなら、観客が観客席に着座して観戦したり、観客席から前のめりになって観戦したり、あるいは観客席から立ち上がって観戦したりといった挙動を示すことにより、観客の位置や姿勢が変化するからである。 On the other hand, the “background” corresponds to the subject existing behind the subject corresponding to the foreground. For example, the subjects corresponding to the background include those whose position or orientation does not change. Hereinafter, a subject whose position or orientation does not change among the backgrounds may be referred to as a “static background”. For example, taking a sports watching as an example, a static background includes a structure such as a stadium where the sports are watched and facilities such as seats of the spectators. In addition to such a static background, the background may include a background whose position or orientation changes. Hereinafter, a subject whose position or orientation changes among the backgrounds may be referred to as a “dynamic background”. For example, a spectator watching a spectator seat in a stadium corresponds to the dynamic background. Because the spectator's position and posture change as the spectator sits in the spectator's seat to watch the game, or leans forward from the spectator's seat to watch the game, or stands up and watches the game. Is.

図２Ａ及び図２Ｂを用いて前景背景分離の一例を説明する。図２Ａは、カメラ画像の一例を示す図である。図２Ｂは、シルエット画像の一例を示す図である。図２Ａには、ある視点ｐ１に対応するカメラ画像２００が示されると共に、図２Ｂには、視点ｐ１のカメラ画像２００から生成されたシルエット画像２１０が示されている。前景背景分離には、あくまで一例として、いわゆる背景差分をカメラ画像２００に適用したり、あるいはカメラ画像２００に２次元のグラフカットを適用したりする。これら背景差分やグラフカットを含む任意のアルゴリズムが適用されることによって、各画素が画素値を持つカメラ画像２００から各画素に前景または背景の２値のラベルが割り当てられたシルエット画像２１０が生成される。このシルエット画像２１０では、図２Ｂに示すように、カメラ画像２００に含まれる被写体３ｆｇのシルエットが背景と分離された上で抽出される。 An example of foreground/background separation will be described with reference to FIGS. 2A and 2B. FIG. 2A is a diagram showing an example of a camera image. FIG. 2B is a diagram showing an example of a silhouette image. 2A shows a camera image 200 corresponding to a certain viewpoint p1, and FIG. 2B shows a silhouette image 210 generated from the camera image 200 of the viewpoint p1. For foreground/background separation, as an example, so-called background difference is applied to the camera image 200, or a two-dimensional graph cut is applied to the camera image 200. By applying any algorithm including these background differences and graph cuts, a silhouette image 210 in which each pixel is assigned a binary label of the foreground or background is generated from the camera image 200 in which each pixel has a pixel value. It In this silhouette image 210, as shown in FIG. 2B, the silhouette of the subject 3fg included in the camera image 200 is extracted after being separated from the background.

（２）前景モデルの生成
上記の「前景モデルの生成」には、一例として、Ｖｉｓｕａｌ−Ｈｕｌｌという技術が用いられる。例えば、ＶｉｓｕａｌＨｕｌｌでは、カメラ５の光学中心とシルエット画像上のシルエットとを結んでできるＣｏｎｅ（視体積）が生成された上で、Ｃｏｎｅ同士が重なる３次元空間上の領域が被写体３ｆｇの３次元形状として算出される。 (2) Generation of Foreground Model In the above “generation of foreground model”, for example, a technique called Visual-Hull is used. For example, in Visual Hull, a Cone (visual volume) formed by connecting the optical center of the camera 5 and the silhouette on the silhouette image is generated, and the area in the three-dimensional space where the Cone overlaps is the three-dimensional object 3fg. It is calculated as a shape.

図３は、ＶｉｓｕａｌＨｕｌｌの一例を示す図である。図３には、カメラ５Ａ〜５Ｃの３つのカメラ５のシルエット画像２１０Ａ〜２１０ＣがＶｉｓｕａｌＨｕｌｌの算出に用いられる例が示されている。図３に示すように、カメラ５Ａ〜５Ｃの各視点に対応するシルエット画像２１０Ａ〜２１０ＣごとにシルエットＳＡ〜ＳＣが３次元空間に投影される。例えば、シルエットＳＡが投影された場合、カメラ５Ａの光学中心およびシルエット画像２１０Ａ上のシルエットＳＡを結ぶ視体積ＣＡが得られる。さらに、シルエットＳＢが投影された場合、カメラ５Ｂの光学中心およびシルエット画像２１０Ｂ上のシルエットＳＢを結ぶ視体積ＣＢが得られる。さらに、シルエットＳＣが投影された場合、カメラ５Ｃの光学中心およびシルエット画像２１０Ｃ上のシルエットＳＣを結ぶ視体積ＣＣが得られる。これら視体積ＣＡ〜ＣＣが重複するＶｉｓｕａｌＨｕｌｌ領域、すなわち図３に示す黒の塗り潰しの３次元モデルが被写体３の３次元形状として算出される。 FIG. 3 is a diagram illustrating an example of the Visual Hull. FIG. 3 shows an example in which silhouette images 210A to 210C of the three cameras 5A to 5C are used to calculate the Visual Hull. As shown in FIG. 3, the silhouettes SA to SC are projected in the three-dimensional space for each silhouette image 210A to 210C corresponding to each viewpoint of the cameras 5A to 5C. For example, when the silhouette SA is projected, a visual volume CA connecting the optical center of the camera 5A and the silhouette SA on the silhouette image 210A is obtained. Further, when the silhouette SB is projected, a visual volume CB connecting the optical center of the camera 5B and the silhouette SB on the silhouette image 210B is obtained. Furthermore, when the silhouette SC is projected, a visual volume CC connecting the optical center of the camera 5C and the silhouette SC on the silhouette image 210C is obtained. A Visual Hull region in which these visual volumes CA to CC overlap, that is, a three-dimensional model with black filling shown in FIG. 3 is calculated as the three-dimensional shape of the subject 3.

（３）背景モデルの生成
上記の「背景モデルの生成」には、あくまで一例として、コンピュータグラフィックスや３次元測距などが用いられる。例えば、３ＤＣＧ（3 Dimensional Computer Graphics）により静的背景がモデリングされることにより背景モデルが生成される。この他、各カメラ５に対応する視点ごとにカメラ画像２００上の各画素に対応するデプスが３次元のレーザセンサにより測定される。これによって、各画素にデプスが対応付けられたデプス画像がカメラ５の視点ごとに得られる。 (3) Generation of Background Model For the above-mentioned “generation of background model”, computer graphics, three-dimensional distance measurement, etc. are used as an example. For example, a background model is generated by modeling a static background by 3DCG (3 Dimensional Computer Graphics). In addition, the depth corresponding to each pixel on the camera image 200 is measured by the three-dimensional laser sensor for each viewpoint corresponding to each camera 5. As a result, a depth image in which the depth is associated with each pixel is obtained for each viewpoint of the camera 5.

（４）レンダリング
上記の「レンダリング」とは、仮想視点に対応するカメラ画像、いわゆる自由視点映像を多視点のカメラ画像を用いて生成する処理を指す。ここで言う「仮想視点」とは、仮想カメラに与えられる視点を指し、例えば、仮想カメラが３次元空間上に配置される位置や姿勢を指す。この仮想視点は、クライアント端末３０からユーザ入力を受け付けることにより指定されることとしてもよいし、また、クライアント端末３０を介するユーザ設定またはサーバ装置１０に登録されたシステム設定により指定されることとしてもかまわない。 (4) Rendering The above-mentioned “rendering” refers to a process of generating a camera image corresponding to a virtual viewpoint, that is, a so-called free viewpoint video, using camera images of multiple viewpoints. The “virtual viewpoint” mentioned here refers to a viewpoint given to the virtual camera, for example, a position or orientation in which the virtual camera is arranged in a three-dimensional space. This virtual viewpoint may be designated by accepting a user input from the client terminal 30, or may be designated by user setting via the client terminal 30 or system setting registered in the server device 10. I don't care.

図４は、レンダリングの一例を示す図である。図４には、仮想カメラＶｃの位置がカメラ５Ｂおよびカメラ５Ｃの間に設定されると共に、仮想カメラＶｃの光学中心および画素を通る直線が被写体３ｆｇに対応する前景モデル３Ｍｆｇと交わる例が示されている。図４に示すように、仮想カメラＶｃの光学中心および画素を通る直線と、前景モデル３Ｍｆｇとの交点の３次元位置が求められる（Ｓ１）。続いて、カメラ５の位置や姿勢などの外部パラメータ及びカメラ５の画角やレンズの歪みなどの内部パラメータが設定されたカメラパラメータにしたがって、上記の交点が各視点に対応するカメラ画像に投影される。ここでは、一例として、仮想カメラＶｃからの距離が近い所定数のカメラ５のカメラ画像、すなわちカメラ５ＢおよびカメラＣの２つのカメラ画像２００Ｂおよび２００Ｃに上記の交点が投影される（Ｓ２Ｂ及びＳ２Ｃ）。これによって、仮想カメラＶｃの画素に対応するカメラ５Ｂの画素およびカメラ５Ｃの画素がテクスチャ座標として識別される。 FIG. 4 is a diagram showing an example of rendering. FIG. 4 shows an example in which the position of the virtual camera Vc is set between the cameras 5B and 5C, and a straight line passing through the optical center of the virtual camera Vc and the pixel intersects with the foreground model 3Mfg corresponding to the subject 3fg. ing. As shown in FIG. 4, the three-dimensional position of the intersection of the straight line passing through the optical center of the virtual camera Vc and the pixel and the foreground model 3Mfg is obtained (S1). Then, according to the camera parameters in which the external parameters such as the position and orientation of the camera 5 and the internal parameters such as the angle of view of the camera 5 and the distortion of the lens are set, the above intersections are projected on the camera image corresponding to each viewpoint. It Here, as an example, the intersections are projected onto camera images of a predetermined number of cameras 5 close to the virtual camera Vc, that is, two camera images 200B and 200C of the cameras 5B and C (S2B and S2C). .. Thereby, the pixel of the camera 5B and the pixel of the camera 5C corresponding to the pixel of the virtual camera Vc are identified as the texture coordinates.

その後、カメラ５Ｂにより撮像されたカメラ画像２００Ｂのうち、仮想カメラＶｃの画素に対応する画素が有する画素値が参照される（Ｓ３Ｂ）。さらに、カメラ５Ｃにより撮像されたカメラ画像２００Ｃのうち仮想カメラＶｃの画素に対応する画素が有する画素値が参照される（Ｓ３Ｃ）。これらＳ３Ｂ及びＳ３Ｃで参照された画素値が仮想カメラＶｃの画素にマッピングされる。例えば、仮想カメラＶｃの画素に対応するカメラ画像２００Ｂ上の画素の画素値およびカメラ画像２００Ｃ上の画素の画素値の統計値、例えば相加平均または仮想カメラＶｃとの距離を用いる加重平均などが仮想カメラＶｃの画素の画素値として決定される。 After that, in the camera image 200B captured by the camera 5B, the pixel value of the pixel corresponding to the pixel of the virtual camera Vc is referred to (S3B). Further, the pixel value of the pixel corresponding to the pixel of the virtual camera Vc in the camera image 200C captured by the camera 5C is referred to (S3C). The pixel values referred to in S3B and S3C are mapped to the pixels of the virtual camera Vc. For example, a statistical value of a pixel value of a pixel on the camera image 200B corresponding to a pixel of the virtual camera Vc and a pixel value of a pixel on the camera image 200C, such as an arithmetic mean or a weighted average using a distance from the virtual camera Vc, is calculated. It is determined as the pixel value of the pixel of the virtual camera Vc.

このように、仮想カメラＶｃの画素ごとに、カメラ画像２００Ｂやカメラ画像２００Ｃなどのテクスチャをマッピングすることで、仮想視点に対応する自由視点映像がレンダリングされる。なお、ここでは、あくまで一例として、複数のカメラ５のカメラ画像を用いて自由視点映像がレンダリングされる場合を例示したが、仮想カメラＶｃとの距離が最も近い最寄りのカメラ５のカメラ画像に絞って自由視点映像のレンダリングに用いることもできる。 In this way, the free viewpoint video corresponding to the virtual viewpoint is rendered by mapping the texture of the camera image 200B or the camera image 200C for each pixel of the virtual camera Vc. Note that, here, as an example, the case where the free viewpoint video is rendered by using the camera images of the plurality of cameras 5 has been illustrated, but the camera images of the nearest camera 5 having the closest distance to the virtual camera Vc are narrowed down. It can also be used to render free-viewpoint video.

［課題の一側面］
上記の背景技術の欄で説明した通り、静的背景がモデリングされた背景モデルを自由視点映像のレンダリングに用いたのでは、カメラ画像に動的背景が含まれる場合に対応できない。なぜなら、動的背景が含まれる場合、仮想視点から背景モデルまでのデプスと、仮想視点から動的背景までのデプスとの間にずれが生じるからである。このデプスのずれが一因となって、テクスチャとして用いられるカメラ画像のうち誤ったテクスチャ座標の画素の画素値がマッピングされる結果、自由視点画像の画質が低下する。 [One aspect of the issue]
As described in the section of the background art above, the use of the background model in which the static background is modeled for rendering the free viewpoint video cannot deal with the case where the camera image includes the dynamic background. This is because, when a dynamic background is included, there is a gap between the depth from the virtual viewpoint to the background model and the depth from the virtual viewpoint to the dynamic background. This shift in depth is one of the causes, and as a result of mapping the pixel value of the pixel of the wrong texture coordinate in the camera image used as the texture, the image quality of the free viewpoint image is deteriorated.

図５は、スタジアムの断面図の一例を示す図である。図５には、スタジアムの中心から外側への方向、すなわちスタンドの列方向を切断面とする断面図が示されている。図５に示す断面図には、静的背景の一例として、スタジアムのスタンド部分がモデリングされた背景モデル３Ｍｂｇｓが示されている。さらに、図５に示す断面図には、動的背景に対応する被写体の一例として、スタジアムのスタンドでスポーツ観戦を行う観客３ｂｇｄが示されている。 FIG. 5: is a figure which shows an example of the cross section of a stadium. FIG. 5 shows a cross-sectional view of the stadium in the direction from the center to the outside, that is, in the row direction of the stands as a cutting plane. In the cross-sectional view shown in FIG. 5, a background model 3Mbgs in which a stand portion of a stadium is modeled is shown as an example of a static background. Further, in the cross-sectional view shown in FIG. 5, as an example of a subject corresponding to the dynamic background, a spectator 3bgd who is watching a sport at a stadium stand is shown.

図５に示すように、仮想視点Ｖｃから背景モデル３Ｍｂｇｓまでのデプス（実線矢印の部分）と、仮想視点Ｖｃから観客３ｂｇｄまでのデプス（一点鎖線の部分）との間にはずれがある。それにもかかわらず、静的背景がモデリングされた背景モデル３Ｍｂｇｓをレンダリングに用いる場合、動的背景の観客３ｂｇｄの３次元位置ではなく、静的背景の背景モデル３Ｍｂｇｓの３次元位置に対応するテクスチャ座標がテクスチャマッピングに用いられる。すなわち、仮想視点Ｖｃの光学中心を通るＲａｙが観客３ｂｇｄと交わる交点Ｏ２の３次元位置ではなく、仮想視点Ｖｃの光学中心を通るＲａｙが背景モデル３Ｍｂｇｓと交わる交点Ｏ１の３次元位置がカメラ画像２００Ｂや２００Ｃなどのテクスチャに投影される。このように、カメラ画像２００Ｂや２００Ｃのうち誤ったテクスチャ座標の画素がテクスチャマッピングに用いられる結果、自由視点映像の画質が低下する。 As shown in FIG. 5, there is a gap between the depth from the virtual viewpoint Vc to the background model 3Mbgs (the portion indicated by the solid line arrow) and the depth from the virtual viewpoint Vc to the audience 3bgd (the portion indicated by the alternate long and short dash line). Nevertheless, when the background model 3Mbgs in which the static background is modeled is used for rendering, the texture coordinates corresponding to the three-dimensional position of the background model 3Mbgs of the static background are not the three-dimensional position of the audience 3bgd of the dynamic background. Is used for texture mapping. That is, the Ray passing through the optical center of the virtual viewpoint Vc is not the three-dimensional position of the intersection O2 intersecting the spectator 3bgd, but the three-dimensional position of the intersection O1 where Ray intersecting the optical center of the virtual viewpoint Vc intersects with the background model 3Mbgs is the camera image 200B. And projected on a texture such as 200C. As described above, as a result of using pixels with wrong texture coordinates in the camera images 200B and 200C for texture mapping, the image quality of the free-viewpoint video deteriorates.

このような動的背景に対応する側面から、上記の背景技術の欄で挙げた自由視点映像生成装置が提案されている。この自由視点映像生成装置は、まず、参照画像と奥行マップから各フレームの仮の自由視点画像を生成する。そして、自由視点映像生成装置は、参照画像と奥行マップから曲面背景バッファに保存する背景画像とその奥行値とを背景領域として抽出する。その上で、自由視点映像生成装置は、仮の自由視点画像を曲面背景バッファに保存した背景画像とその奥行値で補完する。 From the aspect corresponding to such a dynamic background, the free-viewpoint video generation device mentioned in the section of the background art has been proposed. The free-viewpoint video generation device first generates a temporary free-viewpoint image of each frame from the reference image and the depth map. Then, the free viewpoint video generation device extracts the background image and its depth value to be stored in the curved background buffer from the reference image and the depth map as the background area. Then, the free viewpoint video generation device complements the temporary free viewpoint image with the background image stored in the curved background buffer and its depth value.

しかしながら、上記の自由視点映像生成装置では、依然として、背景モデル及び動的背景のデプスのずれを低減できない場合がある。 However, in the above-described free-viewpoint image generation device, it may still be impossible to reduce the depth shift of the background model and the dynamic background.

［課題解決のアプローチの一側面］
そこで、本実施例に係るサーバ装置１０は、動的背景に対応する側面から、所定のフレームで各視点に対応するデプス画像を算出する。例えば、デプス画像は、２つ以上のカメラ画像からステレオマッチングにより算出することとしてもよいし、３次元のレーザセンサ等のデプスカメラにより測定されることとしてもかまわない。 [One aspect of approach to problem solving]
Therefore, the server device 10 according to the present embodiment calculates the depth image corresponding to each viewpoint in a predetermined frame from the side surface corresponding to the dynamic background. For example, the depth image may be calculated by stereo matching from two or more camera images, or may be measured by a depth camera such as a three-dimensional laser sensor.

その上で、本実施例に係るサーバ装置１０は、カメラ画像に対する前景背景分離で背景に分離された画素のデプスを用いてデプス画像の各画素のデプスを補正し、補正したデプス画像から背景モデルを生成する。 Then, the server device 10 according to the present embodiment corrects the depth of each pixel of the depth image using the depth of the pixel separated into the background by the foreground/background separation for the camera image, and the corrected depth image is used as a background model. To generate.

このように、前景および背景の分離結果を用いることで、前景の被写体と背景の被写体とのデプスが近い場合でも、両者を区別してデプス画像を補正することができる。さらに、前景の被写体と背景の被写体との境界部においても、両者を混在せずにデプス画像におけるデプスのばらつきを補正することができる。このような補正が行われたデプス画像から背景モデルが生成される結果、背景モデルの精度を高めることができる。 As described above, by using the separation result of the foreground and the background, even when the depth of the foreground subject and the depth of the background subject are close to each other, the depth image can be corrected by distinguishing between them. Further, even at the boundary between the foreground subject and the background subject, it is possible to correct the depth variation in the depth image without mixing both. As a result of the background model being generated from the depth image thus corrected, the accuracy of the background model can be improved.

したがって、本実施例に係るサーバ装置１０によれば、背景モデル及び動的背景のデプスのずれを低減させることが可能になる。 Therefore, according to the server device 10 according to the present embodiment, it becomes possible to reduce the depth shift of the background model and the dynamic background.

［サーバ装置１０の構成］
次に、本実施例に係るサーバ装置１０の機能的構成について説明する。図６は、実施例１に係るサーバ装置１０の機能的構成を示すブロック図である。図６に示すように、サーバ装置１０は、通信Ｉ／Ｆ（InterFace）部１１と、記憶部１３と、制御部１５とを有する。なお、図１１には、上記の映像生成サービスに関連する機能部が抜粋して示されているに過ぎず、図示以外の機能部、例えば既存のコンピュータがデフォルトまたはオプションで装備する機能部がサーバ装置１０に備わることを妨げない。例えば、多視点のカメラ画像がカメラ５からサーバ装置１０へ放送波や衛星波を介して伝搬される場合、放送波や衛星波の受信部をさらに有することとしてもかまわない。 [Configuration of Server Device 10]
Next, the functional configuration of the server device 10 according to the present embodiment will be described. FIG. 6 is a block diagram illustrating a functional configuration of the server device 10 according to the first embodiment. As shown in FIG. 6, the server device 10 includes a communication I/F (InterFace) unit 11, a storage unit 13, and a control unit 15. It should be noted that FIG. 11 only shows an excerpt of the functional units related to the above-described video generation service, and the functional units other than those shown in the figure, for example, the functional units that an existing computer equips by default or as an option, are servers. It does not prevent the device 10 from being equipped. For example, when multi-view camera images are propagated from the camera 5 to the server device 10 via broadcast waves or satellite waves, a broadcast wave or satellite wave receiving unit may be further provided.

通信Ｉ／Ｆ部１１は、他の装置との間で通信制御を行うインタフェースである。 The communication I/F unit 11 is an interface that controls communication with other devices.

一実施形態として、通信Ｉ／Ｆ部１１には、ＬＡＮ（Local Area Network）カードなどのネットワークインタフェースカードが対応する。例えば、通信Ｉ／Ｆ部１１は、各カメラ５からカメラ画像を受信したり、また、撮像制御に関する指示、例えば電源ＯＮ／電源ＯＦＦの他、パンやチルトなどの指示をカメラ５へ送信したりする。 As an embodiment, the communication I/F unit 11 corresponds to a network interface card such as a LAN (Local Area Network) card. For example, the communication I/F unit 11 receives a camera image from each camera 5, or sends an instruction relating to image pickup control, for example, turning on/off the power, and also an instruction such as pan and tilt to the camera 5. To do.

記憶部１３は、制御部１５で実行されるＯＳ（Operating System）を始め、上記の映像生成プログラムなどの各種プログラムに用いられるデータを記憶するハードウェアに対応する。 The storage unit 13 corresponds to hardware that stores data used for various programs such as an OS (Operating System) executed by the control unit 15 and the above-described image generation program.

一実施形態として、記憶部１３は、サーバ装置１０における補助記憶装置に対応する。例えば、ＨＤＤ（Hard Disk Drive）、光ディスクやＳＳＤ（Solid State Drive）などが補助記憶装置に対応する。この他、ＥＰＲＯＭ（Erasable Programmable Read Only Memory)などのフラッシュメモリも補助記憶装置に対応する。 As one embodiment, the storage unit 13 corresponds to the auxiliary storage device in the server device 10. For example, an HDD (Hard Disk Drive), an optical disk, an SSD (Solid State Drive), or the like corresponds to the auxiliary storage device. In addition, a flash memory such as an EPROM (Erasable Programmable Read Only Memory) also corresponds to the auxiliary storage device.

記憶部１３は、制御部１５で実行されるプログラムに用いられるデータの一例として、シルエット画像２１０と、補正デプス画像２３０とを記憶する。これらシルエット画像２１０及び補正デプス画像２３０以外にも、記憶部１３は、自由視点映像の技術に関連する各種のデータを記憶することができる。例えば、記憶部１３は、カメラ５の位置や向きなどの外部パラメータ及びカメラ５の画角やレンズの歪みなどの内部パラメータを含むカメラパラメータの他、カメラ５から伝送されたカメラ画像の時系列データなどを視点ごとに保存することができる。なお、シルエット画像２１０及び補正デプス画像２３０の説明は、各データの登録または参照が行われる制御部１５の説明と合わせて行うこととする。 The storage unit 13 stores a silhouette image 210 and a corrected depth image 230 as an example of data used for a program executed by the control unit 15. In addition to the silhouette image 210 and the corrected depth image 230, the storage unit 13 can store various data related to the technique of free viewpoint video. For example, the storage unit 13 stores time-series data of camera images transmitted from the camera 5, in addition to camera parameters including external parameters such as the position and orientation of the camera 5 and internal parameters such as the angle of view of the camera 5 and lens distortion. Etc. can be saved for each viewpoint. The silhouette image 210 and the corrected depth image 230 will be described together with the description of the control unit 15 that registers or refers to each data.

制御部１５は、サーバ装置１０の全体制御を行う処理部である。 The control unit 15 is a processing unit that controls the entire server device 10.

一実施形態として、制御部１５は、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）などのハードウェアプロセッサにより実装することができる。ここでは、プロセッサの一例として、ＣＰＵやＭＰＵを例示したが、汎用型および特化型を問わず、任意のプロセッサにより実装することができる。この他、制御部１５は、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）などのハードワイヤードロジックによって実現されることとしてもかまわない。 As one embodiment, the control unit 15 can be implemented by a hardware processor such as a CPU (Central Processing Unit) or MPU (Micro Processing Unit). Here, the CPU and the MPU are illustrated as an example of the processor, but the processor can be implemented by any processor regardless of general-purpose type and specialized type. In addition, the control unit 15 may be realized by a hard-wired logic such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

制御部１５は、図示しない主記憶装置として実装されるＤＲＡＭ（Dynamic Random Access Memory）などのＲＡＭのワークエリア上に、上記の映像生成プログラムを展開することにより、下記の処理部を仮想的に実現する。なお、ここでは、上記の映像生成サービスに対応する機能がパッケージ化された映像生成プログラムが実行される例を挙げたが、これに限定されない。例えば、上記の映像生成サービスが提供する機能のうち、各視点のデプス画像が補正された補正デプス画像から背景モデルを生成する背景モデル生成機能などの単位でプログラムモジュールが実行されたり、ライブラリが参照されたりすることとしてもかまわない。 The control unit 15 virtually realizes the following processing unit by expanding the above-mentioned video generation program on a work area of a RAM such as a DRAM (Dynamic Random Access Memory) implemented as a main storage device (not shown). To do. In addition, here, the example in which the video generation program in which the function corresponding to the above-described video generation service is packaged is executed has been described, but the present invention is not limited to this. For example, among the functions provided by the above video generation service, a program module is executed in units such as a background model generation function that generates a background model from a corrected depth image in which the depth image of each viewpoint is corrected, or a library is referenced. It doesn't matter if they are done.

制御部１５は、図６に示すように、取得部１５Ａと、算出部１５Ｂと、分離部１５Ｃと、補正部１５Ｄと、前景生成部１５Ｅと、背景生成部１５Ｆと、レンダリング部１５Ｇとを有する。 As shown in FIG. 6, the control unit 15 includes an acquisition unit 15A, a calculation unit 15B, a separation unit 15C, a correction unit 15D, a foreground generation unit 15E, a background generation unit 15F, and a rendering unit 15G. ..

取得部１５Ａは、各視点のカメラ画像を取得する処理部である。 The acquisition unit 15A is a processing unit that acquires a camera image of each viewpoint.

一実施形態として、取得部１５Ａは、カメラ５Ａ〜カメラ５Ｎから伝送される各視点のカメラ画像をフレーム単位で取得することができる。ここで、取得部１５Ａがカメラ画像を取得する情報ソースは、任意の情報ソースであってよく、カメラ５に限定されない。例えば、取得部１５Ａは、各視点のカメラ画像を蓄積するハードディスクや光ディスクなどの補助記憶装置またはメモリカードやＵＳＢ（Universal Serial Bus）メモリなどのリムーバブルメディアから多視点のカメラ画像を取得することもできる。この他、取得部１５Ａは、カメラ５以外の外部装置からネットワークＮＷを介して各視点のカメラ画像を取得することもできる。 As one embodiment, the acquisition unit 15A can acquire the camera image of each viewpoint transmitted from the cameras 5A to 5N in frame units. Here, the information source from which the acquisition unit 15A acquires the camera image may be any information source and is not limited to the camera 5. For example, the acquisition unit 15A can also acquire multi-view camera images from an auxiliary storage device such as a hard disk or an optical disc that stores camera images of each viewpoint, or a removable medium such as a memory card or a USB (Universal Serial Bus) memory. .. In addition, the acquisition unit 15A can also acquire a camera image of each viewpoint from an external device other than the camera 5 via the network NW.

このように各視点のカメラ画像が取得された後、前景モデルの生成に用いるシルエット画像および背景モデルの生成に用いる補正デプス画像が生成される。これらシルエット画像および補正デプス画像は、カメラ５の視点ごとに当該視点に対応するカメラ画像を入力とし、算出部１５Ｂ、分離部１５Ｃおよび補正部１５Ｄによる処理が実行されることにより生成できる。 After the camera images of the respective viewpoints are acquired in this way, a silhouette image used for generating the foreground model and a corrected depth image used for generating the background model are generated. The silhouette image and the corrected depth image can be generated for each viewpoint of the camera 5 by inputting the camera image corresponding to the viewpoint and performing the processing by the calculation unit 15B, the separation unit 15C, and the correction unit 15D.

以下では、あくまで一例として、算出部１５Ｂによるデプス画像の算出、分離部１５Ｃによる前景背景分離および補正部１５Ｄによるデプス画像の補正がシングルスレッドで実行される例を挙げて説明するが、これに限定されない。例えば、カメラ５の視点ごとに、算出部１５Ｂによるデプス画像の算出、分離部１５Ｃによる前景背景分離および補正部１５Ｄによるデプス画像の補正がマルチスレッドで並列処理されることとしてもかまわない。このようにマルチスレッドで並列処理される場合、算出部１５Ｂ、分離部１５Ｃおよび補正部１５Ｄは、カメラ５の視点の数に対応するスレッド数まで並列して動作させることができる。 Hereinafter, as an example, the calculation of the depth image by the calculation unit 15B, the foreground/background separation by the separation unit 15C, and the correction of the depth image by the correction unit 15D will be described as a single thread. However, the present invention is not limited to this. Not done. For example, the calculation of the depth image by the calculation unit 15B, the foreground/background separation by the separation unit 15C, and the correction of the depth image by the correction unit 15D may be performed in parallel for each viewpoint of the camera 5. When multithreaded parallel processing is performed in this way, the calculation unit 15B, the separation unit 15C, and the correction unit 15D can operate in parallel up to the number of threads corresponding to the number of viewpoints of the camera 5.

図７は、実施例１に係る各機能部間で授受されるデータの一例を示す図である。図７には、一例として、カメラ５Ａ〜５ＮのＮ個の視点のうち、カメラ５Ａの視点に対応するデプス画像の算出、前景背景分離およびデプス画像の補正が行われる際に、算出部１５Ｂ、分離部１５Ｃおよび補正部１５Ｄの間で授受されるデータの例が示されている。 FIG. 7 is a diagram illustrating an example of data transmitted and received between the functional units according to the first embodiment. In FIG. 7, as an example, when the depth image corresponding to the viewpoint of the camera 5A among the N viewpoints of the cameras 5A to 5N is calculated, the foreground and background are separated, and the depth image is corrected, the calculation unit 15B, An example of data exchanged between the separation unit 15C and the correction unit 15D is shown.

以下、Ｎ個の視点のうちデプス画像の算出、前景背景分離およびデプス画像の補正の処理対象として選択された視点のことを「基準視点」と記載する場合がある。なお、以下では、あくまで一例として、基準視点がカメラ５Ａの視点である場合を抜粋して例示するが、他のカメラ５の視点が基準視点として選択される場合も、カメラ画像が変わるだけで処理内容に変わりはない。 Hereinafter, among the N viewpoints, a viewpoint selected as a processing target of depth image calculation, foreground/background separation, and depth image correction may be referred to as a “reference viewpoint”. Note that, in the following, as an example, the case where the reference viewpoint is the viewpoint of the camera 5A is excerpted and illustrated. However, even when the viewpoint of another camera 5 is selected as the reference viewpoint, the processing is performed only by changing the camera image. The contents are the same.

図７に示すように、取得部１５Ａにより取得されたカメラ画像２００Ａ〜カメラ画像２００Ｎのうち、基準視点に対応するカメラ画像２００Ａが算出部１５Ｂへ入力される。さらに、あくまで一例として、基準視点に対応するデプス画像をステレオマッチングにより算出する側面から、カメラ画像２００Ａとの間で視差を得ることが可能である視点、例えば基準視点と隣接するカメラ５Ｂの視点が参照視点として選択される。このように選択された参照視点に対応するカメラ画像２００Ｂも算出部１５Ｂへ入力される。 As shown in FIG. 7, among the camera images 200A to 200N acquired by the acquisition unit 15A, the camera image 200A corresponding to the reference viewpoint is input to the calculation unit 15B. Furthermore, as an example only, from the side of calculating the depth image corresponding to the reference viewpoint by stereo matching, a viewpoint that can obtain a parallax with the camera image 200A, for example, the viewpoint of the camera 5B adjacent to the reference viewpoint is Selected as the reference viewpoint. The camera image 200B corresponding to the reference viewpoint thus selected is also input to the calculation unit 15B.

これらカメラ画像２００Ａ及びカメラ画像２００Ｂが入力された場合、算出部１５Ｂは、ステレオマッチングにより基準視点に対応するデプス画像２２０Ａを算出する。例えば、算出部１５Ｂは、カメラ５Ａ及びカメラ５Ｂのカメラパラメータにしたがってカメラ画像２００Ａに対するカメラ画像２００Ｂの視差マップを基準視点に対応するデプス画像２２０Ａへ変換する。このようにして得られたデプス画像２２０Ａが算出部１５Ｂから補正部１５Ｄへ入力される。 When the camera image 200A and the camera image 200B are input, the calculation unit 15B calculates the depth image 220A corresponding to the reference viewpoint by stereo matching. For example, the calculation unit 15B converts the parallax map of the camera image 200B with respect to the camera image 200A into the depth image 220A corresponding to the reference viewpoint according to the camera parameters of the cameras 5A and 5B. The depth image 220A thus obtained is input from the calculation unit 15B to the correction unit 15D.

なお、ここでは、あくまで一例として、基準視点に対応するデプス画像２２０Ａがステレオマッチングにより算出される例を挙げたが、これに限定されない。例えば、３次元のレーザセンサ等のデプスカメラにより測定させることにより基準視点に対応するデプス画像２２０Ａが取得されることとしてもかまわない。 Here, as an example, the depth image 220A corresponding to the reference viewpoint is calculated by stereo matching, but the present invention is not limited to this. For example, the depth image 220A corresponding to the reference viewpoint may be acquired by measuring with a depth camera such as a three-dimensional laser sensor.

一方、基準視点に対応するカメラ画像２００Ａは、算出部１５Ｂの他、分離部１５Ｃにも入力される。カメラ画像２００Ａが入力された場合、分離部１５Ｃは、カメラ画像２００Ａに含まれる被写体を前景および背景に分離する。 On the other hand, the camera image 200A corresponding to the reference viewpoint is input to the separation unit 15C as well as the calculation unit 15B. When the camera image 200A is input, the separating unit 15C separates the subject included in the camera image 200A into the foreground and the background.

あくまで一例として、分離部１５Ｃは、いわゆる背景差分によりカメラ画像２００Ａから前景に対応するシルエットを抽出することができる。例えば、時系列に取得されるカメラ画像２００Ａのうち、前景が観測されない可能性が高いフレームの画像を背景画像として保存しておく。例えば、背景画像には、所定のフレーム数にわたってフレーム間の差分が検出されなかったフレームの画像などを用いることができる。このような背景画像が保存された下で、分離部１５Ｃは、最新のフレームに対応するカメラ画像２００Ａと背景画像との間で画素値の差が所定の閾値以上であるか否かにより、前景または背景のラベルを画素ごとに割り当てる。これによって、画素ごとに前景または背景のラベルが割り当てられたシルエット画像２１０Ａが得られる。なお、ここでは、一例として、背景差分により前景背景分離が行われる例を挙げたが、グラフカット、例えば２次元のグラフカットにより前景背景分離を行うこととしてもかまわない。 By way of example only, the separating unit 15C can extract the silhouette corresponding to the foreground from the camera image 200A based on the so-called background difference. For example, of the camera images 200A acquired in time series, an image of a frame in which the foreground is not likely to be observed is saved as a background image. For example, as the background image, an image of a frame in which a difference between frames is not detected over a predetermined number of frames can be used. Under such a background image being stored, the separation unit 15C determines whether the foreground is different depending on whether the difference in pixel value between the camera image 200A corresponding to the latest frame and the background image is equal to or more than a predetermined threshold value. Alternatively, a background label is assigned for each pixel. As a result, a silhouette image 210A to which a foreground or background label is assigned for each pixel is obtained. Here, as an example, the foreground/background separation is performed based on the background difference, but the foreground/background separation may be performed by a graph cut, for example, a two-dimensional graph cut.

このようにして得られたシルエット画像２１０Ａは、デプス画像２２０Ａの補正に用いる側面から分離部１５Ｃから補正部１５Ｄへ入力されると共に、前景モデルの生成にも用いる側面から記憶部１３に保存される。 The silhouette image 210A thus obtained is input to the correction unit 15D from the separation unit 15C from the side used for correcting the depth image 220A, and is also stored in the storage unit 13 from the side used for generating the foreground model. ..

これらデプス画像２２０Ａ及びシルエット画像２１０Ａが入力された場合、補正部１５Ｄは、シルエット画像２１０Ａを用いてデプス画像２２０Ａを補正する。このデプス画像２２０Ａの補正時には、シルエット画像２１０Ａで背景のラベルが割り当てられたデプス画像２２０Ａの画素の画素値を有効とし、（１）空間フィルタリングおよび（２）時間フィルタリングを行うことができる。これら空間フィルタリングおよび時間フィルタリングのうち少なくとも１つが実行されればよく、必ずしも２つとも実行されずともかまわない。 When the depth image 220A and the silhouette image 210A are input, the correction unit 15D corrects the depth image 220A using the silhouette image 210A. When correcting the depth image 220A, the pixel values of the pixels of the depth image 220A to which the background label is assigned in the silhouette image 210A are made valid, and (1) spatial filtering and (2) temporal filtering can be performed. It suffices that at least one of these spatial filtering and temporal filtering is executed, and both do not necessarily have to be executed.

なお、図１や図７には、シルエット画像や補正デプス画像が記憶部１３に保存される例を挙げたが、必ずしもシルエット画像や補正デプス画像が記憶部１３等のストレージに格納されずともかまわない。 1 and 7 exemplify the case where the silhouette image and the corrected depth image are stored in the storage unit 13, but the silhouette image and the corrected depth image may not necessarily be stored in the storage such as the storage unit 13. Absent.

（１）空間フィルタリング
補正部１５Ｄは、デプス画像２２０Ａに含まれる画素ごとに当該画素のデプス値にその周辺画素のデプス値を畳み込むフィルタを適用する。このようなフィルタの例示として、ガウシアンフィルタや入力画像のエッジを参照したエッジ保存型のフィルタ、例えばバイラテラルフィルタなどの平滑化フィルタが挙げられる。 (1) Spatial Filtering The correction unit 15D applies a filter for each pixel included in the depth image 220A by convolving the depth value of the pixel with the depth value of the peripheral pixels. Examples of such a filter include a Gaussian filter, an edge-preserving filter that refers to edges of an input image, and a smoothing filter such as a bilateral filter.

ここで、フィルタの適用時には、補正部１５Ｄは、シルエット画像２１０Ａで背景のラベルが割り当てられたデプス画像２２０Ａの画素のデプス値を有効とし、フィルタの畳み込み演算を実行する。 Here, when the filter is applied, the correction unit 15D validates the depth value of the pixel of the depth image 220A to which the background label is assigned in the silhouette image 210A, and executes the filter convolution operation.

図８Ａは、シルエット画像２１０Ａの一例を示す図である。図８Ｂは、デプス画像２２０Ａの一例を示す図である。図８Ａ及び図８Ｂには、あくまで一例として、フィルタサイズが３×３であるガウシアンフィルタが適用される例が示されると共に、画素（イ）、画素（ロ）及び画素（ハ）の３つの画素にガウシアンフィルタが適用される場面が示されている。なお、ここでは、説明の便宜上、フィルタサイズが３×３である場合を例示するが、当然のことながら任意のフィルタサイズであってかまわない。 FIG. 8A is a diagram showing an example of a silhouette image 210A. FIG. 8B is a diagram showing an example of the depth image 220A. 8A and 8B show an example in which a Gaussian filter having a filter size of 3×3 is applied, and three pixels of pixel (a), pixel (b), and pixel (c) are shown as an example. The scene where the Gaussian filter is applied is shown. Here, for convenience of explanation, a case where the filter size is 3×3 is illustrated, but it goes without saying that the filter size may be any filter size.

以下、フィルタの適用時にフィルタの原点と重ね合わされる画素のことを「注目画素」とし、注目画素の周辺に位置する画素、例えば８近傍の画素のことを「周辺画素」と記載することがある。 Hereinafter, a pixel that is overlapped with the origin of the filter when the filter is applied may be referred to as a “target pixel”, and a pixel located in the periphery of the target pixel, for example, a pixel in the vicinity of 8 may be referred to as a “peripheral pixel”. ..

図９Ａは、画像ＩＤ（IDentification）の一例を示す図である。図９Ａに示すように、フィルタの畳み込み演算時には、あくまで一例として、注目画素を「ｐ_４」と識別する。さらに、注目画素の周辺画素のうち、左上の画素を「ｐ_０」、真上の画素を「ｐ_１」、右上の画素を「ｐ_２」、左の画素を「ｐ_３」、右の画素を「ｐ_５」、左下の画素を「ｐ_６」、真下の画素を「ｐ_７」、右下の画素を「ｐ_８」と識別する。 FIG. 9A is a diagram showing an example of an image ID (IDentification). As shown in FIG. 9A, at the time of the convolution calculation of the filter, the pixel of interest is identified as “p ₄ ”as an example. Further, among the peripheral pixels of the pixel of interest, the upper left pixel is “p ₀ ”, the pixel immediately above is “p ₁ ”, the upper right pixel is “p ₂ ”, the left pixel is “p ₃ ”, the right pixel Is identified as “p ₅ ”, the lower left pixel is identified as “p ₆ ”, the pixel immediately below is identified as “p ₇ ”, and the lower right pixel is identified as “p ₈ ”.

このような識別の下、補正部１５Ｄは、注目画素の補正デプスＤ_ｉを下記の式（１）または下記の式（２）にしたがって算出する。ここで、式（１）及び式（２）における「ｉ」とは、画素ＩＤを指し、例えば、ｐ_０からｐ_８までの８つの画素が含まれる。また、式（１）における「ｌ_ｉ」とは、シルエット画像２１０Ａの画素のうち画素ｐ_ｉに付与される前景または背景のラベル値を指す。ここでは、あくまで一例として、背景のラベルには、「１」が付与される一方で、前景のラベルには、「０」が付与されることとして以下の説明を行う。また、式（１）における「ｋ_ｉ」とは、３×３のフィルタ係数の配列のうち画素ｐ_ｉに適用されるフィルタ係数を指す。また、式（１）における「ｄ_ｉ」とは、デプス画像２２０Ａの画素のうち画素ｐ_ｉが有するデプスの値を指す。 Under such identification, the correction unit 15D calculates the correction depth D _{i of the} pixel of interest according to the following formula (1) or formula (2). Here, “i” in Expressions (1) and (2) indicates a pixel ID, and includes, for example, eight pixels from p ₀ to p ₈ . In addition, “l _i ”in Expression (1) refers to a label value of the foreground or the background, which is given to the pixel p _i of the pixels of the silhouette image 210A. Here, as an example, the following description will be made assuming that “1” is given to the background label and “0” is given to the foreground label. Further, “k _i ”in Expression (1) refers to the filter coefficient applied to the pixel p _i in the array of 3×3 filter coefficients. In addition, “d _i ”in Expression (1) refers to the depth value of the pixel p _i of the pixels of the depth image 220A.

Ｄ_ｉ＝（Σｌ_ｉ×ｋ_ｉ×ｄ_ｉ）÷（Σｌ_ｉ×ｋ_ｉ）Ｉｆｌ_４＝１・・・（１）
Ｄ_ｉ＝ｆｏｒｅｇｒｏｕｎｄＩｆｌ_４＝０・・・（２） D _i =(Σl _i ×k _i ×d _i )÷(Σl _i ×k _i )If l ₄ =1...(1)
D _i =foreground If l ₄ =0 (2)

すなわち、補正部１５Ｄは、注目画素のラベルｌ_４が「１」である場合、すなわち注目画素に背景のラベルが付与されている場合、式（１）を用いて補正デプスを算出する。一方、注目画素のラベルｌ_４が「０」である場合、すなわち注目画素に前景のラベルが付与されている場合、式（２）により注目画素が前景と識別される。この場合、補正部１５Ｄは、注目画素のデプスが背景モデルの生成に用いられるのを抑制する側面から、注目画素の補正デプスＤ_ｉに無効値、例えばＮＵＬＬ値を設定することにより無効化する。 That is, the correction unit 15D, when the pixel of interest labeled l ₄ is "1", that is, when the label of the background to the target pixel is assigned to calculate the corrected depth using Equation (1). On the other hand, if the target pixel label l ₄ is "0", i.e., if the foreground label is applied to the pixel of interest, the pixel of interest is identified as foreground by equation (2). In this case, the correction unit 15D invalidates the depth of the target pixel by setting an invalid value, for example, a NULL value, in the correction depth D _i of the target pixel from the viewpoint of suppressing the depth of the target pixel from being used for generating the background model.

例えば、画素（イ）にガウシアンフィルタが適用される場合、補正部１５Ｄは、図９Ｂに示す畳み込み演算を実行する。図９Ｂは、フィルタの畳み込み演算の一例を示す図である。図９Ｂには、畳み込み演算時にデプスが有効とされる画素がハッチングで示されている。すなわち、図８Ａのシルエット画像２１０Ａに示された通り、注目画素（イ）およびその８近傍の周辺画素には、背景のラベルが付与されている。この場合、図９Ｂに示すように、ラベル行列の全てのラベル値ｌ_０〜ｌ_８には、「１」が設定される。このようなラベル行列によって、デプス行列の全てのデプス値ｄ_０〜ｄ_８が畳み込み演算に用いられる。また、カーネルのうち、左上の画素ｐ_０のフィルタ係数ｋ_０として「１／１６」、真上の画素ｐ_１のフィルタ係数ｋ_１として「２／１６」、右上の画素ｐ_２のフィルタ係数ｋ_２として「１／１６」が用いられる。さらに、左の画素ｐ_３のフィルタ係数ｋ_３として「２／１６」、注目画素ｐ_４のフィルタ係数ｋ_４として「４／１６」、右の画素ｐ_５のフィルタ係数ｋ_５として「２／１６」が用いられる。さらに、左下の画素ｐ_６のフィルタ係数ｋ_６として「１／１６」、真下の画素ｐ_７のフィルタ係数ｋ_７として「２／１６」、右下の画素ｐ_８のフィルタ係数ｋ_８として「１／１６」が用いられる。 For example, when the Gaussian filter is applied to the pixel (a), the correction unit 15D executes the convolution calculation shown in FIG. 9B. FIG. 9B is a diagram showing an example of the convolution operation of the filter. In FIG. 9B, pixels for which the depth is valid at the time of the convolution calculation are shown by hatching. That is, as shown in the silhouette image 210A of FIG. 8A, the background label is given to the pixel of interest (a) and its eight neighboring pixels. In this case, as shown in FIG. 9B, “1” is set to all label values l _{0 to} l ₈ of the label matrix. With such a label matrix, all depth values d _{0 to} d _{8 of the} depth matrix are used in the convolution operation. In addition, among the kernel, "1/16" as the filter coefficient _{k 0} of the upper left pixel _{p 0,} "2/16" as the filter coefficients _{k 1} directly above the pixel _{p 1,} the filter coefficients in the upper right of the pixel _{p 2} k "1/16" is used as a _2. Furthermore, "2/16" as the filter coefficient _{k 3} of the left pixel _{p 3,} "4/16" as the filter coefficient _{k 4} of the pixel of interest _{p 4,} as the filter coefficient _{k 5} of the right pixel _{p 5} "2/16 Is used. Furthermore, "1/16" as the filter coefficient _{k 6} of the lower left pixel _{p 6,} "2/16" as the filter coefficients _{k 7} pixel _{p 7} beneath, "1 as the filter coefficient _{k 8} pixels _{p 8} lower right /16” is used.

これらラベル行列、カーネル及びデプス行列の下、式（１）にしたがって注目画素（イ）の補正デプスＤ_ｉが算出される。例えば、左上の画素ｐ_０の計算は、１×（１／１６）×ｄ_０となる。また、真上の画素ｐ_１の計算は、１×（２／１６）×ｄ_１となる。また、左上の画素ｐ_２の計算は、１×（１／１６）×ｄ_２となる。また、左の画素ｐ_３の計算は、１×（２／１６）×ｄ_３となる。また、注目画素の画素ｐ_４の計算は、１×（４／１６）×ｄ_４となる。また、右の画素ｐ_５の計算は、１×（２／１６）×ｄ_５となる。また、左下の画素ｐ_６の計算は、１×（１／１６）×ｄ_６となる。また、真下の画素ｐ_７の計算は、１×（２／１６）×ｄ_７となる。また、右下の画素ｐ_８の計算は、１×（１／１６）×ｄ_８となる。これらの合計が注目画素（イ）の補正デプスＤ_ｉとして算出される。 Under these label matrix, kernel, and depth matrix, the correction depth D _i of the pixel of interest (a) is calculated according to Expression (1). For example, the calculation of the upper left pixel p ₀ is 1×(1/16)×d ₀ . Moreover, the calculation of the pixel p ₁ immediately above is 1×(2/16)×d ₁ . The calculation of the upper left pixel p ₂ is 1×(1/16)×d ₂ . Further, the calculation of the left pixel p ₃ is 1×(2/16)×d ₃ . Further, the calculation of the pixel p ₄ of the target pixel is 1×(4/16)×d ₄ . Further, the calculation of the right pixel p ₅ is 1×(2/16)×d ₅ . Further, the calculation of the lower left pixel p ₆ is 1×(1/16)×d ₆ . Further, the calculation of the pixel p ₇ immediately below is 1×(2/16)×d ₇ . Further, the calculation of the lower right pixel p ₈ is 1×(1/16)×d ₈ . The sum of these is calculated as the correction depth D _i of the target pixel (a).

次に、画素（ロ）にガウシアンフィルタが適用される場合、補正部１５Ｄは、図９Ｃに示す畳み込み演算を実行する。図９Ｃは、フィルタの畳み込み演算の一例を示す図である。図９Ｃにも、畳み込み演算時にデプスが有効とされる画素がハッチングで示される一方で、畳み込み演算時にデプスが無効とされる画素が無地で示されている。すなわち、図８Ａのシルエット画像２１０Ａに示された通り、注目画素（ロ）には、背景のラベルが付与されているものの、８近傍の周辺画素のうち一部の周辺画素、すなわち左の画素及び左下の画素には、前景のラベルが付与されている。この場合、図９Ｃに示すように、ラベル行列のうち左の画素のラベル値ｌ_３及び左下の画素のラベル値ｌ_６には、「０」が設定される。このようなラベル行列によって、デプス行列の全てのデプス値ｄ_０〜ｄ_８のうち左の画素のデプス値ｄ_３及び左下の画素のデプス値ｄ_６が無効化される。 Next, when the Gaussian filter is applied to the pixel (b), the correction unit 15D executes the convolution operation shown in FIG. 9C. FIG. 9C is a diagram showing an example of the convolution operation of the filter. Also in FIG. 9C, the pixels for which the depth is valid at the time of the convolution operation are shown by hatching, while the pixels for which the depth is invalid at the time of the convolution operation are shown as plain. That is, as shown in the silhouette image 210A of FIG. 8A, although the target pixel (b) is labeled with the background, some of the peripheral pixels in the eight neighboring pixels, that is, the left pixel and The foreground label is attached to the lower left pixel. In this case, as shown in FIG. 9C, “0” is set to the label value l ₃ of the left pixel and the label value l ₆ of the lower left pixel of the label matrix. With such a label matrix, the depth value d ₃ of the left pixel and the depth value d ₆ of the lower left pixel of all the depth values d _{0 to} d ₈ of the depth matrix are invalidated.

これらラベル行列、カーネル及びデプス行列の下、式（１）にしたがって注目画素（ロ）の補正デプスＤ_ｉが算出される。例えば、左上の画素ｐ_０の計算は、１×（１／１６）×ｄ_０となる。また、真上の画素ｐ_１の計算は、１×（２／１６）×ｄ_１となる。また、左上の画素ｐ_２の計算は、１×（１／１６）×ｄ_２となる。また、左の画素ｐ_３の計算は、０×（２／１６）×ｄ_３となる。また、注目画素の画素ｐ_４の計算は、１×（４／１６）×ｄ_４となる。また、右の画素ｐ_５の計算は、１×（２／１６）×ｄ_５となる。また、左下の画素ｐ_６の計算は、０×（１／１６）×ｄ_６となる。また、真下の画素ｐ_７の計算は、１×（２／１６）×ｄ_７となる。また、右下の画素ｐ_８の計算は、１×（１／１６）×ｄ_８となる。これらの合計が注目画素（ロ）の補正デプスＤ_ｉとして算出される。 Under these label matrix, kernel, and depth matrix, the correction depth D _i of the pixel of interest (b) is calculated according to Expression (1). For example, the calculation of the upper left pixel p ₀ is 1×(1/16)×d ₀ . Moreover, the calculation of the pixel p ₁ immediately above is 1×(2/16)×d ₁ . The calculation of the upper left pixel p ₂ is 1×(1/16)×d ₂ . Further, the calculation of the left pixel p ₃ is 0×(2/16)×d ₃ . Further, the calculation of the pixel p ₄ of the target pixel is 1×(4/16)×d ₄ . Further, the calculation of the right pixel p ₅ is 1×(2/16)×d ₅ . Further, the calculation of the lower left pixel p ₆ is 0×(1/16)×d ₆ . Further, the calculation of the pixel p ₇ immediately below is 1×(2/16)×d ₇ . Further, the calculation of the lower right pixel p ₈ is 1×(1/16)×d ₈ . The sum of these is calculated as the correction depth D _i of the pixel of interest (b).

また、画素（ハ）にガウシアンフィルタが適用される場合、画素（ハ）には前景のラベルが付与されているので、補正部１５Ｄは、注目画素（ハ）の補正デプスＤ_ｉにＮＵＬＬ値を設定する。 Further, when the Gaussian filter is applied to the pixel (C), the foreground label is given to the pixel (C), so the correction unit 15D sets the NULL value to the correction depth D _i of the pixel (C) of interest. Set.

このように、補正部１５Ｄは、デプス画像２２０Ａの画素ごとに注目画素および周辺画素のうち背景のラベルが割り当てられた画素のデプス値を有効とし、前景のラベルが割り当てられた画素のデプス値を無効としてフィルタを適用する空間フィルタリングを行う。これによって、前景の被写体と背景の被写体との境界部の画素においても、両者のデプスを混在せずに、デプス画像におけるデプスのばらつきを補正することができる。それ故、デプス画像の画素間におけるデプスのばらつきを抑制したり、あるいはデプス画像のうち背景のラベルが割り当てられた画素のデプス値に欠損がある場合でもデプス値を補間したりすることができる。 As described above, the correction unit 15D validates the depth value of the pixel to which the background label is assigned among the pixel of interest and the peripheral pixels for each pixel of the depth image 220A, and sets the depth value of the pixel to which the foreground label is assigned. Perform spatial filtering to apply the filter as invalid. As a result, it is possible to correct the depth variation in the depth image even in the pixels at the boundary between the foreground subject and the background subject without the depths of both being mixed. Therefore, it is possible to suppress the variation in depth between pixels of the depth image, or to interpolate the depth value even when the depth value of the pixel to which the background label is assigned in the depth image has a defect.

（２）時間フィルタリング
補正部１５Ｄは、デプス画像２２０Ａに含まれる画素ごとに当該画素のデプス値に過去の所定数のフレームに遡って同一の位置に存在する画素のデプス値を畳み込むフィルタを適用する。 (2) Temporal Filtering The correction unit 15D applies, for each pixel included in the depth image 220A, a depth value of the pixel that is convolved with the depth value of the pixel existing at the same position retroactively to a predetermined number of past frames. ..

この時間フィルタリングにおいても、フィルタの適用時には、補正部１５Ｄは、シルエット画像２１０Ａで背景のラベルが割り当てられたデプス画像２２０Ａの画素のデプス値を有効とし、フィルタの畳み込み演算を実行する。 Also in this temporal filtering, when the filter is applied, the correction unit 15D validates the depth value of the pixel of the depth image 220A to which the background label is assigned in the silhouette image 210A, and executes the filter convolution operation.

図１０Ａは、時間フィルタリングにおける注目画素の一例を示す図である。図１０Ａに示すように、フィルタの畳み込み演算時には、あくまで一例として、注目画素を「ｉ」と識別する。さらに、デプス画像２２０Ａのフレームの識別にインデックスｔを用いることとし、注目画素ｉの補正デプスＤ_ｉを算出する注目フレームを「ｔ＝Ｔ」と識別する。さらに、注目フレームＴの過去フレームのうち注目フレームの１つ前の過去フレームを「ｔ＝Ｔ−１」と識別し、注目フレームＴの過去フレームのうち注目フレームのＮ個の過去フレームを「ｔ＝Ｔ−Ｎ」と識別する。 FIG. 10A is a diagram showing an example of a pixel of interest in temporal filtering. As shown in FIG. 10A, at the time of the convolution calculation of the filter, the pixel of interest is identified as “i” as an example. Furthermore, the index t is used to identify the frame of the depth image 220A, and the target frame for which the corrected depth D _i of the target pixel _i is calculated is identified as “t=T”. Further, among the past frames of the target frame T, the past frame immediately before the target frame is identified as “t=T−1”, and the N past frames of the target frame among the past frames of the target frame T are denoted by “t. =T−N”.

例えば、補正部１５Ｄは、注目フレームＴの注目画素ｐ_ｉ，Ｔの補正デプスＤ_ｉ，Ｔを下記の式（３）または下記の式（４）にしたがって算出する。ここで、式（３）及び式（４）における「Ｎ」とは、カーネルのサイズを指す。また、式（３）における「ｌ_ｉ，ｔ」とは、フレームｔのシルエット画像２１０Ａの画素ｐ_ｉ，ｔに付与される前景または背景のラベル値を指す。ここでは、あくまで一例として、背景のラベルには、「１」が付与される一方で、前景のラベルには、「０」が付与されることとして以下の説明を行う。また、式（３）における「ｋ_ｔ」とは、カーネルのうちフレームｔに適用されるフィルタ係数を指す。また、式（３）における「ｄ_ｉ，ｔ」とは、フレームｔのデプス画像２２０Ａの画素ｐ_ｉ，ｔが有するデプス値を指す。 For example, the correction unit 15D calculates the correction depth D _i,T of the target pixel p _i,T of the target frame T according to the following formula (3) or the following formula (4). Here, “N” in Expression (3) and Expression (4) indicates the size of the kernel. In addition, “l _i,t ” in Expression (3) refers to the label value of the foreground or background that is given to the pixel p _i,t of the silhouette image 210A of the frame t. Here, as an example, the following description will be made assuming that “1” is given to the background label and “0” is given to the foreground label. Further, “k _t ”in Expression (3) refers to a filter coefficient applied to frame t in the kernel. In addition, “d _i,t ” in Expression (3) refers to the depth value of the pixel p _i,t of the depth image 220A of the frame t.

Ｄ_ｉ，Ｔ＝ｆｏｒｅｇｒｏｕｎｄＩｆｌ_ｉ，Ｔ＝０・・・（４） D _i,T =foreground ground If l _i,T =0 (4)

すなわち、補正部１５Ｄは、注目フレームＴの注目画素ｐ_ｉ，Ｔのラベルｌ_ｉ，Ｔが「１」である場合、すなわち注目フレームの注目画素に背景のラベルが付与されている場合、式（３）を用いて補正デプスを算出する。一方、注目フレームＴの注目画素ｐ_ｉ，Ｔのラベルｌ_ｉ，Ｔが「０」である場合、すなわち注目画素に前景のラベルが付与されている場合、式（４）により注目画素が前景と識別される。この場合、補正部１５Ｄは、注目フレームＴの注目画素ｐ_ｉ，Ｔのデプスが背景モデルの生成に用いられるのを抑止する側面から、注目フレームＴの注目画素ｐ_ｉ，Ｔの補正デプスＤ_ｉ，ＴをＮＵＬＬ値として無効化する。 That is, the correction unit 15D includes the pixel of interest p _i of the frame of interest _T, when _T label l _{i, T} is "1", that is, when the label of the background to the target pixel of the frame of interest has been granted, the formula ( Calculate the corrected depth using 3). On the other hand, the pixel of interest p _i of the frame of interest _T, when _T label l _{i, T} is "0", i.e., if the foreground label to the target pixel is assigned a pixel of interest is the foreground by the formula (4) To be identified. In this case, the correction unit 15D includes the pixel of interest p _i of the frame of interest _T, from the side depth of _T is suppressed from being used to generate the background model, the pixel of interest p _i of the frame of interest _{T, T} of the correction depth D _{i , T} as null values and invalidate.

図１０Ｂは、フィルタの畳み込み演算の一例を示す図である。図１０Ｂには、畳み込み演算時にデプスが有効とされるフレームｔの注目画素ｐ_ｉ，ｔがハッチングで示される一方で、畳み込み演算時にデプスが無効とされるフレームｔの注目画素ｐ_ｉ，ｔが無地で示されている。なお、図１０Ｂには、カーネルサイズＮが「４」である例が示されているが、カーネルサイズＮは２以上の任意の値であってかまわない。 FIG. 10B is a diagram illustrating an example of the convolution operation of the filter. In FIG. 10B, the target pixel p _{i,t of the} frame t for which the depth is valid during the convolution calculation is shown by hatching, while the target pixel p _{i,t of the} frame t for which the depth is invalid during the convolution calculation is shown. Shown in solid color. Note that FIG. 10B shows an example in which the kernel size N is “4”, but the kernel size N may be any value of 2 or more.

図１０Ｂに示すように、注目画素ｉに関し、注目フレームＴ、１つ前の過去フレームＴ−１、３つ前の過去フレームＴ−３には、背景のラベルが付与されているものの、２つ前の過去フレームＴ−２には、前景のラベルが付与されている。この場合、図１０Ｂに示すように、ラベル行列のうち２つ前の過去フレームＴ−２のラベル値ｌ_{ｉ，Ｔ−２}には、「０」が設定される。このようなラベル行列によって、デプス行列のデプス値ｄ_ｉ，Ｔ〜ｄ_{ｉ，Ｔ−４}のうち２つ前の過去フレームＴ−２のデプス値ｄ_{ｉ，Ｔ−２}が無効化される。また、カーネルのうち、注目フレームＴのフィルタ係数ｋ_Ｔとして「２０／６４」、１つ前の過去フレームＴ−１のフィルタ係数ｋ_Ｔ−１として「１５／６４」、２つ前の過去フレームＴ−２のフィルタ係数ｋ_Ｔ−２として「６／６４」が用いられる。さらに、３つ前の過去フレームＴ−３のフィルタ係数ｋ_Ｔ−３として「１／６４」が用いられる。 As shown in FIG. 10B, regarding the pixel of interest i, a background frame is given to the frame of interest T, the previous frame T-1 that is one before, and the past frame T-3 that is three before, but there are two. The foreground label is given to the previous past frame T-2. In this case, as shown in FIG. 10B, “0” is set to the label value l _i,T-2 of the past frame T-2 that is two frames before in the label matrix. Such label matrix, the depth value _d i of the depth matrix, _{T to d i,} 2 preceding the past frame T-2 of the depth values _{d i} of the _{_T-4,} the _T-2 is invalidated. Further, in the kernel, the filter coefficient k _T of the frame of interest _T is “20/64”, the filter coefficient k _T-1 of the previous frame _T-1 is “15/64”, and the previous frame is two previous frames. “6/64” is used as the filter coefficient k _T-2 of _T-2 . Furthermore, “1/64” is used as the filter coefficient k _T-3 of the past frame T-3 that is three frames before.

これらラベル行列、カーネル及びデプス行列の下、式（３）にしたがって注目画素ｐ_ｉ，Ｔの補正デプスＤ_ｉ，Ｔが算出される。例えば、注目フレームＴの画素ｐ_ｉ，Ｔの計算は、１×（２０／６４）×ｄ_ｉ，Ｔとなる。また、１つ前の過去フレームＴ−１の画素ｐ_{ｉ，Ｔ−１}の計算は、１×（１５／６４）×ｄ_{ｉ，Ｔ−１}となる。また、２つ前の過去フレームＴ−２の画素ｐ_{ｉ，Ｔ−２}の計算は、０×（６／６４）×ｄ_{ｉ，Ｔ−２}となる。また、３つ前の過去フレームＴ−３の画素ｐ_{ｉ，Ｔ−３}の計算は、１×（１／６４）×ｄ_{ｉ，Ｔ−３}となる。これらの合計が注目画素ｐ_ｉ，Ｔの補正デプスＤ_ｉ，Ｔとして算出される。 Under these label matrix, kernel and depth matrix _, the correction depth D _{i,T of the} target pixel p _i,T is calculated according to the equation (3). For example, the calculation of the pixel p _i,T of the frame of interest T is 1×(20/64)×d _i,T . In addition, the calculation of the pixel p _i,T-1 of the immediately previous frame T-1 is 1×(15/64)×d _i,T-1 . Further, the calculation of the pixel p _i,T-2 of the previous frame T-2 two frames before is 0×(6/64)×d _i,T-2 . Further, the calculation of the pixel p _i,T-3 of the past frame T-3 three frames before is 1×(1/64)×d _i,T-3 . The sum of these is calculated as the correction depth D _i,T of the pixel of interest p _i,T .

このように、補正部１５Ｄは、デプス画像２２０Ａの画素ごとに注目フレームおよび過去フレームにおける注目画素のうち背景のラベルが割り当てられた画素のデプス値を有効とし、前景のラベルが割り当てられた画素のデプス値を無効してフィルタを適用する。このような時間フィルタリングによって、過去フレームで注目画素のラベルが前景または背景にばらつく場合でも、両者のデプスを混在せずに、デプス画像のフレーム間におけるデプスのばらつきを補正することができる。このため、デプス画像のフレーム間におけるデプスのばらつきを抑制することができる。 As described above, the correction unit 15D validates the depth value of the pixel to which the background label is assigned among the pixels of interest in the target frame and the past frame for each pixel of the depth image 220A, and determines the depth value of the pixel to which the foreground label is assigned. Apply the filter by invalidating the depth value. By such temporal filtering, even if the label of the pixel of interest varies in the foreground or background in the past frame, it is possible to correct the depth variation between the frames of the depth image without mixing the depths of the both. For this reason, it is possible to suppress variation in depth between frames of the depth image.

これら空間フィルタリング及び時間フィルタリングにより得られた補正デプス画像２３０Ａは、背景モデルの生成に用いる側面から、記憶部１３に保存される。 The corrected depth image 230A obtained by the spatial filtering and the temporal filtering is stored in the storage unit 13 from the aspect used for generating the background model.

図６の説明に戻り、前景生成部１５Ｅは、前景モデルを生成する処理部である。 Returning to the description of FIG. 6, the foreground generation unit 15E is a processing unit that generates a foreground model.

一実施形態として、前景生成部１５Ｅは、記憶部１３にカメラ５の視点ごとに記憶されたシルエット画像２１０を用いて、前景モデル３Ｍｆｇを生成することができる。この前景モデルの生成には、図３を用いて上述したＶｉｓｕａｌ−Ｈｕｌｌを適用することができる。このＶｉｓｕａｌＨｕｌｌでは、カメラ５の光学中心とシルエット画像上のシルエットとを結んでできるＣｏｎｅが生成された上で、Ｃｏｎｅ同士が重なる３次元空間上の領域が前景に対応する被写体３ｆｇの３次元形状として算出される。例えば、図３に示すように、前景生成部１５Ｅは、カメラ５Ａ〜５Ｃの各視点に対応するシルエット画像２１０Ａ〜２１０ＣごとにシルエットＳＡ〜ＳＣを３次元空間に投影する。例えば、シルエットＳＡが投影された場合、カメラ５Ａの光学中心およびシルエット画像２１０Ａ上のシルエットＳＡを結ぶ視体積ＣＡが得られる。さらに、シルエットＳＢが投影された場合、カメラ５Ｂの光学中心およびシルエット画像２１０Ｂ上のシルエットＳＢを結ぶ視体積ＣＢが得られる。さらに、シルエットＳＣが投影された場合、カメラ５Ｃの光学中心およびシルエット画像２１０Ｃ上のシルエットＳＣを結ぶ視体積ＣＣが得られる。これら視体積ＣＡ〜ＣＣが重複するＶｉｓｕａｌＨｕｌｌ領域、すなわち図３に示す黒の塗り潰しの３次元形状が前景モデル３Ｍｆｇとして算出される。 As one embodiment, the foreground generation unit 15E can generate the foreground model 3Mfg using the silhouette image 210 stored in the storage unit 13 for each viewpoint of the camera 5. The Visual-Hull described above with reference to FIG. 3 can be applied to the generation of this foreground model. In this Visual Hull, a Cone formed by connecting the optical center of the camera 5 and the silhouette on the silhouette image is generated, and the area in the three-dimensional space where the Cone overlaps is the three-dimensional shape of the subject 3fg corresponding to the foreground. Is calculated as For example, as shown in FIG. 3, the foreground generation unit 15E projects the silhouettes SA to SC in the three-dimensional space for each of the silhouette images 210A to 210C corresponding to the viewpoints of the cameras 5A to 5C. For example, when the silhouette SA is projected, a visual volume CA connecting the optical center of the camera 5A and the silhouette SA on the silhouette image 210A is obtained. Further, when the silhouette SB is projected, a visual volume CB connecting the optical center of the camera 5B and the silhouette SB on the silhouette image 210B is obtained. Furthermore, when the silhouette SC is projected, a visual volume CC connecting the optical center of the camera 5C and the silhouette SC on the silhouette image 210C is obtained. A Visual Hull region in which these visual volumes CA to CC overlap, that is, a three-dimensional shape with black filling shown in FIG. 3 is calculated as the foreground model 3Mfg.

背景生成部１５Ｆは、背景モデルを生成する処理部である。 The background generation unit 15F is a processing unit that generates a background model.

一実施形態として、背景生成部１５Ｆは、記憶部１３にカメラ５の視点ごとに記憶された補正デプス画像２３０を用いて、背景モデル３Ｍｂｇを生成することができる。例えば、背景生成部１５Ｆは、各視点の補正デプス画像２３０を合成することにより、背景モデル３Ｍｂｇを生成する。なお、ここでは、あくまで一例として、補正デプス画像を合成して３次元の背景モデルを生成することとしたが、必ずしも３次元の背景モデルを生成せずともかまわない。例えば、カメラ５の視点ごとに得られた補正デプス画像を合成せずに各視点の補正デプス画像をそのままレンダリング部１５Ｇに入力することとしてもかまわない。 As an embodiment, the background generation unit 15F can generate the background model 3Mbg using the corrected depth image 230 stored in the storage unit 13 for each viewpoint of the camera 5. For example, the background generation unit 15F generates the background model 3Mbg by combining the corrected depth images 230 of the respective viewpoints. Note that here, as an example, the corrected depth image is combined to generate the three-dimensional background model, but the three-dimensional background model may not necessarily be generated. For example, the corrected depth image of each viewpoint may be directly input to the rendering unit 15G without being combined with the corrected depth image obtained for each viewpoint of the camera 5.

レンダリング部１５Ｇは、自由視点映像をレンダリングする処理部である。 The rendering unit 15G is a processing unit that renders free viewpoint video.

一実施形態として、レンダリング部１５Ｇは、クライアント端末３０からユーザ入力を受け付けることにより仮想視点を指定させることができる。この他、レンダリング部１５Ｇは、クライアント端末３０を介するユーザ設定またはサーバ装置１０に登録されたシステム設定により仮想視点を指定させることができる。このように仮想視点が指定された上で、レンダリング部１５Ｇは、図４を用いて説明した通り、仮想視点に対応する自由視点映像をレンダリングする。すなわち、レンダリング部１５Ｇは、仮想カメラＶｃの光学中心および画素を通る直線と、前景モデル３Ｍｆｇまたは背景モデル３Ｍｂｇとの交点の３次元位置を算出する（Ｓ１）。続いて、レンダリング部１５Ｇは、カメラ５の位置や姿勢などの外部パラメータ及びカメラ５の画角やレンズの歪みなどの内部パラメータが設定されたカメラパラメータにしたがって、上記の交点を各視点に対応するカメラ画像に投影する。図４に示す例で言えば、仮想カメラＶｃからの距離が近い所定数のカメラ５のカメラ画像、すなわちカメラ５ＢおよびカメラＣの２つのカメラ画像２００Ｂおよび２００Ｃに上記の交点が投影される（Ｓ２Ｂ及びＳ２Ｃ）。これによって、仮想カメラＶｃの画素に対応するカメラ５Ｂの画素およびカメラ５Ｃの画素がテクスチャ座標として識別される。その後、レンダリング部１５Ｇは、カメラ５Ｂにより撮像されたカメラ画像２００Ｂのうち、仮想カメラＶｃの画素に対応する画素が有する画素値を参照する（Ｓ３Ｂ）。さらに、レンダリング部１５Ｇは、カメラ５Ｃにより撮像されたカメラ画像２００Ｃのうち仮想カメラＶｃの画素に対応する画素が有する画素値を参照する（Ｓ３Ｃ）。その上で、レンダリング部１５Ｇは、Ｓ３Ｂ及びＳ３Ｃで参照された画素値を仮想カメラＶｃの画素にマッピングする。例えば、仮想カメラＶｃの画素に対応するカメラ画像２００Ｂ上の画素の画素値およびカメラ画像２００Ｃ上の画素の画素値の統計値、例えば相加平均または仮想カメラＶｃとの距離を用いる加重平均などが仮想カメラＶｃの画素の画素値として決定される。 As one embodiment, the rendering unit 15G can specify a virtual viewpoint by receiving a user input from the client terminal 30. In addition, the rendering unit 15G can specify a virtual viewpoint by user setting via the client terminal 30 or system setting registered in the server device 10. In this way, after the virtual viewpoint is designated, the rendering unit 15G renders the free viewpoint video corresponding to the virtual viewpoint, as described with reference to FIG. That is, the rendering unit 15G calculates the three-dimensional position of the intersection of the straight line passing through the optical center of the virtual camera Vc and the pixel and the foreground model 3Mfg or the background model 3Mbg (S1). Then, the rendering unit 15G corresponds the above intersections to each viewpoint according to the external parameters such as the position and orientation of the camera 5 and the internal parameters such as the angle of view of the camera 5 and the distortion of the lens. Project to camera image. In the example shown in FIG. 4, the intersection is projected on the camera images of a predetermined number of cameras 5 that are close to the virtual camera Vc, that is, the two camera images 200B and 200C of the cameras 5B and C (S2B). And S2C). Thereby, the pixel of the camera 5B and the pixel of the camera 5C corresponding to the pixel of the virtual camera Vc are identified as the texture coordinates. After that, the rendering unit 15G refers to the pixel value of the pixel corresponding to the pixel of the virtual camera Vc in the camera image 200B captured by the camera 5B (S3B). Further, the rendering unit 15G refers to the pixel value of the pixel corresponding to the pixel of the virtual camera Vc in the camera image 200C captured by the camera 5C (S3C). Then, the rendering unit 15G maps the pixel values referred to in S3B and S3C to the pixels of the virtual camera Vc. For example, a statistical value of a pixel value of a pixel on the camera image 200B corresponding to a pixel of the virtual camera Vc and a pixel value of a pixel on the camera image 200C, such as an arithmetic mean or a weighted average using a distance from the virtual camera Vc, is calculated. It is determined as the pixel value of the pixel of the virtual camera Vc.

［処理の流れ］
図１１は、実施例１に係る映像生成処理の手順を示すフローチャートである。この処理は、一例として、各カメラ５からカメラ画像が取得された場合、すなわち多視点のカメラ画像が得られた場合に実行される。 [Process flow]
FIG. 11 is a flowchart illustrating the procedure of the video generation process according to the first embodiment. As an example, this processing is executed when a camera image is acquired from each camera 5, that is, when a multi-view camera image is obtained.

図１１に示すように、カメラ５Ａ〜カメラ５Ｎから各視点のカメラ画像が取得されると（ステップＳ１０１）、算出部１５Ｂは、カメラ５Ａ〜５ＮのＮ個の視点のうち未選択の視点を基準視点として選択する（ステップＳ１０２）。続いて、算出部１５Ｂは、基準視点に対応するカメラ画像との間で視差を得ることが可能である視点、例えば基準視点と隣接するカメラ５の視点を参照視点として選択する（ステップＳ１０３）。 As illustrated in FIG. 11, when the camera images of the respective viewpoints are acquired from the cameras 5A to 5N (step S101), the calculation unit 15B uses the unselected viewpoints among the N viewpoints of the cameras 5A to 5N as a reference. A viewpoint is selected (step S102). Subsequently, the calculation unit 15B selects, as a reference viewpoint, a viewpoint that can obtain parallax with the camera image corresponding to the reference viewpoint, for example, the viewpoint of the camera 5 adjacent to the reference viewpoint (step S103).

その上で、算出部１５Ｂは、ステレオマッチングにより、ステップＳ１０２で選択された基準視点に対応するカメラ画像およびステップＳ１０３で選択された参照視点に対応するカメラ画像から基準視点に対応するデプス画像を算出する（ステップＳ１０４）。 Then, the calculation unit 15B calculates the depth image corresponding to the standard viewpoint from the camera image corresponding to the standard viewpoint selected in step S102 and the camera image corresponding to the reference viewpoint selected in step S103 by stereo matching. (Step S104).

また、分離部１５Ｃは、ステップＳ１０２で選択された基準視点に対応するカメラ画像に含まれる被写体を前景および背景に分離する（ステップＳ１０５）。このような前景および背景の分離によって、画素ごとに前景または背景のラベルが割り当てられたシルエット画像が得られる。 Further, the separating unit 15C separates the subject included in the camera image corresponding to the reference viewpoint selected in step S102 into the foreground and the background (step S105). By separating the foreground and the background in this way, a silhouette image in which the label of the foreground or the background is assigned to each pixel is obtained.

その後、補正部１５Ｄは、ステップＳ１０５で得られたシルエット画像を用いて、ステップＳ１０４で得られたデプス画像を補正する（ステップＳ１０６）。このデプス画像の補正によって、補正デプス画像が得られる。 Then, the correction unit 15D uses the silhouette image obtained in step S105 to correct the depth image obtained in step S104 (step S106). A corrected depth image is obtained by the correction of the depth image.

そして、カメラ画像に含まれる全ての画素が選択されるまで（ステップＳ１０７Ｎｏ）、上記のステップＳ１０２から上記のステップＳ１０６までの処理が繰り返し実行される。 Then, until all the pixels included in the camera image are selected (No in step S107), the processes from step S102 to step S106 are repeatedly executed.

その後、カメラ画像に含まれる全ての画素が選択された場合（ステップＳ１０７Ｙｅｓ）、前景生成部１５Ｅは、ステップＳ１０５の繰り返しにより得られた各視点のシルエット画像を用いて前景モデルを生成する（ステップＳ１０８）。また、背景生成部１５Ｆは、ステップＳ１０６の繰り返しにより得られた各視点の補正デプス画像を用いて背景モデルを生成する（ステップＳ１０９）。 After that, when all the pixels included in the camera image are selected (Yes in step S107), the foreground generation unit 15E generates the foreground model using the silhouette image of each viewpoint obtained by repeating step S105 (step S108). ). The background generation unit 15F also generates a background model using the corrected depth image of each viewpoint obtained by repeating step S106 (step S109).

そして、レンダリング部１５Ｇは、ステップＳ１０１で取得された各視点のカメラ画像と、ステップＳ１０８及びＳ１０９で生成された前景モデル及び背景モデルとを用いて、仮想視点に対応するカメラ画像、いわゆる自由視点映像を生成し（ステップＳ１１０）、処理を終了する。 Then, the rendering unit 15G uses the camera image of each viewpoint acquired in step S101 and the foreground model and background model generated in steps S108 and S109 to generate a camera image corresponding to a virtual viewpoint, a so-called free viewpoint video. Is generated (step S110), and the process ends.

なお、図１１のフローチャートでは、ステップＳ１０５の前景背景分離がステップＳ１０４の処理が実行された後に実行される例が示されているが、ステップＳ１０５の前景背景分離は、ステップＳ１０２で基準視点が選択された段階から開始することができる。このため、ステップＳ１０５の前景背景分離は、ステップＳ１０３及びステップＳ１０４の処理よりも先に実行されることとしてもよいし、ステップＳ１０３及びステップＳ１０４の処理と並列して実行することもできる。このような順序の入替えや並列処理が行われる場合でも、ステップＳ１０５の前景背景分離の処理内容に変わりはない。 Although the flowchart of FIG. 11 shows an example in which the foreground/background separation of step S105 is executed after the processing of step S104 is executed, the foreground/background separation of step S105 selects the reference viewpoint in step S102. It is possible to start from the designated stage. Therefore, the foreground/background separation in step S105 may be executed before the processing in steps S103 and S104, or may be executed in parallel with the processing in steps S103 and S104. Even if such an order change or parallel processing is performed, the processing content of the foreground/background separation in step S105 remains unchanged.

［効果の一側面］
上述してきたように、本実施例に係るサーバ装置１０は、前景モデルを生成する側面から行われる前景背景分離により得られた前景および背景の分離結果を用いて各視点に対応するデプス画像を補正し、補正されたデプス画像から背景モデルを生成する。このように、前景および背景の分離結果を用いることで、前景の被写体と背景の被写体とのデプスが近い場合でも、両者を区別してデプス画像を補正することができる。さらに、前景の被写体と背景の被写体との境界部においても、両者を混在せずにデプス画像におけるデプスのばらつきを補正することができる。このような補正が行われたデプス画像から背景モデルが生成される結果、背景モデルの精度を高めることができる。したがって、本実施例に係るサーバ装置１０によれば、背景モデル及び動的背景のデプスのずれを低減させることが可能になる。 [One side of effect]
As described above, the server device 10 according to the present embodiment corrects the depth image corresponding to each viewpoint using the foreground/background separation result obtained by the foreground/background separation performed from the side that generates the foreground model. Then, a background model is generated from the corrected depth image. As described above, by using the separation result of the foreground and the background, even when the depth of the foreground subject and the depth of the background subject are close to each other, the depth image can be corrected by distinguishing between them. Further, even at the boundary between the foreground subject and the background subject, it is possible to correct the depth variation in the depth image without mixing both. As a result of the background model being generated from the depth image thus corrected, the accuracy of the background model can be improved. Therefore, according to the server device 10 according to the present embodiment, it becomes possible to reduce the depth shift of the background model and the dynamic background.

さて、これまで開示の装置に関する実施例について説明したが、本発明は上述した実施例以外にも、種々の異なる形態にて実施されてよいものである。そこで、以下では、本発明に含まれる他の実施例を説明する。 Although the embodiments of the disclosed device have been described above, the present invention may be implemented in various different forms other than the embodiments described above. Therefore, other embodiments included in the present invention will be described below.

［前景背景分離の応用例１］
例えば、サーバ装置１０は、ステレオマッチング等により得られたデプス画像をさらに用いて前景背景分離を行うことができる。 [Application example 1 of foreground/background separation]
For example, the server device 10 can perform foreground/background separation by further using the depth image obtained by stereo matching or the like.

すなわち、上記の実施例１で例に挙げたが背景差分により前景背景分離が実現される場合、画素値が表す色情報に基づいて前景および背景が分離される。この場合、前景の被写体と背景の被写体の色が類似する場合、前景の被写体が背景として分離されたり、背景の被写体が前景として分離されたりするので、十分な分離精度を発揮できないことがある。例えば、スポーツ観戦の様子がカメラ画像として撮像される場合、前景となる選手および動的背景となる観客は、いずれも人であるので、色情報だけでカメラ画像から前景と背景を分離するのは困難である。なお、ここでは、前景背景分離に背景差分を用いる場合を例に挙げたが、この例に限定されない。例えば、前景に対応する色のヒストグラムおよび背景に対応する色のヒストグラムを生成しておき、これらの色のヒストグラムに基づいて取得部１５Ａに取得されたカメラ画像の画素の色を分離する場合にも同様の課題が生じる。 That is, as described in the first embodiment, when the foreground/background separation is realized by the background difference, the foreground and the background are separated based on the color information represented by the pixel value. In this case, when the foreground subject and the background subject are similar in color, the foreground subject may be separated as the background, or the background subject may be separated as the foreground, so that sufficient separation accuracy may not be achieved. For example, when a state of watching a sports game is captured as a camera image, since the player who is the foreground and the spectator who is the dynamic background are both humans, it is not necessary to separate the foreground and the background from the camera image using only color information. Have difficulty. Note that here, the case where the background difference is used for the foreground/background separation has been described as an example, but the present invention is not limited to this example. For example, in the case where a histogram of colors corresponding to the foreground and a histogram of colors corresponding to the background are generated in advance and the colors of the pixels of the camera image acquired by the acquisition unit 15A are separated based on the histograms of these colors. Similar challenges arise.

このことから、応用例１では、色情報に加えて奥行きの情報をさらに用いて前景背景分離を行うことで、前景の被写体と背景の被写体の色が類似する場合にもロバストな前景背景分離を実現し、もって前景および背景の分離精度の向上を図る。 Therefore, in the application example 1, by performing the foreground/background separation by further using the depth information in addition to the color information, the robust foreground/background separation is achieved even when the colors of the foreground subject and the background subject are similar. This will be achieved, and the accuracy of separating the foreground and the background will be improved.

このような前景背景分離を実現する側面から、応用例１では、前景背景分離に２次元のグラフカットを用いる例を説明する。例えば、カメラ画像に含まれる画素に前景または背景のラベルを割り当てるラベル付け問題を下記の式（５）に示すエネルギー関数を最小化する問題として定式化する。 From the aspect of realizing such foreground/background separation, in Application Example 1, an example of using a two-dimensional graph cut for foreground/background separation will be described. For example, a labeling problem of assigning a foreground or background label to pixels included in a camera image is formulated as a problem of minimizing the energy function shown in the following formula (5).

Ｅ＝ΣＥｄ（ｐ）＋λΣＥｓ（ｐ，ｑ）・・・（５） E=ΣEd(p)+λΣEs(p,q) (5)

上記の式（５）に示すエネルギー関数Ｅには、右辺第一項の「データ項」と右辺第二項の「平滑化項」とが含まれる。式（５）における「λ」は、平滑化項に付与する重みの係数を指す。また、式（５）における「ｐ」は、前景または背景のラベルを割り当てる対象とする画素を指す。また、式（５）における「ｑ」は、画素ｐに隣接する画素を指し、例えば、画素ｐの周囲に位置する８近傍、あるいは４近傍の画素を隣接画素として設定することができる。 The energy function E shown in the above equation (5) includes the "data term" of the first term on the right side and the "smoothing term" of the second term on the right side. “Λ” in the equation (5) indicates a coefficient of weight given to the smoothing term. Further, “p” in Expression (5) indicates a pixel to which a label of the foreground or the background is assigned. Further, “q” in the equation (5) indicates a pixel adjacent to the pixel p, and, for example, pixels near 8 or around 4 around the pixel p can be set as adjacent pixels.

ここで、データ項は、下記の式（６）に示すように、色情報から求める第１の前景尤度および第１の背景尤度に基づくエネルギーＥ_colorと、デプス値から求める第２の前景尤度および第２の背景尤度に基づくエネルギーＥ_depthとにより定式化する。なお、式（６）における「ｗ_color」は、Ｅ_colorに付与する重み係数を指し、また、式（６）における「ｗ_depth」は、Ｅ_depthに付与する重み係数を指す。 Here, the data term is, as shown in the following formula (6), the first foreground likelihood obtained from the color information and the energy E _color based on the first background likelihood, and the second foreground obtained from the depth value. It is formulated by the likelihood and the energy E _depth based on the second background likelihood. In addition, "w _color " in Formula (6) points out the weighting factor given to E _color, and "w _depth " in Formula (6) points out the weighting factor given to E _depth .

Ｅｄ（ｐ）＝ｗ_color×Ｅ_color＋ｗ_depth×Ｅ_depth・・・（６） Ed(p)=w _color ×E _color +w _depth ×E _depth (6)

また、平滑化項は、下記の式（７）の通り、隣接する画素間でラベルを滑らかにするペナルティ関数が定められる。なお、式（７）における「Ｃｐ」は、画素ｐにおける画素値を指し、また、式（７）における「Ｃｑ」は、隣接画素ｑにおける画素値を指す。 Further, as the smoothing term, a penalty function that makes the label smooth between adjacent pixels is defined as in the following Expression (7). It should be noted that “Cp” in Expression (7) indicates the pixel value of the pixel p, and “Cq” in Expression (7) indicates the pixel value of the adjacent pixel q.

ΣＥｓ（ｐ，ｑ）＝ｅｘｐ（｜Ｃｐ−Ｃｑ｜）・・・（７） ΣEs(p,q)=exp(|Cp−Cq|) (7)

このようなデータ項によって、第１の前景尤度および第１の背景尤度と、第２の前景尤度および第２の背景尤度との傾向を維持する作用をラベルの割り振りに発揮させることができる。さらに、平滑化によって、画素ごとのラベルのばらつきを抑制する作用をラベルの割り振りに発揮させることができる。 With such a data term, the action of maintaining the tendency between the first foreground likelihood and the first background likelihood and the second foreground likelihood and the second background likelihood is exerted in the label allocation. You can Further, the smoothing can exert the effect of suppressing the label variation for each pixel in the label allocation.

これらのデータ項および平滑化項を含むエネルギー関数Ｅを最小化するラベルの集合を最大フロー最小カットの定理にしたがって算出することにより、画素ごとに前景または背景のラベルを割り当てることができる。 The label of the foreground or the background can be assigned to each pixel by calculating the set of labels that minimizes the energy function E including the data term and the smoothing term according to the maximum flow minimum cut theorem.

以上のようなグラフカットを実現する側面から、応用例１では、第１尤度算出部２１および第２尤度算出部２２などの機能部が追加される。さらに、応用例１では、上記の実施例１で示した分離部１５Ｃの代わりに、前景背景分離をグラフカットで実現する分離部２３が追加される。 From the aspect of realizing the graph cut as described above, in the application example 1, functional units such as the first likelihood calculation unit 21 and the second likelihood calculation unit 22 are added. Furthermore, in the application example 1, instead of the separation unit 15C shown in the above-described first embodiment, a separation unit 23 that realizes foreground/background separation by graph cutting is added.

図１２は、応用例１における各機能部間で授受されるデータの一例を示す図である。図１２には、一例として、カメラ５Ａ〜５ＮのＮ個の視点のうち、カメラ５Ａの視点が基準視点として選択された際に、各機能部の間で授受されるデータの例が示されている。 FIG. 12 is a diagram illustrating an example of data exchanged between the functional units in the application example 1. FIG. 12 shows, as an example, an example of data exchanged between the functional units when the viewpoint of the camera 5A is selected as the reference viewpoint among the N viewpoints of the cameras 5A to 5N. There is.

図１２に示すように、取得部１５Ａにより取得されたカメラ画像２００Ａ〜カメラ画像２００Ｎのうち、基準視点に対応するカメラ画像２００Ａが算出部１５Ｂへ入力される。さらに、あくまで一例として、基準視点に対応するデプス画像をステレオマッチングにより算出する側面から、カメラ画像２００Ａとの間で視差を得ることが可能である視点、例えば基準視点と隣接するカメラ５Ｂの視点が参照視点として選択される。このように選択された参照視点に対応するカメラ画像２００Ｂも算出部１５Ｂへ入力される。 As shown in FIG. 12, among the camera images 200A to 200N acquired by the acquisition unit 15A, the camera image 200A corresponding to the reference viewpoint is input to the calculation unit 15B. Furthermore, as an example only, from the side of calculating the depth image corresponding to the reference viewpoint by stereo matching, a viewpoint that can obtain a parallax with the camera image 200A, for example, the viewpoint of the camera 5B adjacent to the reference viewpoint is Selected as the reference viewpoint. The camera image 200B corresponding to the reference viewpoint thus selected is also input to the calculation unit 15B.

これらカメラ画像２００Ａ及びカメラ画像２００Ｂが入力された場合、算出部１５Ｂは、ステレオマッチングにより基準視点に対応するデプス画像２２０Ａを算出する。例えば、算出部１５Ｂは、カメラ５Ａ及びカメラ５Ｂのカメラパラメータにしたがってカメラ画像２００Ａに対するカメラ画像２００Ｂの視差マップを基準視点に対応するデプス画像２２０Ａへ変換する。 When the camera image 200A and the camera image 200B are input, the calculation unit 15B calculates the depth image 220A corresponding to the reference viewpoint by stereo matching. For example, the calculation unit 15B converts the parallax map of the camera image 200B with respect to the camera image 200A into the depth image 220A corresponding to the reference viewpoint according to the camera parameters of the cameras 5A and 5B.

ここまでは、図１２および図７の間で差はないが、ここからが異なる。すなわち、ステレオマッチング等により得られたデプス画像２２０Ａは、算出部１５Ｂから補正部１５Ｄへ入力されるだけでなく、算出部１５Ｂから第１尤度算出部２１へも入力される。 Up to this point, there is no difference between FIG. 12 and FIG. 7, but it is different from here. That is, the depth image 220A obtained by stereo matching or the like is not only input from the calculation unit 15B to the correction unit 15D, but also input from the calculation unit 15B to the first likelihood calculation unit 21.

一方、基準視点に対応するカメラ画像２００Ａは、算出部１５Ｂの他、第１尤度算出部２１にも入力される。カメラ画像２００Ａが入力された場合、第１尤度算出部２１は、カメラ画像２００Ａに含まれる画素ごとに当該画素の画素値を用いて第１の前景尤度および第１の背景尤度を算出する。これら第１の前景尤度および第１の背景尤度は、次のようにして算出することができる。例えば、前景および背景のラベルごとに、色がラベルに該当する度数分布、例えばヒストグラムや確率分布、例えば混合ガウス分布を事前に算出しておく。ここでは、あくまで一例として、Ｋ個のガウス分布を含む混合ガウス分布が前景および背景のラベルごとに準備される場合を例示する。このような混合ガウス分布およびカメラ画像２００Ａの画素ｐの画素値Ｉ_ｐを比較することにより、画素ｐの第１の前景尤度および第１の背景尤度を算出する。例えば、第１尤度算出部２１は、下記の式（８）にしたがってカメラ画像２００Ａの画素ｐの画素値Ｉ_ｐから第１の前景尤度または第１の背景尤度を求める。ここで、式（８）における「ｗ_ｋ」とは、ｋ番目のガウス分布の重みを指す。また、式（８）における「Ｎ（Ｉ_ｐ｜μ_ｋ，Σ_ｋ）」とは、ｋ番目のガウス分布を指す。このような式（８）により、前景および背景のラベルごとにＫ個のガウス分布の中から１つのガウス分布が選択される。このように算出された第１の前景尤度および第１の背景尤度が第１尤度算出部２１から分離部２３へ入力される。 On the other hand, the camera image 200A corresponding to the reference viewpoint is input to the first likelihood calculation unit 21 as well as the calculation unit 15B. When the camera image 200A is input, the first likelihood calculation unit 21 calculates the first foreground likelihood and the first background likelihood by using the pixel value of the pixel for each pixel included in the camera image 200A. To do. The first foreground likelihood and the first background likelihood can be calculated as follows. For example, for each label of the foreground and background, a frequency distribution in which the color corresponds to the label, for example, a histogram or a probability distribution, for example, a Gaussian mixture distribution is calculated in advance. Here, as an example, a case where a mixed Gaussian distribution including K Gaussian distributions is prepared for each foreground and background label is illustrated. The first foreground likelihood and the first background likelihood of the pixel p are calculated by comparing the mixed Gaussian distribution and the pixel value I _p of the pixel p of the camera image 200A. For example, the first likelihood calculating unit 21 obtains the first foreground likelihood or the first background likelihood from the pixel value I _p of the pixel p of the camera image 200A according to the following formula (8). Here, “w _k ”in Expression (8) refers to the weight of the k-th Gaussian distribution. Further, “N(I _p |μ _k , Σ _k )” in Expression (8) indicates the kth Gaussian distribution. By such an equation (8), one Gaussian distribution is selected from K Gaussian distributions for each label of the foreground and the background. The first foreground likelihood and the first background likelihood calculated in this way are input from the first likelihood calculating unit 21 to the separating unit 23.

Ｐ_color（ｐ｜ｌ）＝Σｗ_ｋ・Ｎ（Ｉ_ｐ｜μ_ｋ，Σ_ｋ）・・・（８） P _color (p|l)=Σw _k ·N(I _p |μ _k , Σ _k )... (8)

また、デプス画像２２０Ａが入力された第２尤度算出部２２は、デプス画像２２０Ａの画素ごとに当該画素のデプス値を用いて第２の前景尤度および第２の背景尤度を算出する。これら第２の前景尤度および第２の背景尤度は、次のようにして算出することができる。まず、３次元空間上で前景の存在領域および背景の存在領域が事前に設定される。例えば、スポーツ観戦を例に挙げれば、スタジアム内で選手が競技を行うフィールドの面および選手やボールが移動しうる高さなどが前景の存在領域として設定される。また、スタジアム内で前景の存在領域以外の領域が背景の存在領域として設定される。 Further, the second likelihood calculation unit 22 to which the depth image 220A is input calculates the second foreground likelihood and the second background likelihood for each pixel of the depth image 220A using the depth value of the pixel. The second foreground likelihood and the second background likelihood can be calculated as follows. First, the presence area of the foreground and the presence area of the background are set in advance in the three-dimensional space. For example, taking sports watching as an example, a field surface in which a player competes in the stadium, a height at which the player or the ball can move, and the like are set as the foreground existence region. In addition, an area other than the area where the foreground exists in the stadium is set as the area where the background exists.

これら前景の存在領域および背景の存在領域の設定の下、第２尤度算出部２２は、基準視点に対応するデプス画像２２０Ａの画素ｐの奥行き方向の評価値を算出する。例えば、第２尤度算出部２２は、基準視点のカメラ５Ａの光学中心から尤度の算出対象とする画素ｐを通るＲａｙを参照視点のカメラ５Ｂのデプス画像２２０Ｂに投影する。これによって、デプス画像２２０Ｂ上にエピポーラ線が描画される。その上で、第２尤度算出部２２は、デプス画像２２０Ａの画素ｐと、デプス画像２２０Ｂのエピポーラ線上に存在する各画素との間で評価値、例えばＳＡＤ（Sum of Absolute Difference）を算出する。 Under these settings of the foreground existing area and the background existing area, the second likelihood calculating unit 22 calculates an evaluation value in the depth direction of the pixel p of the depth image 220A corresponding to the reference viewpoint. For example, the second likelihood calculation unit 22 projects Ray, which passes through the pixel p for which the likelihood is to be calculated, from the optical center of the camera 5A of the standard viewpoint to the depth image 220B of the camera 5B of the standard viewpoint. As a result, an epipolar line is drawn on the depth image 220B. Then, the second likelihood calculating unit 22 calculates an evaluation value, for example, SAD (Sum of Absolute Difference) between the pixel p of the depth image 220A and each pixel existing on the epipolar line of the depth image 220B. ..

図１３は、評価値とデプスのグラフの一例を示す図である。図１３には、縦軸をＳＡＤとし、横軸をデプスとするグラフが示されている。ここで言う「デプス」とは、カメラ５Ａの光学中心を原点とし、そこからの奥行き方向の距離を指す。さらに、図１３には、グラフに重ねて前景の存在領域および背景の存在領域が示されている。図１３に示すように、第２尤度算出部２２は、前景の存在領域のデプスに対応するＳＡＤのうちＳＡＤの最小値が観測される最小点ｊ１を抽出する。その上で、第２尤度算出部２２は、最小点ｊ１で計測されるＳＡＤを代表評価値ｒ（ｌ）とし、この代表評価値ｒ（ｌ）を下記の式（９）にしたがって第２の前景尤度へ変換する。また、第２尤度算出部２２は、背景の存在領域のデプスに対応するＳＡＤのうちＳＡＤの最小値が観測される最小点ｊ２を抽出する。その上で、第２尤度算出部２２は、最小点ｊ２で計測されるＳＡＤを代表評価値ｒ（ｌ）とし、この代表評価値ｒ（ｌ）を下記の式（９）にしたがって第２の背景尤度へ変換する。このように算出された第２の前景尤度および第２の背景尤度が第２尤度算出部２２から分離部２３へ入力される。 FIG. 13 is a diagram showing an example of a graph of the evaluation value and the depth. FIG. 13 shows a graph in which the vertical axis represents SAD and the horizontal axis represents depth. The “depth” here means the distance in the depth direction from the optical center of the camera 5A as the origin. Further, FIG. 13 shows the existence region of the foreground and the existence region of the background in an overlapping manner on the graph. As shown in FIG. 13, the second likelihood calculating unit 22 extracts the minimum point j1 at which the minimum value of SAD is observed among the SADs corresponding to the depth of the existing area of the foreground. Then, the second likelihood calculating unit 22 sets the SAD measured at the minimum point j1 as the representative evaluation value r(l), and uses this representative evaluation value r(l) as the second value according to the following equation (9). To the foreground likelihood of. Further, the second likelihood calculating unit 22 extracts the minimum point j2 at which the minimum value of SAD is observed among the SADs corresponding to the depth of the background existing area. Then, the second likelihood calculating unit 22 sets the SAD measured at the minimum point j2 as the representative evaluation value r(l), and uses this representative evaluation value r(l) as the second evaluation value according to the following equation (9). To the background likelihood of. The second foreground likelihood and the second background likelihood calculated in this way are input from the second likelihood calculating unit 22 to the separating unit 23.

Ｐ_depth（ｐ｜ｌ）＝ｅｘｐ（−ｒ（ｌ））・・・（９） P _depth (p|l)=exp(-r(l)) (9)

そして、分離部２３は、上記の式（５）に示すエネルギー関数を最小化するラベルの集合を最大フロー最小カットの定理にしたがって算出する。このような２次元のグラフカットによって、画素ごとに前景または背景のラベルが割り当てられたシルエット画像２１０Ａ′が得られる。 Then, the separation unit 23 calculates a set of labels that minimizes the energy function shown in the above equation (5) according to the maximum flow minimum cut theorem. By such a two-dimensional graph cut, a silhouette image 210A' in which a label of a foreground or a background is assigned to each pixel is obtained.

このようにして得られたシルエット画像２１０Ａ′は、デプス画像２２０Ａの補正に用いる側面から分離部２３から補正部１５Ｄへ入力されると共に、前景モデルの生成にも用いる側面から記憶部１３に保存される。 The silhouette image 210A′ thus obtained is input to the correction unit 15D from the separation unit 23 from the side used for correcting the depth image 220A, and is stored in the storage unit 13 from the side used also for generating the foreground model. It

これらデプス画像２２０Ａ及びシルエット画像２１０Ａ′が入力された場合、補正部１５Ｄは、シルエット画像２１０Ａ′を用いてデプス画像２２０Ａを補正する。このデプス画像２２０Ａの補正時には、補正部１５Ｄは、シルエット画像２１０Ａ′で背景のラベルが割り当てられたデプス画像２２０Ａの画素の画素値を有効とし、空間フィルタリングおよび時間フィルタリングのうち少なくとも１つを実行する。これによって、デプス画像２２０Ａが補正された補正デプス画像２３０Ａ′が得られる。このように、前景の被写体と背景の被写体の色が類似する場合にもロバストな前景背景分離が行われたシルエット画像２１０Ａ′をデプス画像の補正に用いることで、背景モデルの精度も高めることができる。 When the depth image 220A and the silhouette image 210A' are input, the correction unit 15D corrects the depth image 220A using the silhouette image 210A'. At the time of correcting the depth image 220A, the correction unit 15D validates the pixel value of the pixel of the depth image 220A to which the background label is assigned in the silhouette image 210A', and executes at least one of spatial filtering and temporal filtering. .. As a result, a corrected depth image 230A' obtained by correcting the depth image 220A is obtained. As described above, even when the foreground subject and the background subject are similar in color, the accuracy of the background model can be improved by using the silhouette image 210A′ subjected to the robust foreground/background separation for the correction of the depth image. it can.

一方、記憶部１３に保存されたシルエット画像２１０Ａ′は、他のシルエット画像２１０とともに、前景生成部１５Ｅにより前景モデルの生成に用いられる。このように、シルエット画像２１０Ａ′を前景モデルの生成に用いることで、前景モデルの精度も高めることができる。 On the other hand, the silhouette image 210A' stored in the storage unit 13 is used by the foreground generation unit 15E together with other silhouette images 210 to generate a foreground model. As described above, by using the silhouette image 210A' for generating the foreground model, the accuracy of the foreground model can be improved.

図１４は、応用例１に係る映像生成処理の手順を示すフローチャートである。この処理は、一例として、各カメラ５からカメラ画像が取得された場合、すなわち多視点のカメラ画像が得られた場合に実行される。 FIG. 14 is a flowchart illustrating the procedure of the video generation process according to the application example 1. As an example, this processing is executed when a camera image is acquired from each camera 5, that is, when a multi-view camera image is obtained.

図１４に示すように、カメラ５Ａ〜カメラ５Ｎから各視点のカメラ画像が取得されると（ステップＳ１０１）、算出部１５Ｂは、カメラ５Ａ〜５ＮのＮ個の視点のうち未選択の視点を基準視点として選択する（ステップＳ１０２）。続いて、算出部１５Ｂは、基準視点に対応するカメラ画像との間で視差を得ることが可能である視点、例えば基準視点と隣接するカメラ５の視点を参照視点として選択する（ステップＳ１０３）。 As illustrated in FIG. 14, when the camera images of the respective viewpoints are acquired from the cameras 5A to 5N (step S101), the calculation unit 15B uses the unselected viewpoints among the N viewpoints of the cameras 5A to 5N as a reference. A viewpoint is selected (step S102). Subsequently, the calculation unit 15B selects, as a reference viewpoint, a viewpoint that can obtain parallax with the camera image corresponding to the reference viewpoint, for example, the viewpoint of the camera 5 adjacent to the reference viewpoint (step S103).

続いて、第１尤度算出部２１は、基準視点に対応するカメラ画像の色情報に基づいて各画素の第１の前景尤度および第１の背景尤度を算出する（ステップＳ２０１）。また、第２尤度算出部２２は、ステップＳ１０４で算出されたデプス画像を用いて各画素の第２の前景尤度および第２の背景尤度を算出する（ステップＳ２０２）。 Subsequently, the first likelihood calculating unit 21 calculates the first foreground likelihood and the first background likelihood of each pixel based on the color information of the camera image corresponding to the reference viewpoint (step S201). Further, the second likelihood calculating unit 22 calculates the second foreground likelihood and the second background likelihood of each pixel using the depth image calculated in step S104 (step S202).

その上で、分離部２３は、第１の前景尤度および第１の背景尤度と、第２の前景尤度および第２の背景尤度とがデータ項に組み込まれたエネルギー関数を最小化するラベルの集合を最大フロー最小カットの定理にしたがって算出する（ステップＳ２０３）。このような２次元のグラフカットによって、画素ごとに前景または背景のラベルが割り当てられたシルエット画像２１０Ａが得られる。 Then, the separating unit 23 minimizes the energy function in which the first foreground likelihood and the first background likelihood, and the second foreground likelihood and the second background likelihood are incorporated in the data term. A set of labels to be calculated is calculated according to the maximum flow minimum cut theorem (step S203). With such a two-dimensional graph cut, a silhouette image 210A in which a foreground or background label is assigned to each pixel is obtained.

その後、補正部１５Ｄは、ステップＳ２０３で得られたシルエット画像を用いて、ステップＳ１０４で得られたデプス画像を補正する（ステップＳ１０６）。このデプス画像の補正によって、補正デプス画像が得られる。 Then, the correction unit 15D corrects the depth image obtained in step S104 using the silhouette image obtained in step S203 (step S106). A corrected depth image is obtained by the correction of the depth image.

その後、カメラ画像に含まれる全ての画素が選択された場合（ステップＳ１０７Ｙｅｓ）、前景生成部１５Ｅは、ステップＳ２０３の繰り返しにより得られた各視点のシルエット画像を用いて前景モデルを生成する（ステップＳ１０８）。また、背景生成部１５Ｆは、ステップＳ１０６の繰り返しにより得られた各視点の補正デプス画像を用いて背景モデルを生成する（ステップＳ１０９）。 After that, when all the pixels included in the camera image are selected (Yes in step S107), the foreground generation unit 15E generates the foreground model using the silhouette image of each viewpoint obtained by repeating step S203 (step S108). ). The background generation unit 15F also generates a background model using the corrected depth image of each viewpoint obtained by repeating step S106 (step S109).

なお、図１４のフローチャートでは、ステップＳ２０１の第１の前景尤度および第１の背景尤度の算出がステップＳ１０４の処理が実行された後に実行される例が示されているが、ステップＳ２０１の処理は、ステップＳ１０２で基準視点が選択された段階から開始することができる。このため、ステップＳ２０３の前景背景分離は、ステップＳ１０３及びステップＳ１０４の処理よりも先に実行されることとしてもよいし、ステップＳ１０３及びステップＳ１０４の処理と並列して実行することもできる。このような順序の入替えや並列処理が行われる場合でも、ステップＳ２０１の処理内容に変わりはない。また、ステップＳ２０２の第２の前景尤度および第２の背景尤度の算出がステップＳ２０１の処理が実行された後に実行される例が示されているが、ステップＳ２０２の処理は、ステップＳ１０４でデプス画像が算出された段階から開始することができる。このため、ステップＳ２０２の処理は、ステップＳ２０１の処理よりも先に実行されることとしてもよいし、ステップＳ２０１の処理と並列して実行することもできる。このような順序の入替えや並列処理が行われる場合でも、ステップＳ２０２の処理内容に変わりはない。 Note that the flowchart of FIG. 14 illustrates an example in which the calculation of the first foreground likelihood and the first background likelihood in step S201 is executed after the processing in step S104 is executed. The process can be started from the stage when the reference viewpoint is selected in step S102. Therefore, the foreground/background separation in step S203 may be executed before the processing in steps S103 and S104, or may be executed in parallel with the processing in steps S103 and S104. Even when such order replacement and parallel processing are performed, the processing content of step S201 remains the same. Also, an example is shown in which the calculation of the second foreground likelihood and the second background likelihood in step S202 is executed after the processing in step S201 is executed, but the processing in step S202 is executed in step S104. It can start from the stage when the depth image is calculated. Therefore, the process of step S202 may be executed before the process of step S201, or may be executed in parallel with the process of step S201. Even when such order replacement and parallel processing are performed, the processing content of step S202 does not change.

［前景背景分離の応用例２］
上記の実施例１では、カメラ画像に含まれる画素を前景および背景の少なくとも２つのカテゴリに分離する例を挙げたが、３つ以上のカテゴリに分離することとしてもかまわない。例えば、分離部１５Ｃおよび分離部２３は、カメラ画像に含まれる画素のうち背景に分離される画素を背景のカテゴリがさらに区分された背景のサブカテゴリ群にさらに分離することもできる。例えば、スポーツ観戦を例に挙げれば、背景のカテゴリは、背景サブカテゴリ１「観客」および背景サブカテゴリ２「フィールド」にさらに区分できる。このように３つ以上のカテゴリが存在する場合、グラフカットの実行時には、前景および背景の２値のラベルの代わりに、前景カテゴリ、背景サブカテゴリ１及び背景サブカテゴリ２に対応する多値のラベルごとに、第１の前景尤度および第１の背景尤度と、第２の前景尤度および第２の背景尤度とを算出する。例えば、第１の前景尤度および第１の背景尤度を算出する場合、前景カテゴリ、背景サブカテゴリ１及び背景サブカテゴリ２ごとに混合ガウス分布を用意することとすればよい。また、第２の前景尤度および第２の背景尤度を算出する場合、前景カテゴリの存在領域、背景サブカテゴリ１の存在領域及び背景サブカテゴリ２の存在領域を設定することとすればよい。そして、分離部１５Ｃおよび分離部２３は、多値のグラフカットにより、前景カテゴリ、背景サブカテゴリ１及び背景サブカテゴリ２の多値のラベルを各画素に割り当てる。その上で、補正部１５Ｄは、デプス画像に含まれる画素のデプスを画素が分離された背景サブカテゴリと同一の背景サブカテゴリに分離された画素のデプスを用いて補正することとすればよい。例えば、注目画素の背景サブカテゴリと同一の背景サブカテゴリに分離された周辺画素のデプス値や過去フレームのデプス値に絞り込んで畳み込み演算を行うこととすればよい。 [Application example 2 of foreground/background separation]
In the above-described first embodiment, the example in which the pixels included in the camera image are separated into at least two categories of the foreground and the background has been described, but the pixels may be separated into three or more categories. For example, the separating unit 15C and the separating unit 23 can further separate the pixels, which are separated from the background among the pixels included in the camera image, into the background subcategory group into which the background category is further divided. For example, in the case of watching sports, the background category can be further divided into a background subcategory 1 “spectator” and a background subcategory 2 “field”. When there are three or more categories in this way, when performing graph cut, instead of the binary labels of the foreground and the background, for each of the multivalued labels corresponding to the foreground category, the background subcategory 1 and the background subcategory 2, , A first foreground likelihood and a first background likelihood, and a second foreground likelihood and a second background likelihood. For example, when the first foreground likelihood and the first background likelihood are calculated, a mixed Gaussian distribution may be prepared for each foreground category, background subcategory 1 and background subcategory 2. Further, when calculating the second foreground likelihood and the second background likelihood, the presence area of the foreground category, the presence area of the background subcategory 1 and the presence area of the background subcategory 2 may be set. Then, the separating unit 15C and the separating unit 23 assign the multivalued labels of the foreground category, the background subcategory 1 and the background subcategory 2 to each pixel by the multivalued graph cut. Then, the correction unit 15D may correct the depth of the pixels included in the depth image using the depth of the pixels separated into the same background subcategory as the background subcategory in which the pixels are separated. For example, the convolution calculation may be performed by narrowing down the depth value of the peripheral pixels and the depth value of the past frame separated into the same background subcategory as the background subcategory of the pixel of interest.

［分散および統合］
また、図示した各装置の各構成要素は、必ずしも物理的に図示の如く構成されておらずともよい。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。例えば、取得部１５Ａ、算出部１５Ｂ、分離部１５Ｃ、補正部１５Ｄ、前景生成部１５Ｅ、背景生成部１５Ｆまたはレンダリング部１５Ｇをサーバ装置１０の外部装置としてネットワーク経由で接続するようにしてもよい。また、取得部１５Ａ、算出部１５Ｂ、分離部１５Ｃ、補正部１５Ｄ、前景生成部１５Ｅ、背景生成部１５Ｆまたはレンダリング部１５Ｇを別の装置がそれぞれ有し、ネットワーク接続されて協働することで、上記のサーバ装置１０の機能を実現するようにしてもよい。 [Distribution and integration]
In addition, each component of each illustrated device may not necessarily be physically configured as illustrated. That is, the specific form of distribution/integration of each device is not limited to that shown in the figure, and all or part of the device may be functionally or physically distributed/arranged in arbitrary units according to various loads and usage conditions. It can be integrated and configured. For example, the acquisition unit 15A, the calculation unit 15B, the separation unit 15C, the correction unit 15D, the foreground generation unit 15E, the background generation unit 15F or the rendering unit 15G may be connected as an external device of the server device 10 via a network. Further, another device has the acquisition unit 15A, the calculation unit 15B, the separation unit 15C, the correction unit 15D, the foreground generation unit 15E, the background generation unit 15F, or the rendering unit 15G, and by being network-connected and cooperating, You may make it implement|achieve the function of the said server apparatus 10.

［背景モデル生成プログラム］
また、上記の実施例で説明した各種の処理は、予め用意されたプログラムをパーソナルコンピュータやワークステーションなどのコンピュータで実行することによって実現することができる。そこで、以下では、図１５を用いて、上記の実施例と同様の機能を有する背景モデル生成プログラムを実行するコンピュータの一例について説明する。 [Background model generation program]
Further, the various processes described in the above embodiments can be realized by executing a prepared program on a computer such as a personal computer or a workstation. Therefore, in the following, an example of a computer that executes a background model generation program having the same functions as those in the above embodiments will be described with reference to FIG.

図１５は、実施例１及び実施例２に係る背景モデル生成プログラムを実行するコンピュータのハードウェア構成例を示す図である。図１５に示すように、コンピュータ１００は、操作部１１０ａと、スピーカ１１０ｂと、カメラ１１０ｃと、ディスプレイ１２０と、通信部１３０とを有する。さらに、このコンピュータ１００は、ＣＰＵ１５０と、ＲＯＭ１６０と、ＨＤＤ１７０と、ＲＡＭ１８０とを有する。これら１１０〜１８０の各部はバス１４０を介して接続される。 FIG. 15 is a diagram illustrating a hardware configuration example of a computer that executes the background model generation program according to the first and second embodiments. As shown in FIG. 15, the computer 100 includes an operation unit 110a, a speaker 110b, a camera 110c, a display 120, and a communication unit 130. Further, the computer 100 has a CPU 150, a ROM 160, an HDD 170, and a RAM 180. Each unit of these 110 to 180 is connected via a bus 140.

ＨＤＤ１７０には、図１５に示すように、上記の実施例１で示した取得部１５Ａ、算出部１５Ｂ、分離部１５Ｃ、補正部１５Ｄ、前景生成部１５Ｅ、背景生成部１５Ｆ及びレンダリング部１５Ｇと同様の機能を発揮する背景モデル生成プログラム１７０ａが記憶される。この背景モデル生成プログラム１７０ａは、図６に示した取得部１５Ａ、算出部１５Ｂ、分離部１５Ｃ、補正部１５Ｄ、前景生成部１５Ｅ、背景生成部１５Ｆまたはレンダリング部１５Ｇの各構成要素と同様、統合又は分離してもかまわない。すなわち、ＨＤＤ１７０には、必ずしも上記の実施例１で示した全てのデータが格納されずともよく、処理に用いるデータがＨＤＤ１７０に格納されればよい。 As shown in FIG. 15, the HDD 170 is similar to the acquisition unit 15A, the calculation unit 15B, the separation unit 15C, the correction unit 15D, the foreground generation unit 15E, the background generation unit 15F, and the rendering unit 15G described in the first embodiment. A background model generation program 170a that exhibits the function of is stored. This background model generation program 170a is integrated like each component of the acquisition unit 15A, the calculation unit 15B, the separation unit 15C, the correction unit 15D, the foreground generation unit 15E, the background generation unit 15F, or the rendering unit 15G shown in FIG. Or you may separate. In other words, the HDD 170 does not necessarily need to store all the data described in the first embodiment, and the data used for the processing may be stored in the HDD 170.

このような環境の下、ＣＰＵ１５０は、ＨＤＤ１７０から背景モデル生成プログラム１７０ａを読み出した上でＲＡＭ１８０へ展開する。この結果、背景モデル生成プログラム１７０ａは、図１５に示すように、背景モデル生成プロセス１８０ａとして機能する。この背景モデル生成プロセス１８０ａは、ＲＡＭ１８０が有する記憶領域のうち背景モデル生成プロセス１８０ａに割り当てられた領域にＨＤＤ１７０から読み出した各種データを展開し、この展開した各種データを用いて各種の処理を実行する。例えば、背景モデル生成プロセス１８０ａが実行する処理の一例として、図１１や図１４に示す処理などが含まれる。なお、ＣＰＵ１５０では、必ずしも上記の実施例１で示した全ての処理部が動作せずともよく、実行対象とする処理に対応する処理部が仮想的に実現されればよい。 Under such an environment, the CPU 150 reads out the background model generation program 170a from the HDD 170 and expands it in the RAM 180. As a result, the background model generation program 170a functions as the background model generation process 180a, as shown in FIG. The background model generation process 180a expands various data read from the HDD 170 in the area allocated to the background model generation process 180a in the storage area of the RAM 180, and executes various processes using the expanded data. .. For example, as an example of the process executed by the background model generation process 180a, the processes shown in FIGS. 11 and 14 are included. Note that in the CPU 150, not all the processing units described in the above-described first embodiment may operate, and the processing unit corresponding to the processing to be executed may be virtually realized.

なお、上記の背景モデル生成プログラム１７０ａは、必ずしも最初からＨＤＤ１７０やＲＯＭ１６０に記憶されておらずともかまわない。例えば、コンピュータ１００に挿入されるフレキシブルディスク、いわゆるＦＤ、ＣＤ−ＲＯＭ、ＤＶＤディスク、光磁気ディスク、ＩＣカードなどの「可搬用の物理媒体」に背景モデル生成プログラム１７０ａを記憶させる。そして、コンピュータ１００がこれらの可搬用の物理媒体から背景モデル生成プログラム１７０ａを取得して実行するようにしてもよい。また、公衆回線、インターネット、ＬＡＮ、ＷＡＮなどを介してコンピュータ１００に接続される他のコンピュータまたはサーバ装置などに背景モデル生成プログラム１７０ａを記憶させておき、コンピュータ１００がこれらから背景モデル生成プログラム１７０ａを取得して実行するようにしてもよい。 Note that the background model generation program 170a described above does not necessarily have to be stored in the HDD 170 or the ROM 160 from the beginning. For example, the background model generation program 170a is stored in a “portable physical medium” such as a flexible disk, a so-called FD, a CD-ROM, a DVD disk, a magneto-optical disk, an IC card, which is inserted into the computer 100. Then, the computer 100 may acquire the background model generation program 170a from these portable physical media and execute it. Further, the background model generation program 170a is stored in another computer or a server device connected to the computer 100 via a public line, the Internet, a LAN, a WAN, etc., and the computer 100 executes the background model generation program 170a from these. It may be acquired and executed.

以上の実施例を含む実施形態に関し、さらに以下の付記を開示する。 The following supplementary notes will be further disclosed regarding the embodiments including the above-described examples.

（付記１）所定の撮像位置からカメラにより撮像されたカメラ画像を取得する取得部と、
前記撮像位置に対応するデプス画像を算出する算出部と、
前記カメラ画像に含まれる複数の画素を前景と背景に分離する分離部と、
前記カメラ画像において前記背景に分離された各画素に対応する前記デプス画像の各画素のデプス値を用いて、前記デプス画像の各画素のデプス値を補正する補正部と、
前記補正部により補正された前記デプス画像の各画素のデプス値に基づく新たなデプス画像を用いて、前記背景に係る背景モデルを生成する背景生成部と、
を有することを特徴とする背景モデル生成装置。 (Supplementary Note 1) An acquisition unit that acquires a camera image captured by a camera from a predetermined imaging position,
A calculation unit that calculates a depth image corresponding to the imaging position,
A separation unit that separates a plurality of pixels included in the camera image into a foreground and a background;
A correction unit that corrects the depth value of each pixel of the depth image by using the depth value of each pixel of the depth image corresponding to each pixel separated into the background in the camera image,
Using a new depth image based on the depth value of each pixel of the depth image corrected by the correction unit, a background generation unit that generates a background model related to the background,
A background model generation device comprising:

（付記２）前記補正部は、前記デプス画像に含まれる複数の画素のうち前記前景に分離された各画素のデプス値に無効値を設定することを特徴とする付記１に記載の背景モデル生成装置。 (Supplementary Note 2) The background model generation according to Supplementary note 1, wherein the correction unit sets an invalid value to a depth value of each pixel separated into the foreground among a plurality of pixels included in the depth image. apparatus.

（付記３）前記補正部は、前記デプス画像に含まれる複数の画素ごとに、デプス値および周辺画素のデプス値のうち前記背景に分離された画素のデプス値を所定のフィルタに設定されたフィルタ係数に基づいて畳み込むことにより、前記背景に分離された各画素のデプス値を補正することを特徴とする付記２に記載の背景モデル生成装置。 (Supplementary Note 3) The correction unit sets, for each of a plurality of pixels included in the depth image, a depth value of a pixel separated into the background out of the depth value and the depth values of surrounding pixels to a predetermined filter. The background model generation device according to appendix 2, wherein the depth value of each pixel separated into the background is corrected by convolving based on a coefficient.

（付記４）前記補正部は、前記デプス画像に含まれる複数の画素ごとに、補正対象とする注目フレームにおけるデプス値および過去のフレームにおけるデプス値のうち前記背景に分離された画素のデプス値を所定のフィルタに設定されたフィルタ係数に基づいて畳み込むことにより、前記背景に分離された各画素のデプス値を補正することを特徴とする付記２に記載の背景モデル生成装置。 (Supplementary Note 4) The correction unit sets, for each of a plurality of pixels included in the depth image, a depth value of a pixel separated in the background, among depth values in a target frame to be corrected and depth values in a past frame. 3. The background model generation device according to attachment 2, wherein the depth value of each pixel separated into the background is corrected by convolving based on a filter coefficient set in a predetermined filter.

（付記５）前記分離部は、前記デプス画像を基づいて前記カメラ画像に含まれる複数の画素を前景と背景に分離し、
前記補正部は、前記デプス画像に基づいて前記背景に分離された各画素のデプス値を用いて、前記デプス画像の各画素のデプス値を補正することを特徴とする付記１に記載の背景モデル生成装置。 (Supplementary Note 5) The separation unit separates a plurality of pixels included in the camera image into a foreground and a background based on the depth image,
The background model according to appendix 1, wherein the correction unit corrects the depth value of each pixel of the depth image by using the depth value of each pixel separated into the background based on the depth image. Generator.

（付記６）前記分離部は、前記カメラ画像に含まれる複数の画素のうち前記背景に分離される各画素を、前記背景のカテゴリがさらに区分された背景のサブカテゴリ群に分離し、
前記補正部は、前記デプス画像に含まれる第一の画素のデプス値を、前記第一の画素が分離された背景のサブカテゴリと同一の背景のサブカテゴリに分離された第二の画素のデプス値を用いて補正することを特徴とする付記１に記載の背景モデル生成装置。 (Supplementary Note 6) The separation unit separates each pixel separated into the background among a plurality of pixels included in the camera image into a background subcategory group into which the background category is further divided,
The correction unit, the depth value of the first pixel included in the depth image, the depth value of the second pixel separated into the same background sub-category as the background sub-category from which the first pixel is separated. The background model generation device according to appendix 1, wherein the background model generation device is configured to perform correction.

（付記７）前記算出部は、前記カメラ画像と、前記カメラ画像と撮像位置が異なる他のカメラ画像との視差に基づいて前記デプス画像を算出することを特徴とする付記１に記載の背景モデル生成装置。 (Supplementary note 7) The background model according to Supplementary note 1, wherein the calculation unit calculates the depth image based on a parallax between the camera image and another camera image whose imaging position is different from that of the camera image. Generator.

（付記８）所定の撮像位置からカメラにより撮像されたカメラ画像を取得し、
前記撮像位置に対応するデプス画像を算出し、
前記カメラ画像に含まれる複数の画素を前景と背景に分離し、
前記カメラ画像において前記背景に分離された各画素に対応する前記デプス画像の各画素のデプス値を用いて、前記デプス画像の各画素のデプス値を補正し、
補正された前記デプス画像の各画素のデプス値に基づく新たなデプス画像を用いて、前記背景に係る背景モデルを生成する、
処理をコンピュータが実行することを特徴とする背景モデル生成方法。 (Supplementary Note 8) A camera image captured by the camera is acquired from a predetermined image capturing position,
Calculating a depth image corresponding to the imaging position,
Separating a plurality of pixels included in the camera image into a foreground and a background,
Using the depth value of each pixel of the depth image corresponding to each pixel separated into the background in the camera image, the depth value of each pixel of the depth image is corrected,
A background model relating to the background is generated using a new depth image based on the depth value of each pixel of the corrected depth image,
A background model generation method characterized in that a computer executes the processing.

（付記９）前記補正する処理は、前記デプス画像に含まれる複数の画素のうち前記前景に分離された各画素のデプス値に無効値を設定することを特徴とする付記８に記載の背景モデル生成方法。 (Supplementary note 9) The background model according to Supplementary note 8, wherein in the correction process, an invalid value is set as a depth value of each pixel separated into the foreground among a plurality of pixels included in the depth image. Generation method.

（付記１０）前記補正する処理は、前記デプス画像に含まれる複数の画素ごとに、デプス値および周辺画素のデプス値のうち前記背景に分離された画素のデプス値を所定のフィルタに設定されたフィルタ係数に基づいて畳み込むことにより、前記背景に分離された各画素のデプス値を補正することを特徴とする付記９に記載の背景モデル生成方法。 (Supplementary Note 10) In the correction process, for each of a plurality of pixels included in the depth image, the depth value of the pixel separated into the background among the depth value and the depth values of the peripheral pixels is set in a predetermined filter. 10. The background model generation method according to appendix 9, wherein the depth value of each pixel separated into the background is corrected by convolving based on a filter coefficient.

（付記１１）前記補正する処理は、前記デプス画像に含まれる複数の画素ごとに、補正対象とする注目フレームにおけるデプス値および過去のフレームにおけるデプス値のうち前記背景に分離された画素のデプス値を所定のフィルタに設定されたフィルタ係数に基づいて畳み込むことにより、前記背景に分離された各画素のデプス値を補正することを特徴とする付記９に記載の背景モデル生成方法。 (Supplementary Note 11) In the correction processing, for each of a plurality of pixels included in the depth image, the depth value of the pixel separated into the background among the depth value in the attention frame to be corrected and the depth values in the past frames 10. The background model generation method according to supplementary note 9, wherein the depth value of each pixel separated into the background is corrected by convoluting the above with a filter coefficient set in a predetermined filter.

（付記１２）前記分離する処理は、前記デプス画像を基づいて前記カメラ画像に含まれる複数の画素を前景と背景に分離し、
前記補正する処理は、前記デプス画像に基づいて前記背景に分離された各画素のデプス値を用いて、前記デプス画像の各画素のデプス値を補正することを特徴とする付記８に記載の背景モデル生成方法。 (Supplementary Note 12) In the separation processing, a plurality of pixels included in the camera image are separated into a foreground and a background based on the depth image,
The background according to appendix 8, wherein the correction process corrects the depth value of each pixel of the depth image by using the depth value of each pixel separated into the background based on the depth image. Model generation method.

（付記１３）前記分離する処理は、前記カメラ画像に含まれる複数の画素のうち前記背景に分離される各画素を、前記背景のカテゴリがさらに区分された背景のサブカテゴリ群に分離し、
前記補正する処理は、前記デプス画像に含まれる第一の画素のデプス値を、前記第一の画素が分離された背景のサブカテゴリと同一の背景のサブカテゴリに分離された第二の画素のデプス値を用いて補正することを特徴とする付記８に記載の背景モデル生成方法。 (Supplementary Note 13) In the separation processing, each pixel of the plurality of pixels included in the camera image, which is separated into the background, is separated into a background subcategory group into which the background category is further divided,
The correction process, the depth value of the first pixel included in the depth image, the depth value of the second pixel separated into the same background sub-category as the background sub-category from which the first pixel is separated The background model generation method according to Supplementary Note 8, wherein the background model generation method is characterized in that

（付記１４）前記算出する処理は、前記カメラ画像と、前記カメラ画像と撮像位置が異なる他のカメラ画像との視差に基づいて前記デプス画像を算出することを特徴とする付記８に記載の背景モデル生成方法。 (Supplementary Note 14) The background according to Supplementary Note 8, wherein the calculation process calculates the depth image based on a parallax between the camera image and another camera image whose imaging position is different from that of the camera image. Model generation method.

（付記１５）所定の撮像位置からカメラにより撮像されたカメラ画像を取得し、
前記撮像位置に対応するデプス画像を算出し、
前記カメラ画像に含まれる複数の画素を前景と背景に分離し、
前記カメラ画像において前記背景に分離された各画素に対応する前記デプス画像の各画素のデプス値を用いて、前記デプス画像の各画素のデプス値を補正し、
補正された前記デプス画像の各画素のデプス値に基づく新たなデプス画像を用いて、前記背景に係る背景モデルを生成する、
処理をコンピュータに実行させることを特徴とする背景モデル生成プログラム。 (Supplementary Note 15) A camera image captured by a camera is acquired from a predetermined image capturing position,
Calculating a depth image corresponding to the imaging position,
Separating a plurality of pixels included in the camera image into a foreground and a background,
Using the depth value of each pixel of the depth image corresponding to each pixel separated into the background in the camera image, the depth value of each pixel of the depth image is corrected,
A background model relating to the background is generated using a new depth image based on the depth value of each pixel of the corrected depth image,
A background model generation program characterized by causing a computer to execute processing.

（付記１６）前記補正する処理は、前記デプス画像に含まれる複数の画素のうち前記前景に分離された各画素のデプス値に無効値を設定することを特徴とする付記１５に記載の背景モデル生成プログラム。 (Supplementary note 16) The background model according to Supplementary note 15, wherein in the correction process, an invalid value is set as a depth value of each pixel separated into the foreground among a plurality of pixels included in the depth image. Generator.

（付記１７）前記補正する処理は、前記デプス画像に含まれる複数の画素ごとに、デプス値および周辺画素のデプス値のうち前記背景に分離された画素のデプス値を所定のフィルタに設定されたフィルタ係数に基づいて畳み込むことにより、前記背景に分離された各画素のデプス値を補正することを特徴とする付記１６に記載の背景モデル生成プログラム。 (Supplementary Note 17) In the correction process, for each of a plurality of pixels included in the depth image, the depth value of the pixels separated into the background among the depth value and the depth values of the peripheral pixels is set in a predetermined filter. 17. The background model generation program according to appendix 16, wherein the depth value of each pixel separated into the background is corrected by convolving based on a filter coefficient.

（付記１８）前記補正する処理は、前記デプス画像に含まれる複数の画素ごとに、補正対象とする注目フレームにおけるデプス値および過去のフレームにおけるデプス値のうち前記背景に分離された画素のデプス値を所定のフィルタに設定されたフィルタ係数に基づいて畳み込むことにより、前記背景に分離された各画素のデプス値を補正することを特徴とする付記１６に記載の背景モデル生成プログラム。 (Supplementary Note 18) In the correction process, for each of a plurality of pixels included in the depth image, the depth value of the pixel separated into the background among the depth value in the target frame to be corrected and the depth values in the past frames. 17. The background model generation program according to appendix 16, wherein the depth value of each pixel separated into the background is corrected by convoluting the above with a filter coefficient set in a predetermined filter.

（付記１９）前記分離する処理は、前記デプス画像を基づいて前記カメラ画像に含まれる複数の画素を前景と背景に分離し、
前記補正する処理は、前記デプス画像に基づいて前記背景に分離された各画素のデプス値を用いて、前記デプス画像の各画素のデプス値を補正することを特徴とする付記１５に記載の背景モデル生成プログラム。 (Supplementary Note 19) The separation processing separates a plurality of pixels included in the camera image into a foreground and a background based on the depth image,
16. The background according to appendix 15, wherein the correction process corrects the depth value of each pixel of the depth image by using the depth value of each pixel separated into the background based on the depth image. Model generator.

（付記２０）前記分離する処理は、前記カメラ画像に含まれる複数の画素のうち前記背景に分離される各画素を、前記背景のカテゴリがさらに区分された背景のサブカテゴリ群に分離し、
前記補正する処理は、前記デプス画像に含まれる第一の画素のデプス値を、前記第一の画素が分離された背景のサブカテゴリと同一の背景のサブカテゴリに分離された第二の画素のデプス値を用いて補正することを特徴とする付記１５に記載の背景モデル生成プログラム。 (Supplementary Note 20) In the separating process, each pixel of the plurality of pixels included in the camera image, which is separated into the background, is separated into a background subcategory group into which the background category is further divided,
The correction process, the depth value of the first pixel included in the depth image, the depth value of the second pixel separated into the same background sub-category as the background sub-category from which the first pixel is separated The background model generation program according to attachment 15, wherein the background model generation program is corrected by using

１映像生成システム
３ｆｇ，３ｂｇｓ，３ｂｇｄ被写体
５Ａ〜５Ｎカメラ
１０サーバ装置
１１通信Ｉ／Ｆ部
１３記憶部
１５制御部
１５Ａ取得部
１５Ｂ算出部
１５Ｃ分離部
１５Ｄ補正部
１５Ｅ前景生成部
１５Ｆ背景生成部
１５Ｇレンダリング部
３０クライアント端末 1 image generation system 3fg, 3bgs, 3bgd subject 5A to 5N camera 10 server device 11 communication I/F unit 13 storage unit 15 control unit 15A acquisition unit 15B calculation unit 15C separation unit 15D correction unit 15E foreground generation unit 15F background generation unit 15G Rendering unit 30 Client terminal

Claims

An acquisition unit that acquires a camera image captured by the camera from a predetermined imaging position,
A calculation unit that calculates a depth image corresponding to the imaging position,
A separation unit that separates a plurality of pixels included in the camera image into a foreground and a background;
A correction unit that corrects the depth value of each pixel of the depth image by using the depth value of each pixel of the depth image corresponding to each pixel separated into the background in the camera image,
Using a new depth image based on the depth value of each pixel of the depth image corrected by the correction unit, a background generation unit that generates a background model related to the background,
A background model generation device comprising:

The background model generation device according to claim 1, wherein the correction unit sets an invalid value to a depth value of each pixel separated into the foreground among a plurality of pixels included in the depth image.

The correction unit, for each of a plurality of pixels included in the depth image, the depth value of the pixels separated into the background among the depth values and the depth values of peripheral pixels based on a filter coefficient set in a predetermined filter. The background model generation device according to claim 1, wherein the depth value of each pixel separated into the background is corrected by convolution.

The correction unit sets, for each of a plurality of pixels included in the depth image, a depth value of a pixel separated into the background among depth values in a target frame to be corrected and depth values in a past frame to a predetermined filter. The background model generation device according to claim 1, wherein the depth value of each pixel separated into the background is corrected by performing convolution based on the set filter coefficient.

The separation unit separates a plurality of pixels included in the camera image into a foreground and a background based on the depth image,
5. The correction unit corrects the depth value of each pixel of the depth image by using the depth value of each pixel separated into the background based on the depth image. The background model generation device according to any one of the above.

The separating unit separates each pixel separated into the background among a plurality of pixels included in the camera image into a background subcategory group into which the background category is further divided,
The correction unit, the depth value of the first pixel included in the depth image, the depth value of the second pixel separated into the same background sub-category as the background sub-category from which the first pixel is separated. The background model generation apparatus according to claim 1, wherein the background model generation apparatus corrects the background model.

Acquire the camera image taken by the camera from the predetermined imaging position,
Calculating a depth image corresponding to the imaging position,
Separating a plurality of pixels included in the camera image into a foreground and a background,
Using the depth value of each pixel of the depth image corresponding to each pixel separated into the background in the camera image, the depth value of each pixel of the depth image is corrected,
A background model relating to the background is generated using a new depth image based on the depth value of each pixel of the corrected depth image,
A background model generation method characterized in that a computer executes the processing.

Acquire the camera image taken by the camera from the predetermined imaging position,
Calculating a depth image corresponding to the imaging position,
Separating a plurality of pixels included in the camera image into a foreground and a background,
Using the depth value of each pixel of the depth image corresponding to each pixel separated into the background in the camera image, the depth value of each pixel of the depth image is corrected,
A background model relating to the background is generated using a new depth image based on the depth value of each pixel of the corrected depth image,
A background model generation program characterized by causing a computer to execute processing.