JP2022099127A

JP2022099127A - Subject silhouette extract apparatus, method, and program

Info

Publication number: JP2022099127A
Application number: JP2020212906A
Authority: JP
Inventors: 良亮渡邊; Ryosuke Watanabe
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2022-07-04
Anticipated expiration: 2040-12-22

Abstract

To provide a subject silhouette extraction apparatus, method, and program capable of extracting a subject silhouette suitable for 3D model generation from a video image in consideration of an estimation error of camera parameters.SOLUTION: In a subject silhouette extraction apparatus 1, a video image acquisition unit 10 acquires a video image from at least one camera cam, a camera parameter error estimation unit 20 estimates an error of a camera parameter indicating each camera state for each camera based on a still image extracted from the video image, for example, a frame image, a silhouette extraction parameter calculation unit 30 calculates a parameter (silhouette extraction parameter) when extracting a silhouette from the video image based on the estimation result of the error of the camera parameter, and a silhouette calculation unit 40 extracts a silhouette image from the video image by silhouette calculation using the silhouette extraction parameter for each frame.SELECTED DRAWING: Figure 1

Description

本発明は、カメラで撮影した動画像から被写体部分のみを前景としたシルエットを抽出する被写体シルエット抽出装置、方法及びプログラムに関する。 The present invention relates to a subject silhouette extraction device, a method, and a program for extracting a silhouette with only the subject portion as the foreground from a moving image taken by a camera.

自由視点映像技術は、複数カメラの映像を入力してカメラが存在しない視点も含めた任意の視点からの映像視聴を可能とする技術である。自由視点映像を実現する一手法として、非特許文献1が開示する視体積交差法に基づく3Dモデルベースの自由視点映像生成手法が知られている。 Free-viewpoint video technology is a technology that enables video viewing from any viewpoint, including viewpoints where cameras do not exist, by inputting video from multiple cameras. As a method for realizing a free-viewpoint image, a 3D model-based free-viewpoint image generation method based on the visual volume crossing method disclosed in Non-Patent Document 1 is known.

視体積交差法は、図13に示す様に、異なるカメラ位置で撮影したN枚のシルエット画像を3次元ワールド座標に投影した際の視錐体の共通部分を次式(1)に基づいて視体積（Visual Hull）VH(K)として獲得する技術である。 In the visual volume crossing method, as shown in Fig. 13, the intersection of the visual cones when N silhouette images taken at different camera positions are projected onto the 3D world coordinates is viewed based on the following equation (1). Volume (Visual Hull) This is a technology acquired as VH (K).

ここで、集合Kは各カメラのシルエット画像の集合であり、V_kはk番目のカメラから得られるシルエット画像に基づいて計算される視錐体である。また、通常はN枚全てのカメラの共通部分となる部分がモデル化されるが、N-1枚が共通する場合にモデル化するなど、モデル化が成されるカメラ台数に関しては変更してもよい。 Here, the set K is a set of silhouette images of each camera, and V _k is a visual cone calculated based on the silhouette images obtained from the kth camera. In addition, normally, the part that is the common part of all N cameras is modeled, but even if the number of cameras to be modeled is changed, such as modeling when N-1 cameras are common. good.

このとき、例えばマーチングキューブ法などのボクセルモデルをポリゴンモデルに変換する手法を用いてボクセルモデルをポリゴンモデルに変換する機能を具備し、ポリゴンモデルとして3Dモデルを出力する機能を有していてもよい。 At this time, it may have a function of converting a voxel model into a polygon model by using a method of converting a voxel model into a polygon model such as a marching cube method, and may have a function of outputting a 3D model as a polygon model. ..

このような視体積交差法は、非特許文献2が開示するフルモデル方式自由視点(＝3Dモデルの形状を忠実に表現する方式の自由視点)を実現する上での基礎技術として利用されている。 Such a visual volume crossing method is used as a basic technique for realizing a full model free viewpoint (= a free viewpoint of a method that faithfully expresses the shape of a 3D model) disclosed in Non-Patent Document 2. ..

視体積交差法で利用する積集合を得るためのシルエット抽出手法として、非特許文献3に代表される背景差分法ベースの手法が知られている。背景差分法は、背景モデルと呼ばれる被写体が存在しない状態のモデルと入力画像の差分を基に被写体を抽出する手法である。また、近年は非特許文献4が開示するDeep Learningベースの被写体シルエット抽出手法も登場し、高精度でシルエットを抽出可能な手法が次々と提案されている。 As a silhouette extraction method for obtaining an intersection used in the visual volume crossing method, a method based on the background subtraction method represented by Non-Patent Document 3 is known. The background subtraction method is a method called a background model in which a subject is extracted based on the difference between a model in which no subject exists and an input image. Further, in recent years, a deep learning-based subject silhouette extraction method disclosed in Non-Patent Document 4 has also appeared, and methods capable of extracting silhouettes with high accuracy have been proposed one after another.

特願2020-012676号Japanese Patent Application No. 2020-012676

Laurentini, A. "The visual hull concept for silhouette based image understanding.", IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 150-162, (1994).Laurentini, A. "The visual hull concept for silhouette based image understanding.", IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 150-162, (1994). J. Kilner, J. Starck, A. Hilton and O. Grau, "Dual-Mode Deformable Models for Free-Viewpoint Video of Sports Events," Sixth International Conference on 3-D Digital Imaging and Modeling (3DIM 2007), Montreal, QC, 2007, pp. 177-184.J. Kilner, J. Starck, A. Hilton and O. Grau, "Dual-Mode Deformable Models for Free-Viewpoint Video of Sports Events," Sixth International Conference on 3-D Digital Imaging and Modeling (3DIM 2007), Montreal, QC, 2007, pp. 177-184. C. Stauffer and W. E. L. Grimson, "Adaptive background mixture models for real-time tracking," 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 246-252, Vol. 2, (1999).C. Stauffer and W. E. L. Grimson, "Adaptive background mixture models for real-time tracking," 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 246-252, Vol. 2, (1999). Lim, Long Ang, and Hacer Yalim Keles. "Foreground Segmentation Using Convolutional Neural Networks for Multiscale Feature Encoding." Pattern Recognition Letters, (2018).Lim, Long Ang, and Hacer Yalim Keles. "Foreground Segmentation Using Convolutional Neural Networks for Multiscale Feature Encoding." Pattern Recognition Letters, (2018). J. Chen, et al, "Sports Camera Calibration via Synthetic Data," CVPR Workshop, 2019.J. Chen, et al, "Sports Camera Calibration via Synthetic Data," CVPR Workshop, 2019. Z. Zhang. A flexible new technique for camera calibration. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(11):1330-1334, 2000.Z. Zhang. A flexible new technique for camera calibration. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22 (11): 1330-1334, 2000. 鶴崎裕貴、野中敬介、渡邊良亮、内藤整, "Line Segment Detectorを用いたカメラキャリブレーションの高精度化に関する検討," , 2020-AVM-108, 7, pp. 1-6, 2020.Yuki Tsurusaki, Keisuke Nonaka, Ryosuke Watanabe, Sei Naito, "Study on Higher Accuracy of Camera Calibration Using Line Segment Detector,", 2020-AVM-108, 7, pp. 1-6, 2020. O. Barnich and M. Van Droogenbroeck, "ViBe: A Universal Background Subtraction Algorithm for Video Sequences," in IEEE Transactions on Image Processing, vol. 20, no. 6, pp. 1709-1724, June 2011.O. Barnich and M. Van Droogenbroeck, "ViBe: A Universal Background Subtraction Algorithm for Video Sequences," in IEEE Transactions on Image Processing, vol. 20, no. 6, pp. 1709-1724, June 2011.

視体積交差法に利用するシルエット抽出では、非特許文献3，4のシルエット抽出技術により得られるシルエットが完璧（ここでの完璧とは、非特許文献3，4の中で評価に用いられているF-Measureが100%を示すことと定義する）であったとしても、視体積交差法によって生成される3Dモデルに欠損が生じ得る。これは、カメラの内部パラメータや外部パラメータの推定誤差が一つの要因となっている。 In the silhouette extraction used for the visual volume crossing method, the silhouette obtained by the silhouette extraction techniques of Non-Patent Documents 3 and 4 is perfect (perfectness here is used for evaluation in Non-Patent Documents 3 and 4). Even if F-Measure is defined as showing 100%), defects can occur in the 3D model generated by the visual volume crossing method. One factor in this is the estimation error of the internal parameters and external parameters of the camera.

視体積交差法を用いた3Dモデルの生成のためには、事前にカメラの位置や向きを正しく推定し、加えて画像中に含まれるレンズによる歪を除去するなど、カメラパラメータの推定誤差を小さくする必要がある。これらの研究は、非特許文献5などで改善が進められているものの研究途上であり、正確にカメラの位置や向きを自動推定することは困難である。その結果、図6に示すように合成後の3Dモデルに欠損が発生し、視聴時に違和感が発生するという課題があった。 In order to generate a 3D model using the visual volume crossing method, the position and orientation of the camera are estimated correctly in advance, and the distortion caused by the lens contained in the image is removed to reduce the estimation error of the camera parameters. There is a need to. Although these studies have been improved in Non-Patent Document 5, etc., they are still under research, and it is difficult to accurately estimate the position and orientation of the camera automatically. As a result, as shown in FIG. 6, there is a problem that a defect occurs in the synthesized 3D model and a feeling of strangeness occurs during viewing.

本発明の目的は、上記の技術課題を解決し、カメラパラメータの推定誤差を考慮して、動画像から3Dモデル生成に適した被写体シルエットを抽出できる被写体シルエット抽出装置、方法及びプログラムを提供することにある。 An object of the present invention is to provide a subject silhouette extraction device, a method and a program capable of extracting a subject silhouette suitable for 3D model generation from a moving image in consideration of the estimation error of camera parameters by solving the above technical problems. It is in.

上記の目的を達成するために、本発明は、カメラで撮影した動画像から被写体のシルエットを抽出する被写体シルエット抽出装置において、以下の構成を具備した点に特徴がある。 In order to achieve the above object, the present invention is characterized in that the subject silhouette extraction device for extracting the silhouette of the subject from the moving image taken by the camera has the following configuration.

(1) 動画像に基づいてカメラパラメータ誤差を推定する手段と、カメラパラメータ誤差の推定結果に基づいてシルエット抽出パラメータを計算する手段と、シルエット抽出パラメータに基づいて被写体シルエットを計算する手段とを具備した。 (1) Equipped with a means for estimating the camera parameter error based on the moving image, a means for calculating the silhouette extraction parameter based on the estimation result of the camera parameter error, and a means for calculating the subject silhouette based on the silhouette extraction parameter. did.

(2) カメラパラメータ誤差が大きいほど被写体シルエットの輪郭を膨張するようにした。 (2) The contour of the subject silhouette is expanded as the camera parameter error increases.

(3) 被写体シルエットの輪郭に対して縮退および膨張の各処理を当該順序で少なくとも一回繰り返すようにした。 (3) The degeneracy and expansion processes for the contour of the subject silhouette are repeated at least once in this order.

(4) カメラパラメータ誤差が大きいほど各画素が前景領域に識別され易くなるように背景差分閾値を計算するようにした。 (4) The background subtraction threshold is calculated so that the larger the camera parameter error, the easier it is for each pixel to be identified in the foreground region.

(5) カメラパラメータ誤差が大きいほど背景モデルの更新率を低い値に計算するようにした。 (5) The larger the camera parameter error, the lower the update rate of the background model is calculated.

(6) カメラパラメータ誤差が大きいほど、各画素が前景領域に識別され易くなるように背景差分閾値を計算し、背景モデルの更新率を低い値に計算するようにした。 (6) The background difference threshold is calculated so that the larger the camera parameter error is, the easier it is for each pixel to be identified in the foreground region, and the update rate of the background model is calculated to a low value.

(1) カメラパラメータ誤差の推定結果に基づいて計算したシルエット抽出パラメータを用いて被写体シルエットを計算するので、当該被写体シルエットを用いて3Dモデルを生成する際にカメラパラメータ誤差が原因で生じ得る欠けなどの品質劣化を抑制できるようになる。 (1) Since the subject silhouette is calculated using the silhouette extraction parameters calculated based on the estimation result of the camera parameter error, defects that may occur due to the camera parameter error when generating a 3D model using the subject silhouette, etc. It becomes possible to suppress the deterioration of the quality of the camera.

(2) カメラパラメータ誤差が大きいほど被写体シルエットの輪郭が膨張するので、カメラパラメータ誤差が原因で生じ得る欠けなどの品質劣化を抑制できる被写体シルエットを提供できるようになる。 (2) Since the contour of the subject silhouette expands as the camera parameter error increases, it becomes possible to provide a subject silhouette that can suppress quality deterioration such as chipping that may occur due to the camera parameter error.

(3) 被写体シルエットの輪郭に対して縮退および膨張の各処理を当該順序で少なくとも一回繰り返されるので、カメラパラメータ誤差が原因で生じ得る欠けやノイズなどの品質劣化を抑制できる被写体シルエットを提供できるようになる。 (3) Since each process of degeneration and expansion is repeated at least once in the order for the contour of the subject silhouette, it is possible to provide a subject silhouette that can suppress quality deterioration such as chipping and noise that may occur due to camera parameter error. It will be like.

(4) カメラパラメータ誤差が大きいほど各画素が前景領域に識別され易くなるように背景差分閾値が計算されるので被写体シルエットを拡張できる。したがって、カメラパラメータ誤差が原因で生じ得る欠けなどの品質劣化を抑制できる被写体シルエットを提供できるようになる。 (4) The background difference threshold is calculated so that the larger the camera parameter error, the easier it is for each pixel to be identified in the foreground region, so the subject silhouette can be expanded. Therefore, it becomes possible to provide a subject silhouette that can suppress quality deterioration such as chipping that may occur due to a camera parameter error.

(5) カメラパラメータ誤差が大きいほど背景モデルの更新率が低い値に計算されるので、各画素が前景領域に識別され易くなる。したがって、カメラパラメータ誤差が原因で生じ得る欠けなどの品質劣化を抑制できる被写体シルエットを提供できるようになる。 (5) The larger the camera parameter error, the lower the update rate of the background model is calculated, so each pixel can be easily identified in the foreground region. Therefore, it becomes possible to provide a subject silhouette that can suppress quality deterioration such as chipping that may occur due to a camera parameter error.

(6) カメラパラメータ誤差が大きいほど各画素が前景領域に識別され易くなるように背景差分閾値が計算され、背景モデルの更新率が低い値に計算されるので、カメラパラメータ誤差が原因で生じ得る欠けなどの品質劣化を抑制できる。 (6) The background subtraction threshold is calculated so that the larger the camera parameter error is, the easier it is for each pixel to be identified in the foreground region, and the update rate of the background model is calculated to a low value, which may occur due to the camera parameter error. Quality deterioration such as chipping can be suppressed.

本発明を適用した被写体シルエット抽出装置を含む3Dモデル生成システムの機能ブロック図である。It is a functional block diagram of the 3D model generation system including the subject silhouette extraction apparatus to which this invention is applied. 本発明の第1実施形態に係る被写体シルエット抽出装置の機能ブロック図である。It is a functional block diagram of the subject silhouette extraction apparatus which concerns on 1st Embodiment of this invention. カメラパラメータの推定方法を模式的に示した図である。It is a figure which showed schematically the estimation method of a camera parameter. 再投影誤差が発生している例を示した図である。It is a figure which showed the example which a reprojection error occurs. カメラパラメータの推定誤差が原因で視体積が小さく計算される例を示した図である。It is a figure which showed the example which the visual volume is calculated small due to the estimation error of a camera parameter. カメラパラメータの推定誤差が原因で3Dモデルに欠損が生じる例を示した図である。It is a figure which showed the example which a defect occurs in a 3D model due to the estimation error of a camera parameter. シルエットの輪郭膨張により欠損の少ない3Dモデルが形成される例を示した図である。It is a figure which showed the example which the 3D model with few defects is formed by the contour expansion of a silhouette. 本発明の第2実施形態に係る被写体シルエット抽出装置の機能ブロック図である。It is a functional block diagram of the subject silhouette extraction apparatus which concerns on 2nd Embodiment of this invention. 本発明の第3実施形態に係る被写体シルエット抽出装置の機能ブロック図である。It is a functional block diagram of the subject silhouette extraction apparatus which concerns on 3rd Embodiment of this invention. 背景モデルを用いた背景差分法によりシルエットを抽出する方法を模式的に示した図である。It is a figure which showed schematically the method of extracting the silhouette by the background subtraction method using the background model. 本発明の第4実施形態に係る被写体シルエット抽出装置の機能ブロック図である。It is a functional block diagram of the subject silhouette extraction apparatus which concerns on 4th Embodiment of this invention. 本発明の第5実施形態に係る被写体シルエット抽出装置の機能ブロック図である。It is a functional block diagram of the subject silhouette extraction apparatus which concerns on 5th Embodiment of this invention. 視体積交差法による3Dモデルの生成方法を示した図である。It is a figure which showed the generation method of the 3D model by the visual volume crossing method.

以下、図面を参照して本発明の実施の形態について詳細に説明する。図1は、本発明を適用した被写体シルエット抽出装置1を含む3Dモデル生成システム100の主要部の構成を示した機能ブロック図である。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a functional block diagram showing a configuration of a main part of a 3D model generation system 100 including a subject silhouette extraction device 1 to which the present invention is applied.

被写体シルエット抽出装置1は、動画像取得部10、カメラパラメータ誤差推定部20、シルエット抽出パラメータ計算部30およびシルエット計算部40を主要な構成とする。3Dモデル生成システム100は前記被写体シルエット抽出装置1に加えて、シルエット画像を用いて視体積交差法により3Dモデルを生成する3Dモデル生成部2を含む。 The subject silhouette extraction device 1 mainly includes a moving image acquisition unit 10, a camera parameter error estimation unit 20, a silhouette extraction parameter calculation unit 30, and a silhouette calculation unit 40. In addition to the subject silhouette extraction device 1, the 3D model generation system 100 includes a 3D model generation unit 2 that generates a 3D model by a visual volume crossing method using a silhouette image.

このような3Dモデル生成システム100またはその被写体シルエット抽出装置1は、CPU，ROM，RAM，バス，インタフェース等を備えた少なくとも一台の汎用のコンピュータやサーバに各機能を実現するアプリケーション（プログラム）を実装することで構成できる。あるいはアプリケーションの一部をハードウェア化またはソフトウェア化した専用機や単能機としても構成できる。 Such a 3D model generation system 100 or its subject silhouette extraction device 1 provides an application (program) that realizes each function on at least one general-purpose computer or server equipped with a CPU, ROM, RAM, bus, interface, and the like. It can be configured by implementing it. Alternatively, it can be configured as a dedicated machine or a single-purpose machine in which a part of the application is made into hardware or software.

動画像取得部10は、少なくとも一台のカメラcamから動画像を取得する。動画像を複数のカメラから取得する場合、各カメラは被写体を異なる視点から撮影するように配置される。動画像はカメラから直接取得しても良いし、あるいは動画像データベース（DB）3などに保蔵されている動画像ファイルを読み出すことで取得しても良い。 The moving image acquisition unit 10 acquires a moving image from at least one camera cam. When moving images are acquired from a plurality of cameras, each camera is arranged so as to shoot the subject from different viewpoints. The moving image may be acquired directly from the camera, or may be acquired by reading the moving image file stored in the moving image database (DB) 3 or the like.

カメラパラメータ誤差推定部20は、動画像から抽出した静止画像、例えばフレーム画像に基づいてカメラごとに各カメラの状態を示すカメラパラメータの誤差（カメラパラメータ誤差）を推定する。 The camera parameter error estimation unit 20 estimates a camera parameter error (camera parameter error) indicating the state of each camera for each camera based on a still image extracted from a moving image, for example, a frame image.

シルエット抽出パラメータ計算部30は、カメラパラメータ誤差の推定結果に基づいて、動画像からシルエットを抽出する際のパラメータ（シルエット抽出パラメータ）を計算する。シルエット計算部40は、動画像からフレームごとに前記シルエット抽出パラメータを用いたシルエット計算によりシルエット画像を抽出する。 The silhouette extraction parameter calculation unit 30 calculates a parameter (silhouette extraction parameter) for extracting a silhouette from a moving image based on the estimation result of the camera parameter error. The silhouette calculation unit 40 extracts a silhouette image from a moving image by silhouette calculation using the silhouette extraction parameter for each frame.

3Dモデル生成部2は、前記図13を参照して説明したように、シルエット画像を用いた視体積交差法により被写体の3Dモデルを生成する。 As described with reference to FIG. 13, the 3D model generation unit 2 generates a 3D model of the subject by the visual volume crossing method using the silhouette image.

図2は、前記被写体シルエット抽出装置1の第1実施形態の構成を示した機能ブロック図であり、ここでは本発明の説明に不要な構成は図示を省略している。本実施形態は、動画像から既知の手法で取得した被写体のシルエットを前記シルエット抽出パラメータに基づいて膨張させるようにした点に特徴がある。 FIG. 2 is a functional block diagram showing the configuration of the first embodiment of the subject silhouette extraction device 1, and the configurations unnecessary for the description of the present invention are not shown here. The present embodiment is characterized in that the silhouette of the subject acquired from the moving image by a known method is expanded based on the silhouette extraction parameter.

カメラパラメータ誤差推定部20は、カメラパラメータ推定部21および再投影誤差推定部22を含む。カメラパラメータは、ワールド座標上の3Dの点(X, Y, Z)をカメラ映像上の2Dの点(u, v)に変換するために用いられ、内部パラメータおよび外部パラメータを含む。ピンホールカメラモデルにおける変換式は次式(2)の行列で表される。 The camera parameter error estimation unit 20 includes a camera parameter estimation unit 21 and a reprojection error estimation unit 22. Camera parameters are used to transform a 3D point (X, Y, Z) on world coordinates into a 2D point (u, v) on the camera image, including internal and external parameters. The conversion formula in the pinhole camera model is represented by the matrix of the following formula (2).

r11～r33はカメラの向きを示す回転行列、t1～t3はカメラの位置を表す並進行列であり、いずれもカメラの外部パラメータと呼ばれる。fx，fyはズーム具合を示すピクセル単位の焦点距離、cx、cyは画像の主点であり、いずれもカメラの内部パラメータと呼ばれる。内部パラメータは、その他にカメラレンズ特有の放射状歪や接線歪等をモデル化する際の歪み係数を含む場合もある。sは[u,v,1]とするためのスケーリングに用いる変数である。 r11 to r33 are rotation matrices indicating the direction of the camera, and t1 to t3 are parallel traveling matrices indicating the position of the camera, both of which are called external parameters of the camera. fx and fy are the focal lengths in pixel units that indicate the zoom condition, and cx and cy are the principal points of the image, both of which are called internal parameters of the camera. The internal parameters may also include distortion coefficients when modeling radial distortion, tangential distortion, etc. peculiar to the camera lens. s is a variable used for scaling to make it [u, v, 1].

前記カメラパラメータ推定部21は、既知の手法によりカメラパラメータを推定する。本実施形態では、非特許文献6が開示するように、ワールド座標系の既知の3次元ワールド座標と画像上の対応する2次元座標とのペアを多数集め、これらの再投影誤差が小さくなるように最適なパラメータを推定する手法を採用する。 The camera parameter estimation unit 21 estimates camera parameters by a known method. In this embodiment, as disclosed in Non-Patent Document 6, a large number of pairs of known 3D world coordinates of the world coordinate system and corresponding 2D coordinates on the image are collected so that their reprojection errors are reduced. A method for estimating the optimum parameters is adopted.

図3は、非特許文献6によるカメラパラメータの推定方法を模式的に示した図である。撮影対象がスタジアム内のフィールドやコートを含み、同図(a)のように、白線交点の3次元ワールド座標(Xi, Yi, Zi)が規格等から既知であれば、これらの白線交点と同図(b)の2D画像に写り込んだ対応する各白線交点の2D位置(u_{c, i}，v_{c, i})との画素ペアを多く収集して各パラメータを暫定的に推定する。そして、このパラメータを使って3D座標(Xi, Yi, Zi)を2D画像上に投影した点(u'_{c, i}，v'_{c, i})と写り込んだ位置(u_{c, i}，v_{c, i})との距離の差（再投影誤差）Rc, iが小さくなるように各パラメータを最適化することでカメラパラメータが推定される。 FIG. 3 is a diagram schematically showing a method of estimating camera parameters according to Non-Patent Document 6. If the subject to be photographed includes fields and courts in the stadium and the three-dimensional world coordinates (Xi, Yi, Zi) of the white line intersections are known from the standards, etc., as shown in Fig. (A), the same as these white line intersections. A large number of pixel pairs with the 2D positions (u _{c, i} , v _{c, i} ) of the corresponding white line intersections reflected in the 2D image of Fig. (B) are collected and each parameter is tentatively estimated. Then, using this parameter, the 3D coordinates (Xi, Yi, Zi) are projected onto the 2D image at the point (u'c _{, i} , v'c _{, i} ) and the reflected position (u _{c, i} , v _c) . _{, i} ) The camera parameters are estimated by optimizing each parameter so that the difference in distance (reprojection error) Rc, i becomes small.

iは画素ペアのインデックスであり、図3の例ではスタジアムの白線交点の3D位置が既知なので、その3D位置に該当する画像[同図(b)]上の対応点をUI上での手動での選択、ないしは非特許文献7が開示する手法に基づいて自動推定することによって、3Dワールド座標上の点(Xi, Yi, Zi)と2D画像上の点(u_{c, i}，v_{c, i})との対応する画素ペアを獲得する。 i is the index of the pixel pair, and in the example of Fig. 3, the 3D position of the white line intersection of the stadium is known, so the corresponding point on the image [Fig. (B)] corresponding to that 3D position is manually set on the UI. (Xi, Yi, Zi) and points on 2D images (u _{c, i} , v _{c, i} ) by automatic estimation based on the selection of or the method disclosed in Non-Patent Document 7. ) And the corresponding pixel pair.

本実施形態では、カメラパラメータ推定部21が各画素ペアiの再投影誤差Rc, iの平均値が小さくなるカメラパラメータを推定するものの、取得する動画像がレンズによる歪曲収差の影響で歪んでいる場合や、2D画像上の点(uc, i，vc, i)の推定位置に誤りが含まれる場合には再投影誤差Rc, iが0にならない。 In the present embodiment, the camera parameter estimation unit 21 estimates the camera parameter in which the average value of the reprojection errors Rc and i of each pixel pair i becomes small, but the acquired moving image is distorted due to the influence of the distortion caused by the lens. In some cases, or when the estimated position of the point (uc, i, vc, i) on the 2D image contains an error, the reprojection error Rc, i does not become 0.

図4は、再投影誤差が発生している例を示した図であり、この例では白線交点の検出された位置(uc, 11，vc, 11)が実際の白線交点の位置から大幅にずれていることから、推定されたカメラパラメータを利用して3D座標(X11, Y11, Z11)を2D画像上に投影した点(u'_{c, 11}，v'_{c, 11})に関しても実際の白線交点の位置から大幅にずれており、最終的に再投影誤差Rc, iも大きくなっている。これは極端な例ではあるが、一般的にも再投影誤差Rc, iを0にすることは困難である。 Figure 4 shows an example in which a reprojection error occurs. In this example, the detected positions of the white line intersections (uc, 11, vc, 11) deviate significantly from the actual white line intersection positions. Therefore, the actual white line intersections at the points (u'c, ₁₁ , v'c, ₁₁ ) where the 3D coordinates (X11, Y11, Z11) are projected onto the 2D image using the estimated camera parameters. It deviates significantly from the position of, and finally the reprojection error Rc, i also becomes large. This is an extreme example, but it is generally difficult to set the reprojection error Rc and i to 0.

再投影誤差Rc, iが大きいと、カメラパラメータの推定結果が誤差を含んでいる可能性が高くなることから、この状況下で自由視点制作を行うと、図５に示したように視体積が小さく計算されることがあり、その結果、図6に一例を示したように3Dモデルに欠損が生じ得る。 If the reprojection error Rc and i are large, there is a high possibility that the estimation result of the camera parameters contains an error. Therefore, when free viewpoint production is performed under this situation, the visual volume becomes large as shown in FIG. It may be calculated small, resulting in defects in the 3D model, as shown in Figure 6.

本実施形態では、前記再投影誤差推定部22が、カメラc上の全ての画素ペアから計算される再投影誤差Rc, iの平均を次式(3)で計算し、これで各カメラcが抱えるカメラパラメータ誤差Ecを代表する。ここで、Iは(Xi, Yi, Zi)と(uc, i，vc, i)との対応する画素ペアの総数である。 In the present embodiment, the reprojection error estimation unit 22 calculates the average of the reprojection errors Rc and i calculated from all the pixel pairs on the camera c by the following equation (3), whereby each camera c is calculated by the following equation (3). It represents the camera parameter error Ec that it has. Here, I is the total number of corresponding pixel pairs of (Xi, Yi, Zi) and (uc, i, vc, i).

上式(3)ではカメラパラメータE_cがカメラごとに一つの値として求まるが、カメラ上の画素(u, v)ごとに異なるカメラパラメータ誤差E_c,(u, v)を計算するようにしてもよい。画素ごとに異なるカメラパラメータ誤差を計算する手段として、上式(3)において、各画素(u, v)に最も近い画素ペアの再投影誤差R_{c, i}をE_c(u, v)として得る方法が考えられる。あるいは画素(u, v)ごとに周囲の各画素ペアの再投影誤差R_{c, i}を、距離が近い画素ペアの再投影誤差R_{c, i}ほど強く反映されるように距離の逆数に応じて重み付け和した値としても良い。 In the above equation (3), the camera parameter E _c is obtained as one value for each camera, but the camera parameter error E _c, (u, v) that differs for each pixel (u, v) on the camera is calculated. May be good. As a means of calculating different camera parameter errors for each pixel, in the above equation (3), the reprojection error R _{c, i} of the pixel pair closest to each pixel (u, v) is obtained as E _c (u, v). The method can be considered. Alternatively, for each pixel (u, v), the reprojection error R _{c, i} of each surrounding pixel pair is reflected more strongly as the reprojection error R _{c, i} of the pixel pair with a shorter distance, depending on the reciprocal of the distance. It may be a weighted sum.

これにより、画素(u, v)に応じた再投影誤差を採用できるため、画像の一部分のみに歪曲収差が大きく含まれる場合などに、より正確に誤差を推定できる。また、画素ごとに誤差E_c(u, v)を得ることで、後段のシルエット抽出パラメータ計算部30において画素(u, v)ごとに異なるカメラパラメータを設定できるようになる。 As a result, the reprojection error according to the pixel (u, v) can be adopted, so that the error can be estimated more accurately when only a part of the image contains a large amount of distortion. Further, by obtaining the error E _c (u, v) for each pixel, different camera parameters can be set for each pixel (u, v) in the silhouette extraction parameter calculation unit 30 in the subsequent stage.

なお、カメラパラメータ誤差Ec (u, v)の計算方法は再投影誤差を用いる方法に限定されない。例えば、カメラレンズによって生じる放射状歪がカメラの画角の外側ほど大きくなるという傾向に基づいて、画像中心からの距離が遠くなるほどEc (u, v)が大きくなるように計算するなどの方法を採用してもよい。 The calculation method of the camera parameter error Ec (u, v) is not limited to the method using the reprojection error. For example, based on the tendency that the radial distortion caused by the camera lens increases toward the outside of the angle of view of the camera, a method is adopted such that Ec (u, v) increases as the distance from the center of the image increases. You may.

シルエット抽出パラメータ計算部30は膨張量計算部31を含み、前記カメラパラメータ誤差推定部20によるカメラパラメータ誤差の推定結果に基づいて、動画像から抽出したシルエットを膨張させる際の膨張量を計算する。 The silhouette extraction parameter calculation unit 30 includes the expansion amount calculation unit 31, and calculates the expansion amount when expanding the silhouette extracted from the moving image based on the estimation result of the camera parameter error by the camera parameter error estimation unit 20.

本実施例では、非特許文献3や非特許文献4が開示する代表的な手法で動画像から抽出したシルエットの前景領域（シルエットが白の領域）の輪郭を膨張する。これにより、図7に示すように3Dモデル形成時に欠損が少ないモデル形成が可能になる。 In this embodiment, the contour of the foreground region (the region where the silhouette is white) of the silhouette extracted from the moving image is expanded by the typical method disclosed in Non-Patent Document 3 and Non-Patent Document 4. As a result, as shown in FIG. 7, it becomes possible to form a model with few defects when forming a 3D model.

なお、シルエットの輪郭を膨張させることにより、当該シルエットを用いて生成される3Dモデルの輪郭も膨張されてしまうという懸念はある。しかしながら、被写体の一部が欠損しているよりは被写体の3Dモデルが膨張されている方が視聴時の違和感が目立ちにくくなる。加えて、被写体の膨張に関しては、本発明の発明者等が発明し、特許文献1に開示したように、違和感を軽減する手法が既に提案されている。 By expanding the contour of the silhouette, there is a concern that the contour of the 3D model generated using the silhouette will also be expanded. However, when the 3D model of the subject is inflated, the discomfort during viewing becomes less noticeable than when a part of the subject is missing. In addition, regarding the expansion of the subject, as invented by the inventor of the present invention and disclosed in Patent Document 1, a method for reducing discomfort has already been proposed.

前記膨張量計算部31は、上式(3)で求められるカメラパラメータ誤差E_cに基づいて膨張量dを計算する。本実施形態では、カメラパラメータ誤差E_cが大きいカメラから抽出したシルエットほど膨張量dがより大きな値に設定され、シルエットがより大きく膨張される。これにより、カメラパラメータの影響で被写体3Dモデルに欠損が生じてしまうことを抑止できる。なお、パラメータ誤差をカメラ単位（Ec）ではなくカメラごとに画素単位Ec,i(u, v)で計算できていれば、膨張量dも画素(u, v)ごとに決定してもよい。 The expansion amount calculation unit 31 calculates the expansion amount d based on the camera parameter error E _c obtained by the above equation (3). In the present embodiment, the expansion amount d is set to a larger value as the silhouette is extracted from the camera having a larger camera parameter error E _c , and the silhouette is expanded more. As a result, it is possible to prevent the subject 3D model from being damaged due to the influence of the camera parameters. If the parameter error can be calculated not in the camera unit (Ec) but in the pixel unit Ec, i (u, v) for each camera, the expansion amount d may also be determined for each pixel (u, v).

シルエット計算部40はシルエット抽出部41およびシルエット輪郭膨張処理部42を備える。シルエット抽出部41は、フレーム画像に非特許文献3や非特許文献4が開示する任意の手法を適用して被写体のシルエットを抽出する。シルエット輪郭膨張処理部42は、シルエット抽出部41が抽出したシルエットに対する後処理として、前記膨張量dに応じた輪郭膨張の処理を施す。 The silhouette calculation unit 40 includes a silhouette extraction unit 41 and a silhouette contour expansion processing unit 42. The silhouette extraction unit 41 extracts the silhouette of the subject by applying an arbitrary method disclosed in Non-Patent Document 3 and Non-Patent Document 4 to the frame image. The silhouette contour expansion processing unit 42 performs contour expansion processing according to the expansion amount d as a post-processing for the silhouette extracted by the silhouette extraction unit 41.

前記シルエット輪郭膨張処理部42における膨張は、抽出したシルエットの前景領域の各画素を、その周辺(2d+1)×(2d+1)画素まで拡張することで行われる。dはシルエットの輪郭の膨張量を表すパラメータであり、0以上の整数となる必要がある（d=0は膨張処理が実施されないことを意味する）。なお、シルエットの輪郭膨張は周辺(2d+1)×(2d+1)画素に拡張する方法に限定されず、上下左右4画素への膨張を繰り返すなどの他の膨張方法を採用しても良い。 The expansion in the silhouette contour expansion processing unit 42 is performed by expanding each pixel in the foreground region of the extracted silhouette to the peripheral (2d + 1) × (2d + 1) pixels. d is a parameter that represents the amount of expansion of the contour of the silhouette, and must be an integer greater than or equal to 0 (d = 0 means that expansion processing is not performed). The contour expansion of the silhouette is not limited to the method of expanding to the peripheral (2d + 1) × (2d + 1) pixels, and other expansion methods such as repeating expansion to 4 pixels in the vertical and horizontal directions may be adopted. ..

なお、図8に示した第2実施形態のように、シルエット計算部40がシルエット輪郭縮退処理部43を更に備える場合には、シルエット抽出パラメータ計算部30が縮退量計算部32を更に備えても良い。 When the silhouette calculation unit 40 further includes the silhouette contour shrinkage processing unit 43 as in the second embodiment shown in FIG. 8, the silhouette extraction parameter calculation unit 30 may further include the degeneracy amount calculation unit 32. good.

シルエット抽出では、小さいサイズの縮退処理を施した後に大きいサイズの膨張処理を施すことでシルエットの前景領域は膨張させつつ縮退処理時に微細なノイズを消去できることが知られている。 In silhouette extraction, it is known that by performing a degeneracy process of a small size and then an expansion process of a large size, the foreground region of the silhouette can be expanded and minute noise can be eliminated during the degeneration process.

縮退処理は、シルエットの前景画素の周辺(2e+1)×(2e+1)画素内に前景ではない画素が1画素でも含まれている場合に、当該画素は前景領域の輪郭付近にあるものとしてシルエットの状態を前景から背景に変更することで行われる。eはシルエットの縮退量であり、0以上の整数である（ただしe=0は縮退処理が実施されないことを意味する）。 In the degeneration process, when even one non-foreground pixel is included in the peripheral (2e + 1) × (2e + 1) pixels of the foreground pixel of the silhouette, the pixel is near the outline of the foreground area. It is done by changing the state of the silhouette from the foreground to the background. e is the degenerate amount of the silhouette, which is an integer greater than or equal to 0 (where e = 0 means that the degenerate process is not performed).

前記縮退量計算部32は、膨張量dの計算結果に所定の係数を乗じるか、あるいは膨張量dを変数とする所定の関数計算によりd>eの縮退量eを計算できる。このとき、縮退量eはノイズ除去の観点ではd>eに設定されることが望ましい。しかしながら、縮退量eが大き過ぎると本来前景となるべき領域がノイズとして背景にされてしまう懸念もある。そこで、本実施形態では固定的に縮退量e=2とし、前記膨張量計算部31が前記カメラパラメータ誤差Ecおよび縮退量e=2を次式(4)に適用して膨張量dを計算する。 The shrinkage amount calculation unit 32 can calculate the shrinkage amount e of d> e by multiplying the calculation result of the expansion amount d by a predetermined coefficient or by performing a predetermined function calculation with the expansion amount d as a variable. At this time, it is desirable that the degeneracy amount e is set to d> e from the viewpoint of noise reduction. However, if the degenerate amount e is too large, there is a concern that the region that should originally be the foreground will be used as noise in the background. Therefore, in the present embodiment, the degeneracy amount e = 2 is fixedly set, and the expansion amount calculation unit 31 applies the camera parameter error Ec and the degeneracy amount e = 2 to the following equation (4) to calculate the expansion amount d. ..

ここで、roundは小数点以下を四捨五入する関数である。Econstは膨張量dを調節するための定数であり手動で設定される。上式(4)によれば、各カメラの画素ごとにカメラパラメータ誤差Ec(u, v)の大きさによって膨張量dが画素ごとに調節される。 Here, round is a function that rounds off to the nearest whole number. Econst is a constant for adjusting the expansion amount d and is set manually. According to the above equation (4), the expansion amount d is adjusted for each pixel by the magnitude of the camera parameter error Ec (u, v) for each pixel of each camera.

なお、本実施形態では縮退量eを定数としたが、膨張量dを定数とし縮退量eを変数としてもよい。この場合、上式(4)ではEconstの設定によっては膨張量dが負の値となる可能性があるが、膨張量dが負の値になる場合にはd=0として処理を行えばよい。 In this embodiment, the degenerate amount e is a constant, but the degenerate amount d may be a constant and the degenerate amount e may be a variable. In this case, in the above equation (4), the expansion amount d may have a negative value depending on the setting of Econst, but if the expansion amount d becomes a negative value, processing may be performed with d = 0. ..

また、本実施形態ではノイズ除去の観点からd>eとなるように膨張量dおよび縮退量eが計算されるものとして説明したが、本発明はこれのみに限定されるものではなく、d<eやd=eであって良い。本実施形態では、縮退処理および膨張処理が当該順序で少なくとも一回繰り返される。 Further, in the present embodiment, the expansion amount d and the degeneration amount e are calculated so that d> e from the viewpoint of noise reduction, but the present invention is not limited to this, and d <. It may be e or d = e. In this embodiment, the degeneration treatment and the expansion treatment are repeated at least once in the order.

図9は、被写体シルエット抽出装置1の第3実施形態の構成を示した機能ブロック図であり、シルエット計算部40は背景モデルを用いた背景差分法によりシルエット計算を行う。シルエット抽出パラメータ計算部30は背景差分法によるシルエット計算で用いるパラメータをパラメータ誤差の推定結果に基づいて計算する。 FIG. 9 is a functional block diagram showing the configuration of the third embodiment of the subject silhouette extraction device 1, and the silhouette calculation unit 40 performs silhouette calculation by the background subtraction method using a background model. The silhouette extraction parameter calculation unit 30 calculates the parameters used in the silhouette calculation by the background subtraction method based on the estimation result of the parameter error.

本実施形態では、シルエット抽出パラメータ計算部30に背景差分閾値計算部33および背景モデル更新率計算部34を設け、シルエット抽出パラメータとして前景/背景の判定に用いる閾値T(u, v)および背景モデルの更新率U(u, v)を採用した点に特徴がある。 In the present embodiment, the background subtraction threshold calculation unit 33 and the background model update rate calculation unit 34 are provided in the silhouette extraction parameter calculation unit 30, and the threshold T (u, v) and the background model used for foreground / background determination are provided as silhouette extraction parameters. It is characterized by adopting the update rate U (u, v) of.

背景差分閾値計算部33は、例えば次式(5)に基づいて閾値T(u, v)を計算する。ここで、T_min，T_maxは、それぞれ閾値決定を行う際に最小となる閾値と最大となる閾値である。E_maxはカメラパラメータ誤差E_c(u, v)によるパラメータの変化量をコントロールするための定数である。これらの値は対象とするシーンなどを鑑みて手動で決定される。 The background subtraction threshold calculation unit 33 calculates the threshold T (u, v) based on, for example, the following equation (5). Here, T _min and T _max are the minimum threshold value and the maximum threshold value when determining the threshold value, respectively. E _max is a constant for controlling the amount of parameter change due to the camera parameter error E _c (u, v). These values are manually determined in consideration of the target scene and the like.

背景モデル更新率計算部34は、例えば次式(6)に基づいて更新率U(u, v)を計算する。ここで、U_min，U_maxはそれぞれ更新率を変化させる際の最小更新率と最大更新率であり、これらの値は対象とするシーンなどを鑑みて手動で決定される。 The background model update rate calculation unit 34 calculates the update rate U (u, v) based on, for example, the following equation (6). Here, U _min and U _max are the minimum update rate and the maximum update rate when the update rate is changed, respectively, and these values are manually determined in consideration of the target scene and the like.

本実施形態によれば、上式(5)により、カメラパラメータ誤差Ec(u, v)が大きくなるほど背景差分閾値T(u, v)が小さく設定されるため各画素を前景と判定されやすくできる。その結果、多くの画素が前景と判定されるようになるので第1，第2実施形態の輪郭膨張に近しい効果を奏することができる。 According to the present embodiment, according to the above equation (5), the background difference threshold T (u, v) is set smaller as the camera parameter error Ec (u, v) becomes larger, so that each pixel can be easily determined as the foreground. .. As a result, since many pixels are determined to be the foreground, it is possible to obtain an effect close to the contour expansion of the first and second embodiments.

さらに、上式(6)により、カメラパラメータ誤差Ec(u, v)が大きくなるほど背景モデルの更新率U(u,v)が小さく設定されるため背景モデルを更新されにくくできる。その結果、背景モデルの更新が進んで輪郭が削られていく効果を抑止することができる。 Further, according to the above equation (6), the update rate U (u, v) of the background model is set smaller as the camera parameter error Ec (u, v) becomes larger, so that the background model can be less likely to be updated. As a result, it is possible to suppress the effect that the background model is updated and the contour is cut.

シルエット計算部40において、前景抽出処理部44は動画像から背景モデルを用いた背景差分法により前景を抽出する。背景を単一のガウス分布でモデル化する場合、図10に模式的に示したように、ある画素の背景モデルを構築するためのFフレーム目までのガウス分布の平均がμ_F (u, v)、標準偏差がσ_F (u, v)で与えられるとき、背景差分の計算式は次式(7)となる。 In the silhouette calculation unit 40, the foreground extraction processing unit 44 extracts the foreground from the moving image by the background subtraction method using the background model. When the background is modeled with a single Gaussian distribution, the average Gaussian distribution up to the F frame for constructing a background model for a pixel is μ _F (u, v), as schematically shown in Figure 10. ), When the standard deviation is given by σ _F (u, v), the formula for calculating background subtraction is the following equation (7).

本実施形態では、上記の条件式(7)を満たす画素(u, v)は背景と判断される。I_F (u, v)は取得した動画像の各画素の輝度値、zは標準偏差の何倍までを背景と判断するかを調節するパラメータであり、閾値T(u, v)は上式(5)で計算される。 In the present embodiment, the pixels (u, v) satisfying the above conditional expression (7) are determined to be the background. _IF (u, v) is the brightness value of each pixel of the acquired moving image, z is a parameter that adjusts how many times the standard deviation is judged as the background, and the threshold value T (u, v) is the above equation. Calculated in (5).

なお、背景差分の判定に使う画像の色空間に関してはグレースケールでも良いし、RGBやYUV等の色空間でも実施可能であるが、複数の色チャネルを持つ場合には全てのチャネルを独立に処理し、全ての色で背景になる条件を満たす場合に背景となると判定するものとする。 The color space of the image used for determining background subtraction may be grayscale or may be implemented in a color space such as RGB or YUV, but if there are multiple color channels, all channels are processed independently. However, if all the colors satisfy the conditions for background, it is determined that the background is used.

背景モデルの構築方法は単純ガウス分布を用いた方法に限定されず、非特許文献3が開示する混合ガウス分布を用いて背景モデルを構築する手法や、非特許文献8が開示する各画素位置の過去の画素サンプルを特定数保持し続けることで背景モデルを構築する手法を採用しても良い。 The method for constructing the background model is not limited to the method using the simple Gaussian distribution, and the method for constructing the background model using the mixed Gaussian distribution disclosed in Non-Patent Document 3 and the method for constructing the background model for each pixel position disclosed in Non-Patent Document 8. A method of constructing a background model by keeping a specific number of past pixel samples may be adopted.

画素サンプルを特定数保持する手法では、保持しているサンプル内に入力画素との類似画素が何画素あるかに基づいて前景/背景の判定を行うため、この判定基準となる画素数の閾値を上下させることで、本発明の機構を実現可能である。また更新率という観点でも、特定数保持される画素サンプルを一定の確率で現在フレームの画素で置き換える処理が実施されるため、この確率（＝更新率）を上下することで実現できる。 In the method of holding a specific number of pixel samples, the foreground / background is determined based on how many pixels are similar to the input pixels in the held sample. By moving it up and down, the mechanism of the present invention can be realized. Also, from the viewpoint of the update rate, since the process of replacing the pixel sample held in a specific number with the pixel of the current frame is performed with a certain probability, it can be realized by increasing or decreasing this probability (= update rate).

背景モデル更新部45は、背景モデルのガウス分布平均μF (u, v)および標準偏差σF (u, v)を次式(8)，(9)，(10)により各フレームで更新する。 The background model update unit 45 updates the Gaussian distribution mean μF (u, v) and standard deviation σF (u, v) of the background model at each frame by the following equations (8), (9), and (10).

このように、本実施形態ではフレームごとに背景モデルを徐々に更新することで、日照変化等に応じて少しずつ背景の色が変わる場面等に対して、動的に背景を更新し、精度面で優れたシルエット抽出を実現することができる。 In this way, in this embodiment, by gradually updating the background model for each frame, the background is dynamically updated for a scene where the background color changes little by little according to changes in sunshine, etc., and the accuracy is improved. It is possible to realize excellent silhouette extraction.

なお、本実施形態ではカメラパラメータ誤差Ec(u, v)がカメラごとに画素単位で推定されるものとして説明したが、本発明はこれのみに限定されるものではなく、カメラ単位で推定されても良い。この場合、判定閾値および背景モデルの更新率もカメラ単位で計算される。 In the present embodiment, the camera parameter error Ec (u, v) has been described as being estimated in pixel units for each camera, but the present invention is not limited to this, and is estimated in camera units. Is also good. In this case, the determination threshold value and the update rate of the background model are also calculated for each camera.

また、上記の第3実施形態ではカメラパラメータの推定誤差に基づいて計算するシルエット抽出パラメータが背景差分閾値T(u, v)および背景モデル更新率U(u, v)であるものとして説明した。しかしながら本発明はこれのみに限定されるものではなく、図11に示した第4実施形態のように背景差分閾値T(u, v)のみであっても良いし、あるいは図12に示した第5実施形態のように背景モデル更新率U(u, v)のみであっても良い。 Further, in the third embodiment described above, it has been described that the silhouette extraction parameters calculated based on the estimation error of the camera parameters are the background subtraction threshold T (u, v) and the background model update rate U (u, v). However, the present invention is not limited to this, and may be only the background subtraction threshold T (u, v) as in the fourth embodiment shown in FIG. 11, or the first aspect shown in FIG. 5 As in the embodiment, only the background model update rate U (u, v) may be used.

さらに、第1または第2実施形態と第3ないし第5実施形態とを適宜に組み合わせ、カメラパラメータの推定誤差に基づいてシルエットの膨張量dを計算し、さらにシルエットを背景差分法で抽出する際の背景差分閾値T(u, v)や背景モデル更新率U(u, v)を計算するようにしても良い。 Further, when the first or second embodiment and the third to fifth embodiments are appropriately combined, the expansion amount d of the silhouette is calculated based on the estimation error of the camera parameters, and the silhouette is further extracted by the background subtraction method. The background subtraction threshold T (u, v) and the background model update rate U (u, v) may be calculated.

1…被写体シルエット抽出装置，2…3Dモデル生成部，3…動画像DB，10…動画像取得部，20…カメラパラメータ誤差推定部，21…カメラパラメータ推定部，22…再投影誤差推定部，30…シルエット抽出パラメータ計算部，31…膨張量計算部，32…縮退量計算部，33…背景差分閾値計算部，34…背景モデル更新率計算部，40…シルエット計算部，41…シルエット抽出部，42…シルエット輪郭膨張処理部，43…シルエット輪郭縮退処理部，44…前景抽出処理部，45…背景モデル更新部，100…3Dモデル生成システム 1 ... subject silhouette extractor, 2 ... 3D model generator, 3 ... moving image DB, 10 ... moving image acquisition unit, 20 ... camera parameter error estimation unit, 21 ... camera parameter estimation unit, 22 ... reprojection error estimation unit, 30 ... Silhouette extraction parameter calculation unit, 31 ... Expansion amount calculation unit, 32 ... Decrease amount calculation unit, 33 ... Background difference threshold calculation unit, 34 ... Background model update rate calculation unit, 40 ... Silhouette calculation unit, 41 ... Silhouette extraction unit , 42 ... Silhouette contour expansion processing unit, 43 ... Silhouette contour reduction processing unit, 44 ... Foreground extraction processing unit, 45 ... Background model update unit, 100 ... 3D model generation system

Claims

In the subject silhouette extractor that extracts the silhouette of the subject from the moving image taken by the camera,
A means of estimating camera parameter error based on moving images,
A means to calculate silhouette extraction parameters based on the estimation result of camera parameter error,
A subject silhouette extraction device comprising a means for calculating a subject silhouette based on the silhouette extraction parameter.

The means for estimating the camera parameter error is
A means of estimating camera parameters based on the coordinates of the corresponding pixel pair of the 3D space taken by the camera and its moving image,
A means for calculating the reprojection error of the camera parameter is provided.
The subject silhouette extraction device according to claim 1, wherein the reprojection error represents a camera parameter error.

Further provided with means for optimizing the camera parameters to minimize the reprojection error.
The subject silhouette extraction device according to claim 2, wherein the camera parameter error is represented by the reprojection error of the camera parameter after the optimization.

The subject silhouette extraction device according to any one of claims 1 to 3, wherein the means for estimating the camera parameter error is to estimate the camera parameter error for each camera.

The means for estimating the camera parameter error is to calculate the reprojection error of each pixel pair for each camera.
The subject silhouette extraction device according to claim 4, wherein the camera parameter error is represented by the average value of the reprojection errors of each pixel pair for each camera.

The subject silhouette extraction device according to any one of claims 1 to 3, wherein the means for estimating the camera parameter error is to estimate the camera parameter error for each pixel of each camera.

The means for estimating the camera parameter error is to calculate the reprojection error of each pixel pair for each camera.
The subject silhouette extraction device according to claim 6, wherein the camera parameter error is represented by the reprojection error of the pixel pair having the closest distance for each pixel of each camera.

The means for estimating the camera parameter error is to calculate the reprojection error of each pixel pair for each camera.
The subject silhouette extraction device according to claim 6, wherein the camera parameter error is represented by a function of the reprojection error of each pixel pair and the distance to the pixel pair for each pixel of each camera.

The means for calculating the silhouette extraction parameter includes means for calculating the expansion amount of the subject silhouette based on the camera parameter error.
The means for calculating the subject silhouette includes contour expansion processing means for expanding the contour of the subject silhouette according to the expansion amount.
The subject silhouette extraction device according to any one of claims 1 to 8, wherein the means for calculating the expansion amount is calculated so that the expansion amount increases as the camera parameter error increases.

The means for calculating the subject silhouette includes a contour shrinking processing means for shrinking the contour of the subject silhouette by a predetermined amount.
The subject silhouette extraction device according to claim 9, wherein each process of degeneration and expansion is repeated at least once in the order with respect to the contour of the subject silhouette.

The means for calculating the subject silhouette is to calculate the subject silhouette by the background subtraction method.
The means for calculating the silhouette extraction parameter is provided with a means for calculating a threshold value for discriminating between the foreground area and the background area.
The subject silhouette extraction device according to any one of claims 1 to 10, wherein a threshold value is calculated so that the larger the camera parameter error is, the easier it is for each pixel to be identified in the foreground region.

The means for calculating the subject silhouette is to calculate the subject silhouette by the background subtraction method.
The means for calculating the silhouette extraction parameter is provided with a means for calculating the update rate of the background model.
The subject silhouette extraction device according to any one of claims 1 to 10, wherein the larger the camera parameter error is, the lower the update rate is calculated.

The means for calculating the subject silhouette is to calculate the subject silhouette by the background subtraction method.
The means for calculating the silhouette extraction parameters is
A means of calculating a threshold for distinguishing between the foreground area and the background area,
Equipped with a means to calculate the update rate of the background model,
The subject silhouette according to any one of claims 1 to 10, wherein a threshold value is calculated so that the larger the camera parameter error is, the easier it is for each pixel to be identified in the foreground region, and the update rate is calculated to a lower value. Extractor.

In the subject silhouette extraction method, in which the computer extracts the silhouette of the subject from the moving image taken by the camera.
Estimate camera parameter error based on moving image
Calculate the silhouette extraction parameters based on the estimation result of the camera parameter error,
A method for extracting a subject silhouette, which comprises calculating a subject silhouette based on the silhouette extraction parameter.

In the subject silhouette extraction program that extracts the silhouette of the subject from the moving image taken by the camera
The procedure for estimating the camera parameter error based on the moving image, and
The procedure to calculate the silhouette extraction parameters based on the estimation result of the camera parameter error, and
A subject silhouette extraction program that causes a computer to execute a procedure for calculating a subject silhouette based on the silhouette extraction parameters.