JP5295044B2

JP5295044B2 - Method and program for extracting mask image and method and program for constructing voxel data

Info

Publication number: JP5295044B2
Application number: JP2009196502A
Authority: JP
Inventors: 浩嗣三功; 整内藤; 茂之酒澤
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2009-08-27
Filing date: 2009-08-27
Publication date: 2013-09-18
Anticipated expiration: 2029-08-27
Also published as: JP2011048627A

Description

本発明は、マスク画像を抽出する方法及びプログラム、並びにボクセルデータを構築する方法及びプログラムに関する。より詳細には、被写体を撮影した画像と背景のみを撮影した画像からマスク画像を抽出するマッティング方法及びプログラム、並びに上記で抽出された複数枚のマスク画像から視体積交差法を適用しボクセルデータを構築するモデリングの方法及びプログラムに関する。 The present invention relates to a method and program for extracting a mask image, and a method and program for constructing voxel data. More specifically, a matting method and program for extracting a mask image from an image of a subject and an image of only a background, and voxel data by applying a visual volume intersection method from the plurality of mask images extracted above. The present invention relates to a modeling method and program.

従来、被写体を撮影した画像から被写体の存在を表すマスク画像を抽出するマッティング（ｍａｔｔｉｎｇ）と複数枚のマスク画像に視体積交差法を適用し３次元ボクセルデータを構築するモデリング（ｍｏｄｅｌｉｎｇ）は、別々に行われていた。このため、高精度なボクセルデータを構築するためには、まず高精度なマスク画像を抽出しなければならず、ブルーバックなどの特別な環境が必要であった。特許文献１及び非特許文献１では、背景差分に工夫をこらし、ボクセルデータの色情報を用いてマスク画像の欠損を埋めるマスク画像精度向上の方法を開示している。 Conventionally, matting for extracting a mask image representing the presence of a subject from an image of the subject and modeling for constructing three-dimensional voxel data by applying a visual volume intersection method to a plurality of mask images, It was done separately. For this reason, in order to construct high-precision voxel data, a high-precision mask image must first be extracted, and a special environment such as a blue back is necessary. Patent Document 1 and Non-Patent Document 1 disclose a method for improving the mask image accuracy in which the background difference is devised and the defect of the mask image is filled using the color information of the voxel data.

しかしながら、この方法では、精度の高いボクセルデータを構築するためには、初めに十分精度の高いマスク画像が必要であった。そのため、手作業や、ブルーバックなどの特別な撮影環境を用いて、複雑な計算処理を行い、精度の高いマスク画像を抽出しなければならない。 However, in this method, in order to construct highly accurate voxel data, a mask image with sufficiently high accuracy is required first. For this reason, it is necessary to perform a complicated calculation process using a special photographing environment such as manual work or a blue background and extract a mask image with high accuracy.

そのため、非特許文献２では、
（１）被写体と背景を撮影した複数の被写体画像と背景のみを撮影した複数の背景画像とから、単純な背景差分を適用することで初期のマスク画像を取得した上で、視体積交差法を適用することで初期のボクセルデータを構築する。
（２）上記で得られたボクセルデータからスライス画像を抽出し、各スライス画像にメディアンフィルタをかけることにより、ボクセルデータの穴を埋めていき、この穴埋めに用いられた各ボクセルをマスク画像に投影し、当該ピクセルを白色ピクセルにすることで各マスク画像の穴を埋める。
（３）上記の穴埋め処理されたボクセルデータ全体を各撮影視点に投影した画像と、上記のマスク画像の双方が白色であるピクセルのみを、各撮影視点における新たなマスク画像の白色ピクセルとすることで、上記穴埋めされた各マスク画像の不要部を除去する。 Therefore, in Non-Patent Document 2,
(1) An initial mask image is obtained by applying a simple background difference from a plurality of subject images obtained by photographing the subject and the background and a plurality of background images obtained by photographing only the background, and then the visual volume intersection method is performed. Apply to build initial voxel data.
(2) Extracting slice images from the voxel data obtained above, applying median filters to each slice image, filling the holes in the voxel data, and projecting each voxel used for the hole filling onto the mask image And the hole of each mask image is filled by making the said pixel into a white pixel.
(3) Only a pixel in which both the above-mentioned voxel data subjected to the hole filling process are projected on each photographing viewpoint and the above-described mask image is white is set as a white pixel of a new mask image at each photographing viewpoint. Then, unnecessary portions of each mask image filled in the hole are removed.

（３）で得られた各マスク画像を（１）の初期のマスク画像として（１）〜（３）の処理を繰り返すことで、精度の高いマスク画像を抽出し、これにより精度の高いボクセルデータを構築した。 The mask image obtained in (3) is used as the initial mask image in (1), and the processing of (1) to (3) is repeated to extract a high-accuracy mask image, and thereby high-precision voxel data. Built.

しかしながら、上記非特許文献２の方法は、（２）においてブロックサイズの大きいフィルタ処理を適用し、（１）から（３）を多数繰り返し適用しなければ不要部が除去できなかった。そのため、被写体シルエットのエッジが鈍ってしまうとともに、多くの処理時間を要するという課題があった。また、（３）における不要部の領域面積が大きい場合は、（１）から（３）を多数繰り返しても不要部が除去できないという課題があった。 However, in the method of Non-Patent Document 2, an unnecessary part cannot be removed unless a filter process having a large block size is applied in (2) and (1) to (3) are repeatedly applied. As a result, the subject silhouette edge becomes dull, and a lot of processing time is required. In addition, when the area of the unnecessary portion in (3) is large, there is a problem that the unnecessary portion cannot be removed even if (1) to (3) are repeated many times.

そのため、特願２００９−１８９８７７では、上記方法におけるフィルタ処理に依存せず、少ない繰り返し回数で領域面積の大きい不要部を逐次的に除去することが可能な、閉領域分割を用いた不要部除去処理を備えている、精度の高いマスク画像を抽出し、さらに精度の高いボクセルデータを構築する手法を提案した。 Therefore, in Japanese Patent Application No. 2009-189877, unnecessary portion removal processing using closed region division that can sequentially remove unnecessary portions having a large area area with a small number of iterations without depending on the filter processing in the above method. We proposed a method for extracting highly accurate mask images and constructing more accurate voxel data.

特開２００７−１７３６４号公報JP 2007-17364 A

豊浦正広他「ランダムパターン背景を用いた視体積交差法のためのシルエット修復手法」２００５年電子情報通信学会総合大会Ｄ−１２−１３３Masahiro Toyoura et al. “Silhouette Restoration Technique for Visual Volume Intersection Using Random Pattern Background” 2005 IEICE General Conference D-12-133 三功浩嗣他「被写体３次元モデルの各撮影視点へのフィードバック処理に基づく背景分離方式」２００９年電子情報通信学会総合大会Ｄ−１１−８５Hirokazu Mitsugu et al. “Background Separation Method Based on Feedback Processing to Each Shooting Viewpoint of Subject 3D Model” 2009 IEICE General Conference D-11-85

しかしながら、上記特願２００９−１８９８７７の方法を適用してもなお、床面付近およびシルエット輪郭上に不要部が残ってしまうことがあり、結果的にマスク画像及びボクセルデータの精度が不十分になってしまうという課題があった。 However, even if the method of the above-mentioned Japanese Patent Application No. 2009-189877 is applied, an unnecessary part may remain in the vicinity of the floor surface and on the silhouette outline, resulting in insufficient accuracy of the mask image and voxel data. There was a problem that it would end up.

従って、本発明は、床面付近およびシルエット輪郭上の不要部を逐次的に除去することが可能な不要部除去処理を備えている、精度の高いマスク画像を抽出する方法及びプログラム、並びに上記マスク画像からボクセルデータを構築する方法及びプログラムを提供することを目的とする。 Accordingly, the present invention provides a method and program for extracting a mask image with high accuracy, including an unnecessary portion removal process capable of sequentially removing unnecessary portions in the vicinity of the floor surface and on the silhouette outline, and the mask. It is an object of the present invention to provide a method and program for building voxel data from an image.

上記目的を実現するため本発明による複数のマスク画像を抽出する方法は、被写体と背景を撮影した複数の被写体画像と背景のみを撮影した複数の背景画像とから、被写体の存在を表す複数のマスク画像を抽出する方法であって、前記複数の被写体画像と前記複数の背景画像とから背景差分により、複数の第１のマスク画像を抽出する第１の抽出ステップと、前記複数の第１のマスク画像から視体積交差法により、第１の３次元ボクセルデータを構築する第１の構築ステップと、前記第１の３次元ボクセルデータに対して、欠損を充填する及び／又はノイズを除去する加工を施し、第２の３次元ボクセルデータを構築する第２の構築ステップと、前記第２の３次元ボクセルデータを基に、前記複数の第１のマスク画像の欠損を充填する及び／又はノイズを除去する、被写体画像及び／又は背景画像の色情報をもとにした加工を施し、複数の第２のマスク画像を抽出する第２の抽出ステップとを含む。 In order to achieve the above object, a method for extracting a plurality of mask images according to the present invention includes a plurality of masks representing the presence of a subject from a plurality of subject images obtained by photographing a subject and a background and a plurality of background images obtained by photographing only the background. A method for extracting an image, wherein a first extraction step of extracting a plurality of first mask images from the plurality of subject images and the plurality of background images by a background difference, and the plurality of first masks A first construction step of constructing first three-dimensional voxel data from an image by a visual volume intersection method, and processing for filling a defect and / or removing noise in the first three-dimensional voxel data And a second construction step for constructing the second three-dimensional voxel data, filling in the defects of the plurality of first mask images based on the second three-dimensional voxel data, and / or It removes noise, giving the process that is based on the color information of the object image and / or background image, and a second extraction step of extracting a plurality of second mask image.

また、前記第２の抽出ステップは、前記第２の３次元ボクセルデータにおいて、欠損が充填された３次元座標を各撮影視点に投影し、前記複数の第１のマスク画像における対応画素が欠損を生じている場合は充填して複数の第１のサブマスク画像を抽出し、前記第２の３次元ボクセルデータを各撮影視点に投影し、複数の第２のサブマスク画像を抽出し、該複数の第２のサブマスク画像にフィルタ処理を施し、フィルタ処理された複数の第２のサブマスク画像と前記複数の第１のサブマスク画像の両画像において充填されている画素のみを被写体存在を表す領域とし、それ以外の画素を被写体が存在しない領域とすることで、複数の第３のサブマスク画像を抽出し、該複数の第３のサブマスク画像を閉領域に分割し、所定の条件を満たす閉領域を除去することで、複数の第４のサブマスク画像を抽出し、該複数の第４のサブマスク画像について、被写体画像及び／又は背景画像の色情報をもとにした加工を施し、複数の第２のマスク画像を抽出するステップであることも好ましい。 In the second extraction step, in the second three-dimensional voxel data, the three-dimensional coordinates filled with the defect are projected onto each photographing viewpoint, and the corresponding pixels in the plurality of first mask images are devoid of defects. If it occurs, it is filled to extract a plurality of first submask images, the second three-dimensional voxel data is projected onto each photographing viewpoint, a plurality of second submask images are extracted, and the plurality of second submask images are extracted. Filtering is performed on the two sub-mask images, and only the pixels filled in both the filtered second sub-mask images and the plurality of first sub-mask images are set as the regions representing the subject, and the others A plurality of third sub-mask images are extracted by dividing the plurality of third sub-mask images into closed regions, and the pixels satisfying a predetermined condition Are removed, a plurality of fourth submask images are extracted, the plurality of fourth submask images are processed based on the color information of the subject image and / or the background image, and a plurality of second submask images are extracted. It is also preferable to be a step of extracting the mask image.

また、前記色情報をもとにした加工は、前記第２の３次元ボクセルデータの床面から一定の高さ以下であるボクセルを撮影視点に投影することで前記第４のサブマスク画像中の第１の不要部除去候補を決定するサブステップと、前記第１の不要部除去候補に含まれる各画素について、画素値を色情報に基づく特徴空間に射影し、被写体画像の画素と背景画像の画素との間で差異を計算し、差異が閾値以下の画素を第２の不要部除去候補とするサブステップと、前記第２の不要部除去候補に含まれる各画素について光線を探索し、前記第３の３次元ボクセルデータとの交点が床面から一定の高さ以下である画素を不要画素とするサブステップと、前記不要画素を前記第４のサブマスク画像から除去するサブステップとを含むことも好ましい。 Further, the processing based on the color information is performed by projecting a voxel having a certain height or less from the floor surface of the second three-dimensional voxel data onto a photographing viewpoint, thereby performing the processing in the fourth sub-mask image. A sub-step of determining one unnecessary portion removal candidate, and for each pixel included in the first unnecessary portion removal candidate, a pixel value is projected onto a feature space based on color information, and a subject image pixel and a background image pixel A sub-step in which a pixel having a difference equal to or less than a threshold is set as a second unnecessary portion removal candidate, a light ray is searched for each pixel included in the second unnecessary portion removal candidate, and the first And a sub-step in which a pixel whose intersection with the three-dimensional voxel data is less than a certain height from the floor is used as an unnecessary pixel, and a sub-step in which the unnecessary pixel is removed from the fourth sub-mask image. preferable.

また、前記色情報をもとにした加工は、前記第４のサブマスク画像から輪郭を抽出し、第１の不要部除去候補を決定するサブステップと、前記第１の不要部除去候補に含まれる各画素について、画素値を色情報に基づく特徴空間に射影し、被写体画像の画素と背景画像の画素との間で差異を計算し、差異が閾値以下の画素を第２の不要部除去候補とするサブステップと、前記第２の不要部除去候補に含まれる各画素について、該画素の両隣の視点における前記第２の不要部除去候補の対応画素を決定するサブステップと、前記第２の不要部除去候補の対応画素について、画素値を前記色情報に基づく特徴空間に射影し、被写体画像の画素と背景画像の画素との間で差異を計算し、差異が閾値以下の画素を不要画素とするサブステップと、前記不要画素を前記第４のサブマスク画像から除去するサブステップとを含むことも好ましい。 The processing based on the color information is included in a sub-step of extracting a contour from the fourth sub-mask image and determining a first unnecessary portion removal candidate, and the first unnecessary portion removal candidate. For each pixel, the pixel value is projected onto the feature space based on the color information, the difference is calculated between the pixel of the subject image and the pixel of the background image, and the pixel whose difference is equal to or less than the threshold is determined as the second unnecessary portion removal candidate. A sub-step of determining, for each pixel included in the second unnecessary portion removal candidate, a corresponding step of the second unnecessary portion removal candidate at a viewpoint adjacent to the pixel, and the second unnecessary portion For the corresponding pixel of the part removal candidate, the pixel value is projected onto the feature space based on the color information, and a difference is calculated between the pixel of the subject image and the pixel of the background image. Sub-steps, and It is also preferred to include a sub-step of removing the main pixel from the fourth sub-mask image.

また、複数枚の被写体画像を予め色情報に基づき領域分割し、各撮影視点のマスク画像中における前記不要画素が属する領域を除去するサブステップをさらに含むことも好ましい。 It is also preferable to further include a sub-step of previously dividing a plurality of subject images based on color information and removing a region to which the unnecessary pixel belongs in the mask image at each photographing viewpoint.

また、前記色情報に基づく特徴空間は、ＲＧＢ空間またはＨＳＶ空間であることも好ましい。 The feature space based on the color information is preferably an RGB space or an HSV space.

また、前記不要画素とするサブステップは、前記第２の不要部除去候補の対応画素について、被写体画像及び／又は背景画像の色補正を行い、画素値を前記色情報に基づく特徴空間に射影し、被写体画像の画素と背景画像の画素との間で差異を計算し、差異が閾値以下の画素を不要画素とし、該閾値は前記色補正に基づいて変化することも好ましい。 In addition, the sub-step as the unnecessary pixel performs color correction of the subject image and / or the background image for the corresponding pixel of the second unnecessary portion removal candidate, and projects the pixel value to the feature space based on the color information. It is also preferable that the difference between the pixel of the subject image and the pixel of the background image is calculated, and a pixel whose difference is equal to or smaller than a threshold is set as an unnecessary pixel, and the threshold is changed based on the color correction.

また、前記複数の第２のマスク画像を、前記第１の構築ステップにおける複数の第１のマスク画像とすることで、前記第１の構築ステップから前記第２の抽出ステップまでを所定の回数繰り返すことも好ましい。 Further, the plurality of second mask images are used as the plurality of first mask images in the first construction step, so that the steps from the first construction step to the second extraction step are repeated a predetermined number of times. It is also preferable.

また、前記第２の構築ステップは、前記第１の３次元ボクセルデータの複数の第１のスライス画像を、ｘ軸、ｙ軸及びｚ軸方向から獲得するサブステップと、前記複数の第１のスライス画像にフィルタ処理を施し、該フィルタ処理の結果に基づき第２の３次元ボクセルデータを構築するサブステップとを含むことも好ましい。 The second construction step includes a sub-step of acquiring a plurality of first slice images of the first three-dimensional voxel data from the x-axis, y-axis, and z-axis directions, and the plurality of first It is also preferable to include a sub-step of performing a filtering process on the slice image and constructing second three-dimensional voxel data based on the result of the filtering process.

また、前記第２の３次元ボクセルデータを構築するサブステップは、前記複数の第１のスライス画像にフィルタ処理を施し、該フィルタ処理により白色になった画素を求め、該画素に対応する前記第１の３次元ボクセルデータの３次元座標を埋めることで、第２の３次元ボクセルデータを構築するステップであることも好ましい。 Further, the sub-step of constructing the second three-dimensional voxel data performs a filter process on the plurality of first slice images, obtains a pixel that has become white by the filter process, and performs the process corresponding to the pixel. It is also preferable to construct the second three-dimensional voxel data by filling the three-dimensional coordinates of the one three-dimensional voxel data.

また、前記第２の抽出ステップ後に、前記複数の第５のマスク画像にフィルタ処理を施すステップをさらに含むことも好ましい。 Further, it is preferable that the method further includes a step of performing a filtering process on the plurality of fifth mask images after the second extracting step.

上記目的を実現するため本発明による複数のマスク画像を抽出するプログラムは、被写体と背景を撮影した複数の被写体画像と背景のみを撮影した複数の背景画像とから、被写体の存在を表す複数のマスク画像を抽出するためのコンピュータを、前記複数の被写体画像と前記複数の背景画像とから背景差分により、複数の第１のマスク画像を抽出する第１の抽出手段と、前記複数の第１のマスク画像から視体積交差法により、第１の３次元ボクセルデータを構築する第１の構築手段と、前記第１の３次元ボクセルデータに対して、欠損を充填する及び／又はノイズを除去する加工を施し、第２の３次元ボクセルデータを構築する第２の構築手段と、前記第２の３次元ボクセルデータを基に、前記複数の第１のマスク画像の欠損を充填する及び／又はノイズを除去する、被写体画像及び／又は背景画像の色情報をもとにした加工を施し、複数の第２のマスク画像を抽出する第２の抽出手段として機能させ、複数のマスク画像を抽出する。 In order to achieve the above object, a program for extracting a plurality of mask images according to the present invention includes a plurality of masks representing the presence of a subject from a plurality of subject images obtained by photographing a subject and a background and a plurality of background images obtained by photographing only the background. A computer for extracting an image; a first extracting means for extracting a plurality of first mask images from the plurality of subject images and the plurality of background images by a background difference; and the plurality of first masks. First construction means for constructing first three-dimensional voxel data from the image by a visual volume intersection method, and processing for filling the first three-dimensional voxel data and / or removing noise from the first three-dimensional voxel data A second construction means for constructing the second three-dimensional voxel data, and filling the defects of the plurality of first mask images based on the second three-dimensional voxel data; and Alternatively, it removes noise and performs processing based on the color information of the subject image and / or the background image, and functions as a second extraction means for extracting a plurality of second mask images to extract a plurality of mask images. To do.

上記目的を実現するため本発明による３次元ボクセルデータを構築する方法は、被写体と背景を撮影した複数の被写体画像と背景のみを撮影した複数の背景画像とから、３次元ボクセルデータを構築する方法であって、前記複数の被写体画像と前記複数の背景画像とから背景差分により、複数の第１のマスク画像を抽出する第１の抽出ステップと、前記複数の第１のマスク画像から視体積交差法により、第１の３次元ボクセルデータを構築する第１の構築ステップと、前記第１の３次元ボクセルデータに対して、欠損を充填する及び／又はノイズを除去する加工を施し、第２の３次元ボクセルデータを構築する第２の構築ステップと、前記第２の３次元ボクセルデータを基に、前記複数の第１のマスク画像の欠損を充填する及び／又はノイズを除去する、被写体画像及び／又は背景画像の色情報をもとにした加工を施し、複数の第２のマスク画像を抽出する第２の抽出ステップと、前記複数の第２のマスク画像から視体積交差法により、第３の３次元ボクセルデータを構築する第３の構築ステップとを含む。 In order to achieve the above object, a method for constructing three-dimensional voxel data according to the present invention is a method for constructing three-dimensional voxel data from a plurality of subject images obtained by photographing a subject and a background and a plurality of background images obtained by photographing only the background. A first extraction step of extracting a plurality of first mask images from the plurality of subject images and the plurality of background images by a background difference; and a view volume intersection from the plurality of first mask images. A first construction step for constructing first three-dimensional voxel data by the method, and processing for filling defects and / or removing noise on the first three-dimensional voxel data; A second construction step of constructing three-dimensional voxel data, and filling the defects of the plurality of first mask images based on the second three-dimensional voxel data and / or noise A second extraction step of extracting a plurality of second mask images by performing processing based on the color information of the subject image and / or the background image to be removed; and a view volume from the plurality of second mask images And a third construction step of constructing third three-dimensional voxel data by an intersection method.

上記目的を実現するため本発明による３次元ボクセルデータを構築することを特徴とするプログラムは、被写体と背景を撮影した複数の被写体画像と背景のみを撮影した複数の背景画像とから、３次元ボクセルデータを構築するためのコンピュータを、前記複数の被写体画像と前記複数の背景画像とから背景差分により、複数の第１のマスク画像を抽出する第１の抽出手段と、前記複数の第１のマスク画像から視体積交差法により、第１の３次元ボクセルデータを構築する第１の構築手段と、前記第１の３次元ボクセルデータに対して、欠損を充填する及び／又はノイズを除去する加工を施し、第２の３次元ボクセルデータを構築する第２の構築手段と、前記第２の３次元ボクセルデータを基に、前記複数の第１のマスク画像の欠損を充填する及び／又はノイズを除去する、被写体画像及び／又は背景画像の色情報をもとにした加工を施し、複数の第２のマスク画像を抽出する第２の抽出手段と、前記複数の第２のマスク画像から視体積交差法により、第３の３次元ボクセルデータを構築する第３の構築手段として機能させ、３次元ボクセルデータを構築する。 In order to achieve the above object, a program for constructing three-dimensional voxel data according to the present invention includes a three-dimensional voxel from a plurality of subject images obtained by photographing a subject and a background and a plurality of background images obtained by photographing only the background. A computer for constructing data; a first extracting unit that extracts a plurality of first mask images from the plurality of subject images and the plurality of background images by a background difference; and the plurality of first masks First construction means for constructing first three-dimensional voxel data from the image by a visual volume intersection method, and processing for filling the first three-dimensional voxel data and / or removing noise from the first three-dimensional voxel data And a second construction means for constructing the second three-dimensional voxel data, and filling the defects of the plurality of first mask images based on the second three-dimensional voxel data. And / or a second extraction unit that performs processing based on the color information of the subject image and / or the background image to remove noise, and extracts a plurality of second mask images, and the plurality of second The three-dimensional voxel data is constructed by functioning as third construction means for constructing the third three-dimensional voxel data from the mask image by the visual volume intersection method.

本発明のマスク画像を抽出する方法及びプログラムは、ボクセルデータの情報をマスク画像に反映させ、各マスク画像の穴を埋め、各マスク画像の不要部を除去することを行い、マスク画像とボクセルデータの精度向上を補間的に行う。特に本発明では不要部除去処理を繰り返し行うことにより不要部が逐次的に除去される。本発明の閉領域分割に基づいた不要部除去処理によれば、少ない繰り返し回数で不要部を除去できるため、被写体シルエットのエッジが鈍る度合を抑えることができ、処理時間の短縮が可能となり、面積の大きい不要部も除去できる。さらに、本発明の色情報に基づく加工をマスク画像に施す処理によれば、床面付近の影および輪郭上の不要部を除去することができ、閉領域分割に基づいた不要部除去処理では除去できなかった。不要部を除去することができる。 The method and program for extracting a mask image according to the present invention reflects voxel data information in a mask image, fills holes in each mask image, removes unnecessary portions of each mask image, and performs mask image and voxel data. To improve the accuracy of interpolation. In particular, in the present invention, unnecessary portions are sequentially removed by repeatedly performing unnecessary portion removal processing. According to the unnecessary portion removal processing based on the closed region division of the present invention, since unnecessary portions can be removed with a small number of repetitions, it is possible to suppress the degree to which the edge of the subject silhouette becomes dull, and the processing time can be shortened. Unnecessary large portions can be removed. Furthermore, according to the process of applying the processing based on the color information of the present invention to the mask image, the shadow near the floor and the unnecessary part on the contour can be removed, and the unnecessary part removal process based on the closed region division removes it. could not. Unnecessary parts can be removed.

このように、本発明によれば、初めに高精度なマスク画像を必要とせず、さらに複雑な計算処理を行わずに高精度のマスク画像を抽出でき、この高精度のマスク画像から高精度のボクセルデータを構築できる。さらに、本発明は、特別な撮影環境に依らず、一般的な映像に対して適用可能である。 Thus, according to the present invention, a high-accuracy mask image can be extracted without first requiring a high-accuracy mask image and without performing a complicated calculation process, and a high-accuracy mask image can be extracted from the high-accuracy mask image. Voxel data can be constructed. Furthermore, the present invention can be applied to general images regardless of a special shooting environment.

本発明によるマスク画像の抽出とボクセルデータの構築方法を示すフローチャートである。5 is a flowchart illustrating a mask image extraction and voxel data construction method according to the present invention. 閉領域を説明するための図である。It is a figure for demonstrating a closed area | region. 床面付近の不要部を決定する処理を示すフローチャートである。It is a flowchart which shows the process which determines the unnecessary part near a floor surface. 輪郭上の不要部を決定する処理を示すフローチャートである。It is a flowchart which shows the process which determines the unnecessary part on an outline. ステップ３からステップ１０までの処理を複数回適用し、不要部を除去したマスク画像を示す。The mask image which applied the process from step 3 to step 10 several times and removed the unnecessary part is shown. ステップ２１におけるｙ軸方向のボクセルのスライス画像の例を示す。The example of the slice image of the voxel of the y-axis direction in step 21 is shown. 図６と同じ撮影視点のスライス画像（０≦ｙ≦５０に存在する全てのスライス画像）を各撮影視点に投影した結果得られるマスク画像を示す。8 shows mask images obtained as a result of projecting slice images at the same photographing viewpoint as in FIG. 6 (all slice images existing in 0 ≦ y ≦ 50) onto the respective photographing viewpoints. ステップ２２により不要部として抽出された画素を示す。The pixel extracted as an unnecessary part by step 22 is shown. ステップ２３により不要部として抽出された画素を示す。The pixel extracted as an unnecessary part by step 23 is shown. ステップ２１からステップ２３の処理を適用の結果得られるマスク画像を示す。The mask image obtained as a result of applying the process of step 21 to step 23 is shown. ステップ３からステップ１０までの処理を複数回適用し、不要部を除去したマスク画像を示す。The mask image which applied the process from step 3 to step 10 several times and removed the unnecessary part is shown. ステップ３１により抽出された輪郭を示す。The contour extracted by step 31 is shown. ステップ３２により不要部として抽出された画素を示す。The pixel extracted as an unnecessary part by step 32 is shown. ステップ３３により不要部として抽出された画素を示す。The pixel extracted as an unnecessary part by step 33 is shown. ＲＧＢ値に基づく領域分割により不要部として除去される領域を白色で示す。A region to be removed as an unnecessary portion by region division based on RGB values is shown in white. ステップ３１からステップ３３の処理にＲＧＢ値に基づく領域分割処理を加えた不要部除去の結果得られるマスク画像を示す。The mask image obtained as a result of the unnecessary part removal which added the area | region division process based on RGB value to the process of step 31 to step 33 is shown. ステップ３１からステップ３３の処理にＲＧＢ値に基づく領域分割処理を加えた不要部除去を３回適用した結果を示す。The result of applying the unnecessary part removal which added the area | region division process based on RGB value to the process of step 31 to step 33 is shown 3 times.

本発明を実施するための最良の実施形態について、以下では図面を用いて詳細に説明する。図１は、本発明によるマスク画像の抽出とボクセルデータの構築方法を示すフローチャートである。以下、本フローチャートに基づいて説明する。 The best mode for carrying out the present invention will be described in detail below with reference to the drawings. FIG. 1 is a flowchart showing a mask image extraction and voxel data construction method according to the present invention. Hereinafter, description will be given based on this flowchart.

ステップ１：円周配置の複数枚の被写体画像と背景画像を取得する。キャリブレーション済みのカメラを複数台円周上に配置して、該複数台のカメラで被写体と背景を含む被写体画像と背景のみを含む背景画像とを撮影し、複数の異なった方向から撮影した被写体画像と背景画像を取得する。例えば、カメラが３０台配置された場合、被写体画像画像及び背景画像はそれぞれ３０枚取得される。 Step 1: Obtain a plurality of circumferentially arranged subject images and background images. A plurality of calibrated cameras are arranged on the circumference, a subject image including the subject and the background and a background image including only the background are photographed by the plurality of cameras, and the subject is photographed from a plurality of different directions. Get image and background image. For example, when 30 cameras are arranged, 30 subject image images and 30 background images are acquired.

ステップ２：上記の被写体画像画像と、背景画像とから、背景差分を行うことにより、複数枚のマスク画像を抽出する。本マスク画像は、従来技術の単純な背景差分により抽出されるため、精度は高くない。マスク画像はカメラの台数分抽出される。例えば、カメラが３０台配置された場合、３０枚のマスク画像が抽出される。 Step 2: A plurality of mask images are extracted by performing background difference from the subject image and the background image. Since this mask image is extracted by a simple background difference of the prior art, the accuracy is not high. Mask images are extracted for the number of cameras. For example, when 30 cameras are arranged, 30 mask images are extracted.

ステップ３：複数枚のマスク画像に、視体積交差法を適用することにより、３次元ボクセルデータを構築する。ボクセルデータの精度は、マスク画像の精度に依存するため、ステップ２で抽出されたマスク画像を用いる場合、構築されたボクセルデータの精度は高くない。 Step 3: Three-dimensional voxel data is constructed by applying a view volume intersection method to a plurality of mask images. Since the accuracy of the voxel data depends on the accuracy of the mask image, the accuracy of the constructed voxel data is not high when the mask image extracted in step 2 is used.

ステップ４：上記で獲得されたボクセルデータをスライス画像として獲得する。３次元ボクセルデータをある方向からのスライス画像の集まりと考えて、ボクセルデータをスライス画像としてｘ軸、ｙ軸及びｚ軸方向から獲得する。スライス画像は各軸とも座標範囲の枚数分を獲得する。例えば、ボクセルデータのｙ軸座標範囲が０〜２５５であった場合は、２５６枚のスライス画像を獲得する。なお、ｙ軸は鉛直方向であり、ｘ軸及びｚ軸は、それぞれ水平方向である。 Step 4: The voxel data acquired above is acquired as a slice image. Considering three-dimensional voxel data as a collection of slice images from a certain direction, voxel data is acquired as slice images from the x-axis, y-axis, and z-axis directions. Slice images are acquired for the number of sheets in the coordinate range for each axis. For example, when the y-axis coordinate range of the voxel data is 0 to 255, 256 slice images are acquired. Note that the y-axis is a vertical direction, and the x-axis and the z-axis are horizontal directions.

ステップ５：穴を埋めたボクセルデータを構築する。ステップ４のスライス画像は精度が高くないボクセルデータから獲得されている場合があるため、黒色である箇所が白色となり穴が空いている欠損や、逆に白色の部分に黒色が表れるノイズが含まれていることもある。そのため、各方向（ｘ軸、ｙ軸、ｚ軸）から獲得されたスライス画像に対してフィルタ処理を施す。例えば、ガウスフィルタを適用し欠損を充填することで穴の部分を埋め、メディアンフィルタを適用し不要なノイズを除去する。このようにして、フィルタ済スライス画像を得る。次に、フィルタ済スライス画像とフィルタされる前のスライス画像とを比較し、新たに白色となった画素（つまり、フィルタ処理により穴が埋められた画素）を求め、本画素に対応するボクセルデータの３次元座標を埋める。例えば、ｘ軸の座標ｘ１で獲得されたスライス画像において、白色となった画素が、ｙ座標ｙ１、ｚ座標ｚ１であった場合、ボクセルデータの３次元座標（ｘ１、ｙ１、ｚ１）を埋める。以上の処理を全スライス画像に行い、穴を埋めたボクセルデータを獲得する。 Step 5: Construct voxel data with filled holes. Since the slice image in step 4 may be obtained from voxel data that is not highly accurate, it contains a black part that is white and has a hole, or a white part that appears black. Sometimes. Therefore, filter processing is performed on the slice image acquired from each direction (x axis, y axis, z axis). For example, a Gaussian filter is applied to fill in the defects by filling the holes, and a median filter is applied to remove unnecessary noise. In this way, a filtered slice image is obtained. Next, the filtered slice image is compared with the slice image before being filtered to obtain a new white pixel (that is, a pixel in which a hole is filled by filtering), and voxel data corresponding to this pixel Fill the three-dimensional coordinates. For example, in the slice image acquired at the x-axis coordinate x1, if the pixel that is white is the y-coordinate y1 and the z-coordinate z1, the three-dimensional coordinates (x1, y1, z1) of the voxel data are filled. The above processing is performed on all slice images, and voxel data with filled holes is acquired.

ステップ６：複数枚の穴を埋めたマスク画像を抽出する。ステップ５での３次元座標を各撮影視点に投影し、各マスク画像における対応画素を白色にする。つまり、スライス画像の３次元座標を撮影視点に投影して、各マスク画像を撮影した位置から見た画像を作成し、該画像においてステップ５の３次元座標に対応する画素を白色にする。これにより、穴が埋められたマスク画像が抽出される。 Step 6: Extract a mask image in which a plurality of holes are filled. The three-dimensional coordinates in step 5 are projected onto each photographing viewpoint, and the corresponding pixel in each mask image is white. In other words, the three-dimensional coordinates of the slice image are projected onto the photographing viewpoint to create an image viewed from the position where each mask image is photographed, and the pixels corresponding to the three-dimensional coordinates in Step 5 are white in the image. Thereby, the mask image in which the hole is filled is extracted.

ステップ７：上記ステップ５で獲得されたボクセルデータを各撮影視点に投影し、複数枚のマスク画像を獲得する。 Step 7: The voxel data acquired in Step 5 is projected onto each photographing viewpoint to acquire a plurality of mask images.

ステップ８：不要部を除去した複数枚のマスク画像を抽出する。ステップ７で獲得されたマスク画像にフィルタ処理を施す。ここで得られたマスク画像とステップ６で抽出されたマスク画像とを比較して、両画像ともに白色である場合のみ白色とし、それ以外の場合は該当箇所を黒色にする。以上の処理を全マスク画像に行い、不要部を除去したマスク画像を抽出する。 Step 8: Extract a plurality of mask images from which unnecessary portions are removed. Filter processing is applied to the mask image acquired in step 7. The mask image obtained here is compared with the mask image extracted in step 6, and when both the images are white, the image is white, and in other cases, the corresponding portion is black. The above processing is performed on all mask images, and a mask image from which unnecessary portions are removed is extracted.

ステップ９：上記で得られたマスク画像を閉領域に分割し、所定の条件を満たす閉領域が存在するか確認する。存在した場合、ステップ１０に進み、存在しない場合、これ以上閉領域分割により精度向上が望めないとして、ステップ１１に進む。なお、所定の条件の例として、不要部は影等の領域であり人物の領域よりも小さいと考えられるため、一定の画素数以下の領域を不要部とすることが考えられる。 Step 9: The mask image obtained above is divided into closed areas, and it is confirmed whether there is a closed area that satisfies a predetermined condition. If it exists, the process proceeds to step 10, and if not, the process proceeds to step 11 on the assumption that no further improvement in accuracy can be expected due to the closed region division. As an example of the predetermined condition, the unnecessary part is an area such as a shadow and is considered to be smaller than the person's area.

ここで、閉領域は上下左右のいずれかにより連結している白色領域のことである。例えば、図２によるとＩ、ＩＩがそれぞれ閉領域となる。ＩとＩＩは斜め方向に連結されているが上下左右で連結していないため、別々の閉領域となる。また、マスク画像の必要部分の領域が削除されないようにするため、一定の画素数は必要部分の領域の画素数より小さい値である。例えば、画素数が１２８０×７２０で、人物のマスク画像である場合、人物の領域の画素数より小さい３０００程度が一定の画素数となる。 Here, the closed region is a white region connected by any one of up, down, left and right. For example, according to FIG. 2, I and II are closed regions. I and II are connected in an oblique direction, but are not connected vertically and horizontally, so that they are separate closed regions. Further, in order to prevent deletion of a necessary part area of the mask image, the certain number of pixels is smaller than the number of pixels of the necessary part area. For example, in the case where the number of pixels is 1280 × 720 and the person is a mask image, about 3000 smaller than the number of pixels in the person region is the fixed number of pixels.

ステップ１０：複数枚のマスク画像から閉領域の不要部を除去する。上記で得られた所定の条件を満たす領域を不要部として除去する。つまり、所定の条件を満たす白色領域を黒色とすることにより不要部を除去する。 Step 10: Remove unnecessary portions of the closed region from a plurality of mask images. The region that satisfies the predetermined condition obtained above is removed as an unnecessary portion. In other words, unnecessary portions are removed by setting the white region satisfying the predetermined condition to black.

マスク画像中の影等の不要部は被写体領域と連結していることが少なくないので、多くの撮影視点におけるマスク画像では不要部は閉領域として抽出することができず除去できない。一方、視体積交差法は、各マスク画像のＡＮＤ演算を基にボクセルデータを構築するので、少なくとも１つのマスク画像で不要部が除去されると、ボクセルデータ上でその不要部に相当する領域が除去される。よって、当該不要部が除去されたボクセルデータを各撮影視点に投影した映像によりＡＮＤ演算をとることで、全てのマスク画像から当該不要部が除去される。 Unnecessary portions such as shadows in the mask image are often connected to the subject region, and therefore, unnecessary portions cannot be extracted as a closed region in a mask image at many photographing viewpoints and cannot be removed. On the other hand, the visual volume intersection method constructs voxel data based on an AND operation of each mask image. Therefore, when an unnecessary part is removed from at least one mask image, an area corresponding to the unnecessary part on the voxel data is obtained. Removed. Therefore, the unnecessary part is removed from all the mask images by performing an AND operation on the video obtained by projecting the voxel data from which the unnecessary part has been removed onto each photographing viewpoint.

さらに、マスク画像中では複数の不要部が重なり合っていることが少なくないので、ある不要部が除去されることで、それまで被写体領域と連結していた別の領域が切り離され、閉領域として抽出できる可能性があり、繰り返し処理を適用することで各マスク画像およびボクセルデータから不要部が徐々に除去されていく。 In addition, in the mask image, it is often the case that a plurality of unnecessary portions overlap each other. Therefore, by removing a certain unnecessary portion, another region previously connected to the subject region is separated and extracted as a closed region. There is a possibility that the unnecessary portion is gradually removed by applying the iterative process from each mask image and voxel data.

一般に、視体積交差法を用いる場合、マスク画像における影や背景の不要部分は、ボクセルデータの生成にあまり影響を与えないが、人物マスク内の穴・欠損は、ボクセルデータの生成に大きな影響を与えるため、人物マスク内の穴・欠損を埋める必要がある。 In general, when using the visual volume intersection method, shadows and unnecessary background parts in the mask image do not significantly affect the generation of voxel data, but holes and defects in the human mask have a significant effect on the generation of voxel data. In order to give it, it is necessary to fill the hole / defect in the person mask.

また、ボクセルデータのスライス画像において人物が存在している画素（すなわち黒色でなく白色の画素）は、各マスク画像の該当画素においても必ず白色となるが、対象物体が存在していない画素（黒色の画素）は、各マスク画像において必ずしも黒色とは限らない。 In addition, a pixel in which a person exists in a slice image of voxel data (that is, a white pixel instead of black) is always white in the corresponding pixel of each mask image, but a pixel in which no target object exists (black) Are not necessarily black in each mask image.

よって、スライス画像における白色の画素が重要であり、スライス画像の精度の向上は、徐々に欠損・穴を埋めていくことになる。上記のステップ３からステップ１０を繰り返すことで、スライス画像の精度が向上し、ボクセルモデルの欠損が徐々に埋まっていく。 Therefore, white pixels in the slice image are important, and the improvement of the accuracy of the slice image gradually fills the defect / hole. By repeating Step 3 to Step 10 described above, the accuracy of the slice image is improved, and the voxel model defect is gradually filled.

ステップ１１：複数枚のマスク画像から色情報をもとにして不要部を除去する。上記ステップ８およびステップ１０によりマスク画像から不要部を除去したが、完全に除去できないことがある。例えば、床面付近の影および輪郭上の不要部は、被写体画像と重なり合っていることより、閉領域として抽出できず除去されない場合が多い。そのため、被写体画像と不要部の色情報の差異に着目して、画素を色情報に基づく特徴空間に射影し、マスク画像と背景画像との間で差異がある画素を抽出し不要部を削除する。色情報に基づく特徴空間としては、ＲＧＢ空間、またはＨＳＶ空間等が考えられる。ＲＧＢ空間の実施形態を以下で詳細に示す。 Step 11: Unnecessary portions are removed from the plurality of mask images based on the color information. Although unnecessary portions have been removed from the mask image in steps 8 and 10 above, there are cases where they cannot be completely removed. For example, shadows near the floor and unnecessary portions on the contour often overlap with the subject image and cannot be extracted as a closed region and are often not removed. Therefore, paying attention to the difference between the color information of the subject image and the unnecessary part, the pixels are projected into the feature space based on the color information, and the pixel having the difference between the mask image and the background image is extracted and the unnecessary part is deleted. . As the feature space based on the color information, an RGB space, an HSV space, or the like can be considered. An embodiment of the RGB space is described in detail below.

図３は、床面付近の不要部を決定する処理を示すフローチャートである。以下、本フローチャートに基づいて説明する。 FIG. 3 is a flowchart showing processing for determining an unnecessary portion near the floor surface. Hereinafter, description will be given based on this flowchart.

ステップ２１：３次元ボクセルデータの床面付近のボクセルを各撮影視点に投影し、マスク画像中の不要部除去候補１を抽出する。ステップ５で獲得した３次元ボクセルデータから、床面付近のボクセルを各撮影視点に投影し、各撮影視点毎の不要部除去候補１を獲得する。なお、床面付近は、床面から一定の高さ以下の範囲である。ｙ軸座標が０から始まるとき、例えば、ｙ軸座標範囲が０〜２５５であった場合は、ｙ軸座標０〜２０や０〜３０等が床面付近に該当する。 Step 21: Voxels in the vicinity of the floor surface of the three-dimensional voxel data are projected on each photographing viewpoint, and an unnecessary part removal candidate 1 in the mask image is extracted. From the three-dimensional voxel data acquired in step 5, voxels near the floor surface are projected onto each shooting viewpoint, and unnecessary part removal candidates 1 for each shooting viewpoint are acquired. In addition, the vicinity of the floor surface is a range below a certain height from the floor surface. When the y-axis coordinate starts from 0, for example, when the y-axis coordinate range is 0 to 255, the y-axis coordinate 0 to 20, 0 to 30 or the like corresponds to the vicinity of the floor surface.

ステップ２２：各撮影視点のマスク画像中における不要部除去候補２を抽出する。各撮影視点において前記不要部除去候補１に含まれる画素について、被写体画像の対応画素と背景画像の対応画素との間でＲＧＢベクトル値の差分をとり、その大きさが一定の閾値以下の画素を抽出する。被写体画像のある画素のＲＧＢベクトル値を（ｒ_１、ｇ_１、ｂ_１）、背景画像の対応する画素のＲＧＢベクトル値を（ｒ_２、ｇ_２、ｂ_２）とすると、ＲＧＢベクトル値の差分の大きさは、
√｛（ｒ_１−ｒ_２）^２＋（ｇ_１−ｇ_２）^２＋（ｂ_１−ｂ_２）^２｝
で表される。ｙ座標が０〜２５５であった場合、一定の閾値は、例えば、５０とする。 Step 22: Extract unnecessary part removal candidate 2 in the mask image of each photographing viewpoint. For each pixel included in the unnecessary part removal candidate 1 at each photographing viewpoint, a difference in RGB vector values is calculated between the corresponding pixel of the subject image and the corresponding pixel of the background image, and a pixel whose size is equal to or smaller than a certain threshold is determined. Extract. If the RGB vector value of a pixel in the subject image is (r ₁ , g ₁ , b ₁ ) and the RGB vector value of the corresponding pixel in the background image is (r ₂ , g ₂ , b ₂ ), the difference between the RGB vector values The size of
√ {(r ₁ −r ₂ ) ² + (g ₁ −g ₂ ) ² + (b ₁ −b ₂ ) ² }
It is represented by When the y coordinate is 0 to 255, the certain threshold is 50, for example.

ステップ２３：各撮影視点のマスク画像中における不要部除去候補３を抽出する。前記抽出された不要部除去候補２に含まれる各画素の光線を探索し、３次元ボクセルデータとの交点を求め、交点のｙ座標が一定の閾値以下の画素を抽出する。ｙ座標が０〜２５５であった場合、一定の閾値は、例えば、１０とする。本処理により、人物に当たる画素を除外し、床の影に当たる画素を抽出できる。 Step 23: Unnecessary portion removal candidate 3 is extracted from the mask image at each photographing viewpoint. A ray of each pixel included in the extracted unnecessary part removal candidate 2 is searched, an intersection with the three-dimensional voxel data is obtained, and a pixel whose y coordinate of the intersection is a certain threshold or less is extracted. When the y coordinate is 0 to 255, the certain threshold is, for example, 10. With this processing, it is possible to exclude pixels that hit a person and extract pixels that hit a floor shadow.

本処理により抽出された画素をマスク画像から除去することにより、マスク画像から床面付近の影を除去することができる。 By removing the pixels extracted by this processing from the mask image, it is possible to remove the shadow near the floor surface from the mask image.

図４は、輪郭上の不要部を決定する処理を示すフローチャートである。以下、本フローチャートに基づいて説明する。 FIG. 4 is a flowchart showing processing for determining an unnecessary portion on the contour. Hereinafter, description will be given based on this flowchart.

ステップ３１：各撮影視点におけるマスク画像から輪郭を抽出し、不要部除去候補１を抽出する。ステップ８により抽出されたマスク画像、またはステップ２３で抽出された不要部除去候補２の画素を除去したマスク画像から輪郭を抽出する。 Step 31: An outline is extracted from the mask image at each photographing viewpoint, and an unnecessary part removal candidate 1 is extracted. The contour is extracted from the mask image extracted in step 8 or the mask image from which the pixels of the unnecessary part removal candidate 2 extracted in step 23 are removed.

ステップ３２：各撮影視点のマスク画像中における不要部除去候補２を抽出する。各撮影視点において前記不要部除去候補１に含まれる画素と、同じ視点の背景画像の対応する画素との間でＲＧＢベクトル値の差分をとり、その大きさが一定の閾値以下の画素を抽出する。ＲＧＢベクトル値の差分の大きさは、ステップ２２と同じである。また、ｙ座標が０〜２５５であった場合、一定の閾値は、例えば、１５とする。 Step 32: Extract unnecessary part removal candidate 2 in the mask image at each photographing viewpoint. A difference in RGB vector values is taken between the pixels included in the unnecessary part removal candidate 1 and the corresponding pixels of the background image at the same viewpoint at each photographing viewpoint, and a pixel whose size is equal to or smaller than a certain threshold is extracted. . The magnitude of the difference between the RGB vector values is the same as in step 22. When the y coordinate is 0 to 255, the certain threshold is set to 15, for example.

ステップ３３：各撮影視点のマスク画像中における不要部除去候補３を抽出する。前記抽出された不要部除去候補２に含まれる画素の両隣カメラにおける対応点を検出する。該対応点での被写体画像の画素と、同じ撮影視点の背景画像の画素との間でＲＧＢベクトル値の差分をとり、その大きさが一定の閾値以下の画素を抽出する。 Step 33: Unnecessary portion removal candidate 3 in the mask image at each photographing viewpoint is extracted. Corresponding points in the adjacent cameras of the pixels included in the extracted unnecessary part removal candidate 2 are detected. A difference in RGB vector values is obtained between the pixel of the subject image at the corresponding point and the pixel of the background image at the same shooting viewpoint, and a pixel having a size equal to or smaller than a certain threshold is extracted.

本処理により抽出された画素をマスク画像から除去することにより、マスク画像から輪郭上の不要部を除去することができる。 By removing the pixels extracted by this processing from the mask image, unnecessary portions on the contour can be removed from the mask image.

なお、ステップ３３において、両隣のカメラにおける対応点でＲＧＢベクトル値の差分を取る際に、対応点の画素に色補正を行うこと、または閾値を色補正に関連して可変にすることにより、不要部決定の精度を向上することができる。 In step 33, when the difference between the RGB vector values at the corresponding points in the adjacent cameras is taken, it is unnecessary by performing color correction on the pixels of the corresponding points, or by making the threshold variable in relation to the color correction. The accuracy of part determination can be improved.

また、不要部除去処理対象のマスク画像に対して、事前にＲＧＢ値に基づく領域分割を行い、この領域内の１点でも上記ステップ２３およびステップ３３により抽出された画素があった場合、その領域を削除することにより不要部を除去することもできる。 In addition, if a mask image to be subjected to unnecessary portion removal processing is divided into regions based on RGB values in advance, and there is a pixel extracted in steps 23 and 33 even at one point in this region, that region It is also possible to remove unnecessary parts by deleting.

また、図３の床面付近の不要部を決定する処理、および図４の輪郭上の不要部を決定する処理は、全ての視点のマスク画像に対して行うだけではなく、一部の視点のマスク画像に対してのみ行うこともできる。この場合も、少なくとも１つのマスク画像で不要部が除去されると、ボクセルデータ上でその不要部に相当する領域が除去される。よって、当該不要部が除去されたボクセルデータを各撮影視点に投影した映像によりＡＮＤ演算をとることで、全てのマスク画像から当該不要部が除去される。 Further, the process for determining the unnecessary part near the floor surface in FIG. 3 and the process for determining the unnecessary part on the contour in FIG. 4 are not only performed for the mask images of all viewpoints, but also for some viewpoints. It can also be performed only on the mask image. Also in this case, when an unnecessary portion is removed from at least one mask image, a region corresponding to the unnecessary portion on the voxel data is removed. Therefore, the unnecessary part is removed from all the mask images by performing an AND operation on the video obtained by projecting the voxel data from which the unnecessary part has been removed onto each photographing viewpoint.

また、図３の床面付近の不要部を決定する処理、および図４の輪郭上の不要部を決定する処理を繰り返し適用することで各マスク画像およびボクセルデータから不要部が徐々に除去されていく。 Further, by repeatedly applying the process for determining the unnecessary part near the floor in FIG. 3 and the process for determining the unnecessary part on the contour in FIG. 4, the unnecessary part is gradually removed from each mask image and voxel data. Go.

ステップ１２：複数枚の穴を埋めたマスク画像を抽出する。上記ステップ１０またはステップ１１で得られたマスク画像にフィルタ処理を施すことによりさらに穴を埋める。このように、穴を埋め（ステップ６）、不要な部分を削除し（ステップ８、ステップ１０、ステップ１１）、穴を埋める（ステップ１２）処理を施すことで、より高精度なマスク画像が抽出される。 Step 12: Extract a mask image in which a plurality of holes are filled. A hole is further filled by performing filter processing on the mask image obtained in step 10 or step 11. In this way, by filling the hole (step 6), deleting unnecessary portions (step 8, step 10, and step 11) and filling the hole (step 12), a more accurate mask image is extracted. Is done.

ステップ１３：マスク画像の精度が十分であった場合、本マスク画像から視体積交差法を用いることにより、高精度なボクセルデータが構築される。マスク画像の精度が十分でなかった場合、ステップ１２で得られたマスク画像をステップ３の入力とすることにより、ステップ３からステップ１２を繰り返し、マスク画像とボクセルデータの精度を漸次更新していく。 Step 13: When the accuracy of the mask image is sufficient, highly accurate voxel data is constructed by using the view volume intersection method from the mask image. If the accuracy of the mask image is not sufficient, the mask image obtained in step 12 is used as the input in step 3, and step 3 to step 12 are repeated to gradually update the accuracy of the mask image and voxel data. .

次に、ステップ１１の処理によりマスク画像の精度が向上していくことを実際の画像により示す。最初にステップ２１からステップ２３の処理により、床面付近の不要部が除去されることを示す。図５は、ステップ３からステップ１０までの処理を複数回適用し、不要部を除去したマスク画像を示す。例えば、矢印１で示される床面付近の不要部、および矢印２で示される輪郭上の不要部が存在している。図６は、ステップ２１におけるｙ軸方向のボクセルのスライス画像の例を示す。本例はｙ＝３５のスライス画像を示す。図７は、図６と同じ撮影視点のスライス画像（０≦ｙ≦５０に存在する全てのスライス画像）を各撮影視点に投影した結果得られるマスク画像を示す。図８は、ステップ２２により不要部として抽出された画素を示す。本例では閾値は５０を用いている。図９は、ステップ２３により不要部として抽出された画素を示す。本例では閾値は１０を用いている。ここでは、実際に除去する画素を白、それ以外を黒色で表示している。図１０は、ステップ２１からステップ２３の処理を適用の結果得られるマスク画像を示す。図１０と図５と比較すると矢印１で示される床面付近の不要部が除去され、マスク画像の精度が向上していることが分かる。 Next, an actual image indicates that the accuracy of the mask image is improved by the processing in step 11. First, it is shown that unnecessary portions near the floor surface are removed by the processing from step 21 to step 23. FIG. 5 shows a mask image in which the processing from step 3 to step 10 is applied a plurality of times and unnecessary portions are removed. For example, there is an unnecessary portion near the floor indicated by the arrow 1 and an unnecessary portion on the contour indicated by the arrow 2. FIG. 6 shows an example of a voxel slice image in the y-axis direction in step 21. This example shows a slice image with y = 35. FIG. 7 shows mask images obtained as a result of projecting slice images at the same photographing viewpoint as in FIG. 6 (all slice images existing in 0 ≦ y ≦ 50) onto the respective photographing viewpoints. FIG. 8 shows the pixels extracted as unnecessary portions in step 22. In this example, 50 is used as the threshold value. FIG. 9 shows the pixels extracted as unnecessary portions in step 23. In this example, 10 is used as the threshold value. Here, the pixels that are actually removed are displayed in white, and the others are displayed in black. FIG. 10 shows a mask image obtained as a result of applying the processing from step 21 to step 23. Compared with FIG. 10 and FIG. 5, it can be seen that an unnecessary portion near the floor indicated by the arrow 1 is removed, and the accuracy of the mask image is improved.

次にステップ３１からステップ３３の処理により、輪郭上の不要部が除去されることを示す。図１１は、ステップ３からステップ１０までの処理を複数回適用し、不要部を除去したマスク画像を示す。例えば、矢印１で示される床面付近の不要部、および矢印２、矢印３で示される輪郭上の不要部が存在している。図１２は、ステップ３１により抽出された輪郭を示す。図１３は、ステップ３２により不要部として抽出された画素を示す。本例では閾値は１５を用いている。図１４は、ステップ３３により不要部として抽出された画素を示す。本例では閾値は１５を用いている。本例ではさらに、処理対象のマスク画像をＲＧＢ値に基づく領域分割を行い、この領域内の１点でも上記で摘出された不要部があるならば、該領域を不要部として除去した。図１５は、ＲＧＢ値に基づく領域分割により不要部として除去される領域を白色で示す。図１６は、ステップ３１からステップ３３の処理にＲＧＢ値に基づく領域分割処理を加えた不要部除去の結果得られるマスク画像を示す。図１１と比較すると、矢印２で示される輪郭上の不要部が減少されていることが分かる。 Next, it is shown that unnecessary portions on the contour are removed by the processing from step 31 to step 33. FIG. 11 shows a mask image in which the processing from step 3 to step 10 is applied a plurality of times and unnecessary portions are removed. For example, an unnecessary portion near the floor surface indicated by the arrow 1 and an unnecessary portion on the contour indicated by the arrows 2 and 3 exist. FIG. 12 shows the contour extracted in step 31. FIG. 13 shows the pixels extracted as unnecessary portions in step 32. In this example, a threshold value of 15 is used. FIG. 14 shows the pixels extracted as unnecessary portions in step 33. In this example, a threshold value of 15 is used. In this example, the mask image to be processed is further divided into regions based on RGB values, and if there is an unnecessary portion extracted as described above even at one point in this region, the region is removed as an unnecessary portion. FIG. 15 shows in white the area that is removed as an unnecessary part by the area division based on the RGB values. FIG. 16 shows a mask image obtained as a result of unnecessary portion removal by adding region division processing based on RGB values to the processing from step 31 to step 33. Compared with FIG. 11, it can be seen that unnecessary portions on the contour indicated by the arrow 2 are reduced.

ステップ１１の処理を複数回繰り返すことにより、マスク画像の精度が徐々に向上していくことを実際の画像により示す。本例では、ステップ３１からステップ３３の処理を複数回繰り返す。図１７は、ステップ３１からステップ３３の処理にＲＧＢ値に基づく領域分割処理を加えた不要部除去を３回適用した結果を示す。矢印３で示される輪郭上の不要部が、図１１に比べ図１６で小さくなっていることが分かり、図１７ではさらに小さくなっていることが分かる。 An actual image shows that the accuracy of the mask image is gradually improved by repeating the process of step 11 a plurality of times. In this example, the processing from step 31 to step 33 is repeated a plurality of times. FIG. 17 shows the result of applying unnecessary part removal obtained by adding region dividing processing based on RGB values to the processing from step 31 to step 33 three times. It can be seen that the unnecessary portion on the contour indicated by the arrow 3 is smaller in FIG. 16 than in FIG. 11, and further smaller in FIG.

また、以上述べた実施形態は全て本発明を例示的に示すものであって限定的に示すものではなく、本発明は他の種々の変形態様及び変更態様で実施することができる。従って本発明の範囲は特許請求の範囲及びその均等範囲によってのみ規定されるものである。 Moreover, all the embodiment described above shows the present invention exemplarily, and does not limit the present invention, and the present invention can be implemented in other various modifications and changes. Therefore, the scope of the present invention is defined only by the claims and their equivalents.

Claims

A method of extracting a plurality of mask images representing the presence of a subject from a plurality of subject images obtained by photographing a subject and a background and a plurality of background images obtained by photographing only a background,
A first extraction step in which a processor extracts a plurality of first mask images by background difference from the plurality of subject images and the plurality of background images;
A first construction step in which a processor constructs first three-dimensional voxel data from the plurality of first mask images by a view volume intersection method;
A second construction step in which processing is performed on the first three-dimensional voxel data to fill defects and / or remove noise, and a second three-dimensional voxel data is constructed by a processor ;
Based on the second three-dimensional voxel data, processing based on the color information of the subject image and / or the background image that fills in the defects of the plurality of first mask images and / or removes noise And a second extraction step in which the processor extracts a plurality of second mask images;
Only including,
In the second extraction step, in the second three-dimensional voxel data, a three-dimensional coordinate filled with a defect is projected onto each photographing viewpoint, and a corresponding pixel in the plurality of first mask images has a defect. A plurality of first submask images are extracted, the second three-dimensional voxel data is projected onto each photographing viewpoint, a plurality of second submask images are extracted, and the plurality of second submask images are extracted. Filtering is performed on the sub-mask image, and only pixels filled in both the filtered second sub-mask image and the plurality of first sub-mask images are set as regions representing the presence of the subject, and other pixels. Is a region where no subject exists, a plurality of third submask images are extracted, the plurality of third submask images are divided into closed regions, and a closed region that satisfies a predetermined condition is excluded. Thus, a plurality of fourth submask images are extracted, the plurality of fourth submask images are processed based on the color information of the subject image and / or the background image, and a plurality of second masks are obtained. Extracting images,
Processing based on the color information
A sub-step of determining a first unnecessary part removal candidate in the fourth sub-mask image by projecting a voxel having a certain height or less from a floor surface of the second three-dimensional voxel data to a photographing viewpoint;
For each pixel included in the first unnecessary portion removal candidate, a pixel value is projected onto a feature space based on color information, and a difference is calculated between the pixel of the subject image and the pixel of the background image, and the difference is equal to or less than a threshold value. A sub-step in which the second pixel is a second unnecessary portion removal candidate,
A sub-step in which a ray is searched for each pixel included in the second unnecessary portion removal candidate, and a pixel whose intersection with the third three-dimensional voxel data is equal to or less than a certain height from the floor is set as an unnecessary pixel; ,
Removing the unnecessary pixels from the fourth submask image;
A method for extracting a plurality of mask images characterized by comprising:

Processing based on the color information
A sub-step of extracting a contour from the fourth sub-mask image and determining a first unnecessary portion removal candidate;
For each pixel included in the first unnecessary portion removal candidate, a pixel value is projected onto a feature space based on color information, and a difference is calculated between the pixel of the subject image and the pixel of the background image, and the difference is equal to or less than a threshold value. A sub-step in which the second pixel is a second unnecessary portion removal candidate,
A sub-step of determining, for each pixel included in the second unnecessary portion removal candidate, a corresponding pixel of the second unnecessary portion removal candidate at a viewpoint adjacent to the pixel;
For the corresponding pixel of the second unnecessary part removal candidate, the pixel value is projected onto the feature space based on the color information, the difference between the subject image pixel and the background image pixel is calculated, and the difference is equal to or less than the threshold value. A sub-step in which the pixel is an unnecessary pixel;
Removing the unnecessary pixels from the fourth submask image;
The method of extracting a plurality of mask images according to claim 1 , wherein:

Divided into areas on the basis of previously color information a plurality of object images, according to claim 1 or 2, further comprising a sub-step of removing the region required pixel belongs in the mask image of each shooting viewpoint A method for extracting a plurality of mask images.

The method for extracting a plurality of mask images according to any one of claims 1 to 3 , wherein the feature space based on the color information is an RGB space or an HSV space.

The sub-step of making the unnecessary pixel includes
For the corresponding pixel of the second unnecessary portion removal candidate, color correction of the subject image and / or the background image is performed, and the pixel value is projected onto the feature space based on the color information, and the pixel of the subject image and the pixel of the background image 3. The method of extracting a plurality of mask images according to claim 2 , wherein a difference is calculated between the first and second pixels, a pixel whose difference is equal to or smaller than a threshold value is set as an unnecessary pixel, and the threshold value is changed based on the color correction.

By repeating the plurality of second mask images as the plurality of first mask images in the first construction step, the steps from the first construction step to the second extraction step are repeated a predetermined number of times. The method for extracting a plurality of mask images according to any one of claims 1 to 5 .

The second construction step includes
Obtaining a plurality of first slice images of the first three-dimensional voxel data from the x-axis, y-axis, and z-axis directions;
Applying a filtering process to the plurality of first slice images, and constructing a second three-dimensional voxel data based on a result of the filtering process;
The method for extracting a plurality of mask images according to any one of claims 1 to 6 , characterized by comprising:

The sub-step of constructing the second 3D voxel data is:
A filter process is performed on the plurality of first slice images, a pixel that has become white by the filter process is obtained, and the three-dimensional coordinates of the first three-dimensional voxel data corresponding to the pixel are filled with the second slice image. The method of extracting a plurality of mask images according to claim 7 , wherein the three-dimensional voxel data is constructed.

After the second extraction step, to extract a plurality of mask images according to any one of claims 1 to 8, characterized in that it further comprises a step of applying filtering processing to the plurality of fifth mask image Method.

A computer for extracting a plurality of mask images representing the presence of a subject from a plurality of subject images obtained by photographing the subject and the background and a plurality of background images obtained by photographing only the background,
First extraction means for extracting a plurality of first mask images by background difference from the plurality of subject images and the plurality of background images;
First construction means for constructing first three-dimensional voxel data from the plurality of first mask images by a visual volume intersection method;
Second construction means for constructing second three-dimensional voxel data by performing processing for filling defects and / or removing noise on the first three-dimensional voxel data;
Based on the second three-dimensional voxel data, processing based on the color information of the subject image and / or the background image that fills in the defects of the plurality of first mask images and / or removes noise Second extraction means for extracting a plurality of second mask images;
To function,
In the second three-dimensional voxel data, the second extraction unit projects three-dimensional coordinates filled with defects onto each photographing viewpoint, and corresponding pixels in the plurality of first mask images are defective. A plurality of first submask images are extracted, the second three-dimensional voxel data is projected onto each photographing viewpoint, a plurality of second submask images are extracted, and the plurality of second submask images are extracted. Filtering is performed on the sub-mask image, and only pixels filled in both the filtered second sub-mask image and the plurality of first sub-mask images are set as regions representing the presence of the subject, and other pixels. Is a region where no subject exists, a plurality of third sub-mask images are extracted, the plurality of third sub-mask images are divided into closed regions, and a closed region satisfying a predetermined condition is removed. Thus, a plurality of fourth sub-mask images are extracted, and the plurality of fourth sub-mask images are processed based on the color information of the subject image and / or the background image, and the plurality of second mask images Is a means of extracting
Processing based on the color information
Means for determining a first unnecessary part removal candidate in the fourth sub-mask image by projecting a voxel having a certain height or less from the floor surface of the second three-dimensional voxel data to a photographing viewpoint;
For each pixel included in the first unnecessary portion removal candidate, a pixel value is projected onto a feature space based on color information, and a difference is calculated between the pixel of the subject image and the pixel of the background image, and the difference is equal to or less than a threshold value. A second unnecessary portion removal candidate,
Means for searching for a ray for each pixel included in the second unnecessary portion removal candidate, and making a pixel whose intersection with the third three-dimensional voxel data is a certain height or less from the floor surface as an unnecessary pixel;
Means for removing the unwanted pixels from the fourth submask image;
The containing, program and extracting a plurality of mask images.

A method for constructing three-dimensional voxel data from a plurality of subject images obtained by photographing a subject and a background and a plurality of background images obtained by photographing only a background,
A first extraction step in which a processor extracts a plurality of first mask images by background difference from the plurality of subject images and the plurality of background images;
A first construction step in which a processor constructs first three-dimensional voxel data from the plurality of first mask images by a view volume intersection method;
A second construction step in which processing is performed on the first three-dimensional voxel data to fill defects and / or remove noise, and a second three-dimensional voxel data is constructed by a processor ;
Based on the second three-dimensional voxel data, processing based on the color information of the subject image and / or the background image that fills in the defects of the plurality of first mask images and / or removes noise And a second extraction step in which the processor extracts a plurality of second mask images;
A third construction step in which a processor constructs third three-dimensional voxel data from the plurality of second mask images by a view volume intersection method;
Only including,
In the second extraction step, in the second three-dimensional voxel data, a three-dimensional coordinate filled with a defect is projected onto each photographing viewpoint, and a corresponding pixel in the plurality of first mask images has a defect. A plurality of first submask images are extracted, the second three-dimensional voxel data is projected onto each photographing viewpoint, a plurality of second submask images are extracted, and the plurality of second submask images are extracted. Filtering is performed on the sub-mask image, and only pixels filled in both the filtered second sub-mask image and the plurality of first sub-mask images are set as regions representing the presence of the subject, and other pixels. Is a region where no subject exists, a plurality of third submask images are extracted, the plurality of third submask images are divided into closed regions, and a closed region that satisfies a predetermined condition is excluded. Thus, a plurality of fourth submask images are extracted, the plurality of fourth submask images are processed based on the color information of the subject image and / or the background image, and a plurality of second masks are obtained. Extracting images,
Processing based on the color information
A sub-step of determining a first unnecessary part removal candidate in the fourth sub-mask image by projecting a voxel having a certain height or less from a floor surface of the second three-dimensional voxel data to a photographing viewpoint;
For each pixel included in the first unnecessary portion removal candidate, a pixel value is projected onto a feature space based on color information, and a difference is calculated between the pixel of the subject image and the pixel of the background image, and the difference is equal to or less than a threshold value. A sub-step in which the second pixel is a second unnecessary portion removal candidate,
A sub-step in which a ray is searched for each pixel included in the second unnecessary portion removal candidate, and a pixel whose intersection with the third three-dimensional voxel data is equal to or less than a certain height from the floor is set as an unnecessary pixel; ,
Removing the unnecessary pixels from the fourth submask image;
A method for constructing three-dimensional voxel data, comprising:

A computer for constructing three-dimensional voxel data from a plurality of subject images obtained by photographing the subject and the background and a plurality of background images obtained by photographing only the background,
First extraction means for extracting a plurality of first mask images by background difference from the plurality of subject images and the plurality of background images;
First construction means for constructing first three-dimensional voxel data from the plurality of first mask images by a visual volume intersection method;
Second construction means for constructing second three-dimensional voxel data by performing processing for filling defects and / or removing noise on the first three-dimensional voxel data;
Based on the second three-dimensional voxel data, processing based on the color information of the subject image and / or the background image that fills in the defects of the plurality of first mask images and / or removes noise Second extraction means for extracting a plurality of second mask images;
Third construction means for constructing third three-dimensional voxel data from the plurality of second mask images by a visual volume intersection method;
To function,
In the second three-dimensional voxel data, the second extraction unit projects three-dimensional coordinates filled with defects onto each photographing viewpoint, and corresponding pixels in the plurality of first mask images are defective. A plurality of first submask images are extracted, the second three-dimensional voxel data is projected onto each photographing viewpoint, a plurality of second submask images are extracted, and the plurality of second submask images are extracted. Filtering is performed on the sub-mask image, and only pixels filled in both the filtered second sub-mask image and the plurality of first sub-mask images are set as regions representing the presence of the subject, and other pixels. Is a region where no subject exists, a plurality of third sub-mask images are extracted, the plurality of third sub-mask images are divided into closed regions, and a closed region satisfying a predetermined condition is removed. Thus, a plurality of fourth sub-mask images are extracted, and the plurality of fourth sub-mask images are processed based on the color information of the subject image and / or the background image, and the plurality of second mask images Is a means of extracting
Processing based on the color information
Means for determining a first unnecessary part removal candidate in the fourth sub-mask image by projecting a voxel having a certain height or less from the floor surface of the second three-dimensional voxel data to a photographing viewpoint;
For each pixel included in the first unnecessary portion removal candidate, a pixel value is projected onto a feature space based on color information, and a difference is calculated between the pixel of the subject image and the pixel of the background image, and the difference is equal to or less than a threshold value. A second unnecessary portion removal candidate,
Means for searching for a ray for each pixel included in the second unnecessary portion removal candidate, and making a pixel whose intersection with the third three-dimensional voxel data is a certain height or less from the floor surface as an unnecessary pixel;
Means for removing the unwanted pixels from the fourth submask image;
Including, program characterized by constructing a three-dimensional voxel data.