JP2010237941A

JP2010237941A - Mask image generation device, three-dimensional model information generation device, and program

Info

Publication number: JP2010237941A
Application number: JP2009084871A
Authority: JP
Inventors: Tehrani Mehrdad Panahpour; パナヒプルテヘラニメヒルダド; Akio Ishikawa; 彰夫石川
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2009-03-31
Filing date: 2009-03-31
Publication date: 2010-10-21

Abstract

<P>PROBLEM TO BE SOLVED: To provide a mask image generation device for generating an accurate mask image as much as possible. <P>SOLUTION: The mask image generation device is provided with: a pre-processing means for generating pixel values of pixels to be generated from a light beam of a light beam space, attribute information showing whether the pixels are a foreground or a background and information showing the positions of the pixels in an actual space when the attributes are a foreground based on a plurality of images; an unfixed region determination means for determining an unfixed region including the boundary of the light beams whose attributes are the foreground and the background of the light beam space, and for updating the attribute information of the light beam included in the determined unfixed region to "unfixed"; and a matting means. The matting means is provided with a mask image generation means for generating a plurality of mask images based on the attribute information of the light beam of the light beam space, and a determination means for matting the pixels whose attributes are unfixed of the generated mask image. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、マスク画像及び３次元物体モデル情報の生成技術に関する。 The present invention relates to a technique for generating a mask image and three-dimensional object model information.

特許文献１には、複数のカメラで撮影した動画像から、任意の位置における画像を再現する自由視点画像生成技術が記載されている。特許文献１によると、被写体を取り囲む様に複数のカメラを配置し、このカメラに囲まれた１つの領域を複数の局所領域に分割し、各カメラが撮影した画像に基づき局所的な光線空間を複数構築して自由視点画像を生成している。ここで、局所的な光線空間を構築するとは、局所領域の境界面を通過する光線を適当な座標軸を使用して表し、この座標空間の座標として表現される光線と、この光線により生じる画像の画素値を対応付けることを言う。 Patent Document 1 describes a free viewpoint image generation technique for reproducing an image at an arbitrary position from moving images taken by a plurality of cameras. According to Patent Document 1, a plurality of cameras are arranged so as to surround a subject, one region surrounded by the cameras is divided into a plurality of local regions, and a local light space is determined based on images captured by each camera. Multiple viewpoints are created to generate free viewpoint images. Here, constructing a local ray space means that rays passing through the boundary surface of the local region are expressed using an appropriate coordinate axis, and rays expressed as coordinates in this coordinate space and an image generated by the rays are displayed. This refers to associating pixel values.

自由視点画像生成のためには、３次元空間における被写体の表面位置を示す情報である３次元物体モデル情報が必要であり、例えば、非特許文献１には、視体積交差法により３次元物体モデル情報を生成する技術が、非特許文献２には、センサを利用して３次元物体モデル情報を生成する技術が記載されている。なお、視体積交差法とは、複数のカメラで撮影した画像の各画素が被写体に対応するものであるか否かを判定し、被写体であるか否かを示す２値画像であるマスク画像を複数生成し、各マスク画像の被写体に対応する画素とカメラパラメータから３次元物体情報を求めるものである。 In order to generate a free viewpoint image, 3D object model information that is information indicating the surface position of a subject in a 3D space is required. For example, Non-Patent Document 1 discloses a 3D object model by a visual volume intersection method. As a technology for generating information, Non-Patent Document 2 describes a technology for generating three-dimensional object model information using a sensor. Note that the visual volume intersection method determines whether each pixel of an image captured by a plurality of cameras corresponds to a subject, and a mask image which is a binary image indicating whether the subject is a subject. A plurality of images are generated, and three-dimensional object information is obtained from pixels corresponding to the subject of each mask image and camera parameters.

任意の位置から見た映像を再現する自由視点映像システムのためには、被写体の形状情報、つまり、３次元物体モデル情報を、可能な限り正確に生成することが必要である。ただし、自由視点映像システムでは、例えば、スポーツの試合等、常に動き続ける被写体をも対象とするため、非特許文献２に記載のセンサ等を利用することなく３次元物体モデル情報を生成することが求められる。なお、３次元物体モデル情報は、マスク画像から生成することができるため、正確なマスク画像を生成できれば正確な３次元物体モデル情報を生成できることになる。 For a free viewpoint video system that reproduces video viewed from an arbitrary position, it is necessary to generate subject shape information, that is, three-dimensional object model information, as accurately as possible. However, in the free viewpoint video system, for example, a subject that constantly moves, such as a sporting game, is also targeted, so that it is possible to generate 3D object model information without using the sensor described in Non-Patent Document 2. Desired. Note that since the three-dimensional object model information can be generated from the mask image, accurate three-dimensional object model information can be generated if an accurate mask image can be generated.

特開２００８−１５７５６号公報JP 2008-15756 A

Ｔ．Ｍａｔｓｕｙａｍａ、ｅｔａｌ．、“Ｒｅａｌ−ＴｉｍｅＤｙｎａｍｉｃ３ＤＯｂｊｅｃｔＳｈａｐｅＲｅｃｏｎｓｔｒｕｃｔｉｏｎａｎｄＨｉｇｈ−ＦｉｄｅｌｉｔｙＴｅｘｔｕｒｅＭａｐｐｉｎｇｆｏｒ３ＤＶｉｄｅｏ”、ＩＥＥＥＴｒａｎｓ．ｏｎＣｉｒｃｕｉｔｓａｎｄＳｙｓｔｅｍｓｆｏｒＶｉｄｅｏＴｅｃｈｎｏｌｏｇｙ、Ｖｏｌ．ＣＳＶＴ-１４、Ｎｏ．３、ｐｐ．３５７−３６９、２００４年T.A. Matsuyama, et al. "Real-Time Dynamic 3D Object Shape Reconstruction and High-Fidelity Texture Mapping for 3D Video", IEEE Trans. on Circuits and Systems for Video Technology, Vol. CSVT-14, no. 3, pp. 357-369, 2004 Ａ．Ｂａｎｎｏ、ｅｔａｌ．、“ＦｌｙｉｎｇＬａｓｅｒＲａｎｇｅＳｅｎｓｏｒｆｏｒＬａｒｇｅ−ＳｃａｌｅＳｉｔｅＭｏｄｅｌｉｎｇａｎｄＩｔｓＡｐｐｌｉｃａｔｉｏｎｓｉｎＢａｙｏｎＤｉｇｉｔａｌＡｒｃｈｉｖａｌＰｒｏｊｅｃｔ”、Ｉｎｔ．ＪｏｕｒｎａｌｏｆＣｏｍｐｕｔｅｒＶｉｓｉｏｎ、Ｖｏｌ．７８、Ｎｏ．２−３、ｐｐ．２０７−２２２、２００８年７月A. Banno, et al. “Flying Laser Range Sensor for Large-Scale Site Modeling and Its Applications in Bayon Digital Archive Project”, Int. Journal of Computer Vision, Vol. 78, no. 2-3, pp. 207-222, July 2008

したがって、本発明は、可能な限り正確なマスク画像又は３次元物体モデル情報を生成する装置及び該装置としてコンピュータを機能させるプログラムを提供することを目的とする。 Therefore, an object of the present invention is to provide an apparatus that generates as accurate a mask image or three-dimensional object model information as possible and a program that causes a computer to function as the apparatus.

本発明によるマスク画像生成装置によれば、
複数の画像に基づき、光線空間の光線について、該光線により生じる画素の画素値、該画素が前景であるか背景であるかの属性を示す属性情報及び属性が前景である場合には該画素の実空間における位置を示す情報を生成する前処理手段と、前記光線空間の属性が前景と背景の光線の境界において、境界を含む不定領域を決定し、決定した不定領域に含まれる光線の属性情報を“不定”に更新する不定領域決定手段と、マッティング手段とを備えており、前記マッティング手段は、光線空間の光線の属性情報に基づき複数のマスク画像を生成するマスク画像生成手段と、生成したマスク画像の属性が不定である画素のマッティングを行う判定手段とを備えていることを特徴とする。 According to the mask image generating apparatus according to the present invention,
Based on a plurality of images, for a ray in the ray space, the pixel value of the pixel generated by the ray, attribute information indicating an attribute of whether the pixel is the foreground or the background, and the attribute of the pixel when the attribute is the foreground Pre-processing means for generating information indicating a position in real space, and the attribute of the light ray space determines the indefinite region including the boundary at the boundary between the foreground and background rays, and the attribute information of the light ray included in the determined indefinite region Indeterminate area determining means for updating the value to “indefinite”, and matting means, the matting means, mask image generating means for generating a plurality of mask images based on the attribute information of the light rays in the light space, And a determination unit that performs matting of a pixel whose attribute of the generated mask image is indefinite.

本発明によるマスク画像生成装置の他の実施形態によれば、
複数の画像に基づき、光線空間の光線について、該光線により生じる画素の画素値、該画素が前景であるか背景であるかの属性を示す属性情報及び属性が前景である場合には該画素の実空間における位置を示す情報を生成する前処理手段と、前記光線空間の属性が前景と背景の光線の境界において、境界を含む不定領域を決定し、決定した不定領域に含まれる光線の属性情報を“不定”に更新する不定領域決定手段とマッティング手段とを備えており、前記マッティング手段は、光線空間の属性が不定である光線に対応する画素のマッティングを行う判定手段と、光線空間の光線の前記マッティング後における属性情報に基づき複数のマスク画像を生成するマスク画像生成手段とを備えていることを特徴とする。 According to another embodiment of the mask image generating apparatus according to the present invention,
Based on a plurality of images, for a ray in the ray space, the pixel value of the pixel generated by the ray, attribute information indicating an attribute of whether the pixel is the foreground or the background, and the attribute of the pixel when the attribute is the foreground Pre-processing means for generating information indicating a position in real space, and the attribute of the light ray space determines the indefinite region including the boundary at the boundary between the foreground and background rays, and the attribute information of the light ray included in the determined indefinite region Indeterminate area determining means and matting means for updating to “indefinite”, the matting means comprising: a determining means for performing matting of a pixel corresponding to a light ray having an indefinite ray space attribute; And a mask image generating means for generating a plurality of mask images based on the attribute information after the mating of the rays in the space.

また、本発明によるマスク画像生成装置の他の実施形態によれば、
前景と背景の境界にある属性が前景の第３の光線を複数選択し、選択した第３の光線それぞれについて、第３の光線に対応する第３の画素の実空間の位置である第１の位置と、実空間の１点から発出される各光線の該光線空間における分布に基づき、所定の視点から見た画像内の画素であって第１の位置から発出される第１の光線に対応する第１の画素と、前記所定の視点とは異なる視点から見た画像内の画素であって第１の位置から発出される第２の光線に対応する第２の画素とを求める探索範囲設定手段と、選択した第３の光線それぞれについて、第１の画素を中心とする所定範囲の領域と、第２の画素を中心とする所定範囲の領域内において、所定サイズの画素ブロックによるブロックマッチングを適用し、最も相関の高い組合せに含まれる２つの画素ブロックの所定位置の画素である第４の画素と第５の画素を求めるブロックマッチング手段と、前記不定領域を決定する決定手段とを備えており、前記決定手段は、選択した第３の光線それぞれについて、前記光線空間において、第３の光線と、第４の画素に対応する第４の光線及び第５の画素に対応する第５の光線を直線又は曲線で結ぶことにより得た光線空間内の複数の直線又は曲線で囲まれた領域を前記不定領域とすることも好ましい。 Further, according to another embodiment of the mask image generating apparatus according to the present invention,
A plurality of third light rays having an attribute at the boundary between the foreground and the background are selected, and the first light ray is a position in the real space of the third pixel corresponding to the third light ray for each selected third light ray. Based on the position and the distribution of each ray emitted from one point in the real space in the ray space, corresponding to the first ray emitted from the first position, which is a pixel in the image viewed from a predetermined viewpoint Search range setting for obtaining a first pixel to be processed and a second pixel corresponding to a second light ray emitted from the first position in the image viewed from a viewpoint different from the predetermined viewpoint And block matching with a pixel block of a predetermined size for each of the selected third light rays in a predetermined range area centered on the first pixel and a predetermined range area centered on the second pixel. Applied and included in the most correlated combination Block matching means for obtaining a fourth pixel and a fifth pixel which are pixels at predetermined positions of the two pixel blocks to be determined, and a determining means for determining the indefinite region, wherein the determining means For each of the three light beams, the third light beam, the fourth light beam corresponding to the fourth pixel, and the fifth light beam corresponding to the fifth pixel are connected by a straight line or a curve in the light space. It is also preferable that an area surrounded by a plurality of straight lines or curves in the light space is the indefinite area.

さらに、本発明によるマスク画像生成装置の他の実施形態によれば、
判定手段は、属性が不定である画素の画素値と、該画素を中心とした所定領域に含まれる属性が前景である画素の画素値の平均値及び分散と、属性が背景である画素の画素値の平均値及び分散に基づき該画素に対する第１の値を算出し、不定領域内の前景と背景の境界を、前記境界上の画素に対する第１の値の総和が最小となる様に求めることも好ましい。 Furthermore, according to another embodiment of the mask image generating apparatus according to the present invention,
The determination means includes a pixel value of a pixel whose attribute is indefinite, an average value and a variance of pixel values of a pixel whose attribute is a foreground included in a predetermined area centered on the pixel, and a pixel of a pixel whose attribute is a background A first value for the pixel is calculated based on an average value and a variance of the values, and a boundary between the foreground and the background in the indefinite region is obtained so that a sum of the first values for the pixels on the boundary is minimized. Is also preferable.

本発明による３次元物体モデル情報生成装置によれば、
前記マスク画素装置が生成するマスク画像に基づき３次元物体モデル情報を生成することを特徴とする。 According to the three-dimensional object model information generating device according to the present invention,
Three-dimensional object model information is generated based on a mask image generated by the mask pixel device.

本発明によるプログラムによれば、
前記装置としてコンピュータを機能させることを特徴とする。 According to the program according to the present invention,
A computer is caused to function as the device.

前処理手段が判定する各画素の実空間における位置には誤差が含まれている可能性が高く前景と背景の境界は正確ではない。よって、前景と背景の境界において不定領域を決定して、この不定領域に対して種々のマッティグを行うことで前景及び背景の正確な判定を行う。本発明においては、不定領域を実空間の１点から発出される各光線の光線空間における分布に基づき決定し、これにより、マッティングを行う領域を効率的に絞り込むことができ、よって、少ない処理で前景及び背景の正確な判定が可能となる。 The position in the real space of each pixel determined by the preprocessing means is likely to contain an error, and the boundary between the foreground and the background is not accurate. Therefore, an indefinite area is determined at the boundary between the foreground and the background, and the foreground and background are accurately determined by performing various types of matching on the indefinite area. In the present invention, the indefinite region is determined based on the distribution in the light space of each light beam emitted from one point in the real space, and thereby, the region to be mated can be narrowed down efficiently, and therefore, less processing is performed. This makes it possible to accurately determine the foreground and background.

マッティングには、例えば、不定領域の画素を中心とする所定範囲に含まれる前景と背景の画素の画素値を使用して前記中心の画素のエネルギー値を定義し、このエネルギー値を使用したダイナミックプログラミング法を用いることで効率的に実施することができる。なお、マッティングは、画像上で行っても、光線空間上で行っても良い。 In matting, for example, the energy value of the center pixel is defined using the pixel values of the foreground and background pixels included in a predetermined range centered on the pixel of the undefined region, and the dynamic value using the energy value is defined. It can be implemented efficiently by using a programming method. Note that the matting may be performed on the image or in the light space.

本発明によるマスク画像生成装置の機能ブロック図である。It is a functional block diagram of the mask image generation apparatus by this invention. 局所領域を示す図である。It is a figure which shows a local area | region. 円筒座標系を示す図である。It is a figure which shows a cylindrical coordinate system. 光線空間の各画素の属性を示す図である。It is a figure which shows the attribute of each pixel of light space. 不定領域決定部のブロック図である。It is a block diagram of an indefinite area determination part. 探索範囲設定部の処理の説明図である。It is explanatory drawing of the process of a search range setting part. 探索範囲設定部の処理の説明図である。It is explanatory drawing of the process of a search range setting part. ブロックマッチング部の処理の説明図である。It is explanatory drawing of a process of a block matching part. ブロックマッチング部の処理の説明図である。It is explanatory drawing of a process of a block matching part. 決定部の処理の説明図である。It is explanatory drawing of the process of a determination part. 決定部の処理の説明図である。It is explanatory drawing of the process of a determination part. マッティング部のブロック図である。It is a block diagram of a matting part. マッティングの説明図である。It is explanatory drawing of matting. 不定領域決定部の出力結果を示す図である。It is a figure which shows the output result of an indefinite area | region determination part. 図１４の光線空間データから生成したマスク画像を示す図である。It is a figure which shows the mask image produced | generated from the ray space data of FIG.

本発明を実施するための最良の実施形態について、以下では図面を用いて詳細に説明する。 The best mode for carrying out the present invention will be described in detail below with reference to the drawings.

図１は、本発明によるマスク画像生成装置の機能ブロック図である。図１によると、マスク画像生成装置は、保存部１と、前処理部２と、不定領域決定部３と、マッティング部４とを備えている。なお図１は、本発明の説明に必要な部分のみを表示するものである。 FIG. 1 is a functional block diagram of a mask image generating apparatus according to the present invention. As shown in FIG. 1, the mask image generation apparatus includes a storage unit 1, a preprocessing unit 2, an indeterminate region determination unit 3, and a matting unit 4. FIG. 1 shows only the portions necessary for explaining the present invention.

マスク画像生成装置は、まず、複数のカメラ１００が撮影した動画像を構成する撮影画像と、これら複数のカメラ１００のカメラパラメータを取り込んで保存部１に保存する。前処理部２は、保存部１に保存されている撮影画像及びカメラパラメータに基づき撮影画像の被写体を判定して３次元物体モデル情報、つまり、被写体表面の実空間での位置を示す情報を生成する。なお、前処理部２における被写体の判定及び３次元物体モデル情報の生成には公知の種々の方法を利用する。 First, the mask image generation apparatus captures captured images constituting moving images captured by the plurality of cameras 100 and camera parameters of the plurality of cameras 100 and stores them in the storage unit 1. The preprocessing unit 2 determines the subject of the photographed image based on the photographed image and camera parameters stored in the storage unit 1 and generates three-dimensional object model information, that is, information indicating the position of the subject surface in real space. To do. It should be noted that various known methods are used for subject determination and three-dimensional object model information generation in the preprocessing unit 2.

また、前処理部２は、例えば、図２に示す様に点線で囲まれた空間を、実線の円で示す複数の局所領域に分割し、例えば、特許文献１に記載されている方法を使用して、各局所領域に対する光線空間データを生成する。なお、図２は平面図であり、本実施形態において、実際の局所領域は円柱状である。また、局所的な光線空間データとは、局所領域の境界面を通過する光線と、この光線により生じる画像の画素の画素値と、各光線の属性情報を示すデータである。ここで、属性情報とは、光線により生じる画素が被写体に対応しているか否かを示す情報である。以後、被写体に対応する画素又は光線を前景（図面ではＦと表記する。）と、被写体に対応しない画素又は光線を背景（図面ではＢと表記する。）と呼ぶ。 Further, the preprocessing unit 2 divides a space surrounded by a dotted line as shown in FIG. 2 into a plurality of local regions indicated by solid circles, for example, and uses a method described in Patent Document 1, for example. Then, ray space data for each local region is generated. Note that FIG. 2 is a plan view, and in the present embodiment, the actual local region is cylindrical. The local light space data is data indicating the light rays passing through the boundary surface of the local region, the pixel values of the pixels of the image generated by the light rays, and the attribute information of each light ray. Here, the attribute information is information indicating whether or not the pixel generated by the light beam corresponds to the subject. Hereinafter, a pixel or light beam corresponding to the subject is referred to as a foreground (denoted as F in the drawing), and a pixel or light beam that does not correspond to the subject is referred to as a background (denoted as B in the drawing).

例えば、局所領域を図２に示す円柱状とした場合の座標系としては図３に示す円筒座標系を使用することができる。図３の座標系においては、局所領域を水平に貫く光線を、Ｚ軸からの角度θの値θ_０と、原点を通り光線の方向に対して垂直なＰ軸上の位置ｐ_０で特定される座標（ｐ_０，θ_０）に対応させて表している。この様な座標系を使用した場合、被写体表面のある一点から各角度で発出される各光線は、Ｐ−θ平面上では、その空間位置に応じた振幅及び位相の正弦波状に分布することになる。したがって、この局所領域のある水平面内の各光線の属性をＰ−θ平面で表すと、例えば、図４に示す様になる。なお、図４は一例であり、実際の形状は被写体の形状及び位置により決まるものである。 For example, the cylindrical coordinate system shown in FIG. 3 can be used as the coordinate system when the local region is the columnar shape shown in FIG. In the coordinate system of FIG. 3, a light beam penetrating the local region horizontally is specified by a value θ ₀ of an angle θ from the Z axis and a position p ₀ on the P axis that passes through the origin and is perpendicular to the direction of the light beam. Are expressed in correspondence with the coordinates (p ₀ , θ ₀ ). When such a coordinate system is used, each light ray emitted at a certain angle from a certain point on the surface of the subject is distributed in a sinusoidal shape with an amplitude and a phase corresponding to the spatial position on the P-θ plane. Become. Therefore, when the attribute of each light ray in the horizontal plane where the local region exists is represented by the P-θ plane, for example, as shown in FIG. FIG. 4 is an example, and the actual shape is determined by the shape and position of the subject.

前処理部２が生成する３次元物体モデル情報は正確ではなく誤差が含まれる。つまり、図４の背景と前景との境界線は正確なものではなく再決定する必要がある。このため、不定領域決定部３は、光線空間データの属性情報として、背景及び前景に加えて“不定”（図面ではＵと表記する。）を追加し、各局所領域の光線空間において、属性を不定に変更する光線を決定する。図５は、不定領域決定部３のブロック図である。図５に示す様に、不定領域決定部３は、探索範囲設定部３１と、ブロックマッチング部３２と、決定部３３とを備えている。 The three-dimensional object model information generated by the preprocessing unit 2 is not accurate and includes errors. That is, the boundary line between the background and the foreground in FIG. 4 is not accurate and needs to be redetermined. Therefore, the indeterminate area determination unit 3 adds “indeterminate” (indicated as U in the drawing) in addition to the background and foreground as attribute information of the ray space data, and sets the attribute in the ray space of each local area. Determine the rays to change indefinitely. FIG. 5 is a block diagram of the indeterminate area determination unit 3. As shown in FIG. 5, the indeterminate area determination unit 3 includes a search range setting unit 31, a block matching unit 32, and a determination unit 33.

以下、ある局所領域９の光線空間に対する不定領域の決定について説明する。まず、図６に示す様に、ある局所領域９に対し局所領域に向かう３つの視点を設定する。図６においては局所領域の境界面上に局所領域の中心を向く視点９１、９２及び９３を設定している。視点９１、９２及び９３から見た画像をそれぞれ画像６１、６２及び６３とすると、各画像の画素値は、図７の光線空間の点線で示す位置の光線に対応する値となる。なお、図７は、光線区間をＰ−θ面に垂直な方向から見た図であり、符号６１、６２及び６３で示す点線は、実際には、Ｐ−θ面に垂直な方向に高さを有する曲面である。また、図７は、Ｐ−θ面の総てではなく、一部のみを示すものであり、よって、画像６１、６２及び６３もその一部のみを示すものである。なお、以後の説明においてＰ−θ面に垂直な方向を高さ方向と呼ぶものとする。 Hereinafter, the determination of the indefinite region for the light space of a certain local region 9 will be described. First, as shown in FIG. 6, three viewpoints toward a local area are set for a certain local area 9. In FIG. 6, viewpoints 91, 92, and 93 that face the center of the local region are set on the boundary surface of the local region. If the images viewed from the viewpoints 91, 92, and 93 are images 61, 62, and 63, respectively, the pixel value of each image is a value corresponding to the light ray at the position indicated by the dotted line in the light ray space of FIG. FIG. 7 is a view of the light ray section as viewed from the direction perpendicular to the P-θ plane, and the dotted lines denoted by reference numerals 61, 62, and 63 are actually heights in the direction perpendicular to the P-θ plane. Is a curved surface. FIG. 7 shows only a part, not all of the P-θ plane, and thus the images 61, 62, and 63 also show only a part thereof. In the following description, a direction perpendicular to the P-θ plane is referred to as a height direction.

探索範囲設定部３１は、画像６１と画像６２の間にある画像６３から、前処理部２が求めた現時点における背景と前景の境界線上にある属性が前景の画素、つまり、物体表面の画素７３を選択する。上述した様に、実空間のある一点から発出される光線は、光線空間において、その座標系により定まるある曲線又は直線上に分布することになる。図７の実線は画素７３と同じ位置から発出される光線を結んだもの、つまり、前記分布の１つを示すものである。探索範囲設定部３１は、画素７３の３次元物体モデル情報と前記分布から、画像６１と画像６２内にある画素であって、画素７３と同じ空間位置から発出された光線に対応する画素を求める。つまり、図７の例においては、実線と画像６１及び画像６２の交点である、画素８１及び画素８２を探索範囲設定部３１は求めることになる。 The search range setting unit 31 determines that the attribute on the boundary line between the background and the foreground at the present time obtained by the preprocessing unit 2 from the image 63 between the images 61 and 62 is a foreground pixel, that is, a pixel 73 on the object surface. Select. As described above, light rays emitted from a point in the real space are distributed on a certain curve or straight line determined by the coordinate system in the light space. The solid line in FIG. 7 connects light rays emitted from the same position as the pixel 73, that is, indicates one of the distributions. The search range setting unit 31 obtains a pixel in the image 61 and the image 62 that corresponds to a light ray emitted from the same spatial position as the pixel 73 from the three-dimensional object model information of the pixel 73 and the distribution. . That is, in the example of FIG. 7, the search range setting unit 31 obtains the pixel 81 and the pixel 82 that are the intersections of the solid line and the images 61 and 62.

ただし、前処理部２が求めた３次元物体モデル情報が正確ではない場合、画素８１及び画素８２は、それぞれ、画素７３に対応する画素、つまり同一位置から発出された光線による画素ではないことになる。よって、探索範囲設定部３１は、図９において点線にて示す様に、画像６１内に画素８１を中心とした探索範囲を設け、同様に、画像６２内に画素８２を中心とした探索範囲を設ける。両画像の探索範囲の大きさは同一としても、例えば、視点９３と視点９１の距離と、視点９３と視点９２の距離に応じて変化させても良い。つまり、視点９３との距離が大きくなる程、探索範囲を大きくする形態であっても良い。 However, when the three-dimensional object model information obtained by the preprocessing unit 2 is not accurate, the pixel 81 and the pixel 82 are not pixels corresponding to the pixel 73, that is, pixels by light rays emitted from the same position. Become. Therefore, the search range setting unit 31 provides a search range centered on the pixel 81 in the image 61 as shown by a dotted line in FIG. 9, and similarly, a search range centered on the pixel 82 in the image 62. Provide. The size of the search range of both images may be the same, but may be changed according to the distance between the viewpoint 93 and the viewpoint 91 and the distance between the viewpoint 93 and the viewpoint 92, for example. That is, the search range may be increased as the distance from the viewpoint 93 increases.

ブロックマッチング部３２は、探索範囲設定部３１が設定した探索範囲に対してブロックマッチングの手法を適用し、画像７３に対応する画像６１及び画像６２内の画素を求める。具体的には、画像６１の探索範囲内にあるブロックと、画像６２の探索範囲内にあるブロックの対応する画素の画素値の差の絶対値の総和、或いは、差の２乗の総和を、ブロックの組合せそれぞれに対して求め、求めた総和の値が一番小さい組合せに含まれる２つのブロックの同じ所定位置にある画素、好ましくは中心画素を対応する画素とする。図８及び図９においては、画像６１の画素７１と、画像６２の画素７２が、画素７３に対応する画素として求められている。 The block matching unit 32 applies a block matching method to the search range set by the search range setting unit 31 and obtains pixels in the image 61 and the image 62 corresponding to the image 73. Specifically, the sum of the absolute values of the differences between the pixel values of the corresponding pixels of the block within the search range of the image 61 and the block within the search range of the image 62, or the sum of the squares of the differences, For each combination of blocks, a pixel at the same predetermined position of two blocks included in the combination having the smallest sum total value, preferably a central pixel is set as a corresponding pixel. In FIG. 8 and FIG. 9, the pixel 71 of the image 61 and the pixel 72 of the image 62 are obtained as pixels corresponding to the pixel 73.

探索範囲設定部３１及びブロックマッチング部３２は、画像６３内の画素７３とは異なる高さや異なる位置にある物体表面の画素に対しても同じ処理を行い、それぞれ、画像６１及び６２内の対応する画素を求める。さらに、探索範囲設定部３１及びブロックマッチング部３２は、視点９３の位置を、視点９１と視点９２の間で動かしながら、同じ処理を行い、画像６３の物体表面の各画素に対応する、画像６１及び６２の画素を求める。 The search range setting unit 31 and the block matching unit 32 perform the same processing on the object surface pixels at different heights and positions from the pixels 73 in the image 63, and respectively correspond to the corresponding images in the images 61 and 62. Find the pixel. Further, the search range setting unit 31 and the block matching unit 32 perform the same processing while moving the position of the viewpoint 93 between the viewpoint 91 and the viewpoint 92, and correspond to each pixel on the object surface of the image 63. And 62 pixels.

決定部３３は、視点９３の物体表面の画素と、ブロックマッチング部３２が求めた前記視点９３の物体表面の画素に対応する視点９１及び９２の画素を、曲線又は直線で接続する。図８においては、画素７１、７３、７２を曲線にて結んでいる。決定部７３は、探索範囲設定部３１及びブロックマッチング部３２が視点９３を移動させながら求めた各物体表面の画素に対応する視点９１及び９２の画素を、それぞれ、曲線又は直線で結ぶことで、図１０に示す様に、ある領域内に広がった複数の曲線又は直線を求め、図１１に示す様に、これら複数の曲線又は直線の最も外側の曲線又は直線により囲まれた領域を不定領域とする。なお、視点９１及び９２間に設定する視点９３の位置の数や、高さ方向の処理密度は設計事項である。 The determination unit 33 connects the pixels of the object surface of the viewpoint 93 and the pixels of the viewpoints 91 and 92 corresponding to the pixel of the object surface of the viewpoint 93 obtained by the block matching unit 32 with a curve or a straight line. In FIG. 8, the pixels 71, 73, and 72 are connected by a curve. The determination unit 73 connects the pixels of the viewpoints 91 and 92 corresponding to the pixels on the surface of each object obtained by the search range setting unit 31 and the block matching unit 32 while moving the viewpoint 93 by a curve or a straight line, respectively. As shown in FIG. 10, a plurality of curves or straight lines extending in a certain area are obtained, and as shown in FIG. 11, an area surrounded by the outermost curves or straight lines of the plurality of curves or straight lines is defined as an indefinite area. To do. The number of positions of the viewpoint 93 set between the viewpoints 91 and 92 and the processing density in the height direction are design matters.

上記処理は、図７及び８に示すＰ−θ面の画像６１から画像６２の区間のみの処理であるが、不定領域決定部３は、上記処理を光線空間全体に対して行う。つまり、図６における視点９１と９２の位置のみならず、局所領域９の境界面である円周上全体に視点を設定して上述した処理を順次行い、局所領域９の光線空間データの前景と背景の境界において、境界を含む不定領域を決定し、決定した不定領域に含まれる光線の属性情報を不定に変更する。図１４は実際の処理結果を示す図である。なお、図１４はある水平面に関するデータであり、白抜きの部分は不定領域を、不定領域の内側が前景を、不定領域の外側が背景を表している。 The above processing is processing only for the section from the image 61 to the image 62 on the P-θ plane shown in FIGS. 7 and 8, but the indeterminate region determination unit 3 performs the above processing on the entire ray space. That is, not only the positions of the viewpoints 91 and 92 in FIG. 6 but also the viewpoint described above is set on the entire circumference that is the boundary surface of the local area 9 and the above-described processing is performed sequentially, At the boundary of the background, an indefinite area including the boundary is determined, and the attribute information of the light ray included in the determined indefinite area is changed to indefinite. FIG. 14 is a diagram showing an actual processing result. Note that FIG. 14 shows data relating to a certain horizontal plane. The white portion represents an indefinite area, the inside of the indefinite area represents the foreground, and the outside of the indefinite area represents the background.

マッティング部４は、不定領域に対してマッティングを行って前景又は背景の判定を行う。図１２は、マッティング部４のブロック図である。図１２に示す様にマッティング部４は、マスク画像生成部４１と判定部４２とを備えている。マスク画像生成部４２は、不定領域決定部３が出力する各局所領域に対する光線空間データの光線の属性情報からマスク画像を生成する。図１５は、図１４の光線空間データから得たマスク画像を示す図であり、図１４と同じく白抜きの部分は不定領域を表しており、不定領域に囲まれた人の形状をした部分が前景でありそれ以外の部分が背景である。 The matting unit 4 performs mating on the indeterminate area and determines the foreground or background. FIG. 12 is a block diagram of the matting unit 4. As shown in FIG. 12, the matting unit 4 includes a mask image generation unit 41 and a determination unit 42. The mask image generation unit 42 generates a mask image from the ray attribute information of the ray space data for each local region output from the indeterminate region determination unit 3. FIG. 15 is a diagram showing a mask image obtained from the ray space data of FIG. 14, and the white part represents an indefinite area as in FIG. 14, and a part shaped like a person surrounded by the indefinite area is shown. The foreground and the rest are the background.

図１３は、判定部４２による属性が不定である画素の前景又は背景の振り分けを説明する図である。図１３において、符号９０は、不定領域内の画素であり、点線は、画素９０を中心とするあらかじめ定めた大きさの領域である。判定部４２は、画素９０を中心とする所定の大きさの領域に含まれる前景画素と背景画素それぞれの画素値の平均値ｕ（Ｆ）及びｕ（Ｂ）と、分散ρ^２（Ｆ）及びρ^２（Ｂ）を求め、
ｕ_ｔ＝（ｕ（Ｆ）＋ｕ（Ｂ））／２
σ_ｔ ^２＝（σ^２（Ｆ）＋σ^２（Ｂ））／４
を求める。続いて、画素９０の位置を中心とし、平均値がｖ×ｕ_ｔ、分散がσ_ｔ ^２である２次元ガウス分布を求め、この２次元ガウス分布に画素９０の位置を代入した値を、画素９０の位置におけるエネルギー値Ｅとする。なお、ｖは画素９０の画素値である。 FIG. 13 is a diagram for explaining the foreground or background distribution of a pixel whose attribute is indefinite by the determination unit 42. In FIG. 13, reference numeral 90 denotes a pixel in the indefinite area, and a dotted line is an area having a predetermined size centered on the pixel 90. The determination unit 42 determines the average values u (F) and u (B) of the foreground pixels and the background pixels included in a predetermined size area centered on the pixel 90, and the variances ρ ² (F) and Find ρ ² (B)
u _t = (u (F) + u (B)) / 2
σ _t ² = (σ ² (F) + σ ² (B)) / 4
Ask for. Subsequently, a two-dimensional Gaussian distribution having an average value of v × u _t and a variance of σ _t ^{2 around} the position of the pixel 90 is obtained, and a value obtained by substituting the position of the pixel 90 into the two-dimensional Gaussian distribution is obtained as a pixel value. It is assumed that the energy value E is at position 90. Note that v is the pixel value of the pixel 90.

判定部４２は、上記エネルギー値Ｅに基づきダイナミックプログラミング法により境界線を決定する。具体的には、不定領域において選択する画素のエネルギー値Ｅと、その画素の位置で決まる値λ（ｄ_１−ｄ_２）^２の和、つまり、
ΣＥ＋λΣ（ｄ_１−ｄ_２）^２（１）
が最小となる様に選択を行い、選択した不定領域の画素を物体表面の点又は背景と前景の境界における背景側の点とする。なお、λは所定の係数であり、式（１）の第２項は不定領域の中心を境界線として選択され易くするためのものである。なお、λは０であっても良い。 The determination unit 42 determines the boundary line by the dynamic programming method based on the energy value E. Specifically, the sum of the energy value E of the pixel selected in the indefinite region and the value λ (d ₁ -d ₂ ) ² determined by the position of the pixel, that is,
ΣE + λΣ (d ₁ -d ₂ ) ² (1)
Is selected, and the selected pixel in the indefinite region is set as a point on the object surface or a point on the background side at the boundary between the background and the foreground. Note that λ is a predetermined coefficient, and the second term of the equation (1) is for facilitating selection of the center of the indefinite region as a boundary line. Λ may be 0.

また、判定部４２は、上述したダイナミックプログラミング法以外にも、例えば、“Ｂａｙｅｓｍａｔｔｉｎｇ”、“Ｋｎｏｃｋｏｕｔ”、“Ｐｏｉｓｓｏｎｍａｔｔｉｎｇ”といった、公知の種々のマッティング方法を利用することも可能である。なお、マッティング部４は、マスク画像に変換することなく、図１４に示す光線空間データに対して上述した方法によりマッティングを行い、その後にマスク画像に変換することも可能である。具体的には、所定の高さ位置のＰ−θ面それぞれに対して上述した処理を行って境界を決定する。この場合には、処理したＰ−θ面の間の高さにおける境界は、処理したＰ−θ面の境界から決定し、その後、マスク画像に変換する。つまり、この場合、判定部４２とマスク画像生成部４１の順序は入れ替わることになる。 In addition to the dynamic programming method described above, the determination unit 42 can also use various known matting methods such as “Bayes matting”, “Knockout”, and “Poisson matting”. Note that the matting unit 4 can perform the matting on the light space data shown in FIG. 14 by the above-described method without converting it into a mask image, and then convert it into a mask image. Specifically, the above-described processing is performed on each P-θ plane at a predetermined height position to determine the boundary. In this case, the boundary at the height between the processed P-θ planes is determined from the boundary of the processed P-θ planes, and then converted into a mask image. That is, in this case, the order of the determination unit 42 and the mask image generation unit 41 is switched.

さらに、本発明による３次元物体モデル情報の生成装置は、上記、マスク画像生成装置が出力するマスク画像に対して、例えば、視体積交差法を適用すれことにより３次元物体モデル情報を生成する。 Furthermore, the three-dimensional object model information generating apparatus according to the present invention generates the three-dimensional object model information by applying, for example, the visual volume intersection method to the mask image output from the mask image generating apparatus.

以上、本発明によるマスク画像生成装置は、まず、前処理部２において複数の画像から任意の方法にて３次元物体モデル情報を求める。しかしながら、この３次元物体モデル情報には誤差が含まれている可能性が高い。つまり、前処理部２が決定した前景と背景の境界は正確ではない。よって、不定領域決定部３は、前景と背景の境界の近傍領域に不定領域を設定する。本発明において、不定領域決定部３は、実空間の１点から発出される各光線の光線空間における分布とブロックマッチングによりこの不定領域を決定する。これによりマッティングを行う領域の絞込みを行い、マッティング処理負荷を抑える。マッティングは、光線空間データからマスク画像を生成した後にマスク画像に対して行っても、先に光線空間上で行い、その後にマスク画像を生成しても良い。 As described above, the mask image generation apparatus according to the present invention first obtains three-dimensional object model information from a plurality of images in the preprocessing unit 2 by an arbitrary method. However, there is a high possibility that this three-dimensional object model information includes an error. That is, the boundary between the foreground and the background determined by the preprocessing unit 2 is not accurate. Therefore, the undefined area determination unit 3 sets an undefined area in the area near the boundary between the foreground and the background. In the present invention, the indeterminate area determination unit 3 determines the indeterminate area by distribution and block matching of each ray emitted from one point in the real space. This narrows down the matting area and suppresses the matting processing load. The matting may be performed on the mask image after generating the mask image from the light space data, or may be performed on the light space first and then the mask image may be generated.

マッティングには公知の種々の方法を使用できるが、不定領域の画素を中心とする所定範囲に含まれる前景と背景の画素値に基づき前記中心の画素のエネルギー値を定義してダイナミックプログラミング法を適用することで効率的に前景と背景の境界を判定することができる。 Various known methods can be used for matting, but the dynamic programming method is defined by defining the energy value of the center pixel based on the foreground and background pixel values included in a predetermined range centered on the pixel in the indefinite region. By applying this, the boundary between the foreground and the background can be determined efficiently.

なお、本発明によるマスク画像生成装置及び３次元物体モデル情報の生成装置は、コンピュータを上述した各機能ブロックとして機能させるプログラムにより実現することができる。これらコンピュータプログラムは、コンピュータが読み取り可能な記憶媒体に記憶されて、又は、ネットワーク経由で配布が可能なものである。さらに、本発明は、ハードウェア及びソフトウェアの組合せによっても実現可能である。 The mask image generation apparatus and the three-dimensional object model information generation apparatus according to the present invention can be realized by a program that causes a computer to function as each of the functional blocks described above. These computer programs can be stored in a computer-readable storage medium or distributed via a network. Furthermore, the present invention can be realized by a combination of hardware and software.

１保存部
２前処理部
３不定領域決定部
３１探索範囲設定部
３２ブロックマッチング部
３３決定部
４マッティング部
４１マスク画像生成部
４２判定部
６１、６２、６３画像
７１、７２、７３、８１、８２画素
９局所領域
９１、９２、９３視点
１００カメラ DESCRIPTION OF SYMBOLS 1 Storage part 2 Preprocessing part 3 Undefined area | region determination part 31 Search range setting part 32 Block matching part 33 Determination part 4 Matting part 41 Mask image generation part 42 Judgment part 61, 62, 63 Image 71, 72, 73, 81, 82 pixels 9 local area 91, 92, 93 viewpoint 100 camera

Claims

Based on a plurality of images, for a ray in the ray space, the pixel value of the pixel generated by the ray, attribute information indicating an attribute of whether the pixel is the foreground or the background, and the attribute of the pixel when the attribute is the foreground Preprocessing means for generating information indicating a position in real space;
An indeterminate area determining means for determining an indeterminate area including the boundary at the boundary between the foreground and the background ray, and determining the attribute information of the ray included in the determined indeterminate area indefinitely;
Matting means;
With
The matting means is
Mask image generating means for generating a plurality of mask images based on the attribute information of the light rays in the light space;
A determination unit that performs matting of a pixel whose attribute of the generated mask image is indefinite;
A mask image generating apparatus comprising:

Based on a plurality of images, for a ray in the ray space, the pixel value of the pixel generated by the ray, attribute information indicating an attribute of whether the pixel is the foreground or the background, and the attribute of the pixel when the attribute is the foreground Preprocessing means for generating information indicating a position in real space;
An indeterminate area determining means for determining an indeterminate area including the boundary at the boundary between the foreground and the background ray, and determining the attribute information of the ray included in the determined indeterminate area indefinitely;
Matting means;
With
The matting means is
A determination unit that performs matting of pixels corresponding to a light beam having an indefinite ray space attribute;
A mask image generating means for generating a plurality of mask images based on the attribute information after the matting of the light rays in the light space;
A mask image generating apparatus comprising:

The undefined area determination means is
A plurality of third light rays having an attribute at the boundary between the foreground and the background are selected, and the first light ray is a position in the real space of the third pixel corresponding to the third light ray for each selected third light ray. Based on the position and the distribution of each ray emitted from one point in the real space in the ray space, corresponding to the first ray emitted from the first position, which is a pixel in the image viewed from a predetermined viewpoint Search range setting for obtaining a first pixel to be processed and a second pixel corresponding to a second light ray emitted from the first position in the image viewed from a viewpoint different from the predetermined viewpoint Means,
For each selected third ray, block matching with a pixel block of a predetermined size is applied within a predetermined range region centered on the first pixel and a predetermined range region centered on the second pixel, Block matching means for obtaining a fourth pixel and a fifth pixel which are pixels at predetermined positions of two pixel blocks included in the most highly correlated combination;
Determining means for determining the indefinite region;
With
The determining means, for each of the selected third light beams, linearly combines the third light beam, the fourth light beam corresponding to the fourth pixel, and the fifth light beam corresponding to the fifth pixel in the light beam space. Alternatively, the region surrounded by a plurality of straight lines or curves in the light space obtained by connecting with a curve is the indefinite region,
The mask image generating apparatus according to claim 1.

The determination means includes a pixel value of a pixel whose attribute is indefinite, an average value and a variance of pixel values of a pixel whose attribute is a foreground included in a predetermined area centered on the pixel, and a pixel of a pixel whose attribute is a background Calculating a first value for the pixel based on an average value and a variance of the values;
Obtaining the boundary between the foreground and the background in the indefinite region so that the sum of the first values for the pixels on the boundary is minimized;
The mask image generation apparatus of any one of Claim 1 to 3.

A three-dimensional object model information generating device that generates three-dimensional object model information based on a mask image generated by the mask pixel device according to any one of claims 1 to 4.

A program that causes a computer to function as the apparatus according to claim 1.