JP2013223008A

JP2013223008A - Image processing device and method

Info

Publication number: JP2013223008A
Application number: JP2012091820A
Authority: JP
Inventors: Masateru Kitago; 正輝北郷
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2012-04-13
Filing date: 2012-04-13
Publication date: 2013-10-28
Anticipated expiration: 2032-04-13
Also published as: JP6128748B2

Abstract

PROBLEM TO BE SOLVED: To solve the problem of picture quality degradation in the vicinity of an object boundary in free viewpoint image synthesis technology that since a foreground color and a background color coexist in the vicinity of the object boundary, distance information contains a large estimation error and exact separation between foreground and background is difficult, resulting in picture quality degradation.SOLUTION: An image is divided into three layers: a main layer, a boundary foreground layer and a boundary background layer. Distance information based three-dimensional model generation and rendering processing are performed on the main layer, the boundary foreground layer and the boundary background layer in that order to generate a free viewpoint image. The boundary foreground layer and the boundary background layer are calculated from an object boundary, and the three-dimensional model generation and rendering processing are performed in sub-pixel units in an attempt to improve the picture quality of free viewpoint image synthesis.

Description

本発明は、複数の視点から撮像された画像データ及び距離情報を用いた自由視点画像合成技術に関する。特に多眼方式の撮像装置で撮像された多視点画像データの自由視点画像合成技術に関するものである。 The present invention relates to a free viewpoint image synthesis technique using image data and distance information captured from a plurality of viewpoints. In particular, the present invention relates to a free-viewpoint image synthesis technique for multi-viewpoint image data captured by a multi-view imaging device.

近年、映画業界を中心に３Ｄコンテンツの利用が活発である。より高い臨場感を求めて多視点撮像技術、多視点表示技術の開発が進んでいる。 In recent years, 3D content has been actively used mainly in the movie industry. The development of multi-view imaging technology and multi-view display technology is progressing in search of a higher sense of reality.

２視点表示では眼鏡式３Ｄディスプレイが主流である。右目用の画像データと左目用の画像データを生成し、眼鏡の制御によりそれぞれの眼で見える画像を切り替えることで、観察者は立体画像を見ることができる。また多視点表示ではレンチキュラレンズ、パララックスバリア方式を用いた眼鏡無し３Ｄディスプレイが開発されており、主にデジタル・サイネージの用途で利用されている。 In 2-viewpoint display, a glasses-type 3D display is the mainstream. By generating image data for the right eye and image data for the left eye, and switching the images that can be seen by the respective eyes by controlling the glasses, the observer can view a stereoscopic image. In multi-viewpoint display, a 3D display without glasses using a lenticular lens and a parallax barrier system has been developed, and is mainly used for digital signage.

撮像装置においても、２視点撮像ではステレオカメラ、３視点以上の多視点撮像ではＰｌｅｎｏｐｔｉｃカメラやカメラアレイシステム、といった多眼方式の撮像装置が開発されている。また撮像装置に工夫を加えることで、既存のカメラ構成を比較的変更することなく多視点画像を撮像できるコンピュテーショナル・フォトグラフィー（ＣｏｍｐｕｔａｔｉｏｎａｌＰｈｏｔｏｇｒａｐｈｙ）と呼ばれる分野の研究も盛んに行われている。 In the imaging apparatus, a multi-lens imaging apparatus such as a stereo camera for two-viewpoint imaging and a Plenoptic camera or a camera array system for multi-viewpoint imaging of three or more viewpoints has been developed. In addition, research on a field called “computational photography” in which a multi-viewpoint image can be captured without modifying the existing camera configuration by contriving with an image pickup apparatus has been actively conducted.

多眼方式の撮像装置で撮像された多視点画像を多視点表示装置で表示する場合、撮像装置と表示装置との間の視点数の違いを調整する必要がある。例えば、３眼のカメラで撮像された３視点画像を９視点の眼鏡無し３Ｄディスプレイで表示するとき、撮像されていない６視点分の画像を補完生成しなければならない。またステレオカメラで撮像された画像を眼鏡式３Ｄディスプレイで表示する場合、どちらも２視点ではあるが、ディスプレイによって視聴に最適な視差が異なるため、撮像した画像とは異なる視点で画像を再構成して出力する場合がある。 When a multi-viewpoint image captured by a multi-view imaging device is displayed on a multi-view display device, it is necessary to adjust the difference in the number of viewpoints between the imaging device and the display device. For example, when a three-viewpoint image captured by a three-lens camera is displayed on a nine-viewpoint glassesless 3D display, images for six viewpoints that are not captured must be complementarily generated. Also, when displaying an image captured by a stereo camera on a glasses-type 3D display, both have two viewpoints, but since the parallax optimal for viewing differs depending on the display, the image is reconstructed from a viewpoint different from the captured image. May be output.

以上のようなユースケースを実現するため、撮像された視点以外の画像データを生成する技術として自由視点画像合成技術が開発されている。 In order to realize the use case as described above, a free viewpoint image synthesis technique has been developed as a technique for generating image data other than the captured viewpoint.

関連技術として、ＭＰＥＧ−３ＤＶ（３ＤＶｉｄｅｏＣｏｄｉｎｇ）の標準化作業が進行している。ＭＰＥＧ−３ＤＶは多視点の画像データと共に奥行き情報を符号化する方式である。多視点画像データの入力から、既存の２Ｄディスプレイ、眼鏡式３Ｄディスプレイ、眼鏡無し３Ｄディスプレイなど様々な視点数の表示装置に出力することを想定し、自由視点画像合成技術を用いて視点数の制御を行う。また多視点映像を対話的に視聴するための技術としても、自由視点画像合成技術が開発されている（特許文献１）。 As a related technology, standardization work of MPEG-3DV (3D Video Coding) is in progress. MPEG-3DV is a method for encoding depth information together with multi-viewpoint image data. Assuming output from multi-viewpoint image data to a display device with various number of viewpoints such as existing 2D display, spectacle-type 3D display, and 3D display without glasses, control of the number of viewpoints using free viewpoint image composition technology I do. As a technique for interactively viewing a multi-viewpoint video, a free-viewpoint image synthesis technique has been developed (Patent Document 1).

特開２００６−０１２１６１号公報JP 2006-012161 A

自由視点画像合成技術における課題の一つとして、オブジェクト境界付近の画質劣化がある。その要因としては、オブジェクト境界付近では前景色と背景色とが混在しており、距離情報の推定誤差が大きくなると共に、前景と背景とを正確に分離することが困難であることが挙げられる。オブジェクト境界付近の画像合成をどのように行うかが、自由視点画像合成技術における画質向上の鍵となっている。 One of the problems in the free viewpoint image synthesis technique is image quality degradation near the object boundary. This is because the foreground color and the background color are mixed in the vicinity of the object boundary, the estimation error of the distance information becomes large, and it is difficult to accurately separate the foreground and the background. How to perform image synthesis in the vicinity of the object boundary is the key to improving image quality in the free viewpoint image synthesis technology.

本発明に係る画像処理装置は、複数の視点から撮像することにより得られた多視点画像データの画像中のオブジェクトの境界領域を特定する特定手段と、前記多視点画像データを合成して、合成画像データを生成する合成手段とを有し、前記合成手段は、前記境界領域の画素値を生成する際のレンダリング単位が、前記境界領域以外の領域の画素値を生成する際のレンダリング単位よりも小さいことを特徴とする。 An image processing apparatus according to the present invention synthesizes the multi-viewpoint image data with a specifying unit that specifies a boundary region of an object in an image of multi-viewpoint image data obtained by imaging from a plurality of viewpoints. Synthesizing means for generating image data, wherein the synthesizing means is configured such that a rendering unit when generating pixel values of the boundary region is more than a rendering unit when generating pixel values of regions other than the boundary region. It is small.

本発明によれば、多視点画像データを用いた自由視点画像合成を高画質に行うことができる。 According to the present invention, free viewpoint image composition using multi-viewpoint image data can be performed with high image quality.

複数の撮像部を備えた多眼方式による撮像装置の一例を示した図である。It is the figure which showed an example of the imaging device by a multi-view system provided with the several imaging part. 撮像装置の内部構成を示すブロック図である。It is a block diagram which shows the internal structure of an imaging device. 撮像部の内部構成を示す図である。It is a figure which shows the internal structure of an imaging part. 画像処理部の内部構成を示す機能部ロック図である。It is a functional part lock figure which shows the internal structure of an image process part. 距離情報推定処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a distance information estimation process. 図６は、距離情報推定処理の経過を説明する図である。（ａ）及び（ｃ）は視点画像の例を示す図、（ｂ）は視点画像にフィルタをかけて小領域に分割した状態を示す図、（ｄ）はある視点画像に他の撮像部の視点画像における小領域を重ねた状態を示す図、（ｅ）は（ｄ）で生じているずれが解消された状態を示す図である。FIG. 6 is a diagram for explaining the progress of the distance information estimation process. (A) And (c) is a figure which shows the example of a viewpoint image, (b) is a figure which shows the state which filtered the viewpoint image and divided | segmented into the small area | region, (d) is a viewpoint image of another imaging part. The figure which shows the state which overlapped the small area | region in a viewpoint image, (e) is a figure which shows the state by which the shift | offset | difference which has arisen in (d) was eliminated. ヒストグラムの一例を示す図であり、（ａ）は高いピークを持つヒストグラム、（ｂ）は低いピークのヒストグラムをそれぞれ示している。It is a figure which shows an example of a histogram, (a) has shown the histogram with a high peak, (b) has shown the histogram of a low peak, respectively. 初期視差量の調整を説明する図である。It is a figure explaining adjustment of the amount of initial parallax. 画像分離処理の流れを示すフローチャートである。It is a flowchart which shows the flow of an image separation process. 視点画像内の各画素が、前景画素、背景画素、通常画素の３つに分類される様子を説明する図である。It is a figure explaining a mode that each pixel in a viewpoint image is classified into three, a foreground pixel, a background pixel, and a normal pixel. 実施例１に係る自由視点画像生成処理の流れを示すフローチャートである。6 is a flowchart illustrating a flow of free viewpoint image generation processing according to the first embodiment. 視点画像を３つの層に分離する様子を説明する図である。It is a figure explaining a mode that a viewpoint image is isolate | separated into three layers. 主層のレンダリングの様子を説明する図である。It is a figure explaining the mode of rendering of the main layer. 主層のレンダリング結果の一例を示す図である。It is a figure which shows an example of the rendering result of the main layer. 境界前景層のレンダリングの様子を説明する図である。It is a figure explaining the mode of rendering of a boundary foreground layer. 境界背景層のレンダリングの様子を説明する図である。It is a figure explaining the mode of rendering of a boundary background layer. 実施例２に係る画像分離処理の流れを示すフローチャートである。10 is a flowchart illustrating a flow of image separation processing according to the second embodiment. 実施例２に係るアルファ推定処理の様子を説明する図である。It is a figure explaining the mode of the alpha estimation process which concerns on Example 2. FIG.

以下、添付図面を参照し、本発明の好適な実施形態について説明する。 Preferred embodiments of the present invention will be described below with reference to the accompanying drawings.

図１は、本実施例に係る、複数の撮像部を備えた多眼方式による撮像装置の一例を示した図である。 FIG. 1 is a diagram illustrating an example of a multi-eye imaging device including a plurality of imaging units according to the present embodiment.

撮像装置１００の筐体には、カラー画像データを取得する９個の撮像部１０１〜１０９及び撮像ボタン１１０を備えている。９個の撮像部１０１〜１０９は、すべて同一の焦点距離を有し、正方格子上に均等に配置されている。 The casing of the imaging apparatus 100 includes nine imaging units 101 to 109 that acquire color image data and an imaging button 110. The nine imaging units 101 to 109 all have the same focal length and are equally arranged on the square lattice.

ユーザが撮像ボタン１１０を押下すると、撮像部１０１〜１０９が被写体の光情報をセンサ（撮像素子）で受光し、受光した信号がＡ／Ｄ変換されて、複数のカラー画像（デジタルデータ）が同時に取得される。 When the user presses the imaging button 110, the imaging units 101 to 109 receive light information of the subject with a sensor (imaging device), the received signal is A / D converted, and a plurality of color images (digital data) are simultaneously displayed. To be acquired.

このような多眼方式の撮像装置により、同一の被写体を複数の視点位置から撮像したカラー画像群（多視点画像データ）を得ることができる。 With such a multi-eye imaging device, it is possible to obtain a color image group (multi-view image data) obtained by imaging the same subject from a plurality of viewpoint positions.

なお、ここでは撮像部の数を９個としたが撮像部の数は９個に限定されない。撮像装置が複数の撮像部を有する限りにおいて本発明は適用可能である。また、ここでは９個の撮像部が正方格子上に均等に配置される例について説明したが、撮像部の配置は任意である。例えば、放射状や直線状に配置してもよいし、まったくランダムに配置してもよい。 Although the number of imaging units is nine here, the number of imaging units is not limited to nine. The present invention is applicable as long as the imaging apparatus has a plurality of imaging units. In addition, although an example in which nine imaging units are equally arranged on a square lattice has been described here, the arrangement of the imaging units is arbitrary. For example, they may be arranged radially or linearly, or may be arranged at random.

図２は、撮像装置１００の内部構成を示すブロック図である。 FIG. 2 is a block diagram illustrating an internal configuration of the imaging apparatus 100.

中央処理装置（ＣＰＵ）２０１は、以下に述べる各部を統括的に制御する。 A central processing unit (CPU) 201 generally controls each unit described below.

ＲＡＭ２０２は、ＣＰＵ２０１の主メモリ、ワークエリア等として機能する。 The RAM 202 functions as a main memory, work area, and the like for the CPU 201.

ＲＯＭ２０３は、ＣＰＵ２０１で実行される制御プログラム等を格納している。 The ROM 203 stores a control program executed by the CPU 201 and the like.

バス２０４は、各種データの転送経路であり、例えば、撮像部１０１〜１０９によって取得されたデジタルデータはこのバス２０４を介して所定の処理部に送られる。 The bus 204 is a transfer path for various data. For example, digital data acquired by the imaging units 101 to 109 is sent to a predetermined processing unit via the bus 204.

操作部２０５は、ボタンやモードダイヤルなどが該当し、これらを介してユーザ指示が入力される。 The operation unit 205 corresponds to buttons, mode dials, and the like, and user instructions are input via these buttons.

表示部２０６は、撮影画像や文字の表示を行う。表示部２０６には一般的に液晶ディスプレイが広く用いられる。また、タッチスクリーン機能を有していても良く、その場合はタッチスクリーンを用いたユーザ指示を操作部２０５の入力として扱うことも可能である。 A display unit 206 displays captured images and characters. In general, a liquid crystal display is widely used as the display unit 206. Further, a touch screen function may be provided, and in that case, a user instruction using the touch screen can be handled as an input of the operation unit 205.

表示制御部２０７は、表示部２０６に表示される画像や文字の表示制御を行う。 A display control unit 207 performs display control of images and characters displayed on the display unit 206.

撮像部制御部２０８は、フォーカスを合わせる、シャッターを開く・閉じる、絞りを調節するなどの、ＣＰＵ２０１からの指示に基づいた撮像系の制御を行う。 The imaging unit control unit 208 controls the imaging system based on an instruction from the CPU 201 such as focusing, opening / closing a shutter, and adjusting an aperture.

デジタル信号処理部２０９は、バス２０４を介して受け取ったデジタルデータに対し、ホワイトバランス処理、ガンマ処理、ノイズ低減処理などの各種処理を行う。 The digital signal processing unit 209 performs various processes such as white balance processing, gamma processing, and noise reduction processing on the digital data received via the bus 204.

エンコーダ部２１０は、デジタルデータを所定のファイルフォーマットに変換する処理を行う。外部メモリ制御部２１１は、ＰＣやその他のメディア（例えば、ハードディスク、メモリーカード、ＣＦカード、ＳＤカード、ＵＳＢメモリ）に繋ぐためのインターフェースである。 The encoder unit 210 performs processing for converting digital data into a predetermined file format. The external memory control unit 211 is an interface for connecting to a PC or other media (for example, hard disk, memory card, CF card, SD card, USB memory).

画像処理部２１２は、撮像部１０１〜１０９で取得された多視点画像データ或いは、デジタル信号処理部２０９から出力される多視点画像データから、距離情報を算出し、自由視点合成画像データを生成する。画像処理部２１２の詳細については後述する。 The image processing unit 212 calculates distance information from the multi-view image data acquired by the imaging units 101 to 109 or the multi-view image data output from the digital signal processing unit 209, and generates free-viewpoint combined image data. . Details of the image processing unit 212 will be described later.

なお、撮像装置の構成要素は上記以外にも存在するが、本件発明の主眼ではないので、説明を省略する。 Although there are other components of the image pickup apparatus than the above, they are not the main point of the present invention, and thus the description thereof is omitted.

図３は、撮像部１０１〜１０９の内部構成を示す図である。 FIG. 3 is a diagram illustrating an internal configuration of the imaging units 101 to 109.

撮像部１０１〜１０９は、レンズ３０１〜３０３、絞り３０４、シャッター３０５、光学ローパスフィルタ３０６、ｉＲカットフィルタ３０７、カラーフィルタ３０８、センサ３０９及びＡ／Ｄ変換部３１０で構成される。レンズ３０１〜３０３は夫々、ズームレンズ３０１、フォーカスレンズ３０２、ぶれ補正レンズ３０３である。センサ３０９は、例えばＣＭＯＳやＣＣＤなどのセンサである。 The imaging units 101 to 109 include lenses 301 to 303, a diaphragm 304, a shutter 305, an optical low-pass filter 306, an iR cut filter 307, a color filter 308, a sensor 309, and an A / D conversion unit 310. The lenses 301 to 303 are a zoom lens 301, a focus lens 302, and a shake correction lens 303, respectively. The sensor 309 is a sensor such as a CMOS or CCD.

センサ３０９で被写体の光量を検知すると、検知された光量がＡ／Ｄ変換部３１０によってデジタル値に変換され、デジタルデータとなってバス２０４に出力される。 When the light amount of the subject is detected by the sensor 309, the detected light amount is converted into a digital value by the A / D conversion unit 310, and is output to the bus 204 as digital data.

なお、本実施例では、撮像部１０１〜１０９で撮像される画像がすべてカラー画像であることを前提に各部の構成や処理を説明するが、撮像部１０１〜１０９で撮像される画像の一部或いは全部をモノクロ画像に変更しても構わない。その場合には、カラーフィルタ３０８は省略される。 In this embodiment, the configuration and processing of each unit will be described on the assumption that all images captured by the imaging units 101 to 109 are color images. However, a part of the images captured by the imaging units 101 to 109 is described. Or you may change all into a monochrome image. In that case, the color filter 308 is omitted.

図４は、画像処理部２１２の内部構成を示す機能ブロック図である。 FIG. 4 is a functional block diagram showing the internal configuration of the image processing unit 212.

画像処理部２１２は、距離情報推定部４０１、分離情報生成部４０２、自由視点画像生成部４０３を有している。本実施例において画像処理部２１２は、撮像装置内の一構成要素として説明しているが、この画像処理部２１２の機能をＰＣ等の外部装置で実現してもよい。すなわち、本実施例における画像処理部２１２は、撮像装置の一機能としても、又は独立した画像処理装置としても実現し得るものである。 The image processing unit 212 includes a distance information estimation unit 401, a separation information generation unit 402, and a free viewpoint image generation unit 403. In this embodiment, the image processing unit 212 is described as one component in the imaging apparatus, but the function of the image processing unit 212 may be realized by an external device such as a PC. That is, the image processing unit 212 in the present embodiment can be realized as one function of the imaging device or as an independent image processing device.

以下、画像処理部２１２の各構成要素について説明する。 Hereinafter, each component of the image processing unit 212 will be described.

撮像部１０１〜１０９で取得されたカラーの多視点画像データ或いは、デジタル信号処理部２０９から出力されるカラーの多視点画像データ（本実施例ではいずれも９視点）が画像処理部２１２に入力されると、まず、距離情報推定部４０１に送られる。 Color multi-viewpoint image data acquired by the imaging units 101 to 109 or color multi-viewpoint image data output from the digital signal processing unit 209 (9 viewpoints in this embodiment) is input to the image processing unit 212. Then, it is sent to the distance information estimation unit 401 first.

距離情報推定部４０１は、入力された多視点画像データ内の各視点の画像それぞれについて、撮像部から被写体までの距離を表す情報（以下、「距離情報」と呼ぶ。）を推定する。距離情報推定の詳細については後述する。なお、距離情報推定部４０１を設ける代わりに、同等の距離情報を外部から入力するように構成してもよい。 The distance information estimation unit 401 estimates information indicating the distance from the imaging unit to the subject (hereinafter referred to as “distance information”) for each viewpoint image in the input multi-viewpoint image data. Details of the distance information estimation will be described later. Instead of providing the distance information estimation unit 401, equivalent distance information may be input from the outside.

分離情報生成部４０２は、多視点画像データを構成する各視点画像を、３つの層（被写体の境界の前景部分である境界前景層、背景部分である境界背景層、被写体の境界でない主層）に分離するための基礎となる情報（分離情報）を生成する。具体的には、各視点画像内の各画素を、被写体の境界（以下、「オブジェクト境界」と呼ぶ。）に隣接する前景画素／背景画素、及び前景画素／背景画素以外の通常画素の３種類に分類し、各画素がいずれの種類に該当するのかを特定可能な情報を生成する。分離情報生成の詳細については後述する。 The separation information generating unit 402 divides each viewpoint image constituting the multi-viewpoint image data into three layers (a boundary foreground layer that is a foreground portion of a subject boundary, a boundary background layer that is a background portion, and a main layer that is not a subject boundary). Generate information (separation information) that is the basis for separating the data. Specifically, each pixel in each viewpoint image is classified into three types: foreground pixels / background pixels adjacent to the boundary of the subject (hereinafter referred to as “object boundary”) and normal pixels other than the foreground pixels / background pixels. And generating information that can identify which type each pixel corresponds to. Details of the separation information generation will be described later.

自由視点画像生成部４０３は、主層、境界前景層、境界背景層の各３次元モデルをレンダリングして任意の視点位置における画像データ（自由視点画像データ）を生成する。自由視点画像生成の詳細については後述する。 The free viewpoint image generation unit 403 generates image data (free viewpoint image data) at an arbitrary viewpoint position by rendering each three-dimensional model of the main layer, the boundary foreground layer, and the boundary background layer. Details of the free viewpoint image generation will be described later.

（距離情報推定処理）
距離情報推定部４０１における距離情報の推定方法について説明する。図５は、本実施例に係る距離情報推定処理の流れを示すフローチャートである。以下では、入力される多視点画像データは、図１に示した９個の撮像部１０１〜１０９を有する撮像装置１００によって撮像された９視点の画像データであるものとして説明を行う。 (Distance information estimation process)
A distance information estimation method in the distance information estimation unit 401 will be described. FIG. 5 is a flowchart illustrating the flow of the distance information estimation process according to the present embodiment. In the following description, it is assumed that the input multi-viewpoint image data is image data of nine viewpoints captured by the imaging apparatus 100 having the nine imaging units 101 to 109 shown in FIG.

ステップ５０１において、距離情報推定部４０１は、入力された９視点画像データ内の１の視点画像（対象視点画像）に対し、エッジ保持型の平滑化フィルタをかける。 In step 501, the distance information estimation unit 401 applies an edge holding type smoothing filter to one viewpoint image (target viewpoint image) in the input nine viewpoint image data.

ステップ５０２において、距離情報推定部４０１は、対象視点画像を所定のサイズの領域（以下、「小領域」と呼ぶ。）に分割する。具体的には、色差が閾値以下の隣接する画素（画素群）同士を順次統合していき、最終的に所定の画素数からなる小領域（例えば、１００〜１６００画素の領域）に対象視点画像を分割する。閾値は、比較される色同士が同程度の色であると判断するのにふさわしい値、例えばＲＧＢをそれぞれ８ビット（２５６色）で量子化した場合であれば「６」といった値に設定される。初めは、隣り合う画素同士を比較して色差が上記閾値以下であれば、両画素を統合する。そして、次に、統合された画素群について平均色をそれぞれ求め、隣接する画素群の平均色と比較し、色差が上記閾値以下となった画素群同士を統合していく。このような処理を、画素群の大きさ（画素数）が、上述の一定の画素数で構成される小領域に達するまで繰り返す。 In step 502, the distance information estimation unit 401 divides the target viewpoint image into regions of a predetermined size (hereinafter referred to as “small regions”). Specifically, adjacent pixels (pixel groups) having a color difference equal to or less than a threshold are sequentially integrated, and finally the target viewpoint image is displayed in a small area (for example, an area of 100 to 1600 pixels) having a predetermined number of pixels. Split. The threshold value is set to a value suitable for determining that the compared colors are the same color, for example, “6” when RGB is quantized with 8 bits (256 colors). . Initially, adjacent pixels are compared and if the color difference is less than or equal to the threshold value, both pixels are integrated. Then, an average color is obtained for each of the integrated pixel groups, compared with the average color of adjacent pixel groups, and the pixel groups having a color difference equal to or less than the threshold value are integrated. Such processing is repeated until the size (number of pixels) of the pixel group reaches a small area composed of the above-mentioned fixed number of pixels.

ステップ５０３において、距離情報推定部４０１は、９視点画像データに含まれる９つの視点画像のすべてに対して、小領域への分割が完了したかどうかを判定する。小領域への分割が完了している場合には、ステップ５０４に進む。一方、領域分割が完了していない場合にはステップ５０１に戻り、次の視点画像を対象視点画像として、平滑化フィルタをかける処理及び小領域への分割処理を行う。 In step 503, the distance information estimation unit 401 determines whether or not the division into small areas has been completed for all nine viewpoint images included in the nine viewpoint image data. If the division into small areas has been completed, the process proceeds to step 504. On the other hand, if the area division has not been completed, the process returns to step 501 to perform the process of applying a smoothing filter and the process of dividing into small areas using the next viewpoint image as the target viewpoint image.

ステップ５０４において、距離情報推定部４０１は、すべての視点画像について、それぞれ周囲の視点画像（ここでは、上下左右に位置する視点画像）を参照して、分割された小領域毎に初期視差量を算出する。例えば、中央の撮像部１０５に係る視点画像の初期視差量を算出するときは、撮像部１０２・１０４・１０６・１０８の各視点画像が参照される。端部の撮像部に係る視点画像の場合、例えば撮像部１０７の視点画像は撮像部１０４・１０８の各視点画像が参照され、撮像部１０８の視点画像の場合は撮像部１０５・１０７・１０９の各視点画像が参照されて、初期視差量が算出される。初期視差量の算出は以下のようにして行う。 In step 504, the distance information estimation unit 401 refers to surrounding viewpoint images (here, viewpoint images positioned in the upper, lower, left, and right directions) for all viewpoint images, and calculates an initial amount of parallax for each divided small region. calculate. For example, when calculating the initial parallax amount of the viewpoint image related to the central imaging unit 105, the viewpoint images of the imaging units 102, 104, 106, and 108 are referred to. In the case of the viewpoint image related to the image capturing unit at the end, for example, the viewpoint image of the image capturing unit 107 is referred to each viewpoint image of the image capturing unit 104 or 108, and in the case of the viewpoint image of the image capturing unit 108, Each viewpoint image is referred to, and the initial amount of parallax is calculated. The initial amount of parallax is calculated as follows.

まず、初期視差量を求める視点画像の各小領域と、参照する視点画像（参照視点画像）における対応する小領域との比較を行う。ここで、対応する小領域とは、初期視差量を求める視点画像の各小領域の位置に対して視差量分シフトした参照視点画像における小領域である。 First, each small region of the viewpoint image for which the initial amount of parallax is obtained is compared with the corresponding small region in the referenced viewpoint image (reference viewpoint image). Here, the corresponding small area is a small area in the reference viewpoint image shifted by the amount of parallax with respect to the position of each small area of the viewpoint image for which the initial amount of parallax is obtained.

次に、初期視差量を求める視点画像の各画素と視差量分シフトした参照視点画像における対応する画素との色差を、小領域内の全ての画素について算出し、ヒストグラムを作成する。 Next, the color difference between each pixel of the viewpoint image for obtaining the initial parallax amount and the corresponding pixel in the reference viewpoint image shifted by the parallax amount is calculated for all the pixels in the small region, and a histogram is created.

そして、視差量を変化させて、それぞれヒストグラムを作成する。 Then, the amount of parallax is changed to create a histogram for each.

こうして得られたヒストグラムにおいて、ピークが高い視差量が求める初期視差量となる。なお、参照する視点画像における対応する領域は、縦方向と横方向で視差量を調整して設定する。縦方向の１ピクセルの視差量と横方向の１ピクセルの視差量が同じ距離を示さないためである。 In the histogram thus obtained, the parallax amount having a high peak is the initial parallax amount to be obtained. Note that the corresponding region in the viewpoint image to be referenced is set by adjusting the amount of parallax in the vertical direction and the horizontal direction. This is because the parallax amount of one pixel in the vertical direction and the parallax amount of one pixel in the horizontal direction do not indicate the same distance.

ここまでの処理について、具体例を用いて説明する。 The processing so far will be described using a specific example.

図６の（ａ）は、撮像部１０５の視点画像の例を示す図であり、オブジェクト６０１が写っている。図６の（ｂ）は、撮像部１０５の視点画像にエッジ保持型のフィルタをかけ、小領域に分割した状態を示す図である。ここで、小領域の一つを小領域６０２、小領域６０２の中心座標を６０３とする。図６の（ｃ）は、撮像部１０４の視点画像の例を示す図である。撮像部１０４の場合、同じオブジェクトを撮像部１０５の右側から撮像することになるため、撮像部１０４の視点画像におけるオブジェクト６０４は、撮像部１０５の視点画像におけるオブジェクト６０１よりも左側に写っている。 FIG. 6A is a diagram illustrating an example of a viewpoint image of the imaging unit 105, in which an object 601 is shown. FIG. 6B is a diagram illustrating a state in which the viewpoint image of the imaging unit 105 is subjected to an edge holding filter and is divided into small regions. Here, it is assumed that one of the small areas is a small area 602 and the center coordinates of the small area 602 are 603. FIG. 6C is a diagram illustrating an example of a viewpoint image of the imaging unit 104. In the case of the imaging unit 104, the same object is captured from the right side of the imaging unit 105, so the object 604 in the viewpoint image of the imaging unit 104 is shown on the left side of the object 601 in the viewpoint image of the imaging unit 105.

いま、対象視点画像を撮像部１０５の視点画像、参照する視点画像を撮像部１０４の視点画像として、小領域６０２を対象とした対応領域の比較を行う。図６の（ｄ）は撮像部１０４の視点画像に、撮像部１０５の視点画像における小領域６０２を重ねた状態であり、対応領域においてずれがある。そして、撮像部１０５の視点画像における小領域６０２の画素値（エッジ保持型のフィルタをかけたもの）と撮像部１０４の視点画像における画素値（エッジ保持型のフィルタをかけたもの）との比較を行い、ヒストグラムを作成する。具体的には、対応する小領域の各画素の色差を取得し、横軸に色差、縦軸にマッチングした画素数を取る。そのようにして、視差量を変化させて（例えば、小領域を１画素ずつ移動させて）、視差量毎のヒストグラムを順次作成する。図７は、ヒストグラムの一例を示しており、図７の（ａ）のように高いピークを持つヒストグラム分布は視差量の信頼度が高く、図７の（ｂ）のようにピークの低いヒストグラム分布は視差量の信頼度が低いと判断する。ここでは、高いピークを持つヒストグラムの視差量を、初期視差量として設定する。図６の（ｅ）は、図６の（ｄ）で生じているずれが解消された状態であり、撮像部１０５の視点画像における小領域６０２が撮像部１０４の視点画像における対応領域にずれなく重なっている。この図６の（ｅ）における矢印６０５で示される視差量が、求める初期視差量に相当する。なお、ここでは小領域を１画素ずつ移動させてヒストグラムを生成したが、０．５画素ずつ移動させるなど移動量は任意に設定して構わない。 Now, the corresponding viewpoint image for the small area 602 is compared using the target viewpoint image as the viewpoint image of the imaging unit 105 and the reference viewpoint image as the viewpoint image of the imaging unit 104. FIG. 6D shows a state in which the small area 602 in the viewpoint image of the imaging unit 105 is superimposed on the viewpoint image of the imaging unit 104, and there is a shift in the corresponding area. Then, the pixel value of the small area 602 in the viewpoint image of the imaging unit 105 (with the edge holding filter) is compared with the pixel value in the viewpoint image of the imaging unit 104 (with the edge holding filter). To create a histogram. Specifically, the color difference of each pixel in the corresponding small area is acquired, and the horizontal axis represents the color difference and the vertical axis represents the number of matched pixels. In this way, the parallax amount is changed (for example, the small area is moved one pixel at a time), and a histogram for each parallax amount is sequentially created. FIG. 7 shows an example of a histogram. A histogram distribution having a high peak as shown in FIG. 7A has a high degree of reliability of the parallax amount, and a histogram distribution having a low peak as shown in FIG. 7B. Determines that the reliability of the amount of parallax is low. Here, the parallax amount of the histogram having a high peak is set as the initial parallax amount. 6E shows a state in which the shift generated in FIG. 6D is eliminated, and the small area 602 in the viewpoint image of the imaging unit 105 is not shifted to the corresponding area in the viewpoint image of the imaging unit 104. overlapping. The parallax amount indicated by the arrow 605 in FIG. 6E corresponds to the initial parallax amount to be obtained. Here, the small region is moved pixel by pixel to generate a histogram, but the amount of movement may be arbitrarily set, such as moving by 0.5 pixel.

図５のフローチャートの説明に戻る。 Returning to the flowchart of FIG.

ステップ５０５において、距離情報推定部４０１は、小領域間の色差、初期視差量の差などを用いて反復的に初期視差量を調整する。具体的には、色差が近い近隣の小領域は似た視差量を持ち、所期視差量の差が近い近隣の小領域は似た視差量を持つ可能性が高いとの考えに基づいて初期視差量の調整を行う。 In step 505, the distance information estimation unit 401 adjusts the initial parallax amount repeatedly using the color difference between the small regions, the difference in the initial parallax amount, and the like. Specifically, based on the idea that neighboring small areas with close color differences have similar amounts of parallax, and neighboring small areas with close differences in expected amount of parallax are likely to have similar amounts of parallax. Adjust the amount of parallax.

図８は、初期視差量の調整を説明する図である。図８の（ａ）は、図６の（ｂ）の小領域毎に初期視差量を算出した結果（調整前の状態）を示す図であり、図８の（ｂ）は調整を行った後の状態を示す図である。図８の（ａ）では、オブジェクト領域８００（太線の内側の領域）における３つの小領域の視差量をそれぞれ斜線８０１、斜線８０２、斜線８０３で表している。ここで、斜線８０１／８０３は左上から右下に向かう斜線であり、斜線８０２は右上から左下に向かう斜線であるが、これは両者の視差量が異なっていることを示す。この場合において、背景領域（太線の外側の領域）については右上から左下に向かう斜線が正解の視差量であり、オブジェクト領域については左上から右下に向かう斜線が正解の視差量であるとする。図８の（ａ）では、視差量８０１及び８０３に関しては、オブジェクト領域の視差量として正しい視差量が算出できているが、視差量８０２に関しては、背景領域の視差量を算出してしまっており正しい視差量ができていないことが分かる。視差量の調整では、このような小領域単位に視差量を推定したときに生じた誤りが、周囲の小領域との関係を利用して正される。例えば、図８の（ａ）の場合、背景領域の視差量になってしまっていた視差量８０２が、隣接する小領域の視差量８０１及び視差量８０３を利用して調整された結果、図８の（ｂ）に示すとおり、左上から右下に向かう正しい視差量８０４となる。 FIG. 8 is a diagram illustrating the adjustment of the initial parallax amount. FIG. 8A is a diagram illustrating a result of calculating the initial parallax amount for each small region in FIG. 6B (state before adjustment), and FIG. 8B is a diagram after adjustment is performed. It is a figure which shows the state of. In FIG. 8A, the parallax amounts of the three small areas in the object area 800 (the area inside the bold line) are represented by hatched lines 801, 802, and 803, respectively. Here, diagonal lines 801/803 are diagonal lines from the upper left to the lower right, and diagonal lines 802 are diagonal lines from the upper right to the lower left, which indicate that the parallax amounts of the two are different. In this case, for the background area (the area outside the thick line), the diagonal line from the upper right to the lower left is the correct amount of parallax, and for the object area, the diagonal line from the upper left to the lower right is the correct amount of parallax. In FIG. 8A, the correct amount of parallax can be calculated as the amount of parallax of the object region for the amounts of parallax 801 and 803, but the amount of parallax of the background region has been calculated for the amount of parallax 802. It can be seen that the correct amount of parallax is not achieved. In the adjustment of the amount of parallax, an error that occurs when the amount of parallax is estimated in units of such small areas is corrected using the relationship with the surrounding small areas. For example, in the case of FIG. 8A, the parallax amount 802 that has been the amount of parallax in the background region is adjusted using the parallax amount 801 and the parallax amount 803 of the adjacent small region, and as a result, FIG. As shown in (b), the correct amount of parallax 804 is directed from the upper left to the lower right.

ステップ５０６において、距離情報推定部４０１は、初期視差量の調整によって得られた視差量を距離に変換する処理を行って距離情報を得る。距離情報は、（カメラ間隔×焦点距離）／（視差量×１ピクセルの長さ）で算出されるが、１ピクセルの長さは縦と横とでは異なるため、縦と横の視差量が同一距離を示すように必要な変換が施される。 In step 506, the distance information estimation unit 401 performs a process of converting the parallax amount obtained by adjusting the initial parallax amount into a distance to obtain distance information. The distance information is calculated by (camera interval × focal length) / (parallax amount × 1 pixel length). Since the length of one pixel is different in the vertical and horizontal directions, the vertical and horizontal parallax amounts are the same. Necessary transformations are performed to indicate the distance.

さらに、変換された距離情報は、例えば８ビット（２５６階調）に量子化される。そして８ビットに量子化された距離情報は、８ビットのグレイスケール（２５６階調）の画像データとして保存される。距離情報のグレイスケール画像では、オブジェクトの色は、カメラからの距離が近い程、白（値：２５５）に近い色、カメラからの距離が遠い程、黒（値：０）に近い色で表現される。例えば、図８におけるオブジェクト領域８００は白で表現され、背景領域は黒で表現される。もちろん、距離情報は１０ビット、１２ビットなど他のビット数で量子化しても良いし、量子化せずバイナリファイルとして保存しても構わない。 Further, the converted distance information is quantized to, for example, 8 bits (256 gradations). The distance information quantized to 8 bits is stored as 8-bit gray scale (256 gradations) image data. In the gray scale image of distance information, the color of the object is expressed as a color closer to white (value: 255) as the distance from the camera is closer, and a color closer to black (value: 0) as the distance from the camera is farther away. Is done. For example, the object area 800 in FIG. 8 is expressed in white, and the background area is expressed in black. Of course, the distance information may be quantized with other bit numbers such as 10 bits and 12 bits, or may be stored as a binary file without being quantized.

このようにして、各視点画像の各画素に対応した距離情報が算出される。本実施例では、画像を所定の画素数からなる小領域に分割して距離を算出しているが、多視点画像間の視差を基に距離を得るのであれば他の推定方法を用いても構わない。 In this way, distance information corresponding to each pixel of each viewpoint image is calculated. In this embodiment, the distance is calculated by dividing the image into small areas each having a predetermined number of pixels, but other estimation methods may be used as long as the distance is obtained based on the parallax between multi-viewpoint images. I do not care.

上記の処理で得られた各視点画像に対応する距離情報と多視点画像データは、後続の分離情報生成部４０２および自由視点画像生成部４０３に送られる。なお、各視点画像に対応する距離情報と多視点画像データを分離情報生成部４０２のみに送り、分離情報生成部４０２からこれらのデータを自由視点画像生成部４０３に送るようにしてもよい。 The distance information and multi-viewpoint image data corresponding to each viewpoint image obtained by the above processing are sent to the subsequent separation information generation unit 402 and free viewpoint image generation unit 403. Note that distance information and multi-viewpoint image data corresponding to each viewpoint image may be sent only to the separation information generation unit 402, and these data may be sent from the separation information generation unit 402 to the free viewpoint image generation unit 403.

（分離情報生成処理）
次に、分離情報生成部４０２における、各視点画像を３つの層に分離する処理について説明する。図９は、本実施例に係る画像分離処理の流れを示すフローチャートである。 (Separation information generation process)
Next, processing for separating each viewpoint image into three layers in the separation information generation unit 402 will be described. FIG. 9 is a flowchart illustrating the flow of image separation processing according to the present embodiment.

ステップ９０１において、分離情報生成部４０２は、多視点画像データ、及び、距離情報推定処理によって得られた距離情報を取得する。 In step 901, the separation information generation unit 402 acquires multi-viewpoint image data and distance information obtained by distance information estimation processing.

ステップ９０２において、分離情報生成部４０２は、視点画像内のオブジェクト境界を抽出する。本実施例では、対象画素の距離情報と近隣画素の距離情報との差分（以下、「距離情報の差分」と呼ぶ。）が閾値以上の箇所をオブジェクトの境界として特定している。具体的には、以下のとおりである。 In step 902, the separation information generation unit 402 extracts an object boundary in the viewpoint image. In this embodiment, a portion where the difference between the distance information of the target pixel and the distance information of the neighboring pixels (hereinafter referred to as “distance information difference”) is equal to or greater than a threshold is specified as the boundary of the object. Specifically, it is as follows.

まず、縦方向に走査し、距離情報の差分を閾値と比較して、閾値以上となる画素を特定する。次に、横方向に走査し、同様に距離情報の差分を閾値と比較して、閾値以上となる画素を特定する。そして、縦方向と横方向でそれぞれ特定された画素の和集合をとり、オブジェクト境界として特定する。なお、閾値としては、距離情報が８ビットで量子化（０〜２５５）されている場合においては、例えば「１０」のような値に設定される。 First, scanning is performed in the vertical direction, and the difference in distance information is compared with a threshold value to identify pixels that are equal to or greater than the threshold value. Next, scanning is performed in the horizontal direction, and similarly, a difference in distance information is compared with a threshold value to identify pixels that are equal to or greater than the threshold value. Then, the union of the pixels specified in the vertical direction and the horizontal direction is taken and specified as the object boundary. The threshold is set to a value such as “10” when the distance information is quantized (0 to 255) with 8 bits.

ここでは、オブジェクト境界を距離情報に基づいて得ているが、画像を領域分割してオブジェクト境界にするなど他の方法を用いても構わない。ただ、画像の領域分割により得られるオブジェクト境界と、距離情報から得られるオブジェクト境界とは、出来る限り一致していることが望ましい。画像の領域分割によってオブジェクト境界を得たときは、得られたオブジェクト境界に合わせて距離情報を補正すると良い。 Here, the object boundary is obtained based on the distance information, but other methods such as dividing the image into object boundaries may be used. However, it is desirable that the object boundary obtained by image segmentation and the object boundary obtained from the distance information match as much as possible. When the object boundary is obtained by dividing the image, it is preferable to correct the distance information in accordance with the obtained object boundary.

ステップ９０３において、分離情報生成部４０２は、視点画像内の各画素を、前景画素、背景画素、通常画素の３種類に分類する。具体的には、ステップ９０１で取得した距離情報を参照して、ステップ９０２で特定されたオブジェクト境界に隣接する画素について、距離が手前の方の画素を前景画素、距離が奥の方の画素を背景画素にそれぞれ決定する。図１０は、視点画像内の各画素が、前景画素、背景画素、通常画素の３つに分類される様子を説明する図である。オブジェクト境界１００１を跨ぐ隣接画素に対して、距離が近い方の画素が前景画素１００２、距離が遠い方の画素が背景画素１００３に、残りの画素が通常画素１００４にそれぞれ分類されている。 In step 903, the separation information generation unit 402 classifies each pixel in the viewpoint image into three types of foreground pixels, background pixels, and normal pixels. Specifically, referring to the distance information acquired in step 901, for the pixels adjacent to the object boundary specified in step 902, the pixel with the nearer distance is the foreground pixel and the pixel with the farther distance is the pixel. Each background pixel is determined. FIG. 10 is a diagram for explaining how each pixel in the viewpoint image is classified into three, a foreground pixel, a background pixel, and a normal pixel. With respect to adjacent pixels straddling the object boundary 1001, the closer pixel is classified as the foreground pixel 1002, the farther pixel is classified as the background pixel 1003, and the remaining pixels are classified as normal pixels 1004.

ステップ９０４において、分離情報生成部４０２は、入力された多視点画像データに含まれるすべての視点画像について画素の分類が完了したかどうかを判定する。未処理の視点画像がある場合にはステップ９０２に戻り、次の視点画像に対しステップ９０２及びステップ９０３の処理を行う。一方、すべての視点画像について画素の分類が完了していた場合には、ステップ９０５に進む。 In step 904, the separation information generation unit 402 determines whether pixel classification has been completed for all viewpoint images included in the input multi-viewpoint image data. If there is an unprocessed viewpoint image, the process returns to step 902, and the processing of step 902 and step 903 is performed on the next viewpoint image. On the other hand, if pixel classification has been completed for all viewpoint images, the process proceeds to step 905.

ステップ９０５において、分離情報生成部４０２は、前景画素、背景画素、通常画素を特定可能な分離情報を自由視点画像生成部４０３に送る。前景画素と背景画素が分かれば、その余の画素は通常画素であると判明するので、分離情報としては、前景画素及び背景画素が特定可能な情報であればよい。すなわち、分離情報としては、例えば前景画素と判定された画素について“０”、背景画素と判定された画素について“１”といったフラグを別途付加すること等が考えられる。後述の自由視点画像生成処理では、このような分離情報を用いて、所定の視点画像が３つの層（すなわち、前景画素で構成される境界前景層、背景画素で構成される境界背景層、通常画素で構成される主層）に分離されることになる。 In step 905, the separation information generation unit 402 sends the separation information that can specify the foreground pixel, the background pixel, and the normal pixel to the free viewpoint image generation unit 403. If the foreground pixel and the background pixel are known, the remaining pixels are determined to be normal pixels. Therefore, the separation information may be information that can identify the foreground pixel and the background pixel. That is, as the separation information, for example, a flag such as “0” for a pixel determined to be a foreground pixel and “1” for a pixel determined to be a background pixel may be added separately. In the free viewpoint image generation process described later, using such separation information, a predetermined viewpoint image is divided into three layers (ie, a boundary foreground layer composed of foreground pixels, a boundary background layer composed of background pixels, The main layer is composed of pixels.

（自由視点画像生成処理）
続いて、自由視点画像生成部４０３における、自由視点画像の生成処理について説明する。図１１は、本実施例に係る自由視点画像生成処理の流れを示すフローチャートである。 (Free viewpoint image generation processing)
Next, free viewpoint image generation processing in the free viewpoint image generation unit 403 will be described. FIG. 11 is a flowchart illustrating the flow of the free viewpoint image generation processing according to the present embodiment.

ステップ１１０１において、自由視点画像生成部４０３は、出力される自由視点画像における任意の視点（以下、「自由視点」と呼ぶ。）の位置情報を取得する。自由視点の位置情報は、例えば以下のような座標によって与えられる。本実施例では、撮像部１０５の位置を基準となる座標位置（０．０，０．０）とした場合における、自由視点の位置を示す座標情報を与えるものとする。この場合、撮像部１０１は（１．０，１．０）、撮像部１０２は（０．０，１．０）、撮像部１０３は（−１．０，１．０）、撮像部１０４は（１．０，０．０）の座標でそれぞれ表される。同様に、撮像部１０６は（−１．０，０．０）、撮像部１０７は（１．０，−１．０）、撮像部１０８は（０．０，−１．０）、撮像部１０９は（−１．０，−１．０）の座標で表される。ここで、例えば、４つの撮像部１０１、１０２、１０４、１０５の中間位置を自由視点とした画像を合成したいと考えたとき、ユーザは、座標（０．５，０．５）を入力すればよいことになる。当然のことながら、座標定義の方法は上記に限るものではなく、撮像部１０５以外の撮像部の位置を基準となる座標位置としてもよい。また、自由視点の位置情報の入力方法は上述した座標を直接入力する方法に限られるものではなく、例えば、撮像部の配置を示すＵＩ画面（不図示）を表示部２０６に表示し、タッチ操作等によって所望の自由視点を指定するようにしてもよい。 In step 1101, the free viewpoint image generation unit 403 acquires position information of an arbitrary viewpoint (hereinafter referred to as “free viewpoint”) in the output free viewpoint image. The position information of the free viewpoint is given by the following coordinates, for example. In this embodiment, it is assumed that coordinate information indicating the position of the free viewpoint when the position of the imaging unit 105 is used as a reference coordinate position (0.0, 0.0) is given. In this case, the imaging unit 101 is (1.0, 1.0), the imaging unit 102 is (0.0, 1.0), the imaging unit 103 is (−1.0, 1.0), and the imaging unit 104 is Represented by coordinates of (1.0, 0.0). Similarly, the imaging unit 106 is (−1.0, 0.0), the imaging unit 107 is (1.0, −1.0), the imaging unit 108 is (0.0, −1.0), and the imaging unit. 109 is represented by coordinates of (−1.0, −1.0). Here, for example, when the user wants to synthesize an image having an intermediate position between the four imaging units 101, 102, 104, and 105 as a free viewpoint, the user can input coordinates (0.5, 0.5). It will be good. As a matter of course, the method of defining coordinates is not limited to the above, and the position of the imaging unit other than the imaging unit 105 may be used as a reference coordinate position. Also, the method of inputting the position information of the free viewpoint is not limited to the method of directly inputting the coordinates described above. For example, a UI screen (not shown) indicating the arrangement of the imaging unit is displayed on the display unit 206, and touch operation is performed. For example, a desired free viewpoint may be designated.

なお、本ステップにおける取得対象としては説明していないが、上述のとおり各視点画像に対応する距離情報と多視点画像データも、距離情報推定部４０１或いは分離情報生成部４０２から取得される。 Although not described as an acquisition target in this step, distance information and multi-viewpoint image data corresponding to each viewpoint image are also acquired from the distance information estimation unit 401 or the separation information generation unit 402 as described above.

ステップ１１０２において、自由視点画像生成部４０３は、指定された自由視点の位置における自由視点画像データの生成で参照する、複数の視点画像（以下、「参照画像群」と呼ぶ。）を設定する。本実施例では、指定された自由視点の位置に近い４つの撮像部で撮像された視点画像を参照画像群として設定する。上記のように、自由視点の位置として座標（０．５，０．５）が指定されたときの参照画像群は、撮像部１０１、１０２、１０４、１０５で撮像された４つの視点画像によって構成されることになる。もちろん、参照画像群を構成する視点画像の数は４つに限定されるものではなく、指定された自由視点の周囲の３つでも構わない。さらに、指定された自由視点の位置を内包するものであれば足り、例えば指定された自由始点位置の直近ではない４つの撮像部（例えば、撮像部１０１、１０３、１０７、１０９）で撮像された視点画像を参照画像群に設定してもよい。 In step 1102, the free viewpoint image generation unit 403 sets a plurality of viewpoint images (hereinafter referred to as “reference image group”) to be referred to when generating free viewpoint image data at the designated free viewpoint position. In this embodiment, viewpoint images captured by four imaging units close to the designated free viewpoint position are set as a reference image group. As described above, the reference image group when the coordinates (0.5, 0.5) are designated as the position of the free viewpoint is configured by four viewpoint images captured by the imaging units 101, 102, 104, and 105. Will be. Of course, the number of viewpoint images constituting the reference image group is not limited to four, and may be three around the designated free viewpoint. Furthermore, it is sufficient if it includes the position of the specified free viewpoint. For example, the images are captured by four imaging units (for example, the imaging units 101, 103, 107, and 109) that are not closest to the specified free start point position. The viewpoint image may be set as the reference image group.

ステップ１１０３において、自由視点画像生成部４０３は、設定された参照画像群に含まれる各視点画像を主層、境界前景層、境界背景層の３つの層に分離して３次元モデルを生成する処理を行う。３つの層のうち、境界前景層と境界背景層とで構成される領域は、画像中のオブジェクトの境界領域と言い換えることもできる。そして、後述のとおり、この境界領域の画素値を生成する際には、境界領域以外の領域（主層）の画素値を生成する際のレンダリングの単位よりも小さいレンダリング単位が用いられる。図１２は、視点画像を上記３つの層に分離する様子を説明する図である。以下、詳しく説明する。 In step 1103, the free viewpoint image generation unit 403 generates a three-dimensional model by separating each viewpoint image included in the set reference image group into three layers: a main layer, a boundary foreground layer, and a boundary background layer. I do. Of the three layers, the region composed of the boundary foreground layer and the boundary background layer can be rephrased as the boundary region of the object in the image. As will be described later, when generating the pixel value of the boundary region, a rendering unit smaller than the rendering unit used when generating the pixel value of the region (main layer) other than the boundary region is used. FIG. 12 is a diagram for explaining how the viewpoint image is separated into the three layers. This will be described in detail below.

まず、主層の３次元モデルの生成について説明する。 First, generation of a three-dimensional model of the main layer will be described.

主層の場合は、オブジェクト境界に掛かっていない４つの画素を相互に接続して四辺形メッシュを構築することにより３次元モデルを生成する。図１２において、例えば、いずれもオブジェクト境界１００１に掛かっていない４つの画素（２つの通常画素１００４、１２０１及び２つの前景画素１２０２、１２０３）を接続して四辺形メッシュ１２０４が構築される。このような処理を繰り返し行い、主層の３次元モデルとなるすべての四辺形メッシュが構築される。このときの四辺形メッシュの大きさは最小で１画素×１画素である。本実施例では、主層は全て１画素×１画素の大きさの四辺形メッシュで構築しているが、より大きな四辺形メッシュとしてもよい。または、四辺形以外の形状、例えば三角形のメッシュを構築するようにしても構わない。 In the case of the main layer, a three-dimensional model is generated by building a quadrilateral mesh by connecting four pixels that are not on the object boundary to each other. In FIG. 12, for example, a quadrilateral mesh 1204 is constructed by connecting four pixels (two normal pixels 1004 and 1201 and two foreground pixels 1202 and 1203) that are not on the object boundary 1001. By repeating such a process, all quadrilateral meshes that become the three-dimensional model of the main layer are constructed. The size of the quadrilateral mesh at this time is 1 pixel × 1 pixel at the minimum. In this embodiment, all the main layers are constructed by a quadrilateral mesh having a size of 1 pixel × 1 pixel, but a larger quadrilateral mesh may be used. Alternatively, a shape other than the quadrilateral, for example, a triangular mesh may be constructed.

上記のようにして構築される、１画素単位の四辺形メッシュのＸ座標とＹ座標は撮像装置１００のカメラパラメータから算出されたグローバル座標が相当し、Ｚ座標は距離情報から得られる各画素における被写体までの距離が相当する。そして、各画素の色情報を四辺形メッシュにテクスチャマッピングして、主層の３次元モデルを生成する。 The X coordinate and Y coordinate of the quadrilateral mesh constructed in the above manner correspond to global coordinates calculated from the camera parameters of the imaging apparatus 100, and the Z coordinate is the pixel coordinate obtained from the distance information. This corresponds to the distance to the subject. Then, the color information of each pixel is texture-mapped to a quadrilateral mesh to generate a three-dimensional model of the main layer.

次に、境界前景層と境界背景層の３次元モデルの生成について説明する。 Next, generation of a three-dimensional model of the boundary foreground layer and the boundary background layer will be described.

オブジェクト境界に接する境界前景層と境界背景層では、主層の場合よりも詳細な３次元モデルの生成を行う。具体的には、オブジェクト境界に掛かっている４つの画素を相互に接続して四辺形メッシュを構築し、当該構築された四辺形メッシュをサブ画素に細分割することにより、３次元モデルを生成する。図１２において、例えば、いずれもオブジェクト境界に掛かっている４つの画素（背景画素１００３、１２０５、１２０６及び前景画素１００２）で構築される四辺形メッシュをサブ画素に細分割して、０．５画素×０．５画素サイズの四辺形メッシュが構築される。このような処理を繰り返し行い、境界前景層及び境界背景層の３次元モデルとなるすべての四辺形メッシュが構築される。そして、サブ画素単位の各四辺形メッシュは、対応する画素の特徴から、境界前景層と境界背景層とに分類される。例えば、オブジェクト境界の前景部分である前景画素１００２に対応するサブ画素単位の四辺形メッシュ１２０７は境界前景層となる。また、オブジェクト境界の背景部分である背景画素１００３、１２０５、１２０６に対応するサブ画素単位の四辺形メッシュ１２０８、１２０９、１２１０は境界背景層となる。 In the boundary foreground layer and boundary background layer in contact with the object boundary, a more detailed three-dimensional model is generated than in the case of the main layer. Specifically, a four-dimensional mesh is constructed by interconnecting four pixels on the object boundary, and a three-dimensional model is generated by subdividing the constructed quadrilateral mesh into sub-pixels. . In FIG. 12, for example, a quadrilateral mesh constructed by four pixels (background pixels 1003, 1205, 1206 and foreground pixel 1002) all hanging on the object boundary is subdivided into sub-pixels to obtain 0.5 pixels. A quadrilateral mesh with a size of 0.5 pixels is constructed. By repeating such a process, all quadrilateral meshes that are three-dimensional models of the boundary foreground layer and the boundary background layer are constructed. Each quadrilateral mesh in units of subpixels is classified into a boundary foreground layer and a boundary background layer based on the characteristics of the corresponding pixel. For example, a quadrilateral mesh 1207 in units of sub-pixels corresponding to the foreground pixel 1002 that is the foreground part of the object boundary is a boundary foreground layer. In addition, quadrilateral meshes 1208, 1209, and 1210 in units of sub-pixels corresponding to background pixels 1003, 1205, and 1206, which are background portions of the object boundary, are boundary background layers.

上記のようにして構築される、サブ画素単位の四辺形メッシュのＸ座標とＹ座標は、近隣画素のＸ座標、Ｙ座標から内挿されて算出されたグローバル座標が相当し、Ｚ座標は距離情報から得られる対応する画素における被写体までの距離が相当する。そして、対応する画素の色情報を四辺形メッシュにそれぞれテクスチャマッピングして、境界前景層と境界背景層の３次元モデルを生成する。例えば、サブ画素単位の四辺形メッシュ１２０７は前景画素１００２の色、同様に、１２０８は背景画素１００３の色、１２０９は背景画素１２０５の色、１２１０は背景画素１２０６の色をそれぞれ用いて、境界前景層或いは境界背景層の３次元モデルが生成される。 The X-coordinate and Y-coordinate of the quadrilateral mesh constructed as described above correspond to the global coordinates calculated by interpolating from the X-coordinate and Y-coordinate of the neighboring pixels, and the Z coordinate is the distance. This corresponds to the distance to the subject in the corresponding pixel obtained from the information. Then, the color information of the corresponding pixel is texture-mapped to the quadrilateral mesh, and a three-dimensional model of the boundary foreground layer and the boundary background layer is generated. For example, the quadrilateral mesh 1207 in sub-pixel units uses the color of the foreground pixel 1002, similarly, 1208 is the color of the background pixel 1003, 1209 is the color of the background pixel 1205, and 1210 is the color of the background pixel 1206, respectively. A three-dimensional model of the layer or boundary background layer is generated.

図１１のフローチャートの説明に戻る。 Returning to the flowchart of FIG.

ステップ１１０４において、自由視点画像生成部４０３は、主層のレンダリングを行う。図１３は、主層のレンダリングの様子を説明する図である。横軸にＸ座標、縦軸にＺ座標を取っている。図１３において、線分１３０１及び１３０２は、白塗りの逆三角１３０３で示す参照視点から３次元モデル生成した場合における主層の四辺形メッシュをそれぞれ示している。ここでは、前景画素１３０４と背景画素１３０５との間にオブジェクト境界（不図示）が存在するものとする。主層として、通常画素１３０６と前景画素１３０４とを接続した四辺形メッシュ１３０１、及び、通常画素１３０７と背景画素１３０５とを接続した四辺形メッシュ１３０２が３次元モデル生成されている。このような四辺形メッシュ１３０１及び１３０２を、ステップ１１０１で指定された自由視点の位置（図１３では、黒塗りの逆三角１３０８）でレンダリングした画像が主層のレンダリング画像となる。レンダリング処理において、色が存在しない画素部分は、穴として残ることになる。そして、上記のようなレンダリング処理を、参照画像群の全てについて行い、主層のレンダリング画像群を得る。図１３において、矢印１３０９／１３１０は、四辺形メッシュ１３０２が、視点１３０３／視点１３０８でどの位置から見えるかを示している。視点１３０３より左側にある視点１３０８では、四辺形メッシュ１３０２は、視点１３０３より右側に位置する。矢印１３１１／１３１２も同様に、四辺形メッシュ１３０１が、視点１３０３／視点１３０８でどの位置から見えるかを示している。 In step 1104, the free viewpoint image generation unit 403 performs main layer rendering. FIG. 13 is a diagram for explaining how the main layer is rendered. The horizontal axis represents the X coordinate and the vertical axis represents the Z coordinate. In FIG. 13, line segments 1301 and 1302 indicate the quadrilateral meshes of the main layer when the three-dimensional model is generated from the reference viewpoint indicated by the white triangle 1303. Here, it is assumed that an object boundary (not shown) exists between the foreground pixel 1304 and the background pixel 1305. As the main layer, a quadrilateral mesh 1301 in which the normal pixel 1306 and the foreground pixel 1304 are connected, and a quadrilateral mesh 1302 in which the normal pixel 1307 and the background pixel 1305 are connected are generated as a three-dimensional model. An image obtained by rendering such quadrilateral meshes 1301 and 1302 at the position of the free viewpoint specified in step 1101 (in FIG. 13, a black inverted triangle 1308) becomes the rendered image of the main layer. In the rendering process, the pixel portion where no color exists remains as a hole. Then, the rendering process as described above is performed for all the reference image groups to obtain the main layer rendered image group. In FIG. 13, arrows 1309/1310 indicate from which position the quadrilateral mesh 1302 can be seen at the viewpoint 1303 / viewpoint 1308. At the viewpoint 1308 on the left side of the viewpoint 1303, the quadrilateral mesh 1302 is located on the right side of the viewpoint 1303. Similarly, arrows 1311/1312 indicate from which position the quadrilateral mesh 1301 can be seen from the viewpoint 1303 / viewpoint 1308.

図１４は、主層のレンダリング結果の一例を示す図である。ここでは、説明の簡易化のために、図６で示した２つの視点画像（撮像部１０５の視点画像と撮像部１０４の視点画像）を参照画像群とし、撮像部１０５と撮像部１０４との中間視点を任意の視点位置とした場合のレンダリング結果を示す。この場合、撮像部１０５の視点画像と撮像部１０４の視点画像との中間視点の画像が生成される。図１４の（ａ）は撮像部１０５の視点画像における主層のレンダリング結果、同（ｂ）は撮像部１０４の視点画像における主層のレンダリング結果である。この場合において、オブジェクト１４０１は、中間視点におけるオブジェクトであり、撮像部１０５のオブジェクト６０１（図６の（ａ）を参照）より左側に、撮像部１０４のオブジェクト６０４（図６の（ｃ）を参照）より右側に位置している。そして、図１４の（ａ）ではオブジェクト１４０１の右側にオクルージョン領域１４０２が、同（ｂ）ではオブジェクト１４０１の左側にオクルージョン領域１４０３が穴として残っていることが分かる。 FIG. 14 is a diagram illustrating an example of a main layer rendering result. Here, for simplification of explanation, the two viewpoint images shown in FIG. 6 (the viewpoint image of the imaging unit 105 and the viewpoint image of the imaging unit 104) are used as a reference image group, and the imaging unit 105 and the imaging unit 104 A rendering result when the intermediate viewpoint is an arbitrary viewpoint position is shown. In this case, an intermediate viewpoint image between the viewpoint image of the imaging unit 105 and the viewpoint image of the imaging unit 104 is generated. 14A shows the rendering result of the main layer in the viewpoint image of the imaging unit 105, and FIG. 14B shows the rendering result of the main layer in the viewpoint image of the imaging unit 104. In this case, the object 1401 is an object at the intermediate viewpoint, and on the left side of the object 601 of the imaging unit 105 (see FIG. 6A), the object 604 of the imaging unit 104 (see FIG. 6C). ) Located on the right side. 14A shows that the occlusion area 1402 remains on the right side of the object 1401 and the occlusion area 1403 remains on the left side of the object 1401 in FIG. 14B.

ステップ１１０５において、自由視点画像生成部４０３は、得られた主層のレンダリング画像群を統合して、主層の統合画像データを得る。本実施例の場合、参照画像としての４つの視点画像から生成された主層のレンダリング画像（４つ）が統合されることになる。統合処理は、画素毎に行われ、統合後の色は各レンダリング画像の重み付き平均、具体的には、指定された自由視点の位置と参照画像との距離に基づく重み付き平均を用いる。例えば、指定された自由始点位置が各参照画像に対応する４つの撮像部から等距離だけ離れた位置であった場合には０．２５ずつの等しい重みとなる。一方、指定された自由始点位置が、いずれかの撮像部に近寄った位置であった場合には、その距離が近いほど大きい重みとなる。もちろん平均色の求め方は、これに限定されるものではない。各レンダリング画像で穴が空いている箇所は、統合の色計算には用いない。つまり穴が空いていないレンダリング画像からの重み付き平均によって統合後の色は計算される。全てのレンダリング画像で穴が空いている箇所は穴として残る。前述の図１４の（ａ）及び（ｂ）で示した主層のレンダリング画像の統合の場合、図１４の（ａ）における穴部分（オクルージョン領域１４０２）は図１４の（ｂ）で、図１４の（ｂ）における穴部分（オクルージョン領域１４０３）は図１４の（ａ）で埋められる。すなわち、２つの参照画像での中間視点画像を生成しているため、色計算の重みはそれぞれ０．５となり、統合された画像の各画素の色は、共に穴が空いていない場合は、図１４の（ａ）と同（ｂ）の平均色となる。そして、一方は穴が空いていて他方は穴が空いていない場合は、穴が空いていない画素の色が採用されることになる。説明の簡易化のために２つのレンダリング画像を統合する場合の説明をしたが、４つのレンダリング画像の統合処理も考え方は同じである。なお、ここでの統合処理で穴が埋まらない部分は、後述する境界前景層及び境界背景層のレンダリング結果の統合処理により埋めていく。 In step 1105, the free viewpoint image generation unit 403 integrates the obtained main layer rendering image group to obtain main layer integrated image data. In the case of the present embodiment, rendering images (four) of the main layer generated from the four viewpoint images as reference images are integrated. The integration process is performed for each pixel, and the color after integration uses a weighted average of each rendering image, specifically, a weighted average based on the distance between the designated free viewpoint position and the reference image. For example, when the designated free start point position is a position equidistant from the four imaging units corresponding to each reference image, the weights are equal to 0.25. On the other hand, when the designated free start position is a position close to any one of the imaging units, the weight becomes larger as the distance becomes shorter. Of course, the method of obtaining the average color is not limited to this. A portion having a hole in each rendering image is not used for the integrated color calculation. That is, the color after integration is calculated by a weighted average from a rendering image without a hole. A portion where a hole is formed in all the rendered images remains as a hole. In the case of integration of the rendering images of the main layer shown in FIGS. 14A and 14B, the hole portion (occlusion region 1402) in FIG. 14A is shown in FIG. The hole portion (occlusion region 1403) in (b) of FIG. 14 is filled in with (a) of FIG. That is, since the intermediate viewpoint image is generated with the two reference images, the weight of the color calculation is 0.5, and the color of each pixel of the integrated image is not shown in FIG. The average color of 14 (a) and (b) is obtained. Then, when one has a hole and the other has no hole, the color of the pixel having no hole is adopted. Although the case where two rendering images are integrated has been described for simplification of description, the concept of the integration processing of four rendering images is the same. Note that the portion where the hole is not filled by the integration processing here is filled by integration processing of rendering results of the boundary foreground layer and boundary background layer, which will be described later.

このようにして、主層の統合画像データが生成される。 In this way, the main layer integrated image data is generated.

ステップ１１０６において、自由視点画像生成部４０３は、境界前景層のレンダリングを行う。図１５は、境界前景層のレンダリングの様子を説明する図である。図１３と同様、横軸にＸ座標、縦軸にＺ座標を取っており、前景画素１３０４と背景画素１３０５との間にオブジェクト境界（不図示）が存在するものとする。図１５において、線分１５０１は、白塗りの逆三角で示される参照視点１３０３から３次元モデル生成した場合における境界前景層の四辺形メッシュを示している。この境界前景層１５０１は、前景画素１３０４の距離情報と色情報を持つサブ画素単位の四辺形メッシュである。このようなサブ画素単位の四辺形メッシュ１５０１を、ステップ１１０１で指定された自由視点の位置（図１５中の黒塗りの逆三角１３０８）でレンダリングした画像が境界前景層のレンダリング画像となる。なお、境界前景層のレンダリングの場合も、色が存在しない画素部分は、穴として残ることになる。そして、上記のようなレンダリング処理を、参照画像群の全てについて行い、境界前景層のレンダリング画像群を得る。図１５において、矢印１５０２／１５０３は、四辺形メッシュ１５０１が、視点１３０３／視点１３０８でどの位置から見えるかを示している。視点１３０３より左側にある視点１３０８では、四辺形メッシュ１５０１は、視点１３０３より右側に位置する。 In step 1106, the free viewpoint image generation unit 403 performs rendering of the boundary foreground layer. FIG. 15 is a diagram for explaining how the boundary foreground layer is rendered. As in FIG. 13, the horizontal axis represents the X coordinate and the vertical axis represents the Z coordinate, and an object boundary (not shown) exists between the foreground pixel 1304 and the background pixel 1305. In FIG. 15, a line segment 1501 indicates a quadrilateral mesh of the boundary foreground layer when a three-dimensional model is generated from a reference viewpoint 1303 indicated by a white reverse triangle. The boundary foreground layer 1501 is a quadrilateral mesh in sub-pixel units having distance information and color information of the foreground pixels 1304. An image obtained by rendering such a quadrilateral mesh 1501 in units of sub-pixels at the position of the free viewpoint specified in step 1101 (the black inverted triangle 1308 in FIG. 15) becomes the rendered image of the boundary foreground layer. Note that even in the rendering of the boundary foreground layer, the pixel portion where no color exists remains as a hole. Then, the rendering process as described above is performed for all of the reference image groups to obtain a rendering image group of the boundary foreground layer. In FIG. 15, arrows 1502/1503 indicate from which position the quadrilateral mesh 1501 can be seen at the viewpoint 1303 / viewpoint 1308. At the viewpoint 1308 on the left side of the viewpoint 1303, the quadrilateral mesh 1501 is located on the right side of the viewpoint 1303.

ステップ１１０７において、自由視点画像生成部４０３は、境界前景層のレンダリング画像群を統合して、境界前景層の統合画像データを得る。ステップ１１０５と同様の統合処理によって、参照画像としての４つの視点画像から生成された境界前景層のレンダリング画像（４つ）が統合される。 In step 1107, the free viewpoint image generation unit 403 integrates the rendering image group of the boundary foreground layer to obtain integrated image data of the boundary foreground layer. The rendering images (four) of the boundary foreground layer generated from the four viewpoint images as the reference images are integrated by the integration processing similar to step 1105.

ステップ１１０８において、自由視点画像生成部４０３は、主層の統合画像データと境界前景層の統合画像データとを統合し、２層（主層＋境界前景層）統合画像データを得る。ここでの統合処理も画素毎に行われる。この際、主層と境界前景層では主層の方が安定的に精度の高い画像が得られるため、主層の統合画像を優先して利用する。よって、主層の統合画像に穴が空いていて、境界前景層の統合画像に穴が空いていないという場合にのみ、境界前景層の色で補完がなされる。主層の統合画像と境界前景層の統合画像との双方に穴が空いているときは、穴として残ることになる。以上の処理により、２層統合画像データを得る。 In step 1108, the free viewpoint image generation unit 403 integrates the integrated image data of the main layer and the integrated image data of the boundary foreground layer to obtain two layers (main layer + boundary foreground layer) integrated image data. The integration process here is also performed for each pixel. At this time, since the main layer and the boundary foreground layer can more stably obtain highly accurate images, the integrated image of the main layer is preferentially used. Accordingly, the color of the boundary foreground layer is complemented only when the integrated image of the main layer has a hole and the integrated image of the boundary foreground layer has no hole. When a hole is formed in both the integrated image of the main layer and the integrated image of the boundary foreground layer, it remains as a hole. With the above processing, two-layer integrated image data is obtained.

ステップ１１０９において、自由視点画像生成部４０３は、境界背景層のレンダリングを行う。図１６は、境界背景層のレンダリングの様子を説明する図である。図１３と同様、横軸にＸ座標、縦軸にＺ座標を取っており、前景画素１３０４と背景画素１３０５との間にオブジェクト境界（不図示）が存在するものとする。図１６において、破線の線分１６０１は、白塗りの逆三角で示される参照視点１３０３から３次元モデル生成した場合における境界背景層の四辺形メッシュを示している。この境界背景層１６０１は、背景画素１３０５の距離情報と色情報を持つサブ画素単位の四辺形メッシュである。このようなサブ画素単位の四辺形メッシュ１６０１を、ステップ１１０１で指定された自由視点の位置（図１６中の黒塗りの逆三角１３０８）でレンダリングした画像が境界背景層のレンダリング画像となる。なお、境界背景層のレンダリングの場合も、色が存在しない画素部分は、穴として残ることになる。そして、上記のようなレンダリング処理を、参照画像群の全てについて行い、境界背景層のレンダリング画像群を得る。図１６において、矢印１６０２／１６０３は、四辺形メッシュ１６０１が、視点１３０３／視点１３０８でどの位置から見えるかを示している。視点１３０３より左側にある視点１３０８では、四辺形メッシュ１６０１は、視点１３０３より右側に位置する。 In step 1109, the free viewpoint image generation unit 403 renders the boundary background layer. FIG. 16 is a diagram illustrating the rendering of the boundary background layer. As in FIG. 13, the horizontal axis represents the X coordinate and the vertical axis represents the Z coordinate, and an object boundary (not shown) exists between the foreground pixel 1304 and the background pixel 1305. In FIG. 16, a broken line 1601 indicates a quadrilateral mesh of the boundary background layer when a three-dimensional model is generated from a reference viewpoint 1303 indicated by a white reverse triangle. The boundary background layer 1601 is a quadrilateral mesh in units of sub-pixels having distance information and color information of the background pixel 1305. An image obtained by rendering such a quadrilateral mesh 1601 in units of sub-pixels at the position of the free viewpoint specified in step 1101 (the black inverted triangle 1308 in FIG. 16) becomes the rendered image of the boundary background layer. In the case of rendering the boundary background layer, the pixel portion where no color exists remains as a hole. Then, the rendering process as described above is performed for all the reference image groups to obtain a rendered image group of the boundary background layer. In FIG. 16, arrows 1602/1603 indicate from which position the quadrilateral mesh 1601 can be seen at the viewpoint 1303 / viewpoint 1308. At the viewpoint 1308 on the left side of the viewpoint 1303, the quadrilateral mesh 1601 is located on the right side of the viewpoint 1303.

ステップ１１１０において、自由視点画像生成部４０３は、境界背景層のレンダリング画像群を統合して、境界背景層の統合画像データを得る。ステップ１１０５と同様の統合処理によって、参照画像としての４つの視点画像から生成された境界背景層のレンダリング画像（４つ）が統合される。 In step 1110, the free viewpoint image generation unit 403 integrates the boundary background layer rendering images to obtain integrated image data of the boundary background layer. By the same integration process as in step 1105, the rendering images (four) of the boundary background layer generated from the four viewpoint images as the reference images are integrated.

ステップ１１１１において、自由視点画像生成部４０３は、ステップ１１０８で得た２層統合画像データと、境界背景層の統合画像データとを統合して、３層（主層＋境界前景層＋境界背景層）統合画像データを得る。ここでの統合処理も画素毎に行われる、この際、２層（主層＋境界前景層）統合画像と境界背景層の統合画像とでは２層統合画像の方が安定的に精度の高い画像が得られるため、２層統合画像を優先して利用する。よって、２層（主層＋境界前景層）統合画像に穴が空いていて、境界背景層の統合画像に穴が空いていないという場合にのみ、境界背景層の色で補完がなされる。２層統合画像と境界背景層の統合画像との双方に穴が空いているときは、穴として残ることになる。以上の処理により、３層統合画像データを得る。 In step 1111, the free viewpoint image generation unit 403 integrates the two-layer integrated image data obtained in step 1108 and the integrated image data of the boundary background layer, and combines three layers (main layer + boundary foreground layer + boundary background layer). ) Obtain integrated image data. The integration processing here is also performed for each pixel. At this time, the two-layer integrated image is more stable and accurate in the two-layer (main layer + boundary foreground layer) integrated image and the integrated image of the boundary background layer. Therefore, the two-layer integrated image is preferentially used. Therefore, the color of the boundary background layer is complemented only when there is a hole in the two-layer (main layer + boundary foreground layer) integrated image and there is no hole in the integrated image of the boundary background layer. When a hole is formed in both the two-layer integrated image and the boundary background layer integrated image, the hole remains as a hole. Through the above processing, three-layer integrated image data is obtained.

なお、本実施例において、主層、境界前景層、境界背景層の順にレンダリング処理を行うのは、オブジェクト境界付近の画質劣化を抑えるためである。 In the present embodiment, the rendering process is performed in the order of the main layer, the boundary foreground layer, and the boundary background layer in order to suppress image quality deterioration near the object boundary.

ステップ１１１２において、自由視点画像生成部４０３は、穴埋め処理を行う。具体的には、ステップ１１１１で得た３層統合画像データにおいて穴として残っている部分を周囲の色を用いて補完する。本実施例では、穴埋め対象画素の周囲画素から距離情報が奥にある画素を選択して穴埋め処理を行っている。もちろん穴埋めの方法は他の方法を用いても構わない。 In step 1112, the free viewpoint image generation unit 403 performs hole filling processing. Specifically, the portion remaining as a hole in the three-layer integrated image data obtained in step 1111 is complemented using surrounding colors. In the present embodiment, the filling process is performed by selecting a pixel having distance information in the back from surrounding pixels of the filling target pixel. Of course, other methods may be used for filling holes.

ステップ１１１３において、自由視点画像生成部４０３は、穴埋め処理の終わった自由視点画像データを、エンコーダ部２１０に出力する。エンコーダ部２１０では、任意の符号化方式（例えばＪＰＥＧ方式）で符号化して画像出力される。 In step 1113, the free viewpoint image generation unit 403 outputs the free viewpoint image data that has undergone the hole filling process to the encoder unit 210. In the encoder unit 210, an image is output after being encoded by an arbitrary encoding method (for example, JPEG method).

本実施例によれば、多視点画像データにおける各視点間の撮像画像を高精度に合成することが可能となり、撮像した画像とは視点数の異なるディスプレイにおける違和感のない表示、リフォーカス処理など画像処理の高画質化、などを実現できる。 According to the present embodiment, it is possible to synthesize the captured images between the viewpoints in the multi-viewpoint image data with high accuracy, and display images such as a display with no sense of incongruity on a display having a different number of viewpoints from the captured images, refocus processing, etc. High image quality of processing can be realized.

実施例１では、オブジェクト境界付近の三次元モデル生成を詳細に行い、３層に分離してレンダリングすることで自由視点画像合成の画質向上を図った。次に、オブジェクト境界付近の３次元モデル生成にアルファ値（透明度）を導入することで、より高い画質を実現する態様について、実施例２として説明する。なお、実施例１と共通する部分（距離情報推定部４０１における処理）については説明を簡略化ないしは省略し、ここでは差異点である分離情報生成部４０２及び自由視点画像生成部４０３における処理を中心に説明することとする。 In the first embodiment, the generation of a three-dimensional model in the vicinity of the object boundary is performed in detail, and the image quality of the free viewpoint image synthesis is improved by rendering the three layers separately. Next, an embodiment in which higher image quality is realized by introducing an alpha value (transparency) in the generation of a three-dimensional model near the object boundary will be described as a second embodiment. Note that the description of the portions common to the first embodiment (processing in the distance information estimation unit 401) is simplified or omitted, and here, the processing in the separation information generation unit 402 and the free viewpoint image generation unit 403 which are different points is mainly described. It will be explained in the following.

図１７は、本実施例に係る画像分離処理の流れを示すフローチャートである。 FIG. 17 is a flowchart illustrating the flow of image separation processing according to the present embodiment.

ステップ１７０１において、分離情報生成部４０２は、多視点画像データ、及び、距離情報推定処理によって得られた距離情報を取得する。 In step 1701, the separation information generation unit 402 acquires multi-viewpoint image data and distance information obtained by distance information estimation processing.

ステップ１７０２において、分離情報生成部４０２は、視点画像内のオブジェクト境界を抽出する。 In step 1702, the separation information generating unit 402 extracts an object boundary in the viewpoint image.

ステップ１７０３において、分離情報生成部４０２は、視点画像内の各画素を、前景画素、背景画素、通常画素の３種類に分類する。 In step 1703, the separation information generation unit 402 classifies each pixel in the viewpoint image into three types of foreground pixels, background pixels, and normal pixels.

ここまでは、実施例１の図９のフローチャートのステップ９０１〜ステップ９０３と同じである。 The steps so far are the same as steps 901 to 903 in the flowchart of FIG. 9 of the first embodiment.

ステップ１７０４において、分離情報生成部４０２は、ステップ１７０３で特定された前景画素及び背景画素のアルファ値の推定を行う。 In step 1704, the separation information generation unit 402 estimates the alpha values of the foreground pixels and background pixels specified in step 1703.

図１８は、本実施例に係るアルファ推定処理の様子を説明する図である。 FIG. 18 is a diagram for explaining the state of alpha estimation processing according to the present embodiment.

まず、前景画素（１８０４、１８０５、１８１１、１８１７）と背景画素（１８０３、１８０９、１８１０、１８１６）とを、アルファ推定の未定領域とする。そして、オブジェクト境界１８０１を挟んで前景画素がある側の通常画素（１８０６、１８０７、１８１２、１８１３、１８１８、１８１９）を、アルファ推定の前景領域とする。同様に、オブジェクト境界１８０１を挟んで背景画素がある側の通常画素（１８０２、１８０８、１８１４、１８１５）を、アルファ推定の背景領域とする。ここでは、オブジェクト境界１８０１からそれぞれ２画素幅の領域を前景領域／背景領域に設定しているが、この画素幅は任意に設定可能である。そして、未定領域としいて設定された前景画素及び背景画素のアルファ値を、前景領域／背景領域として設定された通常画素の情報を基に推定する。推定には公知のアルファ推定方法（例えば、ベイズ推定）を適用可能である。アルファ推定の具体的な内容については本発明の特徴ではないため、説明を省略する。このようにして、前景画素及び背景画素にアルファ値が付加される。 First, the foreground pixels (1804, 1805, 1811, 1817) and the background pixels (1803, 1809, 1810, 1816) are set as undetermined regions for alpha estimation. Then, normal pixels (1806, 1807, 1812, 1813, 1818, 1819) on the side where the foreground pixels are located across the object boundary 1801 are set as the foreground region for alpha estimation. Similarly, normal pixels (1802, 1808, 1814, and 1815) on the side where the background pixel is located across the object boundary 1801 are set as the background region for alpha estimation. Here, an area having a width of 2 pixels from the object boundary 1801 is set as the foreground area / background area, but the pixel width can be arbitrarily set. Then, the alpha values of the foreground pixels and background pixels set as the undetermined areas are estimated based on the information of the normal pixels set as the foreground areas / background areas. A known alpha estimation method (for example, Bayesian estimation) can be applied to the estimation. Since the specific content of the alpha estimation is not a feature of the present invention, the description thereof is omitted. In this way, alpha values are added to the foreground and background pixels.

ステップ１７０５において、分離情報生成部４０２は、入力された多視点画像データに含まれるすべての視点画像について、画素の分類およびアルファ値の推定が完了したかどうかを判定する。未処理の視点画像がある場合にはステップ１７０２に戻り、次の視点画像に対しステップ１７０２〜ステップ１７０４の処理を行う。一方、すべての視点画像について画素の分類およびアルファ値の推定が完了していた場合には、ステップ１７０６に進む。 In step 1705, the separation information generation unit 402 determines whether pixel classification and alpha value estimation have been completed for all viewpoint images included in the input multi-viewpoint image data. If there is an unprocessed viewpoint image, the process returns to step 1702, and the processing of steps 1702 to 1704 is performed on the next viewpoint image. On the other hand, if pixel classification and alpha value estimation have been completed for all viewpoint images, the process proceeds to step 1706.

ステップ１７０６において、分離情報生成部４０２は、ステップ１７０３で得た前景画素、背景画素、通常画素を特定可能な分離情報、及びステップ１７０４で得たアルファ情報を自由視点画像合成部４０４に送る。 In step 1706, the separation information generation unit 402 sends the foreground pixel, background pixel, and normal pixel separation information obtained in step 1703 and the alpha information obtained in step 1704 to the free viewpoint image composition unit 404.

続いて、本実施例における自由視点画像の生成処理について説明する。前景画素と背景画素にアルファ値が付加されている点が異なるのみで、基本的な処理の内容は実施例１における自由視点画像の生成処理と同じであるので、前述の図１１のフローチャートに沿って本実施例に特有な点を中心に説明する。 Next, a free viewpoint image generation process in this embodiment will be described. The basic process is the same as the free viewpoint image generation process in the first embodiment except that the alpha value is added to the foreground pixel and the background pixel. The description will focus on the points peculiar to the present embodiment.

ステップ１１０１における自由視点の位置情報の取得、及びステップ１１０２における参照画像群の設定は、実施例１と同様である。但し、上述のとおり、分離情報生成部４０２からは分離情報に加えアルファ情報も取得される（ステップ１１０１）。 The acquisition of the position information of the free viewpoint in step 1101 and the setting of the reference image group in step 1102 are the same as in the first embodiment. However, as described above, in addition to the separation information, alpha information is also acquired from the separation information generation unit 402 (step 1101).

ステップ１１０３における主層、境界前景層、境界背景層の３次元モデル生成も実施例１と同様である。但し、本実施例では、境界前景層、境界背景層の３次元モデル生成の際、サブ画素単位の四辺形メッシュには対応する画素のアルファ値が付加される。 The three-dimensional model generation of the main layer, the boundary foreground layer, and the boundary background layer in step 1103 is the same as in the first embodiment. However, in this embodiment, when generating a three-dimensional model of the boundary foreground layer and boundary background layer, the alpha value of the corresponding pixel is added to the quadrilateral mesh in units of subpixels.

ステップ１１０４〜ステップ１１１０における、各層のレンダリング処理及びレンダリング結果の統合処理も、実施例１と同様である。但し、主層と境界前景層を統合した２層統合画像において境界前景層でレンダリングされた画素はアルファ値を有している。また、境界背景層でレンダリングされた画素も、アルファ値を有している。
ステップ１１１１における３層統合画像データの生成では、画素毎に統合処理を行い、２層統合画像（主層と境界前景層の統合画像）で穴が空いている場合は、境界背景層の色で補完する。また、穴は空いていないが、境界前景層によってレンダリングされアルファ値を有している画素に関しては、境界前景層と境界背景層の色をアルファ値に応じて重み付き平均により求める。例えば、境界前景層の画素が持つアルファ値が大きい（透明度が低い）場合は、境界前景層の画素値がより多く反映されるような重み付き平均を行う。逆に、境界前景層の画素が持つアルファ値が小さい（透明度が高い）場合は、境界背景層の画素値がより多く反映されるような重み付き平均を行う。このような処理により、境界部分の色の曖昧性に対応することができ、画質の向上が図られる。 The rendering processing of each layer and the rendering result integration processing in steps 1104 to 1110 are the same as those in the first embodiment. However, the pixel rendered in the boundary foreground layer in the two-layer integrated image obtained by integrating the main layer and the boundary foreground layer has an alpha value. Pixels rendered in the boundary background layer also have an alpha value.
In the generation of the three-layer integrated image data in step 1111, integration processing is performed for each pixel. If a hole is formed in the two-layer integrated image (integrated image of the main layer and the boundary foreground layer), the color of the boundary background layer is used. Complement. In addition, for a pixel that has no hole but is rendered by the boundary foreground layer and has an alpha value, the colors of the boundary foreground layer and the boundary background layer are obtained by a weighted average according to the alpha value. For example, when the alpha value of the pixels in the boundary foreground layer is large (the transparency is low), a weighted average is performed so that more pixel values in the boundary foreground layer are reflected. Conversely, when the alpha value of the pixels in the boundary foreground layer is small (the transparency is high), a weighted average is performed so that more pixel values in the boundary background layer are reflected. By such processing, it is possible to deal with the ambiguity of the color at the boundary portion, and the image quality can be improved.

ステップ１１１２における穴埋め処理およびステップ１１１３における自由視点画像データの出力処理は、実施例１と同様である。 The hole filling process in step 1112 and the free viewpoint image data output process in step 1113 are the same as in the first embodiment.

本実施例によれば、オブジェクト境界付近の３次元モデル生成にアルファ値（透明度）を導入することで、より高い画質を実現することが可能となる。 According to the present embodiment, it is possible to realize higher image quality by introducing an alpha value (transparency) in the generation of a three-dimensional model near the object boundary.

（その他の実施形態）
また、本発明の目的は、以下の処理を実行することによっても達成される。即ち、上述した実施形態の機能を実現するソフトウェアのプログラムコードを記録した記憶媒体を、システム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）が記憶媒体に格納されたプログラムコードを読み出す処理である。この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施の形態の機能を実現することになり、そのプログラムコード及び該プログラムコードを記憶した記憶媒体は本発明を構成することになる。 (Other embodiments)
The object of the present invention can also be achieved by executing the following processing. That is, a storage medium that records a program code of software that realizes the functions of the above-described embodiments is supplied to a system or apparatus, and a computer (or CPU, MPU, etc.) of the system or apparatus is stored in the storage medium. This is the process of reading the code. In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiments, and the program code and the storage medium storing the program code constitute the present invention.

Claims

A specifying means for specifying a boundary region of an object in an image of multi-viewpoint image data obtained by imaging from a plurality of viewpoints;
Combining the multi-viewpoint image data to generate combined image data,
The image processing apparatus according to claim 1, wherein the synthesizing unit has a rendering unit for generating pixel values of the boundary region smaller than a rendering unit for generating pixel values of regions other than the boundary region.

The boundary region is composed of a boundary foreground layer that is a foreground part adjacent to the boundary of the object, and a boundary background layer that is a background part adjacent to the boundary of the object;
The image processing apparatus according to claim 1, wherein the specifying unit separates the boundary foreground layer, the boundary background layer, and a main layer constituting an area other than the boundary area into three layers.

The synthesizing unit renders the image at the viewpoint position by rendering the main layer, the boundary foreground layer, and the boundary background layer in this order with respect to the viewpoint to be referred to generate an image at an arbitrary viewpoint position. The image processing apparatus according to claim 2, wherein the images are combined.

The said synthesis | combination means calculates | requires the color of the said boundary region by performing the weighted average according to the alpha value of the pixel corresponding to the said boundary region in the rendering of the said boundary region. The image processing apparatus according to any one of the above.

The image processing apparatus according to claim 1, wherein the specifying unit generates the separation information by using image distance information.

The image processing apparatus according to claim 1, wherein the specifying unit generates the separation information by using region division of an image.

A specifying step for specifying a boundary region of an object in an image of multi-viewpoint image data obtained by imaging from a plurality of viewpoints;
Combining the multi-viewpoint image data to generate composite image data,
In the synthesis step, the rendering unit when generating the pixel value of the boundary area is smaller than the rendering unit when generating the pixel value of the area other than the boundary area.

The boundary region is composed of a boundary foreground layer that is a foreground part adjacent to the boundary of the object, and a boundary background layer that is a background part adjacent to the boundary of the object;
The image processing method according to claim 7, wherein in the specifying step, the image processing method is divided into three layers of a main layer that constitutes an area other than the boundary foreground layer, the boundary background layer, and the boundary area.

In the synthesis step, the main layer, the boundary foreground layer, and the boundary background layer are rendered in this order with respect to the viewpoint to be referred to in order to generate an image at an arbitrary viewpoint position, and the image at the viewpoint position is rendered. The image processing method according to claim 8, wherein the images are combined.

10. The composition step according to claim 7, wherein in the rendering of the boundary region, a weighted average corresponding to an alpha value of a pixel corresponding to the boundary region is performed to obtain a color of the boundary region. The image processing method according to claim 1.

The image processing method according to claim 7, wherein in the specifying step, the separation information is generated using image distance information.

The image processing method according to claim 7, wherein in the specifying step, the separation information is generated using image segmentation.

A program for causing a computer to function as the image processing apparatus according to any one of claims 1 to 6.