JP6702755B2

JP6702755B2 - Image processing device, imaging device, image processing method, and program

Info

Publication number: JP6702755B2
Application number: JP2016030973A
Authority: JP
Inventors: 木村　直人; 直人木村
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2015-04-08
Filing date: 2016-02-22
Publication date: 2020-06-03
Anticipated expiration: 2036-02-22
Also published as: JP2016201788A

Description

本発明は、３次元再構成技術による任意の視点位置での画像生成処理に関するものである。 The present invention relates to image generation processing at an arbitrary viewpoint position by a three-dimensional reconstruction technique.

従来、３次元再構成技術として、撮影時における複数の視点での画像から３次元データを再構成した後、３次元データに基づいて別の視点から撮影したかの様な画像を生成する技術がある。撮像後に設定される視点（以下、任意視点という）は、例えばユーザが任意に選択可能である。特許文献１に開示された技術では、複数の視点での画像から任意視点での画像を作成する際に、対象画像と時系列的に近い画像間の特徴量が算出される。特徴量の信頼度が低い場合、時系列前後の画像に基づいて３次元データを再構成する事で、精度の良い３次元データが作成可能である。 Conventionally, as a three-dimensional reconstruction technique, there is a technique of reconstructing three-dimensional data from images at a plurality of viewpoints at the time of photographing and then generating an image as if the photograph was taken from another viewpoint based on the three-dimensional data. is there. The viewpoint (hereinafter referred to as an arbitrary viewpoint) set after the image capturing can be arbitrarily selected by the user, for example. In the technique disclosed in Patent Document 1, when an image from an arbitrary viewpoint is created from an image from a plurality of viewpoints, the feature amount between the target image and the images close in time series is calculated. When the reliability of the feature amount is low, it is possible to create accurate three-dimensional data by reconstructing the three-dimensional data based on the images before and after the time series.

特開２００９−２１２７２８号公報JP, 2009-212728, A 特開２０１３―２３９１１９号公報JP, 2013-239119, A 特開２０１２−２２１１２８号公報JP, 2012-221128, A

任意の視点位置での画像生成処理において、複数の視点による画像間で相関が取れない領域は、視点を変える事により隠れてしまう領域（被写体による遮蔽領域）、つまりオクルージョン領域である可能性がある。その為、時系列前後の画像情報から３次元データに再構成する場合、オクルージョン領域の変化量が画像間で小さいと、マッピング精度の向上に問題がある。また、全ての画像を一度３次元データにマッピングする一斉処理を行った上で任意視点での画像の生成が行われることが前提となる。この場合、一度に処理するデータ量が多くなってしまうという問題があった。
本発明の目的は、複数の画像データから任意の視点位置での画像を生成する逐次処理を可能にした画像処理装置、撮像装置、画像処理方法、プログラムを提供することである。 In the image generation processing at an arbitrary viewpoint position, an area in which images from multiple viewpoints cannot be correlated may be an area that is hidden by changing viewpoints (occlusion area by subject), that is, an occlusion area. .. Therefore, when reconstructing three-dimensional data from image information before and after time series, if the amount of change in the occlusion area is small between images, there is a problem in improving the mapping accuracy. Further, it is premised that all the images are once subjected to simultaneous processing for mapping to three-dimensional data, and then the images are generated from an arbitrary viewpoint. In this case, there is a problem that the amount of data to be processed at once increases.
An object of the present invention is to provide an image processing device, an imaging device, an image processing method, and a program that enable sequential processing for generating an image at an arbitrary viewpoint position from a plurality of image data.

本発明に係る装置は、複数の画像データを取得する画像取得手段と、前記画像取得手段から取得される画像データの奥行き方向の距離情報を取得する距離情報取得手段と、前記画像取得手段から取得される画像データに、座標変換により画像の変形処理を行う画像変形手段と、前記距離情報取得手段により取得された距離情報に、前記画像データに行う座標変換に対応する座標変換により変形処理を行う距離情報変形手段と、前記距離情報変形手段により変形処理された複数の距離情報を合成する距離情報合成手段と、前記距離情報合成手段が出力する距離情報を記録する距離情報記録手段と、前記距離情報変形手段により変形処理された距離情報および前記距離情報記録手段に記録された距離情報に基づいて合成用の情報を生成する合成情報生成手段と、前記合成用の情報に基づいて前記画像変形手段により変形処理された複数の画像を合成することで任意視点画像を生成する画像合成手段と、を備える。 An apparatus according to the present invention includes image acquisition means for acquiring a plurality of image data, distance information acquisition means for acquiring distance information in the depth direction of image data acquired from the image acquisition means, and acquisition from the image acquisition means. Image transformation means for transforming the image data by coordinate transformation, and distance information acquired by the distance information obtaining means is transformed by coordinate conversion corresponding to the coordinate transformation performed on the image data. Distance information transformation means, distance information synthesis means for synthesizing a plurality of distance information transformed by the distance information transformation means, distance information recording means for recording the distance information output by the distance information synthesis means, and the distance Synthesis information generating means for generating information for synthesis based on the distance information transformed by the information transforming means and the distance information recorded in the distance information recording means, and the image transforming means based on the information for synthesis. Image combining means for generating an arbitrary viewpoint image by combining a plurality of images that have been subjected to the deformation processing by.

本発明によれば、複数の画像データから任意の視点位置での画像を生成する逐次処理を可能にした画像処理装置、撮像装置、画像処理方法、プログラムを提供することができる。 According to the present invention, it is possible to provide an image processing device, an imaging device, an image processing method, and a program that enable sequential processing of generating an image at an arbitrary viewpoint position from a plurality of image data.

本発明の実施形態に係る撮像装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the imaging device which concerns on embodiment of this invention. 第１実施形態に係る画像処理部１０４の構成例を示すブロック図である。FIG. 3 is a block diagram showing a configuration example of an image processing unit 104 according to the first embodiment. 図２のαＭＡＰ生成部２０８内の構成例を示すブロック図である。3 is a block diagram showing a configuration example in an αMAP generation unit 208 in FIG. 2. FIG. 第１実施形態における画像処理の流れを示すフローチャートである。6 is a flowchart showing a flow of image processing in the first embodiment. 図２のαＭＡＰ生成部２０８による処理例を示すフローチャートである。3 is a flowchart showing a processing example by an αMAP generation unit 208 in FIG. 2. 第２実施形態に係る画像処理部６０４の構成例を示すブロック図である。It is a block diagram which shows the structural example of the image processing part 604 which concerns on 2nd Embodiment. 図６のαＭＡＰ生成部７１０の構成例を示すブロック図である。7 is a block diagram showing a configuration example of an αMAP generation unit 710 in FIG. 6. 第２実施形態における画像処理の流れを示すフローチャートである。It is a flow chart which shows a flow of image processing in a 2nd embodiment. 図６のαＭＡＰ生成部７１０による処理例を示すフローチャートである。7 is a flowchart showing a processing example by an αMAP generation unit 710 in FIG. 6. 撮影画像と任意視点画像の関係を説明する図である。It is a figure explaining the relationship between a picked-up image and an arbitrary viewpoint image. 各撮影位置での撮影画像と距離ＭＡＰを説明する図である。It is a figure explaining the picked-up image and distance MAP in each shooting position. 各撮影位置での撮影画像と任意視点画像を例示する図である。It is a figure which illustrates the picked-up image and arbitrary viewpoint image in each shooting position. 各撮影位置での距離ＭＡＰと任意視点距離ＭＡＰを例示する図である。It is a figure which illustrates the distance MAP and each arbitrary viewpoint distance MAP in each photography position. 撮影１と撮影２との合成処理を説明する図である。It is a figure explaining the synthetic|combination process of the photography 1 and the photography 2. 撮影１および撮影２の合成結果と撮影３との合成処理を説明する図である。FIG. 8 is a diagram illustrating a combination process of a combination result of shooting 1 and shooting 2 and shooting 3; 画像における２次元座標と実空間における３次元座標との関係を説明する模式図である。It is a schematic diagram explaining the relationship between the two-dimensional coordinates in an image and the three-dimensional coordinates in the real space. 第２実施形態に係る信頼度ＭＡＰに関する説明図である。It is explanatory drawing regarding the reliability MAP which concerns on 2nd Embodiment.

以下に、本発明の各実施形態を、添付図面に基づいて詳細に説明する。各実施形態では、視点位置を変更して撮影された画像から、任意視点の位置（以下、任意視点位置という）における画像を生成する処理が行われる。撮影画像が順次に入力される毎に前記処理が実行されて任意視点位置における画像データが更新されていく。以降の説明では、任意視点位置における画像を「任意視点画像」と称する。被写体については、撮像装置に近づく側を手前側と定義して位置関係を説明する。 Each embodiment of the present invention will be described below in detail with reference to the accompanying drawings. In each embodiment, a process of generating an image at a position of an arbitrary viewpoint (hereinafter referred to as an arbitrary viewpoint position) from an image captured by changing the viewpoint position is performed. The above processing is executed every time the photographed images are sequentially input, and the image data at the arbitrary viewpoint position is updated. In the following description, an image at an arbitrary viewpoint position will be referred to as an “arbitrary viewpoint image”. Regarding the subject, the positional relationship will be described by defining the side closer to the imaging device as the near side.

［第１実施形態］
以下に、本発明の第１実施形態に係る画像処理装置を説明する。まず、本実施形態における画像処理の概要を説明すると、任意視点画像を生成する為に、視点位置を変えて撮影した画像に対し、視点を変更する為の画像の変形処理が行われるとともに、距離ＭＡＰに対する変形処理が実行される。距離ＭＡＰとは、各画素位置における被写体の撮影距離、すなわち距離分布を示す距離情報のことである。ここで、距離ＭＡＰとしては、距離分布に対応する、画像内の被写体間の距離の相対関係が判ればよい。例えば、対となる視差画像から得られる、視差画像間の像ずれ量の分布情報や、像ずれ量をデフォーカス量に換算したデフォーカス量の分布情報などの形態でも構わない。
本実施形態では、距離ＭＡＰに対しても視点変更の為の変形処理を行う点が特徴である。変形処理の後、視点の変更された距離ＭＡＰに基づいて合成用マップ（後述するαＭＡＰ）が生成される。合成用マップを用いて、視点変更後の複数の画像を合成する事により、任意視点画像のデータを更新する処理が逐次に実行される。以下、本実施形態について順を追って説明する。 [First Embodiment]
The image processing apparatus according to the first embodiment of the present invention will be described below. First, the outline of the image processing according to the present embodiment will be described. In order to generate an arbitrary viewpoint image, the image transformation processing for changing the viewpoint is performed on the image shot with the viewpoint position changed, and the distance is changed. The transformation process for the MAP is executed. The distance MAP is shooting information of a subject at each pixel position, that is, distance information indicating a distance distribution. Here, as the distance MAP, it suffices to know the relative relationship of the distances between the subjects in the image, which correspond to the distance distribution. For example, the distribution information of the image shift amount between the parallax images obtained from the parallax images forming a pair, the distribution information of the defocus amount obtained by converting the image shift amount into the defocus amount, and the like may be used.
The present embodiment is characterized in that the distance MAP is also deformed for changing the viewpoint. After the transformation process, a synthesis map (αMAP described later) is generated based on the distance MAP in which the viewpoint is changed. A process of updating the data of the arbitrary viewpoint image is sequentially executed by combining the plurality of images after the viewpoint change using the combining map. Hereinafter, the present embodiment will be described step by step.

図１は、本実施形態に係る画像処理装置を撮像装置に適用した場合の構成例を示したブロック図である。
光学系１０１は、ズームレンズやフォーカスレンズ等から構成されるレンズ群と、絞り調整装置、およびシャッター装置を備える。光学系１０１は、撮像素子１０２の受光面に結像される被写体像の倍率やピント位置、あるいは光量を調整する。撮像素子１０２は、光学系１０１を通過した被写体からの光束を光電変換して電気信号に変換する。撮像素子１０２はＣＣＤ（電荷結合素子）やＣＭＯＳ（相補型金属酸化膜半導体）を用いたイメージセンサ等の光電変換素子である。本実施形態では、撮像素子１０２の各画素は、ＲＧＢの色フィルタを有するＢａｙｅｒ配列で構成されており、各画素は１つのマイクロレンズに対して少なくとも２つの光電変換素子が対応する瞳分割型センサとなっている。しかし、撮像素子の形態としてはこれに限らない。Ａ（Ａｎａｌｏｇ）／Ｄ（Ｄｉｇｉｔａｌ）変換部１０３は撮像素子１０２の出力信号を取得し、映像信号をデジタル画像信号に変換する。 FIG. 1 is a block diagram showing a configuration example when the image processing apparatus according to the present embodiment is applied to an image pickup apparatus.
The optical system 101 includes a lens group including a zoom lens and a focus lens, an aperture adjustment device, and a shutter device. The optical system 101 adjusts the magnification, the focus position, or the amount of light of the subject image formed on the light receiving surface of the image sensor 102. The image sensor 102 photoelectrically converts the light flux from the subject that has passed through the optical system 101 into an electrical signal. The image pickup device 102 is a photoelectric conversion device such as an image sensor using a CCD (charge coupled device) or a CMOS (complementary metal oxide film semiconductor). In the present embodiment, each pixel of the image sensor 102 is configured in a Bayer array having RGB color filters, and each pixel is a pupil division type sensor in which at least two photoelectric conversion elements correspond to one microlens. Has become. However, the form of the image pickup device is not limited to this. The A (Analog)/D (Digital) converter 103 acquires the output signal of the image sensor 102 and converts the video signal into a digital image signal.

画像処理部１０４はＡ／Ｄ変換部１０３からの出力に対して公知の信号処理を行う他、入力された複数枚の画像から任意視点画像を生成する処理を行う。画像処理部１０４が行う処理については、後で詳細に説明する。画像処理部１０４はＡ／Ｄ変換部１０３から出力された画像データのみならず、記録部１０９から読み出された画像データに対しても画像処理を行う。駆動制御部１０５は、絞り値、感度、焦点距離、焦点位置の調整や、手振れ補正（像ブレ補正）を行う為に光学系１０１と撮像素子１０２の駆動制御を行う。 The image processing unit 104 performs known signal processing on the output from the A/D conversion unit 103, and also performs processing of generating an arbitrary viewpoint image from a plurality of input images. The processing performed by the image processing unit 104 will be described in detail later. The image processing unit 104 performs image processing not only on the image data output from the A/D conversion unit 103 but also on the image data read from the recording unit 109. The drive control unit 105 controls the drive of the optical system 101 and the image sensor 102 in order to adjust the aperture value, sensitivity, focal length, and focus position, and to perform camera shake correction (image blur correction).

システム制御部１０６はＣＰＵ（中央演算処理装置）等を備え、撮像装置全体の動作を制御して統括する制御中枢部である。システム制御部１０６は、画像処理部１０４が処理した画像から得られる輝度値や、操作部１０７から送信される指示信号に基づいて、駆動制御信号を駆動制御部１０５に出力し、光学系１０１や撮像素子１０２の制御等を行う。 The system control unit 106 is a control central unit that includes a CPU (central processing unit) and controls the overall operation of the image pickup apparatus. The system control unit 106 outputs a drive control signal to the drive control unit 105 based on a brightness value obtained from the image processed by the image processing unit 104 or an instruction signal transmitted from the operation unit 107, and the optical system 101 or The image sensor 102 is controlled.

表示部１０８は、液晶ディスプレイや有機ＥＬ（ＥｌｅｃｔｒｏＬｕｍｉｎｅｓｃｅｎｃｅ）ディスプレイ等を備える。表示部１０８は、撮像素子１０２により取得される画像データや、記録部１０９から読み出された画像データにしたがって画像を表示する。画像データの記録機能を有する記録部１０９は情報記録媒体を備える。例えば、半導体メモリが搭載されたメモリカードや、光磁気ディスク等の回転記録体を収容したパッケージ等を用いる情報記録媒体が使用され、情報記録媒体は撮像装置に対して着脱可能である。
バス１１０は、画像処理部１０４、駆動制御部１０５、システム制御部１０６、表示部１０８、および記録部１０９を接続し、各部の間で画像データや信号等を送受するために用いられる。 The display unit 108 includes a liquid crystal display, an organic EL (Electro Luminescence) display, and the like. The display unit 108 displays an image according to the image data acquired by the image sensor 102 and the image data read from the recording unit 109. The recording unit 109 having a function of recording image data includes an information recording medium. For example, an information recording medium using a memory card having a semiconductor memory mounted therein, a package containing a rotating recording medium such as a magneto-optical disk, or the like is used, and the information recording medium is removable from the image pickup apparatus.
The bus 110 connects the image processing unit 104, the drive control unit 105, the system control unit 106, the display unit 108, and the recording unit 109, and is used for transmitting and receiving image data, signals, and the like between the units.

次に、図２を参照して、画像処理部１０４内にて本実施形態に関わる構成について詳細に説明する。図２は画像処理部１０４内の一部を示したブロック図である。
画像取得部２０１はＡ／Ｄ変換部１０３から出力される画像データを逐次取得し、画像現像部２０２および距離ＭＡＰ生成部２０３にそれぞれ出力する。画像現像部２０２は現像処理を行った画像データを画像変形部２０４に出力する。画像記録部２０６は、任意視点画像のデータを記録し、画像合成部２０９は画像変形部２０４および画像記録部２０６から画像データを取得し、画像合成後の画像データを出力する。 Next, with reference to FIG. 2, a configuration related to the present embodiment in the image processing unit 104 will be described in detail. FIG. 2 is a block diagram showing a part of the image processing unit 104.
The image acquisition unit 201 sequentially acquires the image data output from the A/D conversion unit 103, and outputs the image data to the image development unit 202 and the distance MAP generation unit 203, respectively. The image developing unit 202 outputs the image data subjected to the developing process to the image transforming unit 204. The image recording unit 206 records the data of the arbitrary viewpoint image, the image combining unit 209 acquires the image data from the image transforming unit 204 and the image recording unit 206, and outputs the image data after the image combination.

距離ＭＡＰ生成部２０３は画像データに関する奥行き方向の距離情報として距離マップ（距離ＭＡＰとも記す）のデータを生成し、距離ＭＡＰ変形部２０５に出力する。距離ＭＡＰの生成処理では、撮像素子１０２より取得される撮像光学系の異なる瞳領域に対応する対をなす画像データに対して、相関演算を用いた公知の方法により距離情報が生成される。しかしこれに限らず、複数の撮像手段から得られる視差のある対の画像データから距離情報を取得する方法や、ピント位置の異なる画像データからＳＡＤなどの相関演算を用いてＤＦＤ法により距離情報を生成しても良い（特許文献２参照）。ＳＡＤは“ＳｕｍｏｆＡｂｓｏｌｕｔｅＤｉｆｆｅｒｅｎｃｅ”の略号であり、ＤＦＤは“ＤｅｐｔｈｆｒｏｍＤｅｆｏｃｕｓ”の略号である。距離ＭＡＰ変形部２０５は、任意視点位置での距離ＭＡＰの情報を生成し、距離ＭＡＰ合成部２１０に出力する。距離ＭＡＰ記録部２０７は、距離ＭＡＰ変形部２０５により変形処理された情報を記録する。距離ＭＡＰ記録部２０７から読み出された情報は、αＭＡＰ生成部２０８と距離ＭＡＰ合成部２１０にそれぞれ出力される。αＭＡＰ生成部２０８は、後述するように合成用のαマップ情報（αＭＡＰとも記す）を生成して、画像合成部２０９と距離ＭＡＰ合成部２１０に出力する。 The distance MAP generation unit 203 generates data of a distance map (also referred to as distance MAP) as distance information in the depth direction regarding the image data, and outputs the data to the distance MAP transformation unit 205. In the process of generating the distance MAP, the distance information is generated by a known method using a correlation calculation with respect to image data forming a pair corresponding to different pupil regions of the image pickup optical system acquired by the image pickup device 102. However, the present invention is not limited to this, and the distance information is obtained by a DFD method using a method of obtaining distance information from a pair of image data having parallax obtained from a plurality of image pickup means or a correlation calculation such as SAD from image data having different focus positions. It may be generated (see Patent Document 2). SAD is an abbreviation for "Sum of Absolute Difference", and DFD is an abbreviation for "Depth from Defocus". The distance MAP deforming unit 205 generates information on the distance MAP at the arbitrary viewpoint position and outputs it to the distance MAP synthesizing unit 210. The distance MAP recording unit 207 records the information subjected to the deformation processing by the distance MAP deforming unit 205. The information read from the distance MAP recording unit 207 is output to the αMAP generating unit 208 and the distance MAP combining unit 210, respectively. The αMAP generation unit 208 generates α map information for synthesis (also referred to as αMAP) as described below, and outputs it to the image synthesis unit 209 and the distance MAP synthesis unit 210.

図４のフローチャートを参照して、図２に示す画像処理部１０４の動作を説明する。以下の処理は、システム制御部１０６による制御指令にしたがって実行される。
撮影動作が開始するとＳ４０１で画像処理部１０４は、逐次処理で用いる変数（ｋと記す）の値を１に初期化する。Ｓ４０２で画像取得部２０１は、ｋ番目の撮影画像のデータ入力処理を行う。Ｓ４０３で距離ＭＡＰ生成部２０３は、Ｓ４０２で入力されたｋ番目の撮像画像に対応するｋ番目の距離ＭＡＰを生成する。本実施形態では、前述したように撮像素子１０２から得られるｋ番目の撮影画像に対応する対をなす視差画像データを用いて距離ＭＡＰを生成する。このとき、撮像素子１０２やその後の処理回路に起因するノイズや信号の減衰などを補正する補正処理が行われたRGBの色信号または輝度信号の状態の視差画像データを用いて距離MAPが生成される。Ｓ４０４で画像現像部２０２は、Ｓ４０２で入力されたｋ番目の画像データを現像処理し、画像出力装置に合わせて視認性の良い画像データに変換する。現像処理とは、シェーディング補正処理、撮像信号の補正処理、ホワイトバランス補正処理、ノイズ低減処理、エッジ強調処理、色マトリクス処理、ガンマ補正処理等の少なくとも一部を含む処理である。なお、Ｓ４０５以降の処理で扱うｋ番目の画像は、Ｓ４０４で現像処理が施された画像を指す。 The operation of the image processing unit 104 shown in FIG. 2 will be described with reference to the flowchart of FIG. The following processing is executed according to a control command from the system control unit 106.
When the shooting operation starts, the image processing unit 104 initializes the value of a variable (denoted as k) used in the sequential processing to 1 in step S401. In step S402, the image acquisition unit 201 performs data input processing for the kth captured image. In S403, the distance MAP generation unit 203 generates the kth distance MAP corresponding to the kth captured image input in S402. In the present embodiment, as described above, the distance MAP is generated using the parallax image data that forms a pair and corresponds to the k-th captured image obtained from the image sensor 102. At this time, the distance MAP is generated using the parallax image data in the state of the RGB color signal or the luminance signal that has been subjected to the correction processing for correcting the noise and the signal attenuation caused by the image sensor 102 and the subsequent processing circuit. It In step S404, the image developing unit 202 develops the k-th image data input in step S402, and converts the k-th image data into image data with good visibility according to the image output device. The development processing is processing including at least a part of shading correction processing, image pickup signal correction processing, white balance correction processing, noise reduction processing, edge enhancement processing, color matrix processing, gamma correction processing, and the like. Note that the k-th image handled in the processing of S405 and thereafter refers to the image subjected to the development processing in S404.

Ｓ４０５で画像変形部２０４は、Ｓ４０３で生成されたｋ番目の距離ＭＡＰの情報を用いて、Ｓ４０４で現像されたｋ番目の画像の変形処理を行う。画像変形処理により、ｋ番目の任意視点画像のデータが生成される。任意視点画像の生成処理の詳細については後述する。Ｓ４０６で距離ＭＡＰ変形部２０５は、Ｓ４０３で生成されたｋ番目の距離ＭＡＰの情報を入力として、ｋ番目の任意視点位置での距離ＭＡＰの情報を生成する。任意視点位置での距離ＭＡＰを、以下では「任意視点距離ＭＡＰ」と称する。任意視点距離ＭＡＰの生成処理については後で詳細に説明する。 In step S405, the image transformation unit 204 uses the information of the kth distance MAP generated in step S403 to perform the transformation process on the kth image developed in step S404. Data of the k-th arbitrary viewpoint image is generated by the image transformation process. Details of the arbitrary viewpoint image generation processing will be described later. In S406, the distance MAP deforming unit 205 receives the information of the kth distance MAP generated in S403 and generates the information of the distance MAP at the kth arbitrary viewpoint position. The distance MAP at the arbitrary viewpoint position will be referred to as “arbitrary viewpoint distance MAP” below. The process of generating the arbitrary viewpoint distance MAP will be described in detail later.

Ｓ４０７でαＭＡＰ生成部２０８は、１番目から“ｋ−１”番目まで処理されて距離ＭＡＰ記録部２０７に記録されている、任意視点距離ＭＡＰの情報を読み込む。Ｓ４０８でαＭＡＰ生成部２０８は、ｋ番目の任意視点距離ＭＡＰの情報と、距離ＭＡＰ記録部２０７に記録されている任意視点距離ＭＡＰの情報を入力として、合成用のαＭＡＰの情報を生成する。Ｓ４０８で生成されたαＭＡＰの情報は、後述のＳ４１０とＳ４１１で使用される。合成用のαＭＡＰの生成処理については後で詳細に説明する。 In S407, the αMAP generation unit 208 reads the information of the arbitrary viewpoint distance MAP, which is processed from the first to “k−1”th and recorded in the distance MAP recording unit 207. In step S<b>408, the αMAP generation unit 208 inputs the information on the k-th arbitrary viewpoint distance MAP and the information on the arbitrary viewpoint distance MAP recorded in the distance MAP recording unit 207, and generates the αMAP information for composition. The αMAP information generated in S408 is used in S410 and S411 described below. The process of generating the αMAP for synthesis will be described in detail later.

Ｓ４０９では、１番目から“ｋ−１”番目まで処理されて画像記録部２０６に記録されている、任意視点画像のデータを画像合成部２０９が読み込む。Ｓ４１０で画像合成部２０９は、ｋ番目の任意視点画像のデータを画像変形部２０４から取得し、画像記録部２０６に記録されている任意視点画像のデータを取得する。画像合成部２０９は、取得した画像データを、Ｓ４０８で生成されたαマップの情報に基づいて合成する。ここで画像に設定される２次元座標系において、ある点の位置座標（ｘ，ｙ）でのαマップ値をα（ｘ，ｙ）と記す。ｋ番目の任意視点画像の画素値をｐｉｘ＿ｋ（ｘ，ｙ）と記し、画像記録部２０６に記録されている任意視点画像の画素値をｐｉｘ＿ｍ（ｘ，ｙ）と記す。合成後の任意視点画像の画素値をｐｉｘ（ｘ，ｙ）と記すと、これは（式１）の様に算出される。
本実施形態では、α（ｘ，ｙ）の値を２値（０または１）とするが、必要に応じて０以上１以下の値をとる多値論理を採用してもよい。このことは以下の距離ＭＡＰ合成の演算でも同様である。 In step S409, the image combining unit 209 reads the data of the arbitrary viewpoint image, which has been processed from the first to “k−1”th and recorded in the image recording unit 206. In step S410, the image composition unit 209 acquires the data of the k-th arbitrary viewpoint image from the image transformation unit 204, and acquires the data of the arbitrary viewpoint image recorded in the image recording unit 206. The image combining unit 209 combines the acquired image data based on the information of the α map generated in S408. Here, in the two-dimensional coordinate system set on the image, the α map value at the position coordinate (x, y) of a certain point is referred to as α(x, y). The pixel value of the k-th arbitrary viewpoint image is described as pix_k(x, y), and the pixel value of the arbitrary viewpoint image recorded in the image recording unit 206 is described as pix_m(x, y). When the pixel value of the combined arbitrary viewpoint image is described as pix(x, y), this is calculated as in (Equation 1).
In the present embodiment, the value of α(x, y) is binary (0 or 1), but multi-valued logic that takes a value of 0 or more and 1 or less may be adopted as necessary. This also applies to the following calculation of the distance MAP combination.

Ｓ４１１において、距離情報合成部である距離ＭＡＰ合成部２１０は、ｋ番目の任意視点距離ＭＡＰの情報を距離ＭＡＰ変形部２０５から取得し、距離ＭＡＰ記録部２０７に記録されている任意視点距離ＭＡＰの情報を取得する。これらの任意視点距離ＭＡＰに対し、距離ＭＡＰ合成部２１０は、Ｓ４０８で生成されたαマップの情報に基づいて距離情報合成処理を行う。画像に設定された２次元座標系において、ある座標（ｘ，ｙ）での、ｋ番目の任意視点距離ＭＡＰの値をＺ＿ｋ（ｘ，ｙ）と記し、距離ＭＡＰ記録部２０７に記録されている任意視点距離ＭＡＰの値をＺ＿ｍ（ｘ，ｙ）と記す。合成後の距離ＭＡＰの値をＺ（ｘ，ｙ）と記すと、αマップ値であるα（ｘ，ｙ）を用いてＺ（ｘ，ｙ）は（式２）の様に算出される。
In step S<b>411, the distance MAP synthesizing unit 210, which is the distance information synthesizing unit, acquires the information of the k-th arbitrary viewpoint distance MAP from the distance MAP deforming unit 205, and calculates the arbitrary viewpoint distance MAP recorded in the distance MAP recording unit 207. Get information. For these arbitrary viewpoint distances MAP, the distance MAP synthesizing unit 210 performs distance information synthesizing processing based on the information of the α map generated in S408. In the two-dimensional coordinate system set for the image, the value of the k-th arbitrary viewpoint distance MAP at a certain coordinate (x, y) is described as Z_k(x, y) and is recorded in the distance MAP recording unit 207. The value of the arbitrary viewpoint distance MAP is described as Z_m (x, y). When the value of the distance MAP after combination is denoted as Z (x, y), Z (x, y) is calculated as in (Equation 2) using α (x, y) that is an α map value.

Ｓ４１２で画像合成部２０９は、Ｓ４１０で合成した任意視点画像のデータを画像記録部２０６に出力し、任意視点画像のデータを更新する。Ｓ４１３で距離ＭＡＰ合成部２１０は、Ｓ４１１で合成した任意視点距離ＭＡＰを距離ＭＡＰ記録部２０７に出力し、記録されている任意視点距離ＭＡＰを更新する。Ｓ４１４は、撮像装置による撮影が終了したか否かの判断処理である。つまり画像処理部１０４は、次の“ｋ＋１”番目の画像が入力されるか否かを判断する。撮像装置による撮影が終了し、“ｋ＋１”番目の画像データが入力されない場合、Ｓ４１５に処理を進める。また、撮像装置による撮影が終了せず、“ｋ＋１”番目の画像データが入力される場合には、Ｓ４１６へ移行する。 In step S412, the image combining unit 209 outputs the arbitrary viewpoint image data combined in step S410 to the image recording unit 206, and updates the arbitrary viewpoint image data. In step S413, the distance MAP combining unit 210 outputs the arbitrary viewpoint distance MAP combined in step S411 to the distance MAP recording unit 207, and updates the recorded arbitrary viewpoint distance MAP. S414 is a process of determining whether or not the image capturing by the image capturing device has been completed. That is, the image processing unit 104 determines whether or not the next “k+1”th image is input. When the imaging by the imaging device is finished and the “k+1”th image data is not input, the process proceeds to S415. If the "k+1"th image data is input without the end of the image capturing by the image capturing apparatus, the process proceeds to S416.

Ｓ４１５で画像処理部１０４は、現時点までに処理したｋ枚の画像のデータから作成された任意視点画像のデータを、出力画像データとして出力して一連の処理を終了する。画像処理部１０４から出力された任意視点画像データは符号化され、対応する距離MAPを含めて同一画像ファイルに記録される。あるいは出力された任意視点画像データは、対応する距離MAPと関連付けられて別ファイルにてそれぞれ記録される。一方、Ｓ４１６では、“ｋ＋１”番目の画像データが入力されて変数ｋ値のインクリメント（ｋ＋＋）処理が実行される。つまり、変数ｋがｋ＋１として更新された後、Ｓ４０２の処理へ戻る。
なお、本実施形態では、ｋ番目の画像データの処理が行われて、撮影が終了するまでの間に作成された任意視点画像のデータを出力しない例を説明した。このような例に限らず、逐次に作成される任意視点画像のデータを、画像処理部１０４の出力画像データとして出力してもよい。このことは、後述する実施形態でも同じである。 In step S415, the image processing unit 104 outputs the data of the arbitrary viewpoint image created from the data of the k images processed up to the present time as output image data, and ends the series of processes. The arbitrary viewpoint image data output from the image processing unit 104 is encoded and recorded in the same image file including the corresponding distance MAP. Alternatively, the output arbitrary viewpoint image data is recorded in a separate file in association with the corresponding distance MAP. On the other hand, in S416, the "k+1"th image data is input and the variable k value increment (k++) process is executed. That is, after the variable k 1 is updated as k+1, the process returns to S402.
In the present embodiment, an example has been described in which the k-th image data is processed and the data of the arbitrary viewpoint image created until the end of shooting is not output. Not limited to such an example, data of arbitrary viewpoint images that are sequentially created may be output as output image data of the image processing unit 104. This also applies to the embodiments described later.

次に、図４のＳ４０５で説明した任意視点画像の生成処理について詳細に説明する。図１０は、撮影画像と任意視点画像との関係を例示する。図１０（Ａ）に示すように撮影位置１１０１から１１０３に関し、被写体として、人物と、その奥の木ＡからＣと、さらに奥に建物があるシーンを想定する。任意視点位置を位置１１０４に示す。図１０（Ｂ）は、撮影位置１１０１にて、建物と木、人物を被写体として撮影された撮影画像１１１１を示す。図１０（Ｃ）は任意視点位置１１０４における任意視点画像１１１４を示す。 Next, the generation processing of the arbitrary viewpoint image described in S405 of FIG. 4 will be described in detail. FIG. 10 illustrates the relationship between the captured image and the arbitrary viewpoint image. As shown in FIG. 10(A), regarding the shooting positions 1101 to 1103, it is assumed that the subject is a person, trees A to C in the back, and a building in the back. An arbitrary viewpoint position is shown at a position 1104. FIG. 10B shows a photographed image 1111 photographed at the photographing position 1101 with buildings, trees, and people as subjects. FIG. 10C shows an arbitrary viewpoint image 1114 at the arbitrary viewpoint position 1104.

図１０（Ｂ）に示す撮影画像１１１１に対応する撮影位置を実空間で表すと、図１０（Ａ）の撮影位置１１０１である。撮影位置１１０１で撮影された画像に基づいて任意視点位置１１０４で撮影したかの如き画像を任意視点画像１１１４として作成する場合を想定する。この場合、撮影位置１１０１で撮影した画像の情報のみでは、任意視点画像の生成に必要な被写体情報が不足する。つまり撮影位置１１０１だけでなく、撮影位置１１０２および撮影位置１１０３を含めた、複数の位置から撮影した画像の情報を入力として、任意視点画像を作成する必要がある。以下では、撮影位置１１０１での撮影を「撮影１」とし、撮影位置１１０２での撮影を「撮影２」とし、撮影位置１１０３での撮影を「撮影３」とする。本実施形態では、３枚の画像に基づいて任意視点画像を生成する処理例を説明する。 When the shooting position corresponding to the shot image 1111 shown in FIG. 10B is represented in the real space, the shooting position 1101 in FIG. 10A is obtained. It is assumed that an image as if taken at the arbitrary viewpoint position 1104 is created as the arbitrary viewpoint image 1114 based on the image taken at the shooting position 1101. In this case, the subject information necessary for generating the arbitrary viewpoint image is insufficient only with the information of the image captured at the capturing position 1101. That is, it is necessary to create an arbitrary viewpoint image by inputting information of images captured from a plurality of positions including the shooting position 1102 and the shooting position 1103 as well as the shooting position 1101. In the following, shooting at the shooting position 1101 is called “shooting 1”, shooting at the shooting position 1102 is called “shooting 2”, and shooting at the shooting position 1103 is called “shooting 3”. In the present embodiment, a processing example of generating an arbitrary viewpoint image based on three images will be described.

図１１を参照して、撮影画像から任意視点画像を作成する処理について説明する。図１１（Ａ）は、撮影１から撮影３のそれぞれの撮影位置にて得られる撮影画像を表した図である。図１１（Ｂ）は、任意視点画像を表した図である。撮影１、撮影２、撮影３の各撮影位置にて撮影される画像から、任意視点画像の位置に対応させる様に画像変形を行う必要がある。撮影画像を任意視点位置での画像に変形する為には、図１１（Ｃ）に示す撮影１から撮影３での各距離ＭＡＰが必要となる。すなわち、撮影画像の各画素に結像する被写体の撮影距離を表した距離ＭＡＰがそれぞれの撮影１から撮影３において必要である。 Processing for creating an arbitrary viewpoint image from a captured image will be described with reference to FIG. 11. FIG. 11A is a diagram showing photographed images obtained at photographing positions of photographing 1 to photographing 3. FIG. 11B is a diagram showing an arbitrary viewpoint image. It is necessary to perform image transformation from the images photographed at the photographing positions of photographing 1, photographing 2, and photographing 3 so as to correspond to the position of the arbitrary viewpoint image. In order to transform the photographed image into an image at an arbitrary viewpoint position, each distance MAP in photographing 1 to photographing 3 shown in FIG. 11C is required. That is, the distance MAP representing the shooting distance of the subject formed on each pixel of the shot image is required in each of Shooting 1 to Shooting 3.

複数の撮影画像に基づいて、任意視点位置での画像に変形する処理では、まず、撮影画像の座標（ｘ，ｙ）を実空間座標（Ｘ，Ｙ，Ｚ）に変換する処理が行われる。座標（ｘ，ｙ）は２次元直交座標を表し、実空間座標（Ｘ，Ｙ，Ｚ）は３次元直交座標を表す。撮影画像の位置座標を（ｘ，ｙ）とし、位置座標（ｘ，ｙ）に対応する距離ＭＡＰの距離値をＺと記すと、実空間座標（Ｘ，Ｙ，Ｚ）は下記の（式３）で表される。“ｐｐ”は撮影画像の画素ピッチを表し、“ｆ”は撮影時の焦点距離を表す。なお、位置座標（ｘ，ｙ）は、画像中心を原点（０，０）として示した座標である。
In the process of transforming an image at an arbitrary viewpoint position based on a plurality of captured images, first, a process of converting coordinates (x, y) of the captured image into real space coordinates (X, Y, Z) is performed. Coordinates (x, y) represent two-dimensional Cartesian coordinates, and real space coordinates (X, Y, Z) represent three-dimensional Cartesian coordinates. Letting the position coordinates of the captured image be (x, y) and the distance value of the distance MAP corresponding to the position coordinates (x, y) be Z, the real space coordinates (X, Y, Z) are ). "Pp" represents the pixel pitch of the captured image, and "f" represents the focal length at the time of capturing. The position coordinates (x, y) are coordinates with the center of the image as the origin (0, 0).

図１６を参照して、（式３）の演算処理について説明する。図１６（Ａ）は、撮影画像上の座標（ｘ，ｙ）と実空間上の座標（Ｘ，Ｙ，Ｚ）との対応関係を示した図である。撮影画像である投影面１５０１を線分で示す。（式３）は、図１６（Ａ）で示した三角形の相似関係に基づいて算出される。（式３）と同じ変数を用いて、三角形の相似関係を下記の（式４）の様に表す事ができる。
θは頂点を同じくする相似三角形同士の頂角を表す。図１６（Ａ）では、撮影画像である投影面１５０１をカメラ側に示すが、図１６（Ｂ）では、撮影画像である投影面１５０２が実空間側（カメラの前側）にある場合を示す。この場合にも考え方は前記と同じである。以降、前記（式３）による座標変換処理を「逆投影変換処理」と称する。しかし、本実施形態に適用可能な逆投影変換処理としては、上記に限らない。
The calculation processing of (Equation 3) will be described with reference to FIG. FIG. 16A is a diagram showing a correspondence relationship between coordinates (x, y) on the captured image and coordinates (X, Y, Z) on the real space. A projection plane 1501 which is a captured image is shown by a line segment. (Equation 3) is calculated based on the similarity relationship of the triangles shown in FIG. Using the same variables as in (Equation 3), the similarity relationship of triangles can be expressed as in (Equation 4) below.
θ represents the apex angle between similar triangles having the same vertex. In FIG. 16A, the projection surface 1501 that is a captured image is shown on the camera side, but in FIG. 16B, the projection surface 1502 that is a captured image is on the real space side (front side of the camera). In this case, the idea is the same as above. Hereinafter, the coordinate conversion process according to the above (formula 3) is referred to as “back projection conversion process”. However, the back projection conversion process applicable to this embodiment is not limited to the above.

次に、実空間座標（Ｘ，Ｙ，Ｚ）を、任意視点位置からの実空間座標（Ｘ^＊，Ｙ^＊，Ｚ^＊）に変換する処理が実行される。具体的には撮影位置から任意視点位置までのｘ，ｙ，ｚ軸を中心とした回転成分と、ｘ，ｙ，ｚ方向のシフト成分を加味して座標変換が行われる。ｘ，ｙ，ｚ軸を中心とした回転成分による行列要素をｒ_１１からｒ_３３と記す。回転前のｘ，ｙ，ｚ方向のシフト成分をｔ_１からｔ_３と記し、回転後のｘ，ｙ，ｚ方向のシフト成分をｔ^＊ _１からｔ^＊ _３と記す。実空間座標（Ｘ，Ｙ，Ｚ）を、任意視点位置からの実空間座標（Ｘ^＊，Ｙ^＊，Ｚ^＊）に変換する処理では、下記の（式５）に示す計算が行われる。
Next, a process of converting the real space coordinates (X, Y, Z) into the real space coordinates (X ^* , Y ^* , Z ^* ) from the arbitrary viewpoint position is executed. Specifically, coordinate conversion is performed by taking into account the rotation component from the shooting position to the arbitrary viewpoint position around the x, y, and z axes and the shift component in the x, y, and z directions. Matrix elements based on rotation components around the x, y, and z axes are denoted as r ₁₁ to r ₃₃ . It noted before rotation x, y, and z direction shift component and _{t 3} from _{t 1,} referred x after the rotation, y, and z direction shift component and ^t _{* 3} from ^t _{* 1.} In the process of converting the real space coordinates (X, Y, Z) into the real space coordinates (X ^* , Y ^* , Z ^* ) from the arbitrary viewpoint position, the calculation shown in (Formula 5) below is performed.

以降、（式５）による座標変換処理を、「実空間移動変換処理」と称する。ただし、本実施形態に適用出来る実空間移動変換処理は、これに限らない。撮影画像から任意視点位置での画像への回転成分およびシフト成分の算出処理については、本実施形態にて、画像間の対応点探索を用いた撮影時の姿勢推定等の公知技術を用いて行われる。この処理により撮影位置が算出され、撮影位置から任意視点位置までの移動量が算出される（特許文献３参照）。 Hereinafter, the coordinate conversion process according to (Equation 5) is referred to as “real space movement conversion process”. However, the real space movement conversion processing applicable to this embodiment is not limited to this. The calculation processing of the rotation component and the shift component from the captured image to the image at the arbitrary viewpoint position is performed by using a known technique such as posture estimation at the time of capturing using the corresponding point search between images in the present embodiment. Be seen. By this processing, the shooting position is calculated, and the amount of movement from the shooting position to the arbitrary viewpoint position is calculated (see Patent Document 3).

最後に、実空間座標（Ｘ^＊，Ｙ^＊，Ｚ^＊）を任意視点画像上の座標（ｘ^＊，ｙ^＊）に変換する処理が実行される。この変換は下記の（式６）を用いて行われる。ここで、座標（ｘ^＊，ｙ^＊）は画像中心を原点位置（０，０）として示した２次元座標である。“ｐｐ^＊”は任意視点画像の画素ピッチを表し、“ｆ^＊”は任意視点画像の焦点距離を表す。
以降、（式６）による座標変換処理を「投影変換処理」と称する。ただし、本実施形態に適用出来る投影変換処理はこれに限らない。（式３）から（式６）までの演算によって示した、撮影画像を任意視点位置での画像に変形する処理を図示したものが図１６（Ｃ）である。 Finally, the process of converting the real space coordinates (X ^* , Y ^* , Z ^* ) into the coordinates (x ^* , y ^* ) on the arbitrary viewpoint image is executed. This conversion is performed using (Equation 6) below. Here, the coordinates (x ^* , y ^* ) are two-dimensional coordinates with the center of the image as the origin position (0, 0). "Pp ^* " represents the pixel pitch of the arbitrary viewpoint image, and "f ^* " represents the focal length of the arbitrary viewpoint image.
Hereinafter, the coordinate conversion process according to (Equation 6) is referred to as “projection conversion process”. However, the projection conversion process applicable to this embodiment is not limited to this. FIG. 16C illustrates a process of transforming a captured image into an image at an arbitrary viewpoint position, which is represented by the calculations of (Formula 3) to (Formula 6).

以上のように、撮影画像の座標（ｘ，ｙ）は（式３）の逆投影変換処理によって、実空間座標（Ｘ，Ｙ，Ｚ）に変換され、その後、（式５）の実空間移動変換処理により、任意視点位置の実空間座標（Ｘ^＊，Ｙ^＊，Ｚ^＊）に変換される。最後に（式６）の投影変換処理により、任意視点画像の座標（ｘ^＊，ｙ^＊）への変換処理が実行される。こうして、撮影画像を任意視点位置での画像に変形する処理が終了する。なお、前記座標変換により、撮影画像の画素値から任意視点画像の画素値を生成する処理においては、一般的なバイリニア補間やバイキュービック補間が用いられる。以上で図４のＳ４０５での任意視点画像の生成処理についての説明を終了する。 As described above, the coordinates (x, y) of the captured image are converted into the real space coordinates (X, Y, Z) by the back projection conversion processing of (Expression 3), and then the real space movement of (Expression 5) is performed. By the conversion process, it is converted into the real space coordinates (X ^* , Y ^* , Z ^* ) of the arbitrary viewpoint position. Finally, by the projection conversion process of (Equation 6), the conversion process to the coordinates (x ^* , y ^* ) of the arbitrary viewpoint image is executed. Thus, the process of transforming the captured image into an image at the arbitrary viewpoint position is completed. In the process of generating the pixel value of the arbitrary viewpoint image from the pixel value of the captured image by the coordinate conversion, general bilinear interpolation or bicubic interpolation is used. This is the end of the description of the arbitrary viewpoint image generation processing in S405 of FIG.

次に、図４のＳ４０６の任意視点距離ＭＡＰの生成処理を詳細に説明する。図１２、図１３を参照して、任意視点画像と任意視点距離ＭＡＰとの関係について説明する。図１２（Ａ）には、図１１（Ａ）と同様に撮影１から撮影３までの各撮影画像を示す。まず、撮影画像の各位置に対応する距離ＭＡＰが、図１３（Ａ）に示す様に生成される。図１３（Ａ）は撮影１から撮影３での各距離ＭＡＰを示している。 Next, the process of generating the arbitrary viewpoint distance MAP in S406 of FIG. 4 will be described in detail. The relationship between the arbitrary viewpoint image and the arbitrary viewpoint distance MAP will be described with reference to FIGS. 12 and 13. FIG. 12(A) shows captured images from photography 1 to photography 3 as in FIG. 11(A). First, the distance MAP corresponding to each position of the captured image is generated as shown in FIG. FIG. 13(A) shows each distance MAP from photographing 1 to photographing 3.

図１２（Ｂ）は、図１２（Ａ）に示す各撮影画像と、図１３（Ａ）に示す距離ＭＡＰから生成される、それぞれの任意視点画像を示す。黒色で示す領域１３０１は、撮影画像から情報が得られなかった任意視点画像の領域を表している。このような領域を「オクルージョン領域」と称する。 FIG. 12B shows each captured image shown in FIG. 12A and each arbitrary viewpoint image generated from the distance MAP shown in FIG. 13A. A region 1301 shown in black represents a region of the arbitrary viewpoint image for which information was not obtained from the captured image. Such an area is called an “occlusion area”.

図１２（Ｃ）は、オクルージョン領域の無い任意視点画像を例示する。このような任意視点画像を得るためには、各撮影画像から生成した複数の任意視点画像を合成して画像を生成する必要がある。その際、図１３（Ｂ）に示すように、任意視点画像にそれぞれ対応した距離ＭＡＰ、つまり任意視点距離ＭＡＰが必要となる。図１３（Ｂ）にて、網掛けのハッチングを付して示す領域１３０２は、オクルージョン領域である。図１３（Ｃ）は、オクルージョン領域を無くした任意視点距離ＭＡＰを示す。このような任意視点距離ＭＡＰを得るためには、任意視点画像の場合と同じく、各撮影画像から生成した任意視点距離ＭＡＰを合成して距離ＭＡＰを生成する必要がある。また、任意視点距離ＭＡＰの合成についても、記録済みの任意視点距離ＭＡＰを必要とする。 FIG. 12C illustrates an arbitrary viewpoint image without an occlusion area. In order to obtain such an arbitrary viewpoint image, it is necessary to combine a plurality of arbitrary viewpoint images generated from each captured image to generate an image. At that time, as shown in FIG. 13B, a distance MAP corresponding to each arbitrary viewpoint image, that is, an arbitrary viewpoint distance MAP is required. In FIG. 13B, a region 1302 shown by hatching is an occlusion region. FIG. 13C shows the arbitrary viewpoint distance MAP without the occlusion area. In order to obtain such an arbitrary viewpoint distance MAP, it is necessary to synthesize the arbitrary viewpoint distance MAP generated from each captured image to generate the distance MAP, as in the case of the arbitrary viewpoint image. Further, the composition of the arbitrary viewpoint distance MAP also requires the recorded arbitrary viewpoint distance MAP.

ここで撮影画像から生成した距離ＭＡＰを用いて、任意視点距離ＭＡＰを生成する処理について説明する。
距離ＭＡＰの座標を（ｘ，ｙ）とし、任意視点距離ＭＡＰの座標を（ｘ^＊，ｙ^＊）とする。座標変換処理については、任意視点画像の生成方法で説明した前記（式３）、（式５）、（式６）の場合と同様の方法で行われる。よって、それらの詳細な説明は省略し、任意視点画像の場合との相違点を説明すると、任意視点距離ＭＡＰの値を、前記（式５）のＺ^＊の値に変換する処理を行う点である。この処理を行う理由は、任意視点距離ＭＡＰとして任意視点位置からの撮影距離に変換する必要があることによる。 Here, a process of generating the arbitrary viewpoint distance MAP using the distance MAP generated from the captured image will be described.
The coordinates of the distance MAP are (x, y), and the coordinates of the arbitrary viewpoint distance MAP are (x ^* , y ^* ). The coordinate conversion process is performed by the same method as in the case of the above-described (formula 3), (formula 5), and (formula 6) described in the method of generating an arbitrary viewpoint image. Therefore, a detailed description thereof will be omitted, and the difference from the case of the arbitrary viewpoint image will be described. In terms of performing the process of converting the value of the arbitrary viewpoint distance MAP into the value of Z ^{* in} (Equation 5). is there. The reason for performing this processing is that it is necessary to convert the shooting distance from the arbitrary viewpoint position as the arbitrary viewpoint distance MAP.

次に図４のＳ４０８に示した、合成用のαＭＡＰの生成処理について詳細に説明する。
各撮影画像から生成した任意視点画像、任意視点距離ＭＡＰを合成する目的は、前述した様にオクルージョン領域を無くした任意視点画像、任意視点距離ＭＡＰを生成することである。図１４を参照して合成処理全体の流れについて説明する。 Next, a detailed description will be given of the generation process of the αMAP for synthesis shown in S408 of FIG.
The purpose of synthesizing the arbitrary viewpoint image and the arbitrary viewpoint distance MAP generated from each captured image is to generate the arbitrary viewpoint image and the arbitrary viewpoint distance MAP without the occlusion area as described above. The overall flow of the combining process will be described with reference to FIG.

図１４は逐次合成処理を実行することにより、オクルージョン領域を無くした任意視点画像、任意視点距離ＭＡＰを生成する処理の流れを示した図である。図１４は、撮影１と撮影２との、任意視点画像および任意視点距離ＭＡＰの合成処理をそれぞれ示す。図１５はさらに、撮影１と撮影２との合成結果である任意視点画像および任意視点距離ＭＡＰと、撮影３の任意視点画像および任意視点距離ＭＡＰとの合成処理をそれぞれ示す。 FIG. 14 is a diagram showing a flow of processing for generating an arbitrary viewpoint image and an arbitrary viewpoint distance MAP in which the occlusion area is eliminated by executing the sequential combining processing. FIG. 14 shows a process of synthesizing an arbitrary viewpoint image and an arbitrary viewpoint distance MAP between photographing 1 and photographing 2, respectively. FIG. 15 further shows a combining process of the arbitrary viewpoint image and the arbitrary viewpoint distance MAP, which are the combined result of the photographing 1 and the photographing 2, and the arbitrary viewpoint image and the arbitrary viewpoint distance MAP of the photographing 3, respectively.

図１４に示すように、撮影１と撮影２との間で任意視点画像が合成される。これにより、黒色領域で示したオクルージョン領域の大きさを減らすことができる。合成は、撮影１と撮影２との各任意視点距離ＭＡＰから生成した合成用のαＭＡＰに基づいて行われる。合成用のαＭＡＰにおいて白色で示した領域はαＭＡＰ値を１としている領域であり、撮影１の任意視点画像に対し、撮影２の任意視点画像が合成される領域を示している。αＭＡＰ値を１としている領域において、撮影１の任意視点距離ＭＡＰでのオクルージョン領域は、撮影２の任意視点距離ＭＡＰではオクルージョン領域でない領域である。また、任意視点距離ＭＡＰ内に点線枠で示した領域１４０１では、撮影１に比べて撮影２の任意視点距離ＭＡＰの値が小さい。つまり任意視点位置から見た場合、撮影１に比べて撮影２の方が近い距離、短い距離に位置していることを示す領域についても、αＭＡＰ値が１に設定される。 As shown in FIG. 14, an arbitrary viewpoint image is combined between shooting 1 and shooting 2. As a result, the size of the occlusion area shown by the black area can be reduced. The combination is performed based on the combination αMAP generated from the arbitrary viewpoint distances MAP between the photographing 1 and the photographing 2. The white area in the combining αMAP is an area in which the αMAP value is 1, and indicates the area in which the arbitrary viewpoint image of shooting 1 is combined with the arbitrary viewpoint image of shooting 1. In the area where the αMAP value is 1, the occlusion area at the arbitrary viewpoint distance MAP of the shooting 1 is an area that is not the occlusion area at the arbitrary viewpoint distance MAP of the shooting 2. Further, in the area 1401 indicated by the dotted frame within the arbitrary viewpoint distance MAP, the value of the arbitrary viewpoint distance MAP of the photographing 2 is smaller than that of the photographing 1. That is, when viewed from the arbitrary viewpoint position, the αMAP value is set to 1 even in the area indicating that the shooting 2 is located at a shorter distance or a shorter distance than the shooting 1.

次に図１５に示すように、撮影１と撮影２とで合成した結果と、撮影３との合成処理が同様に行われる。これにより、図１４に比較して、さらにオクルージョン領域の大きさが低減された、任意視点画像および任意視点距離ＭＡＰを生成する事ができる。 Next, as shown in FIG. 15, the result of combining shooting 1 and shooting 2 and the combining processing of shooting 3 are similarly performed. As a result, it is possible to generate the arbitrary viewpoint image and the arbitrary viewpoint distance MAP in which the size of the occlusion area is further reduced as compared with FIG.

図３を参照して、合成用のαＭＡＰの生成処理について説明する。図３は、αＭＡＰ生成部２０８の構成例を示したブロック図である。αＭＡＰ生成部２０８は、距離ＭＡＰ値比較部３０１、オクルージョン判断部３０２、αＭＡＰ値生成部３０３を備える。
距離ＭＡＰ値比較部３０１は、ｋ番目の任意視点距離ＭＡＰの情報と、距離ＭＡＰ記録部２０７に記録されている任意視点距離ＭＡＰの情報を取得し、両情報を比較して比較結果をαＭＡＰ値生成部３０３に出力する。オクルージョン判断部３０２は、距離ＭＡＰ記録部２０７に記録されている任意視点距離ＭＡＰの情報を取得し、オクルージョン判断結果をαＭＡＰ値生成部３０３に出力する。αＭＡＰ値生成部３０３は、距離ＭＡＰ値比較部３０１とオクルージョン判断部３０２の各出力を取得し、合成用のαＭＡＰを出力する。 With reference to FIG. 3, the process of generating the αMAP for synthesis will be described. FIG. 3 is a block diagram showing a configuration example of the αMAP generation unit 208. The αMAP generation unit 208 includes a distance MAP value comparison unit 301, an occlusion determination unit 302, and an αMAP value generation unit 303.
The distance MAP value comparison unit 301 acquires information on the k-th arbitrary viewpoint distance MAP and information on the arbitrary viewpoint distance MAP recorded in the distance MAP recording unit 207, compares both information, and compares the comparison result with the αMAP value. Output to the generation unit 303. The occlusion determination unit 302 acquires information on the arbitrary viewpoint distance MAP recorded in the distance MAP recording unit 207, and outputs the occlusion determination result to the αMAP value generation unit 303. The αMAP value generation unit 303 acquires each output of the distance MAP value comparison unit 301 and the occlusion determination unit 302, and outputs the combination αMAP.

図５のフローチャートを参照して、合成用のαＭＡＰの生成処理について説明する。
Ｓ５０１でαＭＡＰ生成部２０８は、αＭＡＰを生成する上でＳ５０２以降の生成処理を行う際に用いる座標（ｘ，ｙ）を（０，０）に初期化する。Ｓ５０２では、ｋ番目の任意視点距離ＭＡＰの値であるＺ＿ｋ（ｘ，ｙ）を読み込む処理が実行される。距離ＭＡＰ値比較部３０１はＺ＿ｋ（ｘ，ｙ）を取得する。Ｓ５０３では、距離ＭＡＰ記録部２０７に記録されている任意視点距離ＭＡＰの値であるＺ＿ｍ（ｘ，ｙ）を読み込む処理が実行される。Ｚ＿ｍ（ｘ，ｙ）は、距離ＭＡＰ値比較部３０１およびオクルージョン判断部３０２が取得する。 With reference to the flowchart in FIG. 5, a process for generating a synthesis αMAP will be described.
In S501, the αMAP generation unit 208 initializes the coordinates (x, y) used when performing the generation process in S502 and subsequent steps to generate αMAP to (0, 0). In S502, a process of reading Z_k(x, y) that is the value of the k-th arbitrary viewpoint distance MAP is executed. The distance MAP value comparison unit 301 acquires Z_k(x, y). In S503, a process of reading Z_m(x, y) which is the value of the arbitrary viewpoint distance MAP recorded in the distance MAP recording unit 207 is executed. Z_m(x, y) is acquired by the distance MAP value comparison unit 301 and the occlusion determination unit 302.

Ｓ５０４でオクルージョン判断部３０２は、Ｚ＿ｍ（ｘ，ｙ）の値がＥＲ値であるか否かを判断する。ＥＲ値とはオクルージョン領域であることを示す値である。Ｚ＿ｍ（ｘ，ｙ）がＥＲ値である場合、記録されている任意視点画像の座標（ｘ，ｙ）はオクルージョン領域に属すると判断され、Ｓ５０５へ進む。また、Ｚ＿ｍ（ｘ，ｙ）がＥＲ値でない場合には、記録されている任意視点画像の座標（ｘ，ｙ）はオクルージョン領域に属さないと判断され、Ｓ５０６へ進む。なお、本実施形態にてＥＲ値については、数値表現が可能な範囲内において最大値として設定される。
Ｓ５０５でαＭＡＰ値生成部３０３はαマップ値を決定する。本例ではα（ｘ，ｙ）に１が代入される。αＭＡＰ値生成部３０３は、座標（ｘ，ｙ）に関してＺ＿ｋ（ｘ，ｙ）の値を出力すると判断した上で、Ｓ５０８へ進む。 In step S504, the occlusion determination unit 302 determines whether the value of Z_m(x, y) is the ER value. The ER value is a value indicating that it is an occlusion area. When Z_m(x, y) is the ER value, it is determined that the coordinates (x, y) of the recorded arbitrary viewpoint image belong to the occlusion area, and the process proceeds to S505. If Z_m(x, y) is not the ER value, it is determined that the coordinates (x, y) of the recorded arbitrary viewpoint image do not belong to the occlusion area, and the process proceeds to S506. In the present embodiment, the ER value is set as the maximum value within the range where numerical expression is possible.
In S505, the αMAP value generation unit 303 determines the α map value. In this example, 1 is substituted for α(x,y). The αMAP value generation unit 303 determines to output the value of Z_k(x, y) for the coordinates (x, y), and then proceeds to S508.

Ｓ５０６で距離ＭＡＰ値比較部３０１は、Ｚ＿ｍ（ｘ，ｙ）とＺ＿ｋ（ｘ，ｙ）の各値について大小を比較し、ｋ番目の任意視点画像の座標（ｘ，ｙ）における被写体が、記録されている任意視点画像に比べて、手前に位置しているか否かを判断する。Ｚ＿ｋ（ｘ，ｙ）の値がＺ＿ｍ（ｘ，ｙ）の値よりも小さい場合、つまりｋ番目の任意視点画像の座標（ｘ，ｙ）における被写体が、記録されている任意視点画像に比べて手前に位置していると判断された場合、Ｓ５０５へ進む。反対にＺ＿ｋ（ｘ，ｙ）の値がＺ＿ｍ（ｘ，ｙ）の値以上である場合には、Ｓ５０７へ進む。 In step S506, the distance MAP value comparison unit 301 compares the values of Z_m(x, y) and Z_k(x, y) with each other, and the subject at the coordinate (x, y) of the k-th arbitrary viewpoint image is recorded. It is determined whether or not it is located in front of the arbitrary viewpoint image being displayed. When the value of Z_k(x, y) is smaller than the value of Z_m(x, y), that is, the subject at the coordinates (x, y) of the k-th arbitrary viewpoint image is compared with the recorded arbitrary viewpoint image. If it is determined that it is located in front, the process proceeds to S505. On the contrary, when the value of Z_k(x, y) is equal to or more than the value of Z_m(x, y), the process proceeds to S507.

Ｓ５０７でαＭＡＰ値生成部３０３はαマップ値を決定する。本例ではα（ｘ，ｙ）に０が代入される。αＭＡＰ値生成部３０３は、座標（ｘ，ｙ）に関してＺ＿ｋ（ｘ，ｙ）の値を出力しないと判断した上で、Ｓ５０８へ進む。Ｓ５０８は、任意視点画像に関する全ての座標に対し、Ｓ５０２からＳ５０７までの処理が行われたか否かの判断処理である。αＭＡＰ生成部２０８は、全ての座標に対して処理を行ったと判断した場合、前記した一連の処理を終了する。未処理の座標がある場合には、Ｓ５０９へ移行する。Ｓ５０９でαＭＡＰ生成部２０８は、処理対象となる座標（ｘ，ｙ）の値を更新し、Ｓ５０２へ移行して処理を続行する。 In S507, the αMAP value generation unit 303 determines the α map value. In this example, 0 is substituted for α(x,y). The αMAP value generation unit 303 determines not to output the value of Z_k(x, y) for the coordinate (x, y), and then proceeds to S508. S508 is a process of determining whether or not the processes of S502 to S507 have been performed on all the coordinates of the arbitrary viewpoint image. If the αMAP generation unit 208 determines that the processing has been performed for all the coordinates, the series of processing described above ends. If there are unprocessed coordinates, the process proceeds to S509. In S509, the αMAP generation unit 208 updates the value of the coordinate (x, y) to be processed, shifts to S502, and continues the processing.

本実施形態では、距離情報取得部である距離ＭＡＰ生成部２０３が画像取得部２０１から画像データを取得して奥行き方向の距離情報を生成し、画像変形部２０４は座標変換により画像の変形処理を行って任意視点画像を生成する。距離情報変形部である距離ＭＡＰ変形部２０５は、距離情報取得部から距離情報を取得して座標変換により変形処理を行って任意視点距離ＭＡＰを生成する。合成情報生成部であるαＭＡＰ生成部２０８は、変形処理された距離情報を取得して合成用のαＭＡＰの情報を生成する。合成用のαＭＡＰの情報は、例えば、図５（Ｓ５０５、Ｓ５０７参照）に示すように１または０の２値である。画像合成部２０９は、合成用のαＭＡＰの情報を用いて、変形処理された複数の画像を合成する。本実施形態によれば、任意視点画像および任意視点距離ＭＡＰに係る逐次処理により、オクルージョン領域を適切に補間し、オクルージョン領域の大きさが低減された任意視点画像および任意視点距離ＭＡＰを生成することができる。 In the present embodiment, the distance MAP generation unit 203, which is a distance information acquisition unit, acquires image data from the image acquisition unit 201 and generates distance information in the depth direction, and the image transformation unit 204 performs image transformation processing by coordinate transformation. To generate an arbitrary viewpoint image. The distance MAP deforming unit 205, which is a distance information deforming unit, acquires the distance information from the distance information acquiring unit and performs a deformation process by coordinate conversion to generate an arbitrary viewpoint distance MAP. The αMAP generation unit 208, which is a combination information generation unit, acquires the modified distance information and generates the combination αMAP information. The information of the αMAP for synthesis is, for example, a binary value of 1 or 0 as shown in FIG. 5 (see S505 and S507). The image composition unit 209 composes a plurality of modified images using the information of the αMAP for composition. According to the present embodiment, the occlusion area is appropriately interpolated by the sequential processing related to the arbitrary viewpoint image and the arbitrary viewpoint distance MAP to generate the arbitrary viewpoint image and the arbitrary viewpoint distance MAP in which the size of the occlusion area is reduced. You can

本実施形態では、合成用のαＭＡＰの情報を２値に設定した例を説明したが、このような例に限定されない。例えば、距離情報記録部である距離ＭＡＰ記録部２０７に記録された距離情報の座標（ｘ，ｙ）がオクルージョン領域に属すると合成情報生成部が判断した場合、α（ｘ，ｙ）の値を相対的に大きく設定する。つまり当該座標での、変形処理された画像の画素値や変形処理された第１の距離情報の示す距離値の合成時の割合が大きくなる。また合成情報生成部は、前記座標（ｘ，ｙ）がオクルージョン領域に属さない場合において、変形処理された第１の距離情報の示す距離値と、距離情報記録部に記録された第２の距離情報の示す距離値を比較し、比較結果からα（ｘ，ｙ）の値を決定する。この場合、第１の距離情報の示す距離値が第２の距離情報の示す距離値よりも小さいときには、α（ｘ，ｙ）の値が相対的に大きく設定される。また第１の距離情報の示す距離値が、第２の距離情報の示す距離値以上であるときには、α（ｘ，ｙ）の値が相対的に小さく設定される。このような合成時の割合の設定処理に関する変更については後述の実施形態でも同じである。
本実施形態では、画像データから視点変更後の画像データを生成するために行われた座標変換（幾何変形）を当該画像データに対応する距離ＭＡＰにも反映させた。しかし、これに限らず、画像データに対して行う歪み補正、ノイズリダクション処理、他の座標変換処理等、種々の画像処理を当該画像データに対応する距離ＭＡＰにも反映させて、距離ＭＡＰをより画像データに精度良く対応づけてもよい。このような種々の画像処理は、本実施形態では、例えば図４のＳ４０６にて、座標変換に代わって、あるいは加えて行われる。特に、撮像光学系や撮像素子、各処理回路等の装置や撮像条件に起因したノイズ、歪み、欠陥などを補正する補正処理は、現像される画像データだけでなく、距離ＭＡＰにも行われることが好ましい。
また、観賞用の画像データ及び距離ＭＡＰ両方に対して幾何変形や補正処理を行うことは、視点変更後の画像データを生成させるために限らず、種々の用途に応用できる。例えば、観賞用の画像データへ前述したような装置や撮像条件に起因した幾何変形や補正処理を行い、現像処理を行って表示、記録する場合、同様の幾何変形や補正処理を距離ＭＡＰにも行い、該画像データに対応する距離ＭＡＰとして記録することが有効である。 In the present embodiment, an example in which the information of the αMAP for synthesis is set to a binary value has been described, but the present invention is not limited to such an example. For example, when the composite information generation unit determines that the coordinates (x, y) of the distance information recorded in the distance MAP recording unit 207, which is the distance information recording unit, belongs to the occlusion area, the value of α(x, y) is set to Set relatively large. That is, the ratio of the pixel value of the image subjected to the deformation process and the distance value indicated by the first distance information subjected to the deformation process at the coordinate becomes large. In addition, when the coordinates (x, y) do not belong to the occlusion area, the composite information generation unit may calculate the distance value indicated by the modified first distance information and the second distance recorded in the distance information recording unit. The distance values indicated by the information are compared, and the value of α(x, y) is determined from the comparison result. In this case, when the distance value indicated by the first distance information is smaller than the distance value indicated by the second distance information, the value of α(x, y) is set relatively large. Further, when the distance value indicated by the first distance information is greater than or equal to the distance value indicated by the second distance information, the value of α(x, y) is set relatively small. Such changes regarding the setting process of the ratio at the time of combination are the same in the embodiments described later.
In the present embodiment, the coordinate transformation (geometrical deformation) performed to generate the image data after changing the viewpoint from the image data is also reflected in the distance MAP corresponding to the image data. However, the present invention is not limited to this, and various image processes such as distortion correction, noise reduction process, and other coordinate conversion process performed on the image data are reflected on the distance MAP corresponding to the image data to further improve the distance MAP. It may be accurately associated with the image data. In the present embodiment, such various kinds of image processing are performed in place of or in addition to the coordinate conversion, for example, in S406 of FIG. In particular, correction processing for correcting noise, distortion, defects and the like caused by devices such as an image pickup optical system, an image pickup element, each processing circuit, and image pickup conditions should be performed not only on image data to be developed but also on the distance MAP. Is preferred.
In addition, performing geometric deformation and correction processing on both the ornamental image data and the distance MAP is not limited to generating the image data after the viewpoint change, but can be applied to various applications. For example, when geometrical deformation or correction processing is performed on ornamental image data due to the above-described device or imaging condition, and development processing is performed to display and record, similar geometrical deformation or correction processing is performed on the distance MAP. It is effective to carry out and record as the distance MAP corresponding to the image data.

［第２実施形態］
次に本発明の第２実施形態を説明する。本実施形態では、第１実施形態で説明した構成に加え、距離ＭＡＰ生成時の信頼度に関する構成部を備える。撮影画像から距離ＭＡＰを生成する際に、距離ＭＡＰに対応する信頼度情報として信頼度ＭＡＰが同時に生成される。信頼度ＭＡＰは、距離ＭＡＰや撮影画像と同様に、任意視点位置に関して変形処理および合成処理が行われる。以下では、第１実施形態の場合と同様の構成要素については既に使用した符号を用いることでそれらの詳細な説明を省略し、第１実施形態との相違点を中心に説明する。 [Second Embodiment]
Next, a second embodiment of the present invention will be described. In the present embodiment, in addition to the configuration described in the first embodiment, a configuration unit relating to the reliability at the time of generating the distance MAP is provided. When the distance MAP is generated from the captured image, the reliability MAP is simultaneously generated as the reliability information corresponding to the distance MAP. As for the reliability MAP, the deformation process and the combining process are performed on the arbitrary viewpoint position, similarly to the distance MAP and the captured image. In the following, the same components as those in the first embodiment will be denoted by the same reference numerals as used above, and a detailed description thereof will be omitted. Differences from the first embodiment will be mainly described.

図１７を参照して、信頼度ＭＡＰについて説明する。図１７（Ａ）は、撮影画像から生成される距離ＭＡＰを例示する。距離ＭＡＰの生成方法としては、例えば瞳分割された複数の画像や視点の異なる画像同士のＳＡＤ値による相関演算を用いた公知の方法とする。図１７（Ａ）には、撮像装置からの距離が近い被写体領域１７０１、および撮像装置からの距離が遠い被写体領域１７０２に係る距離ＭＡＰを例示する。前記方法を用いた場合、図１７（Ｂ）に示す斜線部の境界領域１６０１における相関演算の結果が問題となる。つまり、撮像装置からの距離が近い被写体領域１７０１と、距離が遠い被写体領域１７０２との境界領域１６０１では、相関演算の結果が正確でない場合があり得る。そこで、本実施形態では、距離ＭＡＰ値として信頼できるか否かを示す信頼度ＭＡＰを使用する。図１７（Ｃ）は信頼度ＭＡＰを例示する。例えば、図１７（Ｃ）に示す領域１６０２は、図１７（Ｂ）で示した斜線部の境界領域１６０１に対応する。領域１６０２ではその周辺に比べ、信頼度ＭＡＰの値が小さく設定される。相関演算の結果が正確でない領域としては、前記のような境界領域だけでなく、低コントラストや同じテクスチャーが画像内にいくつも存在する繰り返しパターンの被写体領域等がある。このような領域についても同様に、信頼度ＭＡＰの値が小さく設定される。信頼度の値が大きいほど、対応する距離マップの値をより信頼できることを意味する。信頼度ＭＡＰの生成方法については、相関演算によるＳＡＤ値から判断する方法等がある。例えば、ＤＦＤ方法では、信頼度とは、デフォーカスの影響を表す値である。デフォーカスの影響が出にくい領域は、ＤＦＤ法で距離を算出するための手がかりに乏しい領域であることを意味するので、信頼度の値が小さい領域は、算出された距離値が正確ではない領域である。 The reliability MAP will be described with reference to FIG. FIG. 17A illustrates the distance MAP generated from the captured image. As a method of generating the distance MAP, for example, a known method using a correlation calculation using SAD values of a plurality of pupil-divided images or images having different viewpoints is used. FIG. 17A illustrates a distance MAP relating to a subject area 1701 that is close to the image capturing apparatus and a subject area 1702 that is far from the image capturing apparatus. When the above method is used, the result of the correlation calculation in the shaded boundary region 1601 shown in FIG. That is, the result of the correlation calculation may not be accurate in the boundary area 1601 between the subject area 1701 having a short distance from the imaging device and the subject area 1702 having a long distance. Therefore, in this embodiment, the reliability MAP indicating whether or not the distance is reliable is used as the distance MAP value. FIG. 17C illustrates the reliability MAP. For example, a region 1602 shown in FIG. 17C corresponds to the shaded boundary region 1601 shown in FIG. 17B. In the area 1602, the value of the reliability MAP is set smaller than that in the surrounding area. The region where the result of the correlation calculation is not accurate includes not only the boundary region as described above, but also a subject region having a repeated pattern in which low contrast and the same texture are present in the image many times. Similarly, for such a region, the value of the reliability MAP is set small. The larger the reliability value, the more reliable the value of the corresponding distance map. As a method of generating the reliability MAP, there is a method of judging from the SAD value by a correlation calculation. For example, in the DFD method, the reliability is a value representing the influence of defocus. A region that is less likely to be affected by defocus means a region that has few clues for calculating the distance by the DFD method. Therefore, a region having a small reliability value is a region where the calculated distance value is not accurate. Is.

図６のブロック図を参照して、本実施形態に係る画像処理部６０４（図１参照）内の構成について詳細に説明する。図２に示す構成との相違点は、距離ＭＡＰ・信頼度ＭＡＰ生成部７０３、信頼度ＭＡＰ変形部７０８、信頼度ＭＡＰ記録部７０９、αＭＡＰ生成部７１０、距離ＭＡＰ合成部７１２、信頼度ＭＡＰ合成部７１３である。その他の構成要素については、図２に示す各部に付した符号と同じ符号を使用することで、それらの詳細な説明を割愛する。 The internal configuration of the image processing unit 604 (see FIG. 1) according to the present embodiment will be described in detail with reference to the block diagram of FIG. The difference from the configuration shown in FIG. 2 is that the distance MAP/reliability MAP generation unit 703, the reliability MAP modification unit 708, the reliability MAP recording unit 709, the αMAP generation unit 710, the distance MAP combination unit 712, and the reliability MAP combination. It is a part 713. With respect to the other components, the same reference numerals as those given to the respective units shown in FIG. 2 are used, and the detailed description thereof will be omitted.

距離ＭＡＰ・信頼度ＭＡＰ生成部７０３は、画像取得部２０１の出力する画像データを取得し、距離ＭＡＰおよび信頼度ＭＡＰを生成する。距離ＭＡＰのデータは画像変形部２０４、距離ＭＡＰ変形部２０５に出力され、信頼度ＭＡＰのデータは信頼度ＭＡＰ変形部７０８に出力される。信頼度ＭＡＰ変形部７０８は変形処理後の信頼度ＭＡＰのデータを信頼度ＭＡＰ合成部７１３に出力する。信頼度ＭＡＰ合成部７１３は、信頼度ＭＡＰ変形部７０８、信頼度ＭＡＰ記録部７０９、αＭＡＰ生成部７１０の各出力を取得し、信頼度合成処理後の信頼度ＭＡＰのデータを信頼度ＭＡＰ記録部７０９に出力する。 The distance MAP/reliability MAP generation unit 703 acquires the image data output by the image acquisition unit 201 and generates the distance MAP and the reliability MAP. The data of the distance MAP is output to the image transforming unit 204 and the distance MAP transforming unit 205, and the data of the reliability MAP is output to the reliability MAP transforming unit 708. The reliability MAP modification unit 708 outputs the data of the reliability MAP after the modification process to the reliability MAP synthesis unit 713. The reliability MAP combining unit 713 acquires the outputs of the reliability MAP transforming unit 708, the reliability MAP recording unit 709, and the αMAP generating unit 710, and outputs the reliability MAP data after the reliability combining process to the reliability MAP recording unit. Output to 709.

図８のフローチャートを参照して、各部の処理について説明する。
撮影が開始するとＳ９０１で画像処理部６０４は、逐次処理で用いる変数ｋに１を代入して初期化する。Ｓ９０２で画像取得部２０１は、ｋ番目の撮影画像の取得処理を行う。Ｓ９０３で距離ＭＡＰ・信頼度ＭＡＰ生成部７０３は、Ｓ９０２で取得されたｋ番目の画像に対応するｋ番目の距離ＭＡＰ、およびｋ番目の距離ＭＡＰに対応する信頼度ＭＡＰの各データを生成する。距離ＭＡＰと信頼度ＭＡＰの生成処理については公知の方法を用いる（特許文献２）。 The processing of each unit will be described with reference to the flowchart in FIG.
When image capturing is started, the image processing unit 604 initializes by substituting 1 into the variable k used in the sequential processing in step S901. In step S902, the image acquisition unit 201 performs a process of acquiring the kth captured image. In S903, the distance MAP/reliability MAP generation unit 703 generates each data of the kth distance MAP corresponding to the kth image acquired in S902 and the reliability MAP corresponding to the kth distance MAP. A known method is used for the generation processing of the distance MAP and the reliability MAP (Patent Document 2).

Ｓ９０４で画像現像部２０２は、Ｓ９０２で入力されたｋ番目の画像データに対して現像処理を行う。Ｓ９０５で画像変形部２０４は、Ｓ９０３で生成されたｋ番目の距離ＭＡＰを用いて、Ｓ９０４で現像されたｋ番目の画像に対する変形処理を行い、ｋ番目の任意視点画像のデータを生成する。Ｓ９０６で距離ＭＡＰ・信頼度ＭＡＰ生成部７０３は、Ｓ９０３で生成されたｋ番目の距離ＭＡＰを入力として、ｋ番目の任意視点距離ＭＡＰを生成する。なお、現像処理、任意視点画像データの生成処理、任意視点距離ＭＡＰの生成処理については、第１実施形態の場合と同様である。 In step S904, the image development unit 202 performs development processing on the kth image data input in step S902. In step S905, the image transformation unit 204 performs transformation processing on the kth image developed in step S904 using the kth distance MAP generated in step S903 to generate data of the kth arbitrary viewpoint image. In step S906, the distance MAP/reliability MAP generation unit 703 inputs the kth distance MAP generated in step S903 and generates the kth arbitrary viewpoint distance MAP. The development processing, the arbitrary viewpoint image data generation processing, and the arbitrary viewpoint distance MAP generation processing are the same as those in the first embodiment.

Ｓ９０７で信頼度ＭＡＰ変形部７０８は、Ｓ９０３で生成されたｋ番目の信頼度ＭＡＰを入力として、ｋ番目の任意視点位置での信頼度ＭＡＰを生成する。任意視点位置での信頼度ＭＡＰを「任意視点信頼度ＭＡＰ」と称する。任意視点信頼度ＭＡＰの生成方法については、Ｓ９０５の任意視点画像の生成方法と同様であるため説明を割愛する。 In step S907, the reliability MAP transforming unit 708 receives the kth reliability MAP generated in step S903 as input, and generates the reliability MAP at the kth arbitrary viewpoint position. The reliability MAP at the arbitrary viewpoint position is referred to as "arbitrary viewpoint reliability MAP". The method of generating the arbitrary viewpoint reliability MAP is the same as the method of generating the arbitrary viewpoint image in S905, and thus the description thereof will be omitted.

Ｓ９０８で距離ＭＡＰ合成部７１２は、既に“ｋ−１”番目まで処理して距離ＭＡＰ記録部２０７に記録されている任意視点距離ＭＡＰのデータを読み込む。Ｓ９０９で信頼度ＭＡＰ合成部７１３は、既に“ｋ−１”番目まで処理して信頼度ＭＡＰ記録部７０９に記録されている任意視点信頼度ＭＡＰのデータを読み込む。 In step S<b>908, the distance MAP synthesizing unit 712 reads the data of the arbitrary viewpoint distance MAP that has already been processed up to the “k−1”th position and recorded in the distance MAP recording unit 207. In step S<b>909, the reliability MAP combining unit 713 reads the data of the arbitrary viewpoint reliability MAP recorded in the reliability MAP recording unit 709 after being processed up to the “k−1”th position.

Ｓ９１０でαＭＡＰ生成部７１０は、ｋ番目の任意視点距離ＭＡＰおよび記録済みの任意視点距離ＭＡＰ、並びにｋ番目の任意視点信頼度ＭＡＰおよび記録済の任意視点信頼度ＭＡＰの各データを入力として、合成用のαＭＡＰを生成する。αＭＡＰ生成部７１０が生成した合成用のαＭＡＰは、後述のＳ９１２とＳ９１３とＳ９１４で使用する。なお、合成用のαＭＡＰの生成処理については後で詳細に説明する。 In step S910, the αMAP generation unit 710 inputs the data of the kth arbitrary viewpoint distance MAP and the recorded arbitrary viewpoint distance MAP, and the kth arbitrary viewpoint reliability MAP and the recorded arbitrary viewpoint reliability MAP. Generate an αMAP for. The αMAP for synthesis generated by the αMAP generation unit 710 is used in S912, S913, and S914 described below. The process of generating the αMAP for synthesis will be described in detail later.

Ｓ９１１では、既にｋ−１番目まで処理して記録済みの任意視点画像を読み込む処理を行われた後、Ｓ９１２では、ｋ番目の任意視点画像と、記録済みの任意視点画像を、Ｓ９１０で生成されたαＭＡＰに基づいて合成する処理が実行される。Ｓ９１３では、ｋ番目の任意視点距離ＭＡＰと、記録済みの任意視点距離ＭＡＰを、Ｓ９１０で生成されたαＭＡＰに基づいて合成する処理が行われる。なお、Ｓ９１１からＳ９１３の処理は第１実施形態の場合と同様である。 In step S911, the process of reading the already recorded arbitrary viewpoint image up to the k-1th is performed, and then in step S912, the kth arbitrary viewpoint image and the recorded arbitrary viewpoint image are generated in step S910. A process of combining is executed based on the αMAP. In S913, a process of synthesizing the k-th arbitrary viewpoint distance MAP and the recorded arbitrary viewpoint distance MAP based on the αMAP generated in S910 is performed. The processing from S911 to S913 is the same as in the case of the first embodiment.

Ｓ９１４で信頼度ＭＡＰ合成部７１３は、ｋ番目の任意視点信頼度ＭＡＰと、信頼度ＭＡＰ記録部７０９に記録されている任意視点信頼度ＭＡＰを、Ｓ９１０で生成されたαＭＡＰに基づいて合成する。合成方法についてはＳ９１２、Ｓ９１３の場合と同様である。Ｓ９１５では、Ｓ９１２で合成された任意視点画像を、画像記録部２０６に記録される任意視点画像として更新する処理が行われる。Ｓ９１６では、Ｓ９１３で合成された任意視点距離ＭＡＰを、距離ＭＡＰ記録部２０７に記録される任意視点距離ＭＡＰとして更新する処理が行われる。Ｓ９１７で信頼度ＭＡＰ合成部７１３は、Ｓ９１４で合成された任意視点信頼度ＭＡＰを、信頼度ＭＡＰ記録部７０９に記録される任意視点信頼度ＭＡＰとして更新する。 In S914, the reliability MAP combining unit 713 combines the kth arbitrary viewpoint reliability MAP and the arbitrary viewpoint reliability MAP recorded in the reliability MAP recording unit 709 based on the αMAP generated in S910. The synthesizing method is the same as in S912 and S913. In S915, a process of updating the arbitrary viewpoint image synthesized in S912 as an arbitrary viewpoint image recorded in the image recording unit 206 is performed. In S916, a process of updating the arbitrary viewpoint distance MAP synthesized in S913 as the arbitrary viewpoint distance MAP recorded in the distance MAP recording unit 207 is performed. In S917, the reliability MAP combining unit 713 updates the arbitrary viewpoint reliability MAP combined in S914 as the arbitrary viewpoint reliability MAP recorded in the reliability MAP recording unit 709.

Ｓ９１８は、撮像装置による撮影が終了したか否かの判断処理であり、画像処理部６０４は“ｋ＋１”番目の画像が入力されるか否かを判断する。撮像装置による撮影が終了し、“ｋ＋１”番目の画像が入力されない場合には、Ｓ９１９の処理へ進む。また、撮像装置による撮影が終了せず、“ｋ＋１”番目の画像が入力される場合には、Ｓ９２０へ移行する。Ｓ９１９で画像処理部６０４は、これまでのｋ枚の画像により作成された任意視点画像を出力画像として出力し、一連の処理を終了する。画像処理部１０４から出力された任意視点画像データは符号化され、対応する距離ＭＡＰ、信頼度ＭＡＰを含めて同一画像ファイルに記録される。画像ファイルはたとえばＥＸＩＦのファイルフォーマットに準拠し、距離ＭＡＰ、信頼度ＭＡＰはメタデータとして記録される。あるいは出力された任意視点画像データは、対応する距離ＭＡＰ、信頼度ＭＡＰと関連付けられて別ファイルにてそれぞれ記録される。
Ｓ９２０では、“ｋ＋１”番目の画像が入力され、変数ｋのインクリメントによりｋ値に１が加算されることで更新され、Ｓ９０２へ移行する。 S918 is a process of determining whether or not the image capturing by the image capturing apparatus is completed, and the image processing unit 604 determines whether or not the "k+1"th image is input. When the imaging by the imaging device is finished and the “k+1”th image is not input, the process proceeds to S919. If the "k+1"th image is input without the end of the image capturing by the image capturing apparatus, the process proceeds to S920. In step S919, the image processing unit 604 outputs, as an output image, the arbitrary viewpoint image created by the k images so far, and ends the series of processes. The arbitrary viewpoint image data output from the image processing unit 104 is encoded and recorded in the same image file including the corresponding distance MAP and reliability MAP. The image file conforms to the EXIF file format, for example, and the distance MAP and the reliability MAP are recorded as metadata. Alternatively, the output arbitrary viewpoint image data is recorded in separate files in association with the corresponding distance MAP and reliability MAP.
In step S920, the “k+1”th image is input and updated by incrementing the variable k by adding 1 to the k value, and the process proceeds to step S902.

次に、図８のＳ９１０におけるαＭＡＰの生成処理について詳細に説明する。図７は、αＭＡＰ生成部７１０の構成例を示したブロック図である。αＭＡＰ生成部７１０は、信頼度ＭＡＰ値比較部８０１、距離ＭＡＰ値比較部８０２、オクルージョン判断部８０３、αＭＡＰ値生成部８０４を備える。αＭＡＰ生成部７１０は、以下の各データが入力され、合成用のαＭＡＰを出力する。
・ｋ番目の任意視点距離ＭＡＰ、および距離ＭＡＰ記録部２０７に記録されている任意視点距離ＭＡＰ。
・ｋ番目の任意視点信頼度ＭＡＰ、および信頼度ＭＡＰ記録部７０９に記録されている任意視点信頼度ＭＡＰ。 Next, the αMAP generation process in S910 of FIG. 8 will be described in detail. FIG. 7 is a block diagram showing a configuration example of the αMAP generation unit 710. The αMAP generation unit 710 includes a reliability MAP value comparison unit 801, a distance MAP value comparison unit 802, an occlusion determination unit 803, and an αMAP value generation unit 804. The following data is input to the αMAP generation unit 710, and the αMAP for synthesis is output.
The kth arbitrary viewpoint distance MAP and the arbitrary viewpoint distance MAP recorded in the distance MAP recording unit 207.
The k-th arbitrary viewpoint reliability MAP and the arbitrary viewpoint reliability MAP recorded in the reliability MAP recording unit 709.

信頼度ＭＡＰ値比較部８０１は、ｋ番目の任意視点信頼度ＭＡＰ、および信頼度ＭＡＰ記録部７０９の記録されている任意視点信頼度ＭＡＰの各値を比較し、比較結果を距離ＭＡＰ値比較部８０２に出力する。 The reliability MAP value comparison unit 801 compares each value of the k-th arbitrary viewpoint reliability MAP and the arbitrary viewpoint reliability MAP recorded in the reliability MAP recording unit 709, and the comparison result is the distance MAP value comparison unit. Output to 802.

図９のフローチャートを参照し、合成用のαＭＡＰの生成処理について説明する。
Ｓ１００１でαＭＡＰ生成部７１０は、αＭＡＰを生成する上でＳ１００２以降の生成処理に使用する座標（ｘ，ｙ）を（０，０）に初期化する。Ｓ１００２で距離ＭＡＰ値比較部８０２は、ｋ番目の任意視点距離ＭＡＰの値であるＺ＿ｋ（ｘ，ｙ）を読み込む。Ｓ１００３で距離ＭＡＰ値比較部８０２およびオクルージョン判断部８０３は、距離ＭＡＰ記録部２０７に記録されている任意視点距離ＭＡＰの値であるＺ＿ｍ（ｘ，ｙ）を読み込む。Ｓ１００４で信頼度ＭＡＰ値比較部８０１は、ｋ番目の任意視点信頼度ＭＡＰの値であるＲ＿ｋ（ｘ，ｙ）を読み込む。Ｓ１００５で信頼度ＭＡＰ値比較部８０１は、信頼度ＭＡＰ記録部７０９に記録されている任意視点信頼度ＭＡＰの値であるＲ＿ｍ（ｘ，ｙ）を読み込む。 With reference to the flowchart of FIG. 9, the process of generating the αMAP for synthesis will be described.
In S1001, the αMAP generation unit 710 initializes the coordinates (x, y) used in the generation processing in S1002 and subsequent steps to generate αMAP to (0, 0). In step S1002, the distance MAP value comparison unit 802 reads Z_k(x, y), which is the value of the k-th arbitrary viewpoint distance MAP. In step S1003, the distance MAP value comparison unit 802 and the occlusion determination unit 803 read Z_m(x, y), which is the value of the arbitrary viewpoint distance MAP recorded in the distance MAP recording unit 207. In step S1004, the reliability MAP value comparison unit 801 reads R_k(x, y), which is the value of the k-th arbitrary viewpoint reliability MAP. In step S1005, the reliability MAP value comparison unit 801 reads R_m(x, y), which is the value of the arbitrary viewpoint reliability MAP recorded in the reliability MAP recording unit 709.

Ｓ１００６でオクルージョン判断部８０３は、Ｚ＿ｍ（ｘ，ｙ）の値がＥＲ値であるか否かを判断する。ＥＲ値は前述のオクルージョン領域を表す。Ｚ＿ｍ（ｘ，ｙ）の値がＥＲ値である場合、記録されている任意視点画像の座標（ｘ，ｙ）はオクルージョン領域に属すると判断され、Ｓ１００７へ進む。一方、Ｚ＿ｍ（ｘ，ｙ）の値がＥＲ値でない場合には、記録されている任意視点画像の座標（ｘ，ｙ）はオクルージョン領域に属さないと判断され、Ｓ１００８へ進む。 In step S1006, the occlusion determination unit 803 determines whether the value of Z_m(x,y) is the ER value. The ER value represents the occlusion area described above. If the value of Z_m(x, y) is the ER value, it is determined that the coordinates (x, y) of the recorded arbitrary viewpoint image belong to the occlusion area, and the process proceeds to S1007. On the other hand, when the value of Z_m(x, y) is not the ER value, it is determined that the coordinates (x, y) of the recorded arbitrary viewpoint image do not belong to the occlusion area, and the process proceeds to S1008.

Ｓ１００７でαＭＡＰ値生成部８０４は、αＭＡＰの値α（ｘ，ｙ）に１を代入し、座標（ｘ，ｙ）に関してＺ＿ｋ（ｘ，ｙ）の値を出力すると判断した上で、Ｓ１０１２へ進む。Ｓ１００８で距離ＭＡＰ値比較部８０２は、Ｚ＿ｍ（ｘ，ｙ）とＺ＿ｋ（ｘ，ｙ）の各値を比較し、ｋ番目の任意視点画像の座標（ｘ，ｙ）の被写体が、記録されている任意視点画像での当該被写体に比べ、手前に位置しているか否かを判断する。Ｚ＿ｋ（ｘ，ｙ）の値がＺ＿ｍ（ｘ，ｙ）の値よりも小さい場合、つまり、ｋ番目の任意視点画像の座標（ｘ，ｙ）の被写体が、記録されている任意視点画像での当該被写体に比べ、手前に位置していると判断された場合、Ｓ１００９へ進む。またＺ＿ｋ（ｘ，ｙ）の値がＺ＿ｍ（ｘ，ｙ）の値よりも大きい場合には、Ｓ１０１０へ進む。 In S1007, the αMAP value generation unit 804 substitutes 1 into the αMAP value α(x, y), determines that the Z_k(x, y) value is output for the coordinates (x, y), and then proceeds to S1012. .. In step S1008, the distance MAP value comparison unit 802 compares each value of Z_m(x, y) and Z_k(x, y), and the subject at the coordinate (x, y) of the k-th arbitrary viewpoint image is recorded. It is determined whether or not the subject is located in front of the subject in the arbitrary viewpoint image. When the value of Z_k(x, y) is smaller than the value of Z_m(x, y), that is, the subject at the coordinates (x, y) of the k-th arbitrary viewpoint image is recorded in the arbitrary viewpoint image. When it is determined that the object is located in front of the subject, the process proceeds to S1009. If the value of Z_k(x, y) is larger than the value of Z_m(x, y), the process proceeds to S1010.

Ｓ１００９で信頼度ＭＡＰ値比較部８０１は、ｋ番目の任意視点信頼度ＭＡＰの値Ｒ＿ｋ（ｘ，ｙ）と、記録されている任意視点信頼度ＭＡＰの値Ｒ＿ｍ（ｘ，ｙ）との差分値を算出し、差分値を第１閾値（ＴＨ１と記す）と比較する。第１閾値ＴＨ１については、予め決められた固定値、または信頼度ＭＡＰの値に応じて変化する可変値を用いる。Ｒ＿ｋ（ｘ，ｙ）とＲ＿ｍ（ｘ，ｙ）の差分値が第１閾値ＴＨ１よりも大きい場合、信頼度ＭＡＰ値比較部８０１は、Ｒ＿ｋ（ｘ，ｙ）がＲ＿ｍ（ｘ，ｙ）に比べて信頼度が充分に高いと判断し、Ｓ１００７へ処理を進める。Ｒ＿ｋ（ｘ，ｙ）とＲ＿ｍ（ｘ，ｙ）の差分値が第１閾値ＴＨ１以下である場合には、Ｓ１０１１へ移行する。 In step S1009, the reliability MAP value comparison unit 801 determines the difference value between the value R_k(x, y) of the kth arbitrary viewpoint reliability MAP and the recorded value R_m(x, y) of the arbitrary viewpoint reliability MAP. Is calculated and the difference value is compared with a first threshold value (denoted as TH1). As the first threshold TH1, a fixed value that is determined in advance or a variable value that changes according to the value of the reliability MAP is used. When the difference value between R_k(x,y) and R_m(x,y) is larger than the first threshold value TH1, the reliability MAP value comparison unit 801 compares R_k(x,y) with R_m(x,y). It is determined that the reliability is sufficiently high, and the process proceeds to S1007. When the difference value between R_k(x, y) and R_m(x, y) is less than or equal to the first threshold TH1, the process proceeds to S1011.

Ｓ１０１０で信頼度ＭＡＰ値比較部８０１は、Ｒ＿ｋ（ｘ，ｙ）とＲ＿ｍ（ｘ，ｙ）との差分値を第２閾値（ＴＨ２と記す）と比較し、差分値が第２閾値ＴＨ２よりも大きいか否かを判断する。第２閾値ＴＨ２については、予め決められた固定値、または信頼度ＭＡＰの値に応じて変化する可変値を用いる。本実施形態にて第２閾値ＴＨ２は、第１閾値ＴＨ１とは異なる値とするが、必要に応じて同じ閾値を用いてもよい。Ｒ＿ｋ（ｘ，ｙ）とＲ＿ｍ（ｘ，ｙ）との差分値が第２閾値ＴＨ２よりも大きい場合、Ｒ＿ｋ（ｘ，ｙ）はＲ＿ｍ（ｘ，ｙ）に比べて信頼度が充分に高いと判断され、Ｓ１００７へ進む。Ｒ＿ｋ（ｘ，ｙ）とＲ＿ｍ（ｘ，ｙ）の差分値が閾値ＴＨ２以下である場合には、Ｓ１０１１へ移行する。Ｓ１０１１でαＭＡＰ値生成部８０４は、αＭＡＰの値α（ｘ，ｙ）に０を代入し、座標（ｘ，ｙ）に関してＺ＿ｋ（ｘ，ｙ）の値を出力しないと判断した上で、Ｓ１０１２へ進む。 In S1010, the reliability MAP value comparison unit 801 compares the difference value between R_k(x, y) and R_m(x, y) with a second threshold value (denoted as TH2), and the difference value is higher than the second threshold value TH2. Judge whether it is large or not. As the second threshold TH2, a predetermined fixed value or a variable value that changes according to the value of the reliability MAP is used. In the present embodiment, the second threshold TH2 is different from the first threshold TH1, but the same threshold may be used if necessary. When the difference value between R_k(x, y) and R_m(x, y) is larger than the second threshold value TH2, R_k(x, y) has a sufficiently higher reliability than R_m(x, y). The determination is made, and the process proceeds to S1007. When the difference value between R_k(x, y) and R_m(x, y) is less than or equal to the threshold value TH2, the process proceeds to S1011. In step S1011, the αMAP value generation unit 804 substitutes 0 into the value α(x, y) of αMAP, determines that the value of Z_k(x, y) is not output for the coordinate (x, y), and then proceeds to step S1012. move on.

Ｓ１０１２でαＭＡＰ生成部７１０は、任意視点画像に対応する全ての座標に対し、Ｓ１００２からＳ１０１１までの処理を行ったか否かを判断する。全ての座標に対する処理が行われたと判断した場合、αＭＡＰ生成部７１０は処理を終了する。当該処理が未終了の場合には、Ｓ１０１３へ進む。Ｓ１０１３でαＭＡＰ生成部７１０は、処理対象となる座標（ｘ，ｙ）を更新し、Ｓ１００２へ進んで処理を続行する。 In S1012, the αMAP generation unit 710 determines whether or not the processes from S1002 to S1011 have been performed on all the coordinates corresponding to the arbitrary viewpoint image. When it is determined that the processing has been performed for all the coordinates, the αMAP generation unit 710 ends the processing. If the process is not completed, the process proceeds to S1013. In S1013, the αMAP generation unit 710 updates the coordinates (x, y) to be processed, proceeds to S1002, and continues the processing.

本実施形態では、画像データから算出される距離情報の信頼度を示す信頼度情報を生成する信頼度情報生成部を備える。距離情報の生成時に信頼度情報を生成する実施形態に限らず、距離情報生成部とは別の信頼度情報生成部が生成する実施形態でもよい。信頼度情報合成部である信頼度ＭＡＰ合成部７１３は、信頼度情報変形部である信頼度ＭＡＰ変形部７０８により変形処理された信頼度情報、および信頼度情報記録部である信頼度ＭＡＰ記録部７０９に記録された信頼度情報を取得する。信頼度ＭＡＰ合成部７１３は、合成用のαＭＡＰの情報を用いて信頼度情報を合成する。本実施形態によれば、オクルージョン領域を適切に補間する事により任意視点画像を生成する逐次処理を可能にした画像処理装置を提供できる。距離ＭＡＰに対応する信頼度情報として信頼度ＭＡＰを用いることで、正確な距離情報を任意視点画像の生成処理に反映させることができる。 In the present embodiment, the reliability information generation unit that generates reliability information indicating the reliability of the distance information calculated from the image data is provided. The embodiment is not limited to the embodiment in which the reliability information is generated at the time of generating the distance information, but may be an embodiment in which the reliability information generation unit other than the distance information generation unit generates the reliability information. The reliability MAP synthesizing unit 713, which is a reliability information synthesizing unit, includes reliability information that has been transformed by the reliability MAP transforming unit 708, which is a reliability information transforming unit, and a reliability MAP recording unit, which is a reliability information recording unit. The reliability information recorded in 709 is acquired. The reliability MAP synthesis unit 713 synthesizes reliability information using the information of the αMAP for synthesis. According to the present embodiment, it is possible to provide an image processing apparatus that enables sequential processing of generating an arbitrary viewpoint image by appropriately interpolating an occlusion area. By using the reliability MAP as the reliability information corresponding to the distance MAP, accurate distance information can be reflected in the arbitrary viewpoint image generation process.

［その他の実施形態］
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 [Other Embodiments]
The present invention supplies a program that implements one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program. It can also be realized by the processing. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

１０４画像処理部
２０３距離ＭＡＰ生成部
２０４画像変形部
２０５距離ＭＡＰ変形部
２０８ αＭＡＰ生成部
２０９画像合成部
２１０距離ＭＡＰ合成部

104 Image Processing Unit 203 Distance MAP Generation Unit 204 Image Deformation Unit 205 Distance MAP Deformation Unit 208 αMAP Generation Unit 209 Image Synthesis Unit 210 Distance MAP Synthesis Unit

Claims

Image acquisition means for acquiring a plurality of image data,
Distance information acquisition means for acquiring distance information in the depth direction of the image data acquired from the image acquisition means,
Image transforming means for transforming the image by coordinate transformation on the image data acquired from the image acquiring means;
Distance information transformation means for performing transformation processing on the distance information acquired by the distance information acquisition means by coordinate transformation corresponding to coordinate transformation performed on the image data,
Distance information synthesizing means for synthesizing a plurality of distance information transformed by the distance information transforming means,
Distance information recording means for recording distance information output by the distance information combining means,
Synthetic information generating means for generating synthetic information based on the distance information transformed by the distance information transforming means and the distance information recorded in the distance information recording means,
Based on the information for the synthesis, the image processing device characterized by and an image synthesizing means for generating an arbitrary viewpoint image by synthesizing a plurality of images transformation processing by the image deforming means.

The image according to claim 1, wherein the coordinate conversion performed on the plurality of image data is a movement conversion process based on a movement amount from a shooting position of the plurality of images to a viewpoint position of the arbitrary viewpoint image. Processing equipment.

The coordinate conversion performed on the plurality of image data is a coordinate conversion in which at least one of translation and rotation is performed based on a movement amount from a shooting position of the plurality of images to a viewpoint position of the arbitrary viewpoint image. The image processing apparatus according to claim 1 or 2.

The image acquisition means sequentially acquires the plurality of image data,
Said distance information combining means, corresponding to said plurality of image data, any one of claims 1 to 3, characterized in that sequentially synthesizing a plurality of distance information modification process is performed by the distance information modification unit 1 The image processing device according to item .

Before Symbol synthesis information generation unit, among the distance information distance information is transformation processing by the deformation means and the distance information recorded on the distance information recording means, the distance information indicating that the subject is present in a closer distance The image processing apparatus according to any one of claims 1 to 4, wherein the image information is selected to generate the information for composition.

A distance information synthesizing unit that obtains and synthesizes a plurality of distance information corresponding to a plurality of image data, the distance information being transformed by the distance information transforming unit;
The distance information synthesizing unit synthesizes the distance information transformed by the distance information transforming unit and the distance information recorded in the distance information recording unit using the information for synthesis acquired from the synthesis information generating unit. The image processing apparatus according to claim 5 , wherein:

Wherein the image deforming means is an image processing according to any one of claims 1 to 6, characterized in that the deformation processing by the coordinate transformation for generating image data for changing the viewpoint to the image data apparatus.

The distance information modification means, according to any one of claims 1 to 7, characterized in that the conversion of distance information associated with the change of the viewpoint with respect to the distance information distance information acquired by the acquiring means Image processing device.

Said distance information obtaining means, according to any one of claims 1 to 8, characterized in that to obtain the distance information generated from the data of the pupil divided plurality of image or view of different images Image processing device.

A reliability information generating unit that generates reliability information indicating the reliability of the distance information acquired by the distance information acquiring unit;
The combined information generating means, any one of claims 1 to 9, characterized in that to generate the information for the synthesis using the distance information and the reliability information modification process by the distance information modification means The image processing device according to.

Reliability information deforming means for acquiring the reliability information generated by the reliability information generating means and performing a deformation process by coordinate conversion;
The image processing apparatus according to claim 10 , further comprising a reliability information synthesizing unit that acquires a plurality of pieces of reliability information transformed by the reliability information transforming unit and sequentially synthesizes the pieces of reliability information.

A reliability information recording unit for recording reliability information output by the reliability information combining unit;
The reliability information synthesizing unit records the reliability information and the reliability information recording unit that has been transformed by the reliability information transforming unit using the information for synthesis acquired from the synthesis information creating unit. The image processing apparatus according to claim 11 , wherein the reliability information is combined.

The composite information generation unit represents the composition ratio by comparing the first distance information modified by the distance information modification unit with the second distance information recorded in the distance information recording unit. the image processing apparatus according to any one of claims 1 to 4, characterized in that to determine the information for the synthesis.

The composite information generation unit determines an occlusion area which is an area at a viewpoint where information cannot be obtained from an image, and when the coordinates of the second distance information belong to an occlusion area, the transformation processing at the coordinates is performed. 14. The image processing apparatus according to claim 13 , wherein the ratio at the time of synthesizing the pixel value of the image and the distance value indicated by the first distance information is set relatively large.

In the case where the coordinates of the second distance information do not belong to the occlusion area, the combined information generating unit makes the distance value indicated by the first distance information smaller than the distance value indicated by the second distance information. At this time, the pixel value of the transformed image and the distance value indicated by the first distance information at the coordinates are set to be relatively large, and the first distance information is also set. When the distance value indicated by is greater than or equal to the distance value indicated by the second distance information, the pixel value of the deformed image at the coordinates and the distance value indicated by the first distance information are respectively combined. 15. The image processing apparatus according to claim 14 , wherein the ratio at the time of performing is set to be relatively small.

The combination information generation unit acquires the plurality of pieces of distance information transformed by the distance information transformation unit and the plurality of pieces of reliability information generated by the reliability information generation unit, and indicates the proportion of the combination. The image processing device according to claim 10 , wherein the image processing device generates information.

An image pickup apparatus comprising the image processing apparatus according to any one of claims 1 to 16 .

An image processing method executed by an image processing device for processing image data, comprising:
An image acquisition step of acquiring a plurality of image data,
A distance information acquisition step of acquiring distance information in the depth direction of the image data acquired in the image acquisition step,
An image transformation step of transforming the image by coordinate transformation on the image data obtained in the image obtaining step;
Distance information transformation step of performing transformation processing by coordinate transformation corresponding to coordinate transformation performed on the image data with respect to the distance information obtained in the distance information obtaining step;
A distance information synthesizing step of synthesizing a plurality of distance information transformed in the distance information transforming step,
A distance information recording step of recording the distance information combined in the distance information combining step,
A combined information generating step of generating information for combining based on the distance information modified in the distance information modifying step and the distance information recorded in the distance information recording step;
An image combining step of generating an arbitrary viewpoint image by combining a plurality of images that have been deformed in the image deforming step based on the combining information.

A program that causes a computer of the image processing apparatus to execute each step according to claim 18 .