JP2014010783A

JP2014010783A - Image processing apparatus, image processing method, and program

Info

Publication number: JP2014010783A
Application number: JP2012148922A
Authority: JP
Inventors: Tatsuro Koizumi; 達朗小泉
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2012-07-02
Filing date: 2012-07-02
Publication date: 2014-01-20

Abstract

PROBLEM TO BE SOLVED: To reduce a blur of the area to be focused by stable processing so as to synthesize an image.SOLUTION: Masks 1003 and 1004 are applied to respective images 901 and 902 of actual imaging parts 701 and 702, and images that extract focusing areas are images 1005 and 1006 (Processing 2). The images thus obtained are images 1007 and 1008 projected onto a focusing surface and converted into the coordinates on the multi-view point image (Processing 3). If these images are superposed with each other, a virtual view-point image 1009 is obtained (Processing 4). If remained as they are, the images are shifted in the respective viewpoints due to the shift of the camera parameter. Thus, position adjustment is made so as to superpose these images with the image of one view point as a reference, and a correction parameter is calculated to describe a positional deviation such as that of a correction image 1010 (Processing (5)). This correction parameter is applied, similarly to a correction image 1011, entirely to the image obtained by the imaging part which is not the reference when synthesizing the virtual view point image, and thereby the image having no blur in the focusing area can be obtained (Processing 6).

Description

本発明は、画像処理装置、画像処理方法およびプログラムに関し、特に複数の視点から撮像した画像を処理する画像処理装置、画像処理方法およびプログラムに関するものである。 The present invention relates to an image processing apparatus, an image processing method, and a program, and more particularly to an image processing apparatus, an image processing method, and a program for processing images captured from a plurality of viewpoints.

通常のカメラにより撮影を行う場合、撮影時にピント合わせ（合焦）を行うため、ピント位置の調節を誤って撮影した場合、正しいピント位置の撮影画像を得るためには再撮影の必要がある。 When shooting with a normal camera, focusing is performed at the time of shooting. Therefore, if the shooting is performed with incorrect focus position adjustment, re-shooting is necessary to obtain a shot image with the correct focus position.

近年、多視点から複数の画像を取得し、取得後、すなわち撮影後画像処理によってピント位置を調節するライトフィールドフォトグラフィという技術が発展している。この技術は撮影後にピント位置を調節できるため、撮影時のピント調節を失敗しても、その後の画像処理で正しいピント位置に調節することができるという利点がある。ライトフィールドフォトグラフィでは、多視点の画像から、空間中の複数の位置について、ある位置を通過する光線の方向と強度（ライトフィールド、以降ＬＦと略記）を取得する。取得したＬＦの情報を用いて任意の位置の仮想のセンサに結像した画像を計算する。ＬＦに関する数学的性質や数学的基礎などはＲ．ＮＧによって議論されている（例えば、非特許文献１参照）。 In recent years, a technique called light field photography has been developed in which a plurality of images are acquired from multiple viewpoints, and the focus position is adjusted by image processing after acquisition, that is, post-shooting image processing. Since this technique can adjust the focus position after shooting, there is an advantage that even if focus adjustment at the time of shooting fails, the image can be adjusted to the correct focus position by subsequent image processing. In light field photography, the direction and intensity of light passing through a certain position (light field, hereinafter abbreviated as LF) are acquired from a multi-viewpoint image at a plurality of positions in space. An image formed on a virtual sensor at an arbitrary position is calculated using the acquired LF information. Mathematical properties and mathematical foundations related to LF It is discussed by NG (for example, refer nonpatent literature 1).

この仮想のセンサの位置を適宜設定し、所望の撮影対象にピントが合う位置に調節することにより、前述した撮影後のピント調節を可能とする。以降では多視点の画像から仮想のセンサの位置で取得する画像を計算する処理をリフォーカス処理と呼ぶ。また、このようにして取得した画像を仮想視点画像と呼ぶ。ＬＦを取得するための撮像装置としては小型のカメラを並べたカメラアレイや、メインレンズの後ろにマイクロレンズアレイを置いたＰｌｅｎｏｐｔｉｃＣａｍｅｒａが知られている。いずれの撮像装置においても取得したＬＦから、仮想的にセンサを置いた時の画像を撮影後に合成することができる。ＬＦから仮想的なセンサ上の画像を合成する方法として、取得した複数枚の画像を仮想的なセンサ上に射影変換し加算して平均化する方法が提案されている（例えば、特許文献１参照）。このような画像合成手法を合成開口撮影法と呼ぶ。 By adjusting the position of the virtual sensor as appropriate and adjusting it to a position where the desired subject is in focus, the above-described focus adjustment after photographing can be performed. Hereinafter, a process of calculating an image acquired at a virtual sensor position from a multi-viewpoint image is referred to as a refocus process. An image acquired in this way is called a virtual viewpoint image. As an image pickup apparatus for acquiring LF, a camera array in which small cameras are arranged, and a Plenoptic Camera in which a microlens array is placed behind a main lens are known. In any imaging apparatus, an image obtained when a sensor is virtually placed can be synthesized after shooting from the acquired LF. As a method of synthesizing images on a virtual sensor from LF, a method has been proposed in which a plurality of acquired images are subjected to projective transformation on a virtual sensor, and added and averaged (see, for example, Patent Document 1). ). Such an image composition method is called a synthetic aperture photographing method.

合成開口撮影法による画像合成において、既知の情報として与えられるカメラパラメータ（カメラの位置、姿勢、画角、焦点距離など）に、実際の値との間でずれがあった場合、ピントを合わせたい領域がぼけてしまうという問題がある。このような、画質劣化を引き起こすカメラパラメータのずれを、カメラキャリブレーション技術を用いてカメラパラメータを推定することで補正する技術が提案されている（例えば、非特許文献２参照）。 In image synthesis using synthetic aperture photography, if camera parameters (camera position, orientation, angle of view, focal length, etc.) given as known information deviate from the actual values, you want to focus There is a problem that the area is blurred. There has been proposed a technique for correcting such a camera parameter shift causing image quality degradation by estimating a camera parameter using a camera calibration technique (see, for example, Non-Patent Document 2).

国際公開第２００８／０５０９０４号パンフレットInternational Publication No. 2008/050904 Pamphlet

Ｒ．ＮＧ，Ｍ．Ｌｅｖｏｙ，Ｍ．Ｂｒｅｄｉｆ，Ｇ．Ｄｕｖａｌ，Ｍ．Ｈｏｒｏｗｉｔｚ，Ｐ．Ｈａｎｒａｈａｎ著「ＬｉｇｈｔＦｉｅｌｄＰｈｏｔｏｇｒａｐｈｙｗｉｔｈａＨａｎｄ−ｈｅｌｄＰｌｅｎｏｐｔｉｃＣａｍｅｒａ」（ＳｔａｎｆｏｒｄＴｅｃｈＲｅｐｏｒｔＣＴＳＲ２００５−０２，２００５）R. NG, M.M. Levoy, M.M. Bredif, G.M. Duval, M.M. Horowitz, P.M. Hanrahan, “Light Field Photographic With a Hand-Held Plenoptic Camera” (Stanford Tech Report CTSR 2005-02, 2005) ＲｉｃｈａｒｄＳｚｅｌｉｓｋｉ， “ＣｏｍｐｕｔｅｒＶｉｓｉｏｎ：ＡｌｇｏｒｉｔｈｍｓａｎｄＡｐｐｌｉｃａｔｉｏｎｓ”，Ｓｐｒｉｎｇｅｒ，ＮｅｗＹｏｒｋ，２０１０Richard Szeliski, “Computer Vision: Algorithms and Applications”, Springer, New York, 2010

しかしながら、例えば非特許文献２で用いられる従来のカメラキャリブレーション技術は、チャートを用いて行う場合、ユーザが煩雑な作業をする必要があるという点で問題がある。また、自然画像からキャリブレーションする場合は処理が不安定になるという点で問題がある。 However, the conventional camera calibration technique used in Non-Patent Document 2, for example, has a problem in that the user needs to perform complicated work when using a chart. Further, there is a problem in that the process becomes unstable when calibrating from a natural image.

本願の画像処理装置は、予め設定された異なる視点から撮像された複数の画像データの各々を、撮像された視点の位置に基づき合成変換して、仮想的に設定した視点から撮像した仮想視点画像として合成画像データを生成する画像合成手段と、合成画像データにおけるピントを合わせる領域に対応する領域を、複数の画像データの各々から抽出する合焦領域抽出手段と、複数の画像データの各々の領域の画素値の差が小さくなるよう画像データを変換し、予め設定された視点の位置に基づき合成変換された画像データとの差を算出する位置合わせ手段とを備え、算出された差を用い、複数の画像データの各々の合成変換された画像データ全体を変換して合成画像データを生成することを特徴とする。 The image processing apparatus of the present application combines and converts each of a plurality of image data captured from different preset viewpoints based on the position of the captured viewpoint, and captures a virtual viewpoint image captured from a virtually set viewpoint Image synthesizing means for generating composite image data, focus area extracting means for extracting an area corresponding to the focus area in the composite image data from each of the plurality of image data, and each area of the plurality of image data The image data is converted so that the difference between the pixel values of the image data becomes smaller, and alignment means for calculating the difference from the image data synthesized and converted based on the position of the preset viewpoint is used, and the calculated difference is used. It is characterized in that composite image data is generated by converting the entire combined image data of a plurality of image data.

多視点画像からリフォーカス処理により仮想視点画像を合成する際に、与えられたカメラパラメータに実際の値との間でずれがあった場合でも、安定した処理でピントを合わせたい領域のぼけを低減して画像を合成することが可能となる。 When compositing a virtual viewpoint image from a multi-viewpoint image by refocus processing, even if there is a deviation from the actual value of a given camera parameter, the blur of the area to be focused is reduced with stable processing Thus, it is possible to synthesize an image.

本発明を適用し得る撮像装置の概観の一例を表す図である。It is a figure showing an example of the outline of the imaging device which can apply the present invention. 実施例１の多視点撮像装置の構成例を表すブロック図である。1 is a block diagram illustrating a configuration example of a multi-viewpoint imaging apparatus according to Embodiment 1. FIG. 実施例１の多視点撮像部の構成例を表すブロック図である。3 is a block diagram illustrating a configuration example of a multi-viewpoint imaging unit according to Embodiment 1. FIG. 実施例１の撮像部の構成例を表すブロック図である。3 is a block diagram illustrating a configuration example of an imaging unit according to Embodiment 1. FIG. 撮像部の配置例を表す図である。It is a figure showing the example of arrangement | positioning of an imaging part. 一実施例の各撮像部の配置を横から見た図である。It is the figure which looked at arrangement | positioning of each imaging part of one Example from the side. 一実施例の各撮像部の配置を上部から見た図である。It is the figure which looked at arrangement | positioning of each imaging part of one Example from the upper part. 一実施例の各撮像部の座標と多視点撮像部の座標との関係を説明するための図である。It is a figure for demonstrating the relationship between the coordinate of each imaging part of one Example, and the coordinate of a multiview imaging part. 一実施例の被写体、仮想撮像系、多視点撮像部の関係を示す図である。It is a figure which shows the relationship of the to-be-photographed object, virtual imaging system, and multiview imaging part of one Example. 理想光学系の性質を説明するための図である。It is a figure for demonstrating the property of an ideal optical system. 仮想視点画像合成の原理を説明するための図である。It is a figure for demonstrating the principle of a virtual viewpoint image composition. 仮想視点画像合成の原理を説明するための図である。It is a figure for demonstrating the principle of a virtual viewpoint image composition. 仮想視点画像合成の原理を説明するための図である。It is a figure for demonstrating the principle of a virtual viewpoint image composition. 仮想視点画像合成の原理を説明するための図である。It is a figure for demonstrating the principle of a virtual viewpoint image composition. 本発明における画像合成処理の一例の概要を中間生成データの側面から説明する図である。It is a figure explaining the outline | summary of an example of the image composition process in this invention from the side of intermediate generation data. 実施例１における画像処理部の構成例を表すブロック図である。FIG. 3 is a block diagram illustrating a configuration example of an image processing unit in Embodiment 1. 実施例１における画像合成処理のフローを表す図である。FIG. 3 is a diagram illustrating a flow of image composition processing in the first embodiment. 位置合わせ処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of a position alignment process. 実施例２における画像処理部の構成例を表すブロック図である。FIG. 10 is a block diagram illustrating a configuration example of an image processing unit in Embodiment 2. 実施例２における画像合成処理のフローを表す図である。FIG. 10 is a diagram illustrating a flow of image composition processing in the second embodiment. 位置合わせの方法の一例の処理を示すフローチャートである。It is a flowchart which shows the process of an example of the method of alignment. 有効ブロック判定方法の一例を示すフローチャートである。It is a flowchart which shows an example of the effective block determination method. 有効動きベクトル判定方法の一例の処理を示すフローチャートである。It is a flowchart which shows a process of an example of an effective motion vector determination method.

［実施例１］
＜撮像装置の全体構成＞
図１は、本発明を適用し得る多視点撮像装置の一例を示した模式図である。多視点撮像装置の筺体１０１には多視点画像を撮像するための部材や装置が取り付けられている。図１の多視点撮像装置の前面（ａ）には、撮像部１０５〜１１３が複数に配置されており、これにより多視点撮像装置は複数の視点の画像を取得する。図１の多視点撮像装置の背面（ｂ）には、撮影ボタン１０２、ディスプレイ１０３および操作ボタン１０４が配置されている。多視点撮像装置の種々の設定は、操作ボタン１０４とディスプレイ１０３とを用いて行われ、撮影ボタン１０２を押下することで、撮像部１０５〜１１３による撮像が行われる。取得した画像および合成した画像はディスプレイ１０３に表示される。 [Example 1]
<Overall configuration of imaging device>
FIG. 1 is a schematic diagram showing an example of a multi-viewpoint imaging apparatus to which the present invention can be applied. A member or device for capturing a multi-viewpoint image is attached to the housing 101 of the multi-viewpoint imaging device. A plurality of imaging units 105 to 113 are arranged on the front surface (a) of the multi-viewpoint imaging apparatus in FIG. 1, and thereby the multi-viewpoint imaging apparatus acquires images of a plurality of viewpoints. A shooting button 102, a display 103, and an operation button 104 are arranged on the rear surface (b) of the multi-viewpoint imaging apparatus of FIG. Various settings of the multi-viewpoint imaging apparatus are performed using the operation button 104 and the display 103, and when the shooting button 102 is pressed, the imaging units 105 to 113 perform imaging. The acquired image and the synthesized image are displayed on the display 103.

図２は、本発明の一実施例に係る多視点撮像装置の構成の一例を示すブロック図である。多視点撮像部２１３は、詳細は後述するが、複数の撮像部によって構成され、複数の視点から画像の取得を行う。多視点撮像部２１３を構成する各撮像部は、後述するように光学系によって結像した像を、イメージセンサにより画像データへと変換する。 FIG. 2 is a block diagram showing an example of the configuration of the multi-viewpoint imaging apparatus according to an embodiment of the present invention. Although the details will be described later, the multi-viewpoint imaging unit 213 includes a plurality of imaging units, and acquires images from a plurality of viewpoints. Each imaging unit constituting the multi-viewpoint imaging unit 213 converts an image formed by the optical system into image data by an image sensor as described later.

画像処理部２１４は、詳細は後述するが、多視点撮像部２１３から取得した多視点画像データからリフォーカス処理を行って合成画像データを生成する。ＣＰＵ２０１は、各構成の処理全てに関わり、ＲＯＭ２０２やＲＡＭ２０３に格納された命令を順に読み込み、解釈し、その結果に従って処理を実行する。また、ＲＯＭ２０２とＲＡＭ２０３は、その処理に必要なプログラム、データ、作業領域などをＣＰＵ２０１に提供する。バス２１０は各構成間で、データや処理の指示をやり取りするための経路として機能する。 Although details will be described later, the image processing unit 214 performs refocus processing from the multi-view image data acquired from the multi-view image capturing unit 213 to generate composite image data. The CPU 201 is involved in all the processes of each configuration, reads the instructions stored in the ROM 202 and the RAM 203 in order, interprets them, and executes the processes according to the results. The ROM 202 and RAM 203 provide the CPU 201 with programs, data, work areas, and the like necessary for the processing. The bus 210 functions as a path for exchanging data and processing instructions between the components.

画像処理部２１４は、多視点撮像部を構成する複数の視点の位置となる各撮像部の配置の情報および特性の情報に基づいて画像合成を行う。これらの情報は、システムや装置の設定をもとに予め算出されてＲＯＭ２０２ないしはＲＡＭ２０３に格納されているものとする。操作部２０４は、例えばボタンやモードダイヤルなどであり、これらを介して入力されたユーザー指示を受け取る。キャラクタージェネレーション２０９は、文字やグラフィックなどを生成する。表示部２０６は、一般的には液晶ディスプレイが広く用いられており、キャラクタージェネレーション部２０９や表示制御部２０５から受け取った撮影画像、合成画像および文字などの表示を行う。また、タッチスクリーン機能を有していても良く、その場合は、タッチスクリーンを用いたユーザー指示を操作部２０４の入力として使用することも可能である。 The image processing unit 214 performs image composition based on information on arrangement and characteristics of each imaging unit that is the position of a plurality of viewpoints constituting the multi-viewpoint imaging unit. It is assumed that these pieces of information are calculated in advance based on system and apparatus settings and stored in the ROM 202 or the RAM 203. The operation unit 204 is a button or a mode dial, for example, and receives a user instruction input via these buttons. The character generation 209 generates characters and graphics. Generally, a liquid crystal display is widely used as the display unit 206, and displays captured images, composite images, characters, and the like received from the character generation unit 209 and the display control unit 205. Further, a touch screen function may be provided, and in that case, a user instruction using the touch screen can be used as an input of the operation unit 204.

デジタル信号処理部２１２は、多視点撮像部２１３により取得された多視点画像データのγ調整や欠陥画素の補間などを行う。これらの処理は画像処理部２１４による本実施例の特徴である画像処理の前に行われる。エンコーダ部２１１は出力される多視点画像データのエンコード処理を行う。メディアインターフェース２０７は、ＰＣその他メディア（例えば、ＰＣ、ハードディスク、メモリーカード、ＣＦカード、ＳＤカード、ＵＳＢメモリなど）２０８に装置を結合するためのインターフェースである。例えば、メディアインターフェース２０７を介して合成画像データ及び多視点画像データの出力が行われる。なお、通常、装置の構成要素は上記以外にも存在するが、本発明を説明するために特に必要ではないので説明を省略する。 The digital signal processing unit 212 performs γ adjustment of the multi-viewpoint image data acquired by the multi-viewpoint imaging unit 213, interpolation of defective pixels, and the like. These processes are performed by the image processing unit 214 before image processing, which is a feature of this embodiment. The encoder unit 211 performs an encoding process on the output multi-viewpoint image data. The media interface 207 is an interface for coupling the apparatus to a PC or other media (for example, PC, hard disk, memory card, CF card, SD card, USB memory, etc.) 208. For example, composite image data and multi-viewpoint image data are output via the media interface 207. Usually, the components of the apparatus are present in addition to the above, but the description thereof is omitted because they are not particularly necessary for explaining the present invention.

＜多視点撮像部の構成＞
図３のブロック図を参照して多視点撮像部２１３の構成例を説明する。撮像部３０１〜３０９は各々が空間的に異なる配置された単一の視点から画像を撮像することにより複数の画像データからなる多視点画像データを取得する。撮像部３０１〜３０９は、図１の撮像部１０５〜１１３にそれぞれ対応するものとして説明する。 <Configuration of multi-viewpoint imaging unit>
A configuration example of the multi-viewpoint imaging unit 213 will be described with reference to the block diagram of FIG. The imaging units 301 to 309 acquire multi-viewpoint image data including a plurality of image data by capturing images from a single viewpoint that is arranged spatially different from each other. The imaging units 301 to 309 will be described as corresponding to the imaging units 105 to 113 in FIG.

図４を参照して撮像部３０１〜３０９の各々の構成の一例を説明する。レンズ４０１及びレンズ４０２は結像光学系を構成する。被写体から発せられた光線はこの結像光学系により集光し、絞り４０３、ＩＲカットフィルタ４０４、ローパスフィルタ４０５およびカラーフィルタ４０６を通過した後に、撮像素子４０７上に結像する。光学系制御部４１０はレンズ４０１及びレンズ４０２を制御して、その結像光学系としての合焦距離や焦点距離を変化させる。また、絞り４０３を開閉して、Ｆ値の調整を行う。本実施例では、撮像部３０１〜３０９の各々の制御用の光学系制御部４１０は、各撮像部の光学系を制御するが、その制御値は撮像制御部３１０により一括して設定される。設定される制御値は、例えば合焦距離、焦点距離およびＦ値などであり、操作部２０４でユーザーにより入力された設定値、または別途設けられた自動露出装置およびオートフォーカス装置などにより算出された値に基づいて定められる。 An example of the configuration of each of the imaging units 301 to 309 will be described with reference to FIG. The lens 401 and the lens 402 constitute an imaging optical system. Light rays emitted from the subject are collected by this imaging optical system, and pass through the aperture 403, the IR cut filter 404, the low pass filter 405, and the color filter 406, and then form an image on the image sensor 407. The optical system control unit 410 controls the lens 401 and the lens 402 to change the focusing distance and the focal length as the imaging optical system. Further, the aperture 403 is opened and closed to adjust the F value. In the present embodiment, the control optical system control unit 410 of each of the image capturing units 301 to 309 controls the optical system of each image capturing unit, and the control value is collectively set by the image capturing control unit 310. The control values to be set are, for example, a focusing distance, a focal length, an F value, and the like, and are calculated by a setting value input by the user through the operation unit 204, or an automatic exposure device and an autofocus device that are provided separately. It is determined based on the value.

なお、ここで示した光学系の構成は、説明のために単純化した例であり、撮像素子上に被写体の像を結像させる機能を有すればどのような構成でもかまわない。また、カラー画像を取得する構成について説明したが、取得する画像は白黒や４色以上の画素を持つ画像とすることができ、さらに画素毎に露出の異なる画像でもかまわない。 Note that the configuration of the optical system shown here is a simplified example for explanation, and any configuration may be used as long as it has a function of forming an image of a subject on the image sensor. Although the configuration for acquiring a color image has been described, the acquired image can be an image having black and white or four or more pixels, and can be an image with different exposure for each pixel.

撮像素子４０７及びＡ／Ｄ変換部４０８として、ＣＭＯＳイメージセンサなどのイメージセンサを使用することができる。一般に、撮像素子４０７は２次元の格子状に画素が配列され、画素上の光量を検出して集約することにより結像された像を電気信号に変換する。Ａ／Ｄ変換部４０８は電気信号に変換された像の情報をデジタル信号に変換する。デジタル信号に変換された画像データはバッファ４０９内に格納される。撮像素子制御部４１１は撮像素子４０７及びＡ／Ｄ変換部４０８に対して、露光の開始と信号の読み出しの制御を行う。制御のタイミングなどの制御値は撮像制御部３１０により一括して設定される。 As the image sensor 407 and the A / D converter 408, an image sensor such as a CMOS image sensor can be used. In general, the image sensor 407 has pixels arranged in a two-dimensional lattice pattern, and detects and aggregates light amounts on the pixels to convert an image formed into an electrical signal. The A / D converter 408 converts the information of the image converted into an electric signal into a digital signal. The image data converted into the digital signal is stored in the buffer 409. The image sensor control unit 411 controls the start of exposure and signal readout for the image sensor 407 and the A / D converter 408. Control values such as control timing are collectively set by the imaging control unit 310.

以上、一般的な撮像部の構成を図４を参照して説明したが、撮像部３０１〜３０９の各々は全く同じ構成である必要はなく、各々被写体の像を結像し、画像データを取得する機能を有すれば、どのような設計や構成でもかまわない。 As described above, the configuration of the general imaging unit has been described with reference to FIG. 4, but each of the imaging units 301 to 309 does not have to have the same configuration, and forms an image of a subject and acquires image data. Any design or configuration may be used as long as it has a function to do so.

図５および図６を参照して、本実施例の撮像部３０１〜３０９の配置について説明する。各撮像部はその結像光学系の光軸方向を軸に持つ円筒形の筺体に格納され、多視点撮像装置の筺体５０１上に格子状に配置されている。筺体５０１は図１の筺体１０１に対応する。本実施例で、各撮像部の結像光学系の光軸は同じ向きになるよう配置されている。図５は多視点撮像装置を正面から見た図である。図６は図５における左方から多視点撮像装置を見た図である。図７は図５における上方から多視点撮像装置を見た図である。各図の関係を明らかにするため、それぞれの図の中に右手系の座標軸５０２、５０３および５０４を記した。 With reference to FIGS. 5 and 6, the arrangement of the imaging units 301 to 309 of the present embodiment will be described. Each imaging unit is stored in a cylindrical casing having the optical axis direction of the imaging optical system as an axis, and is arranged in a grid on the casing 501 of the multi-viewpoint imaging apparatus. The housing 501 corresponds to the housing 101 of FIG. In this embodiment, the optical axes of the image forming optical systems of the respective image pickup units are arranged in the same direction. FIG. 5 is a front view of the multi-viewpoint imaging apparatus. FIG. 6 is a diagram of the multi-viewpoint imaging device viewed from the left in FIG. FIG. 7 is a diagram of the multi-viewpoint imaging device viewed from above in FIG. In order to clarify the relationship between each figure, right-handed coordinate axes 502, 503 and 504 are shown in each figure.

なお、本実施例では撮像部が９つの場合について説明を行ったが、複数視点からの撮像が可能であれば、撮像部の数はいくつであっても構わない。また、図５の配置は一例に過ぎず、各視点の配置が設計ないしは計測結果から知ることができるならば、どのような配置でもかまわない。さらに、本実施例では、各撮像部は円筒形として示しているが、どのような形状の撮像素子でも用いることができる。 In the present embodiment, the case of nine image pickup units has been described. However, any number of image pickup units may be used as long as image pickup from a plurality of viewpoints is possible. Further, the arrangement of FIG. 5 is merely an example, and any arrangement may be used as long as the arrangement of each viewpoint can be known from the design or the measurement result. Furthermore, in this embodiment, each imaging unit is shown as a cylindrical shape, but any shape imaging device can be used.

図８を参照して、以降の説明で用いる各撮像部の配置情報、すなわち各視点の位置情報を本実施例ではどのように記述するかについて説明する。ここでは各視点の画像を取得する撮像部を、単純な透視投影カメラとして考える。座標系として、撮像部座標系、多視点撮像部座標系を考え、それらの関係から視点の配置を表現する。各座標軸の関係は右手系とする。 With reference to FIG. 8, how to describe the arrangement information of each imaging unit used in the following description, that is, the position information of each viewpoint, in this embodiment will be described. Here, an imaging unit that acquires an image of each viewpoint is considered as a simple perspective projection camera. As the coordinate system, an imaging unit coordinate system and a multi-viewpoint imaging unit coordinate system are considered, and the arrangement of viewpoints is expressed from these relationships. The relationship between the coordinate axes is a right-handed system.

多視点撮像部座標系は、多視点撮像装置に対応する座標系であり、各撮像部の視点の位置と向きの基準となる。撮像部座標系は各撮像部に対応する座標系であり、そのｚ軸方向は各撮像部の光軸方向を、その原点は光学中心を表している。ｎ番目の撮像部の撮像部座標系の原点は、多視点撮像部座標系を用いてｔ_nと表す。また、ｎ番目の撮像部の撮像部座標系のｘ軸、ｙ軸、ｚ軸は、それぞれ多視点撮像部座標系の単位ベクトルを用いてｖ_xn、ｖ_yn、ｖ_znと表す。 The multi-viewpoint imaging unit coordinate system is a coordinate system corresponding to the multi-viewpoint imaging apparatus, and serves as a reference for the position and orientation of the viewpoint of each imaging unit. The imaging unit coordinate system is a coordinate system corresponding to each imaging unit, the z-axis direction representing the optical axis direction of each imaging unit, and the origin representing the optical center. The origin of the imaging unit coordinate system of the n-th imaging unit is expressed as t _n using the multi-view imaging unit coordinate system. The x-axis, y-axis, and z-axis of the imaging unit coordinate system of the n-th imaging unit are represented as v _xn , v _yn , and v _zn using unit vectors of the multi-view imaging unit coordinate system, respectively.

本実施例では、各撮像部は多視点撮像部座標系でｚ成分が０となるような平面に近い位置に配置されており、各撮像部の光軸はほぼ多視点撮像部座標系のｚ軸方向を向いているものとする。以上の撮像部配置情報は設計値ないしはキャリブレーションによって算出され、あらかじめＲＯＭ２０２またはＲＡＭ２０３に格納されているものとする。 In this embodiment, each imaging unit is arranged at a position close to a plane such that the z component is 0 in the multi-view imaging unit coordinate system, and the optical axis of each imaging unit is substantially z in the multi-view imaging unit coordinate system. It shall be facing the axial direction. The above imaging unit arrangement information is calculated by design values or calibration, and is stored in the ROM 202 or RAM 203 in advance.

多視点撮像部座標系はリフォーカス処理によって再現する仮想的な撮像系の配置の基準ともなる。仮想的な撮像系は、仮想的な光学系に対して仮想的なイメージセンサと共役の関係になるような面、すなわち像側のピントが合っている面を用いて表現される。このような面を以降、仮想合焦面と呼ぶ。本実施例では仮想合焦面は、多視点撮像部座標系のｚ軸に垂直であるものとし、仮想合焦面の位置は多視点撮像部座標系の原点から仮想合焦面までの距離ｄを用いて表す。 The multi-viewpoint imaging unit coordinate system is also a reference for the placement of a virtual imaging system that is reproduced by refocus processing. The virtual imaging system is expressed using a surface that is conjugate with the virtual image sensor, that is, a surface that is in focus on the image side. Such a surface is hereinafter referred to as a virtual in-focus surface. In this embodiment, the virtual focusing plane is assumed to be perpendicular to the z-axis of the multi-view imaging unit coordinate system, and the position of the virtual focusing plane is the distance d from the origin of the multi-view imaging unit coordinate system to the virtual focusing plane. It expresses using.

仮想合焦面を表す情報は、ユーザーインターフェースを介してユーザーが入力するなどしたパラメータから算出され、予めＲＡＭ２０３に格納されているものとする。具体的な算出手法は、本技術分野で知られたいずれの方法も用いることができる。 Information representing the virtual in-focus plane is calculated from parameters input by the user via the user interface, and is stored in the RAM 203 in advance. As a specific calculation method, any method known in this technical field can be used.

なお、以上説明した撮像部の配置と仮想的な撮像系の表現は一例であり、各撮像部と多視点撮像装置、仮想的な撮像系の配置についての情報を示すことができるものであるならば、どのような表現でも構わない。 Note that the above-described arrangement of the imaging units and the representation of the virtual imaging system are merely examples, and the information about each imaging unit, the multi-view imaging device, and the arrangement of the virtual imaging system can be shown. Any expression is acceptable.

＜画像合成処理の概要＞
本実施例の特徴的な構成である画像処理部２１４で行う画像合成処理の概要について説明する。画像処理部２１４では、多視点撮像部２１３を用いて取得した同一シーンを異なる視点から撮像した画像を処理することで、所望の仮想的な撮像系で撮像したかのような画像を合成することができる。図９は、画像合成処理における被写体、撮像部、仮想的な撮像系の配置を説明するための図である。図９に示すシーン中にはそれぞれ多視点撮像部２１３からの距離の異なる被写体７０４、７０５および７０６が存在し、これを互いに異なる視点に存在する実在の撮像部７０１および７０２で撮像する。撮像部７０１および７０２は空間内の光線情報を画像として取得しており、これを用いて仮想的な撮像系７０３で撮像したかのような画像を合成する。仮想的な撮像系で撮像したかのような合成画像を以降、仮想視点画像と呼ぶ。 <Overview of image composition processing>
An overview of image composition processing performed by the image processing unit 214, which is a characteristic configuration of the present embodiment, will be described. The image processing unit 214 synthesizes an image as if it were captured by a desired virtual imaging system by processing images obtained by capturing the same scene acquired using the multi-viewpoint imaging unit 213 from different viewpoints. Can do. FIG. 9 is a diagram for explaining the arrangement of a subject, an imaging unit, and a virtual imaging system in the image composition process. In the scene shown in FIG. 9, there are subjects 704, 705, and 706 having different distances from the multi-viewpoint imaging unit 213, and these are imaged by the actual imaging units 701 and 702 that exist at different viewpoints. The imaging units 701 and 702 acquire light information in the space as an image, and use this to synthesize an image as if it was captured by a virtual imaging system 703. Hereinafter, a composite image as if it was captured by a virtual imaging system is referred to as a virtual viewpoint image.

図１０を参照して、仮想視点画像の合成方法の概要について説明する。画像処理部２１４において合成する仮想的な撮像系は、有限の被写界深度を有する撮像系である。すなわち図１０に示す撮像系７０３おいて、自然光や照明などの光を被写体が反射した光線を、光学系８０４によって集光し、センサ８０５上に結像させ、サンプリングすることで画像が取得されるものとする。このような仮想的な撮像系によって取得した画像を合成するためには、撮像部７０１および７０２で取得した光線を用いて、光学系８０４による結像およびセンサ８０５によるサンプリングの過程を再現すればよい。センサ８０５上の画素８０７の画素値は、画素８０７に入射する光線を足し合わせることで得られる。光学系８０４を理想光学系とするならば、センサ８０５上の画素８０７に入射する光線は、センサ８０５の共役面である合焦面８０６上の画素８０７に対する共役点８０８を必ず通る。従って、合焦面上に撮像部７０１および７０２によって取得した画像を投影したものを、合焦面８０６とセンサ８０５との関係に従って仮想視点画像の座標に合わせて変形し、これを全視点に渡って加算することにより、仮想視点画像が得られる。なお、実際には画像の明るさを維持するために、単純な加算ではなく全視点で平均化を行う。 With reference to FIG. 10, an outline of a method for synthesizing a virtual viewpoint image will be described. The virtual imaging system combined in the image processing unit 214 is an imaging system having a finite depth of field. That is, in the imaging system 703 shown in FIG. 10, the light beam reflected by the subject such as natural light or illumination is condensed by the optical system 804, imaged on the sensor 805, and sampled to obtain an image. Shall. In order to synthesize an image acquired by such a virtual imaging system, it is only necessary to reproduce the imaging process by the optical system 804 and the sampling process by the sensor 805 using the light beams acquired by the imaging units 701 and 702. . The pixel value of the pixel 807 on the sensor 805 is obtained by adding the light rays incident on the pixel 807. If the optical system 804 is an ideal optical system, a light beam incident on the pixel 807 on the sensor 805 always passes through a conjugate point 808 with respect to the pixel 807 on the focusing surface 806 that is a conjugate surface of the sensor 805. Accordingly, the image obtained by projecting the images acquired by the imaging units 701 and 702 on the in-focus surface is deformed according to the coordinates of the virtual viewpoint image according to the relationship between the in-focus surface 806 and the sensor 805, and this is applied to all viewpoints. Are added to obtain a virtual viewpoint image. Actually, in order to maintain the brightness of the image, averaging is performed from all viewpoints instead of simple addition.

図１１は被写体７０４が存在する距離に、仮想的な撮像系の合焦面９０３を設定し、撮像部７０１および７０２で取得した図１２に示す画像９０１および９０２を投影して得られる画像を示す図である。被写体７０４は、撮像部７０１および７０２のどちらにおいても、同じ位置に投影されるようにするため、ピントのあった絵となる。一方で、被写体７０５および７０６は合焦面９０３上には無いため、投影位置がずれてぼけた絵となる。図１１においてぼけた様子は、被写体７０５および７０６を二重にして示す。これが有限の被写界深度の再現であり、合焦面にある被写体にピントを合わせ、それ以外の面にある被写体はぼけるように画像処理されることとなる。 FIG. 11 shows an image obtained by projecting the images 901 and 902 shown in FIG. 12 acquired by the imaging units 701 and 702 by setting a focal plane 903 of a virtual imaging system at a distance where the subject 704 exists. FIG. The subject 704 is a focused picture so that it is projected at the same position in both the imaging units 701 and 702. On the other hand, since the subjects 705 and 706 are not on the focusing plane 903, the projected position is shifted and the picture is blurred. The blurred state in FIG. 11 is shown with the subjects 705 and 706 doubled. This is a reproduction of a finite depth of field, and the image processing is performed so that the subject on the in-focus plane is focused and the subject on the other plane is blurred.

以上の仮想視点画像の合成方法では、撮像部７０１および７０２の配置や向き、画角などのカメラパラメータが既知であることを前提とする。しかし、実際にはこれらのカメラパラメータが何らかの原因で変化してしまうと、図１４に示すように、与えられた値と実際の値との間にずれが生じ、図１３に示すように合焦面上に投影した際に全視点の画像が重なるべき被写体７０４もぼけた絵となってしまう。従って、補正処理によりこのボケを低減することが、必要である。 In the above-described method for synthesizing a virtual viewpoint image, it is assumed that camera parameters such as the arrangement, orientation, and angle of view of the imaging units 701 and 702 are known. However, when these camera parameters actually change for some reason, as shown in FIG. 14, a deviation occurs between the given value and the actual value, and the in-focus state as shown in FIG. When projected onto the surface, the subject 704 to which the images of all viewpoints should overlap is also a blurred picture. Therefore, it is necessary to reduce this blur by correction processing.

本実施例の補正処理の概要を、図１５を参照して説明する。本実施例では多視点画像から被写体の距離を算出する。撮像部７０１および７０２のそれぞれに対応する被写体距離情報に基づいて得られた画像データの模式図が画像１００１および１００２である。距離情報から合焦面上に被写体、本実施例では被写体７０４が存在する画像上の領域を抽出する。撮像部７０１および７０２のそれぞれに対応する抽出された領域のマスクの模式図がマスク１００３および１００４である（処理（１））。ここでは被写体７０４上に合焦面があるものとする。 The outline of the correction process of this embodiment will be described with reference to FIG. In this embodiment, the distance of the subject is calculated from the multi-viewpoint image. Images 1001 and 1002 are schematic diagrams of image data obtained based on subject distance information corresponding to the imaging units 701 and 702, respectively. From the distance information, an area on the image where the subject, in this embodiment the subject 704 exists, is extracted on the in-focus plane. Schematic diagrams of extracted region masks corresponding to the imaging units 701 and 702 are masks 1003 and 1004 (processing (1)). Here, it is assumed that the in-focus surface is on the subject 704.

図１４に示すような実際の撮像部７０１および７０２のそれぞれの画像９０１および９０２に上記で得られたマスク１００３および１００４を適用し、合焦領域を抽出したものが、画像１００５および１００６である（処理（２））。こうして得られた画像を、合焦面上に投影し、多視点画像上の座標に変換したものが、画像１００７および１００８である（処理（３））。これを重ね合わせると仮想視点画像１００９が得られる（処理（４））。そのままでは、カメラパラメータのずれによって、各視点間で画像がずれている。これが重なるよう、一方の視点の画像を基準として位置を合わせ、補正画像１０１０のような位置ずれを記述する補正パラメータを算出する（処理（５））。この補正パラメータを、仮想視点画像合成時に基準ではない方の撮像部で得られた画像に補正画像１０１１のように画像全体に適用することにより、合焦領域にボケのない画像が得られる（処理（６））。なお、ここでは説明のために撮像部を２つとしたが、撮像部の数はいくつであってもかまわない。 Images 1005 and 1006 are obtained by applying the masks 1003 and 1004 obtained above to the images 901 and 902 of the actual imaging units 701 and 702 as shown in FIG. Process (2)). Images 1007 and 1008 are obtained by projecting the image thus obtained onto the in-focus plane and converting it into coordinates on the multi-viewpoint image (processing (3)). When these are superimposed, a virtual viewpoint image 1009 is obtained (processing (4)). As it is, the image is shifted between the viewpoints due to the shift of the camera parameter. The positions are aligned with reference to the image of one viewpoint so that they overlap, and a correction parameter describing a positional deviation like the corrected image 1010 is calculated (process (5)). By applying this correction parameter to the entire image like the corrected image 1011 to the image obtained by the imaging unit that is not the reference at the time of virtual viewpoint image synthesis, an image without blur in the in-focus area is obtained (processing) (6)). Here, for the sake of explanation, two imaging units are used, but any number of imaging units may be used.

＜画像処理部の構成＞
図１６を参照して、画像処理部２１４の構成例を説明する。多視点画像取得部１１０１は、多視点撮像部２１３によって撮像された多視点画像データを、バス２１０を介して取得する。視点情報取得部１１０２は、設計値や校正値に基づき設定された多視点撮像部２１３を構成する各撮像部の配置や特性についての情報をバス２１０を介して取得する。仮想撮像系情報取得部１１０３は、予めユーザーによって設定されている仮想撮像系についての情報をバス２１０を介して取得する。 <Configuration of image processing unit>
A configuration example of the image processing unit 214 will be described with reference to FIG. The multi-view image acquisition unit 1101 acquires multi-view image data captured by the multi-view image capturing unit 213 via the bus 210. The viewpoint information acquisition unit 1102 acquires information about the arrangement and characteristics of each imaging unit constituting the multi-viewpoint imaging unit 213 set based on the design value and the calibration value via the bus 210. The virtual imaging system information acquisition unit 1103 acquires information about the virtual imaging system preset by the user via the bus 210.

画素位置変換パラメータ算出部１１０４は、詳細は後述するが、各視点から撮像された画像上の画素が、画像合成部１１０８で生成する画像上の、どの位置に対応するかを表すパラメータを、視点情報と仮想撮像系情報とから算出する。距離算出部１１０５は、詳細は後述するが、多視点画像と視点情報とから各視点から撮像された画像上の画素毎に被写体距離を算出する。合焦領域抽出部１１０６は、詳細は後述するが、距離算出部１１０５が算出した被写体距離と、仮想撮像系情報取得部が取得した仮想撮像系情報とから、仮想撮像系において各視点から撮像された画像上で合焦する領域を特定する。特定した領域の画像データを、多視点画像を構成する各視点から撮像された画像データから抽出する。 As will be described in detail later, the pixel position conversion parameter calculation unit 1104 sets a parameter indicating which position on the image generated by the image synthesis unit 1108 the pixel on the image captured from each viewpoint corresponds to the viewpoint. It calculates from information and virtual imaging system information. Although details will be described later, the distance calculation unit 1105 calculates the subject distance for each pixel on the image captured from each viewpoint from the multi-viewpoint image and the viewpoint information. Although the details will be described later, the focus area extraction unit 1106 is imaged from each viewpoint in the virtual imaging system from the subject distance calculated by the distance calculation unit 1105 and the virtual imaging system information acquired by the virtual imaging system information acquisition unit. The region to be focused on is determined. Image data of the specified region is extracted from image data captured from each viewpoint constituting the multi-viewpoint image.

位置合わせ部１１０７は、合焦領域抽出部１１０６が抽出した各視点から撮像された画像上での合焦領域を画素位置変換パラメータ算出部が算出したパラメータに基づいて仮想視点画像上に変換する。変換した画像のうち１つの視点の画像を基準として、詳細は後述するが、他の画像と重なるような変換を表す位置合わせパラメータを算出する。画像合成部１１０８は、画素位置変換パラメータに基づいて各視点の画像を仮想視点画像上に変換し、さらに位置合わせパラメータを適用して補正を行う。詳細は後述するが、そのようにして得られた画像を全視点にわたって重ね合わせ、仮想視点画像を合成する。画像出力部１１０９は、画像合成部１１０８で合成された画像の画像データをバッファ２１０に出力する。 The alignment unit 1107 converts the focus area on the image captured from each viewpoint extracted by the focus area extraction unit 1106 into a virtual viewpoint image based on the parameter calculated by the pixel position conversion parameter calculation unit. Although the details will be described later on the basis of an image of one viewpoint among the converted images, alignment parameters representing conversion that overlaps with other images are calculated. The image composition unit 1108 converts the image of each viewpoint into a virtual viewpoint image based on the pixel position conversion parameter, and further performs correction by applying the alignment parameter. Although details will be described later, the virtual viewpoint images are synthesized by superimposing the images thus obtained over all the viewpoints. The image output unit 1109 outputs the image data of the image synthesized by the image synthesis unit 1108 to the buffer 210.

図１７を参照して、画像合成処理の流れの一例を説明する。ステップＳ１２０１では多視点画像として複数の画像データを取得する。ステップＳ１２０２では多視点画像データを取得した各撮像部に関する情報を取得する。ステップＳ１２０３では、ユーザーによって設定された仮想撮像系に関する情報を取得する。ステップＳ１２０４では、詳細は後述するが、ステップＳ１２０２で取得した視点情報と、ステップＳ１２０３で取得した仮想撮像系情報とから、各視点から撮像された画像の画素が、仮想視点画像上のどの位置に対応するかを判定する。判定された対応関係に従って、各視点から撮像した画像を変換（合成変換）するための画素位置変換パラメータを算出する。 An example of the flow of image composition processing will be described with reference to FIG. In step S1201, a plurality of image data is acquired as a multi-viewpoint image. In step S1202, information regarding each imaging unit that has acquired multi-viewpoint image data is acquired. In step S1203, information related to the virtual imaging system set by the user is acquired. In step S1204, as will be described in detail later, based on the viewpoint information acquired in step S1202 and the virtual imaging system information acquired in step S1203, at which position on the virtual viewpoint image the pixel of the image captured from each viewpoint is located. Judge whether it corresponds. In accordance with the determined correspondence, a pixel position conversion parameter for converting (synthesizing) an image captured from each viewpoint is calculated.

ステップＳ１２０５では、詳細は後述するが、ステップＳ１２０１で取得した多視点画像と、ステップＳ１２０２で取得した視点情報とから、各視点から撮像された画像上の画素毎に被写体距離を算出する。ステップＳ１２０６では、詳細は後述するが、ステップＳ１２０３で取得した仮想撮像系情報と、ステップＳ１２０５で算出した被写体距離とに基づき、各視点から撮像された画像上の仮想撮像系が合焦する領域を抽出する。ステップＳ１２０７では、詳細は後述するが、ステップＳ１２０４で算出した画素位置変換パラメータに基づいて、ステップＳ１２０６で抽出した合焦領域を仮想視点画像上に変換する。変換した画像のうち１つの視点から撮像された画像を基準画像として、他の画像と重なるように変換するための位置合わせパラメータを算出する。 In step S1205, although details will be described later, the subject distance is calculated for each pixel on the image captured from each viewpoint from the multi-viewpoint image acquired in step S1201 and the viewpoint information acquired in step S1202. In step S1206, although details will be described later, based on the virtual imaging system information acquired in step S1203 and the subject distance calculated in step S1205, an area where the virtual imaging system on the image captured from each viewpoint is in focus is determined. Extract. In step S1207, although the details will be described later, based on the pixel position conversion parameter calculated in step S1204, the in-focus area extracted in step S1206 is converted into a virtual viewpoint image. Using the image captured from one viewpoint among the converted images as a reference image, alignment parameters for conversion so as to overlap with other images are calculated.

ステップＳ１２０８では、詳細は後述するが、ステップＳ１２０４で算出した画素位置変換パラメータに基づいて各視点の画像を仮想視点画像上に変換し、さらにステップＳ１２０７で算出して位置合わせパラメータを適用して補正を行う。そのようにして得られた全撮像部が撮像した画像を重ね合わせ、仮想視点画像データを合成する。ステップＳ１２０９では、ステップＳ１２０８で合成した仮想視点画像データを出力する。 In step S1208, although details will be described later, the image of each viewpoint is converted into a virtual viewpoint image based on the pixel position conversion parameter calculated in step S1204, and further corrected in step S1207 by applying the alignment parameter. I do. The virtual viewpoint image data is synthesized by superimposing the images captured by all the imaging units thus obtained. In step S1209, the virtual viewpoint image data synthesized in step S1208 is output.

なお、本実施例では多視点画像から被写体距離を算出するものとしたが、別の算出した被写体距離を外部から取得する構成とすることもできる。 In this embodiment, the subject distance is calculated from the multi-viewpoint image, but another calculated subject distance may be acquired from the outside.

＜画素位置変換パラメータ算出方法＞
画像処理部２１４では多視点撮像部２１３で撮像した多視点画像を用いて、仮想的な撮像系で撮像したかのような画像を合成する仮想視点画像合成処理を行うが、この際に必要となる画素位置変換パラメータの算出方法を以下に説明する。一般に、多視点画像はシーンの光線情報を持つので、これを仮想的な光学系で結像させ、仮想的なイメージセンサで画像化する過程を再現すれば、仮想的な撮像系で撮像したかのような画像が得られる。この再現に、画素位置変換パラメータが必要となる。 <Pixel position conversion parameter calculation method>
The image processing unit 214 uses the multi-viewpoint image captured by the multi-viewpoint imaging unit 213 to perform virtual viewpoint image composition processing for compositing an image as if it was captured by a virtual imaging system. A calculation method of the pixel position conversion parameter will be described below. In general, multi-viewpoint images have scene ray information, so if you recreate the process of imaging with a virtual optical system and imaging with a virtual image sensor, is the image captured with a virtual imaging system? An image like this is obtained. For this reproduction, pixel position conversion parameters are required.

結像光学系の性質から、像側のある点を通る光線は、物体側の共役点を必ず通る。従って、仮想光学系に対する仮想センサの物体側の共役面を考え、この共役面上の各点を通る光線を足し合わせることで、仮想的な撮像系で撮像したかのような画像が得られる。この共役面は、仮想的な撮像系の物体側のピントが合っている面であり、ここでは仮想合焦面と呼ぶ。仮想合焦面上で足し合わせられた光線をサンプリングして得られるものが合成画像である。画素位置変換パラメータは、多視点画像の各画像上の座標を、仮想視点画像上の座標に変換するためのパラメータである。 Due to the nature of the imaging optical system, light rays that pass through a certain point on the image side always pass through a conjugate point on the object side. Therefore, by considering a conjugate plane on the object side of the virtual sensor with respect to the virtual optical system, and adding the light rays that pass through each point on the conjugate plane, an image as if it was captured by a virtual imaging system can be obtained. This conjugate plane is a plane that is in focus on the object side of the virtual imaging system, and is called a virtual focusing plane here. A composite image is obtained by sampling the light rays added on the virtual in-focus plane. The pixel position conversion parameter is a parameter for converting coordinates on each image of the multi-viewpoint image into coordinates on the virtual viewpoint image.

画素位置変換パラメータ算出方法の一例を示す。まず、ｎ番目の撮像部の横画角をθ_x _n、縦画角をθ_y _n、横画素数をｓ_x _n、縦画素数をｓ_y _n、横画素数をｓ_x _n、縦画素数をｓ_y _n、画像上の光学中心位置を（ｃ_x _n，ｃ_y _n）とし、内部パラメータ行列Ａ_nを式（１）のように表す。 An example of a pixel position conversion parameter calculation method will be described. First, the horizontal field angle of the n-th imaging unit is θ _x _n , the vertical field angle is θ _y _n , the horizontal pixel number is s _x _n , the vertical pixel number is s _y _n , the horizontal pixel number is s _x _n , and the vertical pixel is the number s _y _n, the optical center position on the image as a _{_{_{_{(c x n, c y n}}}} ), representing the internal parameter matrix a _n by the equation (1).

同様に仮想視点画像合成処理によって再現する仮想撮像系の内部パラメータ行列Ａ_vを定義する。横画角をθ_x _n、縦画角をθ_y _v、横画素数をｓ_x _v、縦画素数をｓ_y _v、横画素数をｓ_x _v、縦画素数をｓ_yv、画像上の光学中心位置を（ｃ_x _v，ｃ_y _v）とすると、内部パラメータ行列Ａ_vは式（２）のように表される。 It defines the internal parameter matrix A _v of the virtual imaging system to reproduce as well by the virtual viewpoint image synthesis processing. The horizontal angle of view is θ _x _n , the vertical angle of view is θ _y _v , the number of horizontal pixels is s _x _v , the number of vertical pixels is s _y _v , the number of horizontal pixels is s _x _v , the number of vertical pixels is s _yv , Assuming that the optical center position is (c _x _v , _cy _v ), the internal parameter matrix A _v is expressed as in equation (2).

また、図６の記法を用いて、各撮像部座標系から多視点撮像部座標系への４行４列の座標変換行列Ｍ_nは式（３）のように表す。 Further, using the notation of FIG. 6, a 4 × 4 coordinate transformation matrix M _n from each imaging unit coordinate system to the multi-viewpoint imaging unit coordinate system is expressed as in Expression (3).

また、３行３列の回転行列Ｒ_nを式（４）のように表す。 Also, a 3 × 3 rotation matrix R _n is expressed as shown in Equation (4).

あるカメラにおけるある画素の座標を（ｘ，ｙ）としたとき、式（５）のような（ｘ，ｙ）の斉次座標ｘ_hを用いると、対応する光線ｒは式（６）のように表される。 When the coordinate of a certain pixel in a certain camera is (x, y) and the homogeneous coordinate x _h of (x, y) as shown in equation (5) is used, the corresponding ray r is as shown in equation (6). It is expressed in

ここで、ｋ_(x,y)は光線の軌跡を表すパラメータである。光線と仮想合焦面の交点に対応するｋ_(x,y)と、多視点撮像部座標系の原点から仮想合焦面までの距離ｄ、仮想合焦面の法線単位ベクトルであるｚ軸方向単位ベクトルをｅ_zの関係は式（７）のようになる。 Here, k _{(x, y)} is a parameter representing the trajectory of the light beam. K _{(x, y)} corresponding to the intersection of the ray and the virtual focal plane, the distance d from the origin of the multi-viewpoint imaging unit coordinate system to the virtual focal plane, and the z axis that is the normal unit vector of the virtual focal plane relation to the direction unit vector e _z is as equation (7).

これを、１／ｋ_(x,y) について解いたものは、式（８）に示すＢ_nを用いて、式（９）のようになる。 A solution of this to 1 / k _{(x, y)} is as shown in equation (9) using B _n shown in equation (8).

式（６）に式（９）を代入し、Ａ_vを左側から適用すると、光線と仮想合焦面の交点を合成画像上の座標に射影した点を表す斉次座標が得られる。斉次座標は定数倍しても同値であるので、式（６）の両辺をｋ_(x,y)で割ったものに、式（９）を代入し、Ａ_vを左側から作用させても、光線と仮想合焦面の交点を合成画像上の座標に射影した点を表す斉次座標ｘ’_hが式（１０）のように得られる。 Substituting equation (9) into equation (6), applying the A _v from the left, homogeneous coordinates representing points obtained by projecting the intersection of the ray and the virtual focus plane coordinates on the synthesized image is obtained. Since homogeneous coordinates is also multiplied by a constant it is equivalent to those obtained by dividing the both sides of the equation (6) with k _{(x, y),} substituting the equation (9), even when allowed to act A _v from the left Then, a homogeneous coordinate x ′ _h representing a point obtained by projecting the intersection of the ray and the virtual focal plane to the coordinate on the composite image is obtained as shown in Expression (10).

λは不定性を表す係数であり、あるスカラーである。式（１０）は画像の射影変換に相当する。本実施形態では式（１１）のような射影変換行列Ｈ_nを全撮像部について計算して得た、行列の集まりを画素配置パラメータとして扱う。 λ is a coefficient representing indefiniteness and is a certain scalar. Expression (10) corresponds to projective transformation of an image. In the present embodiment, a collection of matrices obtained by calculating the projection transformation matrix H _n as shown in Expression (11) for all imaging units is handled as a pixel arrangement parameter.

なお、本実施形態の画素配置パラメータと、その算出方法とは一例であり、多視点撮像部によって得た画像上の座標と、合成画像上の座標との対応関係を規定するものならばどのようなものでもよい。 It should be noted that the pixel arrangement parameter and the calculation method thereof according to the present embodiment are merely examples, and any method may be used as long as it defines the correspondence between the coordinates on the image obtained by the multi-viewpoint imaging unit and the coordinates on the composite image. It may be anything.

＜距離算出方法＞
距離算出部１１０５では多視点画像から各視点から撮像された画像上の画素毎に被写体距離を算出する。異なる複数の視点から撮像された画像を用いて距離を推定する方法は、非特許文献２のＣｈａｐｔｅｒ１１などで詳述されており、システムの構成に合わせた適切な方法を用いれば良い。一例として、ｖｉｒｔｕａｌｃａｍｅｒａを、被写体距離を算出したい視点そのものと同一に設定し、ＰｌａｎｅＳｗｅｅｐ法を行うなどの方法がある。なお、無限遠方にある被写体の距離を取り扱えるよう、実際に求めるのは距離の逆数などにするとよい。 <Distance calculation method>
The distance calculation unit 1105 calculates the subject distance for each pixel on the image captured from each viewpoint from the multi-viewpoint image. A method for estimating the distance using images captured from a plurality of different viewpoints is described in detail in Chapter 11 of Non-Patent Document 2, and an appropriate method according to the system configuration may be used. As an example, there is a method in which the virtual camera is set to be the same as the viewpoint itself for which the subject distance is to be calculated, and the Plane Sweep method is performed. It should be noted that the reciprocal of the distance or the like is actually obtained so that the distance of the subject at an infinite distance can be handled.

＜合焦領域抽出方法＞
合焦領域抽出部１１０６では、距離算出部１１０５で算出した被写体距離と、仮想撮像系情報とを用いて、各視点から撮像された画像上において仮想撮像系が合焦する領域を抽出する。すなわち、仮想撮像系情報として与えられる仮想合焦面を各撮像部座標系に変換し、各視点から撮像された画像上の画素毎に仮想合焦面までの距離を求め、被写体距離と比較することで、その画素が仮想撮像系の合焦する領域に属するか否か判定する。 <Focus area extraction method>
The in-focus area extraction unit 1106 uses the subject distance calculated by the distance calculation unit 1105 and the virtual imaging system information to extract an area in which the virtual imaging system is in focus on the image captured from each viewpoint. That is, the virtual focusing plane given as virtual imaging system information is converted into each imaging unit coordinate system, and the distance to the virtual focusing plane is obtained for each pixel on the image captured from each viewpoint, and compared with the subject distance. Thus, it is determined whether or not the pixel belongs to the in-focus area of the virtual imaging system.

仮想合焦面上の点の斉次座標をＸ_vとすると、仮想合焦面に相当する平面の斉次座標Ｌ_vを用いて式（１４）のように平面を表すことができる。 If the homogeneous coordinates of a point on the virtual in-focus plane is X _v , the plane can be expressed as in equation (14) using the homogeneous coordinates L _v of the plane corresponding to the virtual in-focus plane.

ここで、仮想合焦面が図８を参照して説明したように、ｚ軸に垂直で原点からの距離がｄならば、ｌ₁＝０，ｌ₂＝０，ｌ₃＝１，ｌ₄＝−ｄとなる。Ｌ_vをｎ番目の撮像部の撮像部座標系での仮想合焦面の斉次座標Ｌ_nに変換すると、式（３）を用いて式（１５）のようになる。 Here, as described with reference to FIG. 8, if the virtual focal plane is perpendicular to the z axis and the distance from the origin is d, l ₁ = 0, l ₂ = 0, l ₃ = 1, l ₄ = −d. When L _v is converted into the homogeneous coordinates L _n of the virtual focusing plane in the imaging unit coordinate system of the n-th imaging unit, Equation (15) is obtained using Equation (3).

ｎ番目の撮像部の画像上のある点の斉次座標をｘ_nとおくと、合焦面までの距離がｄ_nならば合焦面と該点を通る光線の交点の撮像部座標系での斉次座標Ｘ_nは、式（１６）のようになる。 If the homogeneous coordinates of a certain point on the image of the n-th imaging unit are set to x _n , if the distance to the focal plane is d _n , the imaging unit coordinate system of the intersection of the focal plane and the light beam passing through the point is used. The homogeneous coordinate X _n is as shown in Equation (16).

平面の斉次座標と平面上の点の関係から式（１７）が成り立つ。 Expression (17) holds from the relationship between the homogeneous coordinates of the plane and the points on the plane.

１／ｄ_nについて解くと、式（１８）のようになり、合焦面までの距離情報が得られる。 Solving for 1 / d _n, it is as Equation (18), distance information to the focusing surface can be obtained.

合焦領域の判定は、距離算出部１１０５で算出したｎ番目の撮像系の画像上の点ｘ_nに対応する被写体の距離をｄ’_n（ｘ_n）とすると、閾値εを用いて式（１９）に基づくマスクｍ_nの生成として処理される。 The in-focus area is determined by using the threshold ε as an equation ( _where the distance of the subject corresponding to the point x _n on the n-th imaging system image calculated by the distance calculation unit 1105 is d ( _n )) ( 19) is processed as generation of a mask _mn based on.

抽出した合焦領域は、データとしては、各視点の画像及びそれぞれに対応付けられた合焦領域判定マスクのセットという形態で取り扱う。 The extracted in-focus area is handled as data in the form of an image of each viewpoint and a set of in-focus area determination masks associated with each viewpoint.

＜位置合わせ方法＞
位置合わせ部１１０７は、合焦領域抽出部１１０６が抽出した各視点となる撮像部の合焦領域を、画素位置変換パラメータ算出部１１０４が算出したパラメータに基づいて仮想視点画像上に変換する。変換した画像のうち１つの視点から撮像された画像を基準画像として、他の画像と重なるように変換するための位置合わせパラメータを算出する。 <Positioning method>
The alignment unit 1107 converts the in-focus area of the imaging unit serving as each viewpoint extracted by the in-focus area extraction unit 1106 onto a virtual viewpoint image based on the parameters calculated by the pixel position conversion parameter calculation unit 1104. Using the image captured from one viewpoint among the converted images as a reference image, alignment parameters for conversion so as to overlap with other images are calculated.

図１８を参照して、位置合わせパラメータの算出方法を説明する。ステップＳ１３０１では多視点画像を構成する各視点から撮像された画像データと、それに対応する合焦領域判定マスクを取得する。ステップＳ１３０２では各視点から撮像された画像を仮想視点画像上に変換する画素位置変換パラメータを取得する。ステップＳ１３０３ではステップＳ１３０１で取得した各視点から撮像された画像データとそれに対応する合焦領域判定マスクとを、ステップＳ１３０２で取得した画素位置変換パラメータを用いて仮想視点画像上に変換する。ステップＳ１３０４では位置合わせパラメータを算出するにあたっての基準となる画像を撮像した撮像部、すなわち基準の視点を選択する。 With reference to FIG. 18, a method for calculating the alignment parameter will be described. In step S1301, image data picked up from each viewpoint constituting a multi-viewpoint image and a corresponding focus area determination mask are acquired. In step S1302, a pixel position conversion parameter for converting an image captured from each viewpoint onto a virtual viewpoint image is acquired. In step S1303, the image data captured from each viewpoint acquired in step S1301 and the corresponding focus area determination mask are converted into a virtual viewpoint image using the pixel position conversion parameter acquired in step S1302. In step S1304, an imaging unit that captures an image serving as a reference for calculating the alignment parameter, that is, a reference viewpoint is selected.

ステップＳ１３０５からステップＳ１３０７は繰り返し処理であり、補正対象となる画像を撮像した撮像部、すなわち補正対象の視点を変えながら複数回処理を行って、全撮像部について撮像された画像の補正を行う。ステップＳ１３０５では未処理の視点から補正対象の視点を選択する。ステップＳ１３０６では、ステップＳ１３０３で仮想視点画像上に変換した基準視点画像の合焦領域が、補正対象画像の合焦領域上に重なるよう位置合わせを行う。位置合わせの方法については後述するが、その結果を表すパラメータが位置合わせパラメータである。ステップＳ１３０７では全ての視点について処理を行ったか判定し、未処理の視点があればステップＳ１３０５に戻る。全ての視点について処理済みであれば、ステップＳ１３０８において位置合わせパラメータを出力する。なお、基準の視点の画像に対する位置合わせパラメータとして、位置を動かさないような変換を表すパラメータを出力する。 Steps S1305 to S1307 are repetitive processes, in which an imaging unit that captures an image to be corrected, that is, a process is performed a plurality of times while changing the viewpoint of the correction target, and the images captured for all imaging units are corrected. In step S1305, a correction target viewpoint is selected from unprocessed viewpoints. In step S1306, alignment is performed so that the in-focus area of the reference viewpoint image converted into the virtual viewpoint image in step S1303 overlaps the in-focus area of the correction target image. The alignment method will be described later, but the parameter representing the result is the alignment parameter. In step S1307, it is determined whether all viewpoints have been processed. If there are unprocessed viewpoints, the process returns to step S1305. If all the viewpoints have been processed, the alignment parameter is output in step S1308. Note that a parameter representing conversion that does not move the position is output as an alignment parameter for the reference viewpoint image.

以上により算出された各撮像部が撮像した画像に対する位置合わせパラメータを用いて、位置合わせを実行する方法について説明する。ここで位置合わせの対象となっている画像の合焦領域に対応する被写体は、被写体側の空間において合焦面という単一の平面上に存在する。したがって、このような位置合わせは、射影変換やアフィン変換や２次の多項式変換などの単純な変形モデルにより可能である。モデルベースの位置合わせには、疎な対応点探索を行い、その結果からモデルのパラメータを算出する方法や、画像全体のずれをパラメトリックに記述し、画像間の輝度差が小さくなるよう最適化する方法などがある。 A method of performing alignment using the alignment parameters for the image captured by each imaging unit calculated as described above will be described. Here, the subject corresponding to the in-focus area of the image to be aligned exists on a single plane called the in-focus plane in the subject-side space. Therefore, such alignment is possible by a simple deformation model such as projective transformation, affine transformation, or second-order polynomial transformation. For model-based registration, sparse corresponding point search is performed, model parameters are calculated from the results, and the displacement of the entire image is described parametrically to optimize the brightness difference between images. There are methods.

前者の疎な対応点探索は非特許文献２のＣｈａｐｔｅｒ４に詳しい。また、モデルのパラメータを算出する方法は非特許文献２のＣｈａｐｔｅｒ６に詳しい。これらの組み合わせで位置合わせを行う場合、本実施例では合焦領域判定マスクを用いて、非合焦領域を対応点探索の対象から除外する。 The former sparse corresponding point search is detailed in Chapter 4 of Non-Patent Document 2. A method for calculating the model parameters is detailed in Chapter 6 of Non-Patent Document 2. In the case of performing alignment with these combinations, in the present embodiment, the in-focus area is excluded from the corresponding point search target by using the in-focus area determination mask.

また、後者の画像間の輝度差が小さくなるよう最適化する方法は、非特許文献２のＣｈａｐｔｅｒ８で述べられているＰａｒａｍｅｔｅｒｉｃｍｏｔｉｏｎがある。この方法で位置合わせを行う場合、本実施例では合焦領域判定マスクを用いて非合焦領域を画像間の輝度差の算出の対象から除外する。 Also, the latter method of optimizing so as to reduce the luminance difference between images is Parametric motion described in Chapter 8 of Non-Patent Document 2. In the case of performing alignment by this method, in this embodiment, the in-focus area is excluded from the calculation target of the luminance difference between images by using the in-focus area determination mask.

以下、疎な対応点探索として差分絶対値和もしくは差分二乗和を用いたブロックマッチングによってブロック毎の動きベクトルを求め、アフィン変換を用いて画面全体の変形を求める場合の例を説明する。 Hereinafter, an example will be described in which a motion vector for each block is obtained by block matching using a sum of absolute differences or a sum of squares of squares as a sparse corresponding point search, and a deformation of the entire screen is obtained using affine transformation.

図２１は、位置合わせの方法の一例の処理を示すフローチャートである。ブロック毎の動きベクトルを求める際にその前処理としてステップＳ１６０１の合焦領域ブロック判定と、ステップＳ１６０２の有効ブロック判定とを行う。合焦領域ブロック判定は、動きベクトルを求めようとするブロックが合焦領域に含まれるか否かの判定であり、以降の処理では合焦領域に含まれるブロックのみを使用する。有効ブロック判定は、詳細は後述するが、正しい動きベクトルが求まらない可能性のあるブロックを除外する処理である。ステップＳ１６０３では、ブロックの動きベクトルを算出する。ここでは一般的なブロックマッチング方法について説明する。 FIG. 21 is a flowchart showing an example of the alignment method. When the motion vector for each block is obtained, the focus area block determination in step S1601 and the effective block determination in step S1602 are performed as preprocessing. The focused area block determination is a determination as to whether or not a block for which a motion vector is to be obtained is included in the focused area, and only blocks included in the focused area are used in the subsequent processing. Although the details will be described later, the effective block determination is a process of excluding a block that may not be able to obtain a correct motion vector. In step S1603, a motion vector of the block is calculated. Here, a general block matching method will be described.

ブロックマッチング法では、マッチングの評価値としてブロック内の画素間の差分二乗和もしくは差分絶対値和を用いる。ベクトルを求める対象ブロックを基準画像のサーチ範囲内で順次動かしなら評価値を求めていく。サーチ範囲内で求めた全ての評価値の中から最小の評価値もつ位置が対称ブロックと最も相関の高い位置であり、その移動量が動きベクトルとなる。サーチ範囲を１画素ずつ求めていく方法はフルサーチと呼ばれている。これに対し、サーチ範囲を間引きながら最小の評価値を求め、次にその近傍に対して細かくサーチする方法は、ステップサーチと呼ばれている。ステップサーチは高速に動きベクトルを求める方法としてよく知られている。次にステップＳ１６０５にて、有効動きベクトル判定を行う。これは、詳細は後述するが、求めた動きベクトルのうち、算出結果が正しくないと判断されるものを除外する処理である。ステップＳ１６０４で、終了判定を行い、すべてのブロックの処理が終わると、ステップＳ１６０６にて、有効な動きベクトルから、アフィンパラメータの算出を行う。 In the block matching method, a sum of squares of differences or a sum of absolute differences between pixels in a block is used as a matching evaluation value. If the target block for which the vector is to be calculated is sequentially moved within the search range of the reference image, the evaluation value is determined. The position having the smallest evaluation value among all the evaluation values obtained in the search range is the position having the highest correlation with the symmetric block, and the movement amount is a motion vector. A method of obtaining the search range pixel by pixel is called full search. On the other hand, a method of obtaining a minimum evaluation value while thinning out the search range and then performing a detailed search for the vicinity thereof is called a step search. Step search is well known as a method for obtaining a motion vector at high speed. In step S1605, effective motion vector determination is performed. Although details will be described later, this is a process of excluding those obtained from the motion vectors that are determined to be incorrect. In step S1604, end determination is performed. When all blocks have been processed, affine parameters are calculated from valid motion vectors in step S1606.

次に、アフィンパラメータ算出の詳細を説明する。対象ブロックの中心座標が、（ｘ，ｙ）であり、動きベクトルの算出結果から基準画像におけるブロックの中心座標が（ｘ’，ｙ’）に移動したとすると、これらの関係は、式（２０）のように表すことができる。 Next, details of affine parameter calculation will be described. Assuming that the center coordinates of the target block are (x, y) and the center coordinates of the block in the reference image have moved to (x ′, y ′) from the motion vector calculation result, these relationships are expressed by the equation (20). ).

ここで、３×３の行列がアフィン変換行列である。行列の各要素がアフィンパラメータであり、ａ＝１，ｂ＝０，ｄ＝０，ｅ＝１のとき、この変換は平行移動となり、ｃが水平方向の移動量、ｆが垂直方向の移動量となる。また、回転角θでの回転移動は、ａ＝ｃｏｓθ，ｂ＝−ｓｉｎθ，ｄ＝ｓｉｎθ，ｅ＝ｃｏｓθで表すことができる。式（２０）は一般化した行列の形式で式（２１）のように表現することができる。 Here, a 3 × 3 matrix is an affine transformation matrix. When each element of the matrix is an affine parameter and a = 1, b = 0, d = 0, and e = 1, this conversion is parallel movement, c is the horizontal movement amount, and f is the vertical movement amount. It becomes. Further, the rotational movement at the rotation angle θ can be expressed by a = cos θ, b = −sin θ, d = sin θ, e = cos θ. Expression (20) can be expressed as general expression (21) in the form of a generalized matrix.

ここでｘとｘ’は１×３の行列、Ａは３×３の行列である。有効な動きベクトルがｎ個であった場合、対象画像の座標値は、式（２２）のようにｎ×３の行列で表現できる。 Here, x and x ′ are 1 × 3 matrices, and A is a 3 × 3 matrix. When there are n effective motion vectors, the coordinate value of the target image can be expressed by an n × 3 matrix as shown in Expression (22).

同様に、移動後の座標値も式（２３）のようにｎ×３の行列で表現できる。 Similarly, the coordinate values after movement can be expressed by an n × 3 matrix as shown in Equation (23).

よって、ｎ個の動きベクトルに対しては、式（２４）のような表現となる。 Therefore, for n motion vectors, the expression is as in Expression (24).

すなわち、式（２４）におけるアフィン行列Ａを求めれば、それが画面全体の位置ずれ量になる。アフィン行列を、ｎ個の動きベクトルについての対象画像の座標値と移動後の座標値の誤差の最小二乗解として算出すると、式（２５）のように求まる。 That is, if the affine matrix A in the equation (24) is obtained, it becomes the amount of displacement of the entire screen. When the affine matrix is calculated as a least squares solution of the error between the coordinate value of the target image and the coordinate value after movement for n motion vectors, the affine matrix is obtained as shown in Expression (25).

ここで、有効ブロック判定方法を、図２２のフローチャートを参照して説明する。ブロックマッチングによりブロック間の相関を求めようとする場合、ブロック内の画像が何らかの特徴量を持っている必要がある。平坦でほとんど直流成分しか含んでいないブロックでは、正しい動きベクトルを求めることはできない。逆に水平方向や垂直方向にエッジを含んでいると、マッチングがとりやすくなると考えられる。図１７に示すフローチャートの判定処理は、このような平坦部のブロックを除外する一手法である。ここでは１つのブロックに対する処理で説明する。 Here, the effective block determination method will be described with reference to the flowchart of FIG. When trying to obtain a correlation between blocks by block matching, an image in the block needs to have some feature amount. A correct motion vector cannot be obtained in a flat block containing almost only a DC component. On the other hand, if the edge is included in the horizontal direction or the vertical direction, it is considered that matching becomes easy. The determination process of the flowchart shown in FIG. 17 is one method for excluding such a flat portion block. Here, the processing for one block will be described.

まずステップＳ１７０１で、ブロック内にある水平方向の１つのラインに対し、最大値と最小値の差分値を算出する。例えば、ブロックのサイズが５０×５０の画素で構成されているとすると、ブロック内における水平方向の５０の画素から最大値と最小値を求め、その差分値を算出する。これを水平ライン数分、すなわち５０回繰り返す。そして、ステップＳ１７０３で５０の差分値の中から最大の差分値を求める。ステップＳ１７０４で、あらかじめ設定したＴｘと最大差分値の比較を行う。 First, in step S1701, the difference value between the maximum value and the minimum value is calculated for one horizontal line in the block. For example, assuming that the block size is composed of 50 × 50 pixels, the maximum value and the minimum value are obtained from the 50 pixels in the horizontal direction in the block, and the difference value is calculated. This is repeated for the number of horizontal lines, that is, 50 times. In step S1703, the maximum difference value is obtained from the 50 difference values. In step S1704, the preset Tx is compared with the maximum difference value.

最大差分値が閾値Ｔｘよりも小さければ、水平方向には特徴量を持たないブロックであるとみなし、ステップＳ１７０５にて、無効ブロックとする。水平方向に特徴量を持つとみなせる場合は、垂直方向で同様の検証を行う。まず、ステップＳ１７０６で、ブロック内にある垂直方向の１つのラインに対し、最大値と最小値の差分値を算出する。つまりブロック内における垂直方向の５０の画素から最大値と最小値を求め、その差分値を算出する。これを垂直ライン数分、すなわち５０回繰り返す。そして、ステップＳ１７０８で５０の差分値の中から最大の差分値を求める。ステップＳ１７０９で、あらかじめ設定したＴｙと最大差分値の比較を行う。最大差分値が閾値Ｔｙよりも小さければ、垂直方向には特徴量を持たないブロックであるとみなし、ステップＳ１７０５にて、無効ブロックとする。水平・垂直両方向に特徴を持つブロックならば、正確なブロックマッチングが行われることが期待できるので、ステップＳ１７１０にて、有効ブロックと判定する。 If the maximum difference value is smaller than the threshold value Tx, it is regarded as a block having no feature amount in the horizontal direction, and is determined as an invalid block in step S1705. If it can be considered that the feature amount is in the horizontal direction, the same verification is performed in the vertical direction. First, in step S1706, a difference value between the maximum value and the minimum value is calculated for one vertical line in the block. That is, the maximum value and the minimum value are obtained from 50 pixels in the vertical direction in the block, and the difference value is calculated. This is repeated for the number of vertical lines, that is, 50 times. In step S1708, the maximum difference value is obtained from the 50 difference values. In step S1709, the preset Ty is compared with the maximum difference value. If the maximum difference value is smaller than the threshold value Ty, the block is regarded as a block having no feature amount in the vertical direction, and is determined as an invalid block in step S1705. If the block has features in both the horizontal and vertical directions, it can be expected that accurate block matching is performed. Therefore, in step S1710, the block is determined to be an effective block.

次に、有効動きベクトル判定方法を、図２３のフローチャートを参照して説明する。まずは、ステップＳ１８０１にて動きベクトルを入力し、ステップＳ１８０２にて、その発生頻度を算出する。ステップＳ１８０３にて、全ての動きベクトルの発生頻度が求まるまでこの処理を繰り返し、終了するとステップＳ１８０４にて、最大発生頻度の動きベクトルを求める。次に、ステップＳ１８０５にて、再度、動きベクトルを入力し、ステップＳ１８０６で、この動きベクトルが最大発生頻度の動きベクトル、もしくはその近傍の動きベクトルであるかどうかの判定を行う。 Next, the effective motion vector determination method will be described with reference to the flowchart of FIG. First, a motion vector is input in step S1801, and the occurrence frequency is calculated in step S1802. In step S1803, this process is repeated until the occurrence frequencies of all the motion vectors are obtained. When the processing is completed, the motion vector having the maximum occurrence frequency is obtained in step S1804. Next, in step S1805, a motion vector is input again. In step S1806, it is determined whether this motion vector is a motion vector with the maximum occurrence frequency or a motion vector in the vicinity thereof.

カメラ姿勢の誤差などに起因して位置ずれが生じている場合、一般に位置ずれは単純なモデルで表す事が出来るので、最大発生頻度の動きベクトルの近傍に多くの動きベクトルが発生すると考えられる。したがって、これらの値に含まれる動きベクトルは、ステップＳ１８０７にて、有効動きベクトルと判定し、これらの値から外れている動きベクトルは、ステップＳ１８０８にて無効動きベクトルと判定する。ステップＳ１８０９では、すべての動きベクトルに対して処理が終わったかどうかの判定を行い、終了までステップＳ１０１０からの処理を繰り返す。 In the case where a positional deviation occurs due to an error in the camera posture or the like, since the positional deviation can be generally expressed by a simple model, it is considered that many motion vectors are generated in the vicinity of the motion vector having the maximum occurrence frequency. Therefore, a motion vector included in these values is determined as an effective motion vector in step S1807, and a motion vector deviating from these values is determined as an invalid motion vector in step S1808. In step S1809, it is determined whether or not the processing has been completed for all the motion vectors, and the processing from step S1010 is repeated until the processing is completed.

＜仮想視点画像合成方法＞
画像合成部１１０８は、上述したように画素位置変換パラメータに基づいて各視点の位置に配置された撮像部により撮像された画像を仮想視点画像上に変換する。このようにして変換された画像データに、さらに位置合わせパラメータを作用させて補正を行って得られた画像データを、全視点にわたって重ね合わせることで仮想視点画像データを合成する。 <Virtual viewpoint image composition method>
As described above, the image composition unit 1108 converts the image captured by the imaging unit arranged at the position of each viewpoint into a virtual viewpoint image based on the pixel position conversion parameter. The virtual viewpoint image data is synthesized by superimposing the image data obtained by performing the correction by further applying the alignment parameter to the image data thus converted, over all viewpoints.

ｎ番目の撮像部の画像上の座標から仮想視点画像上の座標への変換は式（１１）で記述される射影変換行列Ｈ_nを用いて表される。また、位置合わせ部１１０７で算出した、位置合わせによる補正後の仮想視点画像上の座標からｎ番目の撮像部に対応する位置合わせによる補正前の仮想視点画像上の座標への変換を、射影変換行列Ｃ_nを用いて表す。したがって、仮想視点画像上の座標からｎ番目の撮像部の画像上の座標への変換Ｈ’_n ^-1は式（２６）のようになる。 The conversion from the coordinates on the image of the n-th imaging unit to the coordinates on the virtual viewpoint image is expressed using a projective transformation matrix H _n described by Expression (11). Further, the conversion from the coordinates on the virtual viewpoint image corrected by the alignment calculated by the alignment unit 1107 to the coordinates on the virtual viewpoint image before the correction corresponding to the n-th imaging unit is performed by projective conversion. This is expressed using a matrix C _n . Therefore, the conversion H ′ _n ⁻¹ from the coordinates on the virtual viewpoint image to the coordinates on the image of the n-th imaging unit is expressed by Expression (26).

Ｈ’_n ^-1を用いて仮想視点画像上の各画素に対応する各撮像部により撮像された画像上の画素を特定し、バイリニアやバイキュービックなどの補間によって画素値を算出することにより、各撮像部により撮像された画像を仮想視点画像上に変換した画像が得られる。それらを全視点にわたって足し合わせて平均化することで仮想視点画像が得られる。なお、ここでは位置合わせパラメータを、射影変換行列を用いて記述するものとしたが、並進変換、アフィン変換、高次の多項式変換などどのようなモデルを用いても構わない。 By using H ′ _n ⁻¹ to identify a pixel on the image captured by each imaging unit corresponding to each pixel on the virtual viewpoint image and calculating a pixel value by interpolation such as bilinear or bicubic, An image obtained by converting an image captured by the imaging unit onto a virtual viewpoint image is obtained. A virtual viewpoint image is obtained by adding and averaging them over all viewpoints. Here, the alignment parameter is described using a projective transformation matrix, but any model such as translational transformation, affine transformation, and higher-order polynomial transformation may be used.

以上、本実施例によれば、多視点画像から有限な被写界深度を有する仮想視点画像を合成する際に、与えられたカメラパラメータに実際の値との間でずれがあった場合でも、合焦領域のぼけを低減しながら合成が可能となる。 As described above, according to the present embodiment, when a virtual viewpoint image having a finite depth of field is synthesized from a multi-viewpoint image, even when there is a deviation from an actual value in a given camera parameter, Combining is possible while reducing blur in the in-focus area.

［実施例２］
実施例１では位置合わせ部１１０７において単一の変形モデルを用い、異なる視点から撮像された画像上の合焦領域が、仮想視点画像上で重なるように位置合わせを行うことで、位置合わせパラメータを算出した。しかし、合焦領域が画像上の極めて狭い領域に分布していた場合、モデルのパラメータの推定が不安定になり、特に画像上の合焦領域が存在する位置から離れた位置にある画素が、補正によって意図しない位置に変換されてしまう現象が起こり得る。そこで、本実施例２では合焦領域の画像上での分布に従って、位置合わせ部１１０７で用いる変形モデルを切り替えることにより柔軟な補正を可能とする。すなわち、狭い範囲に合焦点領域が分布している場合には安定性を重視した変形モデルを用い、広い範囲に合焦点領域が分布している場合にはより高い補正精度を期待できる変形モデルを採用し、いずれの場合にも適切な補正を可能にする。 [Example 2]
In the first embodiment, the alignment unit 1107 uses a single deformation model, and performs alignment so that focused regions on images taken from different viewpoints overlap on the virtual viewpoint image. Calculated. However, if the in-focus area is distributed in a very narrow area on the image, the estimation of the model parameters becomes unstable, and in particular, pixels located at a position away from the position where the in-focus area exists on the image, A phenomenon may occur in which the position is converted to an unintended position by the correction. Therefore, in the second embodiment, flexible correction can be performed by switching the deformation model used in the alignment unit 1107 according to the distribution of the focused region on the image. In other words, a deformation model that emphasizes stability is used when the in-focus area is distributed in a narrow range, and a deformation model that can be expected to have higher correction accuracy when the in-focus area is distributed in a wide area. Adopted and allows appropriate correction in any case.

図１９は本実施例の画像処理部２１４の構成例を表すブロック図である。本実施例の構成は、実施例１の画像処理部に対して、位置合わせ方法選択部１４０１を加えた構成となっている。位置合わせ方法選択部１４０１は、詳細は後述するが、合焦領域抽出部１１０６が抽出した、合焦領域の画像上での分布に従い、位置合わせ部１１０７で用いる変形モデルを選択する。 FIG. 19 is a block diagram illustrating a configuration example of the image processing unit 214 of the present embodiment. The configuration of the present exemplary embodiment is a configuration in which an alignment method selection unit 1401 is added to the image processing unit of the first exemplary embodiment. Although the details will be described later, the alignment method selection unit 1401 selects a deformation model to be used by the alignment unit 1107 according to the distribution on the image of the in-focus area extracted by the in-focus area extraction unit 1106.

図２０は、本実施例における画像合成処理の手順を示すフローチャートである。このフローチャートは、図１７に示す実施例１のフローチャートに対してステップＳ１５０１を加えられている。ステップＳ１５０１ではステップＳ１２０６で抽出した合焦領域の画像上での分布に従い、ステップＳ１２０７で位置合わせに用いる変形モデルを選択する。 FIG. 20 is a flowchart illustrating a procedure of image composition processing in the present embodiment. In this flowchart, step S1501 is added to the flowchart of the first embodiment shown in FIG. In step S1501, a deformation model used for alignment is selected in step S1207 in accordance with the distribution on the image of the in-focus area extracted in step S1206.

＜位置合わせ方法選択方法＞
位置合わせ方法選択部１４０１で行う、位置合わせの変形モデルの選択方法について説明する。一般に、位置合わせに用いる変形のモデルは、自由度が高いほど不安定であるが複雑なずれに対応でき、自由度が低いほど安定であるが単純なずれにしか対応できない。 <Positioning method selection method>
A method for selecting a deformation model for alignment performed by the alignment method selection unit 1401 will be described. In general, the deformation model used for alignment is more stable as the degree of freedom is higher, but can deal with complex deviations. The degree of freedom is lower as it is more stable, but can only deal with simple deviations.

並進変換の自由度は２、アフィン変換の自由度は６および射影変換の自由度は８であり、順に自由度が高くなる。本実施例では、この３つの変形モデルから選択を行うものとして説明する。なお、本実施例で用いる変形モデルは一例であり、変形モデルとしてはどのような変換を用いても構わない。 The degree of freedom of translational transformation is 2, the degree of freedom of affine transformation is 6, and the degree of freedom of projective transformation is 8, and the degree of freedom increases in order. In this embodiment, description will be made assuming that selection is made from these three deformation models. Note that the deformation model used in the present embodiment is an example, and any transformation may be used as the deformation model.

変形モデルは、合焦領域抽出部１１０６が抽出した合焦領域の分布に従って選択する。合焦領域抽出部１１０６が合焦領域の抽出結果として、合焦領域判定マスクｍ（ｘ，ｙ）を出力するものとする。ここでｘ，ｙは撮像部の画像上の座標である。ｍ（ｘ，ｙ）は合焦領域と判定された画素では１の値を持ち、それ以外の画素では０となっている。一般に、変形モデルの判定には１つの視点から撮像した画像の合焦領域抽出結果があれば良い。ここでは、位置合わせ部１１０７で基準視点として選択する視点を用いるものとする。合焦領域判定マスクｍ（ｘ，ｙ）を用いて、合焦領域の散らばりｐは式（２７）で算出できる。 The deformation model is selected according to the distribution of the focus area extracted by the focus area extraction unit 1106. Assume that the focus area extraction unit 1106 outputs a focus area determination mask m (x, y) as the focus area extraction result. Here, x and y are coordinates on the image of the imaging unit. m (x, y) has a value of 1 for pixels determined to be in focus, and 0 for other pixels. In general, it is sufficient to determine the deformation model if there is a focused region extraction result of an image taken from one viewpoint. Here, the viewpoint selected by the alignment unit 1107 as the reference viewpoint is used. Using the focus area determination mask m (x, y), the dispersion p of the focus area can be calculated by Expression (27).

ここで、Ｓ_x、Ｓ_yは撮像部の画像上のそれぞれｘ方向、ｙ方向の画素数であり、ｗ_x、ｗ_yはそれぞれ合焦領域の重心のｘ座標とｙ座標である。合焦領域の散らばりｐに対して閾値ε１、ε２を設定し、ｐ＜ε１ならば並進変換を、ε１≦ｐ＜ε２ならばアフィン変換を、ε２≦ｐならば射影変換を変形モデルとして選択するなどして切り替えを行う。 Here, S _x and S _y are the numbers of pixels in the x direction and y direction, respectively, on the image of the imaging unit, and w _x and w _y are the x coordinate and y coordinate of the center of gravity of the focused area, respectively. Threshold values ε1 and ε2 are set for the dispersion p in the in-focus region, and translational transformation is selected as p <ε1, affine transformation is selected as ε1 ≦ p <ε2, and projective transformation is selected as ε2 ≦ p. Etc. to switch.

以上、本実施例によれば、多視点画像から有限な被写界深度を有する仮想視点画像を合成する際に、与えられたカメラパラメータに実際の値との間でずれがあった場合にも安定した処理で合焦領域のぼけを低減しながら合成が可能となる。 As described above, according to the present embodiment, when a virtual viewpoint image having a finite depth of field is synthesized from a multi-viewpoint image, even when a given camera parameter has a deviation from an actual value, It is possible to perform composition while reducing blur in the in-focus area with stable processing.

［その他の実施例］
実施例１及び実施例２においては、単一の視点を持つ撮像部を複数用いて多視点撮像部を構成する場合について説明した。しかしながら、撮像部が図４に示す単一の撮像素子４０７を複数の領域に分割し、その分割した領域に対応する光学系を設ける構成を採用してもよい。また、そのような撮像部を複数用いてもよい。 [Other Examples]
In the first and second embodiments, the case where a multi-viewpoint imaging unit is configured using a plurality of imaging units having a single viewpoint has been described. However, a configuration in which the imaging unit divides the single imaging element 407 illustrated in FIG. 4 into a plurality of regions and provides an optical system corresponding to the divided regions may be employed. A plurality of such imaging units may be used.

また、本発明は、以下の処理を実行することによっても実現される。すなわち、上述した実施例の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。また、本発明は、複数のプロセッサが連携して処理を行うことによっても実現できるものである。 The present invention can also be realized by executing the following processing. That is, software (program) for realizing the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, etc.) of the system or apparatus reads the program. It is a process to be executed. The present invention can also be realized by a plurality of processors cooperating to perform processing.

Claims

Each of a plurality of image data captured from different preset viewpoints is combined and converted based on the position of the captured viewpoint to generate composite image data as a virtual viewpoint image captured from a virtually set viewpoint. Image synthesizing means,
An in-focus area extracting means for extracting an area corresponding to an in-focus area in the composite image data from each of the plurality of image data;
Positioning means for converting the image data so as to reduce a difference in pixel values of each region of the plurality of image data, and calculating a difference with the image data synthesized and converted based on the preset viewpoint position And
An image processing apparatus that generates combined image data by converting the entire combined image data of each of the plurality of image data using the calculated difference.

The image processing apparatus according to claim 1, wherein the entire conversion of the image data is performed by projective conversion or polynomial conversion of the image data.

3. The image processing apparatus according to claim 1, wherein a method of converting the entire image data differs depending on the shape of the area extracted by the focused area extracting unit.

The plurality of pieces of image data are a plurality of pieces of image data picked up while changing the arrangement of one image pickup device, or a plurality of pieces of image data picked up by a multi-viewpoint image pickup device including a plurality of image pickup units having different arrangements. The image processing apparatus according to any one of claims 1 to 3.

Each of a plurality of image data captured from different preset viewpoints is combined and converted based on the position of the captured viewpoint to generate composite image data as a virtual viewpoint image captured from a virtually set viewpoint. Image compositing hand step,
An in-focus area extracting step for extracting an area corresponding to an in-focus area in the composite image data from each of the plurality of image data;
An alignment step of converting the image data so as to reduce a difference in pixel values of each region of the plurality of image data, and calculating a difference from the image data synthesized and converted based on the position of the preset viewpoint And
An image processing method comprising: generating the composite image data by converting the entire composite image data of each of the plurality of image data using the calculated difference.

6. The image processing method according to claim 5, wherein the image processing is performed by conversion of the entire image data, projective conversion of image data, or polynomial conversion.

The method of converting the entire image data is as follows:
7. The image processing method according to claim 5, wherein the image processing method differs depending on the shape of the area extracted by the focus area extracting means.

The plurality of image data are:
8. A plurality of pieces of image data picked up while changing the arrangement of one image pickup apparatus, or a plurality of pieces of image data picked up by a multi-viewpoint image pickup apparatus including a plurality of image pickup units having different arrangements. The image processing method according to claim 1.

A program for causing a computer to function as the image processing apparatus according to any one of claims 1 to 4.