JP2009111921A

JP2009111921A - Image processing device and image processing method

Info

Publication number: JP2009111921A
Application number: JP2007284423A
Authority: JP
Inventors: Yosuke Bando; 洋介坂東; Tomoyoshi Nishida; 友是西田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2007-10-31
Filing date: 2007-10-31
Publication date: 2009-05-21

Abstract

PROBLEM TO BE SOLVED: To provide an image processing device and an image processing method capable of extracting a foreground with a simple technique. SOLUTION: The image processing device comprises: an estimating part 11 for estimating a state of defocus of each image data out of a plurality of image data about an identical object; a dividing part 12 for dividing the background area and the foreground area of each of the image data based on the estimated result by the estimating part 11; and an extracting part 16 for extracting the foreground of each of the image data in response to the divided result 31 of the dividing part 12. Each of the image data is the image focused on the foreground compared with the background. COPYRIGHT: (C)2009,JPO&INPIT

Description

この発明は、画像処理装置及び画像処理方法に関するもので、例えば画像に含まれる前景を抽出する方法に関するものである。 The present invention relates to an image processing apparatus and an image processing method, for example, a method for extracting a foreground included in an image.

従来から、画像における前景領域を抽出する技術が研究されている。本技術は例えば、カメラで撮影した画像において、背景を取り除いて俳優のみを切り出し、切り出した俳優の画像を別の背景に合成する等の目的で使用されている。 Conventionally, techniques for extracting a foreground region in an image have been studied. For example, the present technology is used for the purpose of, for example, extracting an actor only by removing a background from an image taken by a camera, and combining the extracted actor image with another background.

映画産業で使用されている標準的な前景抽出法は、背景を例えば青一色とし、前景物体の色には青が含まれないことを仮定して前景部を抽出するというものである（例えば特許文献１、２参照）。また、異なる２色（例えば青と緑）の背景の前でそれぞれ撮影した２枚の画像を使用することで、前景物体の色には特定の色が含まれないという仮定を取り除く方法も提案されている（例えば特許文献３参照）。しかし、これらの手法であると、撮影場所がスタジオなど背景の制御が容易な環境に限られ、また抽出精度を向上させることが困難であった。 The standard foreground extraction method used in the movie industry is to extract the foreground part on the assumption that the background is, for example, a single blue color and that the color of the foreground object does not include blue (for example, patents). References 1 and 2). Also proposed is a method of eliminating the assumption that the foreground object color does not include a specific color by using two images taken in front of two different colors (eg blue and green) respectively. (For example, refer to Patent Document 3). However, with these methods, the shooting location is limited to an environment where background control is easy, such as a studio, and it is difficult to improve the extraction accuracy.

この点、撮影した画像の各画素が前景か背景のいずれかであるという二値判別問題と考えることで、問題を大きく単純化する手法がある。しかし本手法であると、頭髪や毛皮などのシルエットの複雑な物体や、半透明部分のある物体を撮影した場合、前景が一部欠けてしまったり、元の背景色まで含んだ前景色を抽出したりする問題がある。特許文献４に開示の技術では、抽出した前景にフィルタをかけるなどしてこの問題を軽減しているが、本質的な解決とはなっていない。 In this regard, there is a technique that greatly simplifies the problem by considering it as a binary discrimination problem in which each pixel of the captured image is either the foreground or the background. However, with this method, when shooting a complex silhouette object such as hair or fur, or an object with a translucent part, the foreground color is partially extracted or the foreground color including the original background color is extracted. There is a problem to do. In the technique disclosed in Patent Document 4, this problem is reduced by filtering the extracted foreground, but it is not an essential solution.

更に、何らかのユーザインタフェースによっておおまかな前景部と背景部の情報を使用者が提供することにより、前景部を抽出する方法も多く提案されている（例えば特許文献５〜７参照）。しかし、これらの手法であると、使用者がインターフェースに習熟しなければならず、また情報を入力する手間がかかるという問題がある。そして一般的に、入力作業は試行錯誤になることが多い。 Furthermore, many methods for extracting the foreground part by providing the user with rough information on the foreground part and the background part through some user interface have been proposed (see, for example, Patent Documents 5 to 7). However, with these methods, there is a problem that the user has to become familiar with the interface and it takes time and effort to input information. In general, the input work is often trial and error.

そこで、背景が未知の状態で、人間の介入なしに自動で前景を抽出するために、同じ物体を撮影する画像の枚数を増やす手法が提案されている（例えば特許文献３、８、９、及び非特許文献１〜３参照）。しかし、これらの手法では、多数のカメラを用意すると共に、カメラの較正を行う等、非常に精密な機材が必要とされる。
米国特許６，５２５，７４１号明細書米国特許３，５９５，９８７号明細書米国特許６，３０１，３８２号明細書米国特許７，０２４，０５４号明細書米国特許６，７２１，４４６号明細書米国特許６，２８８，７０３号明細書米国特許６，１３４，３４６号明細書米国特許６，９０３，７３８号明細書米国特許７，２０６，０００号明細書 McGuire、Matusik、Pfister、Hughes、Durand著、“Defocus video matting”、ACM Trans. Graphics 24(3)、page 567-576、2005年 McGuire、Matusik、Yerazunis著、“Practical, real-time studio matting using dual imagers”、Eurographics Symposium on Rendering、page 235-244、2006年 Joshi、Matusik、Avidan著、“Natural video matting using camera arrays”、ACM Trans. Graphics 25(3)、page 779-786、2006年 Therefore, in order to automatically extract the foreground without human intervention in a state where the background is unknown, a method of increasing the number of images taken of the same object has been proposed (for example, Patent Documents 3, 8, 9, and Non-patent documents 1 to 3). However, these methods require very precise equipment such as preparing a large number of cameras and calibrating the cameras.
US Pat. No. 6,525,741 US Patent 3,595,987 US Pat. No. 6,301,382 US Patent 7,024,054 US Pat. No. 6,721,446 US Pat. No. 6,288,703 US Pat. No. 6,134,346 US Pat. No. 6,903,738 US 7,206,000 specification McGuire, Matusik, Pfister, Hughes, Durand, “Defocus video matting”, ACM Trans. Graphics 24 (3), pages 567-576, 2005 McGuire, Matusik, Yerazunis, “Practical, real-time studio matting using dual imagers”, Eurographics Symposium on Rendering, page 235-244, 2006 Joshi, Matusik, Avidan, “Natural video matting using camera arrays”, ACM Trans. Graphics 25 (3), page 779-786, 2006

この発明は、簡便な手法により前景を抽出出来る画像処理装置及び画像処理方法を提供する。 The present invention provides an image processing apparatus and an image processing method capable of extracting a foreground by a simple method.

この発明の一態様に係る画像処理装置は、同一の対象に関する複数の画像データの各々におけるぼけの状態を推定する推定部と、前記推定部における推定結果に基づいて、各々の前記画像データにおいて背景となる領域と前景となる領域とを分割する分割部と、前記分割部における分割結果に応じて、各々の前記画像データにおける前景を抽出する抽出部とを具備し、各々の前記画像データは、前記背景に比べて前記前景に焦点の合った画像である。 An image processing apparatus according to an aspect of the present invention includes: an estimation unit that estimates a blur state in each of a plurality of image data related to the same target; and a background in each of the image data based on an estimation result in the estimation unit. A dividing unit that divides the region to be the foreground and the region to be the foreground, and an extraction unit that extracts the foreground in each of the image data according to the division result in the dividing unit, each of the image data, The image is focused on the foreground compared to the background.

この発明の一態様に係る画像処理方法は、複数の第１画像データを受け取るステップと、受け取った複数の前記第１画像データの各々に含まれる複数の領域につき、複数のプロセッサがそれぞれぼけの状態を推定するステップと、前記プロセッサが、前記ぼけの状態に基づいて、前記第１画像データの各々を、背景となる領域と前景となる領域とに分割するステップと、前記プロセッサが、各々の前記第１画像データについて前記前景を抽出するステップと、前記プロセッサが、前記第１画像データと異なる第２画像データに前記前景を合成するステップとを具備し、各々の前記第１画像データは、前記背景に比べて前記前景に焦点の合った画像である。 An image processing method according to an aspect of the present invention includes a step of receiving a plurality of first image data and a state in which a plurality of processors are blurred for a plurality of regions included in each of the received plurality of first image data. Estimating each of the first image data into a background area and a foreground area based on the blur state; and Extracting the foreground for first image data; and the processor synthesizes the foreground with second image data different from the first image data, wherein each of the first image data includes the first image data The image is focused on the foreground compared to the background.

この発明によれば、簡便な手法により前景を抽出出来る画像処理装置及び画像処理方法を提供できる。 According to the present invention, it is possible to provide an image processing apparatus and an image processing method capable of extracting a foreground by a simple method.

以下、この発明の実施形態を図面を参照して説明する。この説明に際し、全図にわたり、共通する部分には共通する参照符号を付す。 Embodiments of the present invention will be described below with reference to the drawings. In the description, common parts are denoted by common reference symbols throughout the drawings.

［第１の実施形態］
この発明の第１の実施形態に係る画像処理装置及び画像処理方法について、図１を用いて説明する。図１は、本実施形態に係る画像処理システムのブロック図である。 [First Embodiment]
An image processing apparatus and an image processing method according to the first embodiment of the present invention will be described with reference to FIG. FIG. 1 is a block diagram of an image processing system according to the present embodiment.

図示するように画像処理システム１は、複数のカメラ２及び画像処理装置３を備えている。カメラ２の各々は、対象物体を撮影し、得られた画像データを画像処理装置３へ出力する。カメラ２は、同一の対象物体を異なる角度から撮影する。その際、対象物体たる前景に焦点を合わせ、その背景に関しては焦点をずらして撮影する。対象物体は、対象物体のカメラ２からの奥行きが、背景の奥行きに比べて十分近くなるよう配置される。またカメラ２は、それぞれが撮影する前景の形状が、実質的に同一となるように撮影する。言い換えれば、あるカメラ２は対象物体を正面から撮影し、別のカメラ２は後ろから撮影し、更に別のカメラ２は横から撮影する、といった撮影の仕方はしない。すなわち、全てのカメラ２が対象物体を同一方向（または略同一方向）から撮影する。従って、各カメラ２における視点間の変異は、シーンの奥行き方向に対してほぼ垂直となる。すると、前景と背景との間に奥行きの差があるため、異なる視点で撮影すると、背景に対して前景物体が移動したような複数枚の画像データが得られる。また、背景は前景よりもぼけてさえいれば良く、その程度は問題とならない。 As illustrated, the image processing system 1 includes a plurality of cameras 2 and an image processing device 3. Each of the cameras 2 captures the target object and outputs the obtained image data to the image processing device 3. The camera 2 captures the same target object from different angles. At that time, the foreground as the target object is focused, and the background is photographed with the focus shifted. The target object is arranged such that the depth of the target object from the camera 2 is sufficiently closer than the depth of the background. In addition, the camera 2 shoots so that the shape of the foreground taken by each camera is substantially the same. In other words, there is no way to shoot such that one camera 2 shoots the target object from the front, another camera 2 shoots from behind, and another camera 2 shoots from the side. That is, all the cameras 2 photograph the target object from the same direction (or substantially the same direction). Therefore, the variation between viewpoints in each camera 2 is substantially perpendicular to the depth direction of the scene. Then, since there is a difference in depth between the foreground and the background, a plurality of pieces of image data in which the foreground object is moved with respect to the background can be obtained when shooting from different viewpoints. Also, the background only needs to be more blurred than the foreground, and the degree is not a problem.

画像処理装置３は、前景抽出部１０及び画像合成部２０を備えている。前景抽出部１０は、カメラ２から与えられた複数の画像データを用いて、撮影された画像における前景を抽出する。また画像合成部２０は、前景抽出部１０で抽出された前景を、別の背景画像と合成し、合成画像データを生成する。 The image processing device 3 includes a foreground extraction unit 10 and an image synthesis unit 20. The foreground extraction unit 10 extracts a foreground in a captured image using a plurality of image data given from the camera 2. The image composition unit 20 composes the foreground extracted by the foreground extraction unit 10 with another background image to generate composite image data.

＜画像処理装置３の構成＞
次に、上記画像処理装置３の詳細について、図２を用いて説明する。図２は、画像処理装置３の構成を示すブロック図である。 <Configuration of Image Processing Device 3>
Next, details of the image processing apparatus 3 will be described with reference to FIG. FIG. 2 is a block diagram illustrating a configuration of the image processing apparatus 3.

まず、前景抽出部１０の構成につき説明する。図示するように前景抽出部１０は、ぼけ推定部１１、領域分割部１２、背景対応点計算部１３、前景対応点計算部１４、前背景色推定部１５、及びα値推定部１６を備えている。 First, the configuration of the foreground extraction unit 10 will be described. As illustrated, the foreground extraction unit 10 includes a blur estimation unit 11, a region division unit 12, a background corresponding point calculation unit 13, a foreground corresponding point calculation unit 14, a foreground / background color estimation unit 15, and an α value estimation unit 16. Yes.

ぼけ推定部１１は、カメラ２で撮影された複数の画像データを入力画像として受け取る。そして入力画像の各々における、ぼけの度合いを推定する。この推定により、各入力画像において、いずれの領域がどれだけぼけているか、という情報が得られる。この情報を、以下ぼけ量３０と呼ぶ。 The blur estimation unit 11 receives a plurality of image data captured by the camera 2 as an input image. Then, the degree of blur in each input image is estimated. By this estimation, information indicating how much of which area is blurred in each input image can be obtained. This information is hereinafter referred to as a blur amount 30.

領域分割部１２は、ぼけ推定部１１で得られたぼけ量３０に基づいて、各入力画像に対応するトライマップ（trimap）３１を作成する。トライマップとは、入力画像を、明確に前景となる領域、明確に背景となる領域、及び前景であるか背景であるかが不明な領域、の３つの領域に分割した画像である。領域分割部１２は、入力画像内におけるぼけの度合いから、上記３つの領域を推定する。 The area dividing unit 12 creates a trimap 31 corresponding to each input image based on the blur amount 30 obtained by the blur estimation unit 11. A trimap is an image obtained by dividing an input image into three regions: a clearly foreground region, a clearly background region, and a foreground or background unknown region. The region dividing unit 12 estimates the three regions from the degree of blur in the input image.

背景対応点計算部１３は、背景について、複数の入力画像間における位置あわせを行う。つまり、一方の入力画像における背景のある点（画素）が、他方の入力画像における背景のいずれの点（画素）に対応するかを把握し、その移動量を背景ワープ関数３２として得る。 The background corresponding point calculation unit 13 aligns the background between a plurality of input images. That is, it is determined which point (pixel) in the background in one input image corresponds to which point (pixel) in the background in the other input image, and the amount of movement is obtained as the background warp function 32.

前景対応点計算部１４は、前景について背景対応点計算部１３と同様の処理を行う。すなわち、前景について、複数の入力画像間における位置あわせを行う。つまり、一方の入力画像における前景のある点（画素）が、他方の入力画像における前景のいずれの点（画素）に対応するかを把握し、その移動量を前景ワープ関数３３として得る。 The foreground corresponding point calculation unit 14 performs the same processing as the background corresponding point calculation unit 13 for the foreground. That is, the foreground is aligned between a plurality of input images. In other words, the foreground warp function 33 is obtained by grasping which point (pixel) in the foreground in one input image corresponds to which point (pixel) in the foreground in the other input image.

前背景色推定部１５は、背景対応点計算部１３で得られた背景ワープ関数３２、前景対応点計算部１４で得られた前景ワープ関数３３、入力画像、及びα値（後述する）に基づき、各入力画像における背景色３４と前景色３５とを推定する。 The foreground / background color estimation unit 15 is based on the background warp function 32 obtained by the background corresponding point calculation unit 13, the foreground warp function 33 obtained by the foreground corresponding point calculation unit 14, an input image, and an α value (described later). The background color 34 and the foreground color 35 in each input image are estimated.

α値推定部１６は、各入力画像についてのα値を推定して、マスク画像３６を得る。α値とは、ある座標における前景色と背景色との混合割合を示し、［０、１］の範囲の値を取るスカラー値である。例えばαが“０”である領域は背景のみが見えており、前景が存在しないことを意味する。逆にαが“１”である領域は前景のみが見えており、背景が前景によって完全に遮蔽されていることを意味する。このα値を入力画像全体につき求めて得られる画像がマスク画像である。なお、入力画像の各点におけるα値が求まれば十分であり、マスク画像という画像自体を作成する必要は必ずしも無い。α値推定部１６は、まず領域分割部１２で得られたトライマップ３１に基づいて、α値の初期値を得る。その後は、前景対応点計算部１４で得られた前景ワープ関数３３、並びに前背景色推定部１５で得られた背景色３４及び前景色３５を用いて、α値を推定する。 The α value estimation unit 16 estimates the α value for each input image to obtain a mask image 36. The α value indicates a mixing ratio of the foreground color and the background color at a certain coordinate, and is a scalar value that takes a value in the range [0, 1]. For example, an area where α is “0” means that only the background is visible and there is no foreground. Conversely, in the region where α is “1”, only the foreground is visible, which means that the background is completely blocked by the foreground. An image obtained by obtaining this α value for the entire input image is a mask image. It is sufficient that the α value at each point of the input image is obtained, and it is not always necessary to create an image itself called a mask image. The α value estimating unit 16 first obtains an initial value of the α value based on the trimap 31 obtained by the region dividing unit 12. Thereafter, the α value is estimated using the foreground warp function 33 obtained by the foreground corresponding point calculation unit 14 and the background color 34 and foreground color 35 obtained by the foreground / background color estimation unit 15.

次に、画像合成部２０の構成について説明する。図示するように画像合成部２０は、メモリ２１及び合成部２２を備えている。 Next, the configuration of the image composition unit 20 will be described. As illustrated, the image composition unit 20 includes a memory 21 and a composition unit 22.

メモリ２１は、新背景４０を保持する。新背景４０とは、入力画像とは異なる、新たな背景となる画像データのことである。 The memory 21 holds a new background 40. The new background 40 is image data serving as a new background that is different from the input image.

合成部２２は、α値推定部１６で得られたマスク画像３６（α値）、前景色３５、及び新背景４０を用いて、入力画像における前景と、新背景４０とを合成する。これにより、入力画像における背景が新背景４０に置き換えられた合成画像４１が得られる。 The synthesizing unit 22 synthesizes the foreground and the new background 40 in the input image using the mask image 36 (α value) obtained by the α value estimating unit 16, the foreground color 35, and the new background 40. As a result, a composite image 41 in which the background in the input image is replaced with the new background 40 is obtained.

＜画像処理装置３の動作＞
＜＜前景抽出の概念について＞＞
まず、本実施形態に係る画像処理装置３における前景抽出の大まかな概念について、以下説明する。 <Operation of Image Processing Device 3>
<< About the concept of foreground extraction >>
First, the general concept of foreground extraction in the image processing apparatus 3 according to the present embodiment will be described below.

前景抽出、あるいはマッティング(matting)においては、入力画像Ｉは、前景色Ｆ（図２で説明した前景色３５）と背景色Ｂ（図２で説明した背景色３４）の線形混合（linear blending)によって生成された結果であるとして、以下の式が仮定される。

In the foreground extraction or matting, the input image I is a linear blending of the foreground color F (the foreground color 35 described in FIG. 2) and the background color B (the background color 34 described in FIG. 2). ) Is assumed to be the result generated by

ここでxは画像上の座標（画素の位置）を表す二次元ベクトルである。従って、Ｉ(x)などの表記は、入力画像Ｉの点xにおける値を表し、グレースケールであればＩ(x)はスカラー値、ＲＧＢ画像であればＩ(x)は三次元ベクトルとなる。前景色Ｆ及び背景色Ｂについても同様である。従ってαもxの関数となり、その値はα(x)として表される。前述の通り、α(x)は座標xにおける前景色Ｆ(x)と背景色Ｂ(x)との混合割合を示す。従って、α(x)が０の画素は背景のみが見えており、α(x)が１の画素は前景のみが見えている。またα(x)が中間の値（０＜α(x)＜１）をとる場合は、注目画素において前景が背景の一部を遮蔽していることを意味する。この部分的な遮蔽が起こるのには２つの原因がある。１つは、画素の中を前景と背景の境界線（前景物体の輪郭）が通っており、前景色と背景色が同じ画素に寄与する場合である。もう１つは、前景物体が半透明であるため、背景色が透けて見える場合である。αはこの２つの効果を統合的に表しており、前景抽出においてはこの２つの効果を区別しない。このα(x)を、前景マスクあるいはマット（matte）と呼ぶ。 Here, x is a two-dimensional vector representing coordinates (pixel positions) on the image. Therefore, a notation such as I (x) represents a value at the point x of the input image I. If grayscale, I (x) is a scalar value, and if it is an RGB image, I (x) is a three-dimensional vector. . The same applies to the foreground color F and the background color B. Therefore, α is a function of x, and its value is expressed as α (x). As described above, α (x) indicates the mixing ratio of the foreground color F (x) and the background color B (x) at the coordinate x. Therefore, a pixel with α (x) of 0 can see only the background, and a pixel with α (x) of 1 can see only the foreground. When α (x) takes an intermediate value (0 <α (x) <1), it means that the foreground blocks a part of the background at the target pixel. There are two causes for this partial occlusion. One is a case where a boundary line between the foreground and the background (the outline of the foreground object) passes through the pixel, and the foreground color and the background color contribute to the same pixel. The other is a case where the background color can be seen through because the foreground object is translucent. α represents the two effects in an integrated manner, and the two effects are not distinguished in foreground extraction. This α (x) is called a foreground mask or matte.

前景抽出の目的は、与えられた入力画像Ｉ(x)から、マットα(x)、前景色Ｆ(x)、及び背景色Ｂ(x)を求めることである。これが達成されれば、新たな背景Ｂ’(x)に前景部を合成した画像Ｉ’(x)は以下の式で得られる。

The purpose of foreground extraction is to obtain a mat α (x), foreground color F (x), and background color B (x) from a given input image I (x). If this is achieved, an image I ′ (x) obtained by synthesizing the foreground part with the new background B ′ (x) is obtained by the following equation.

前景抽出問題は、式(1)のみでは、求めたい値の数に対して制約が過少であり、解が無数に存在する。特にＲＧＢ画像の場合には、各画素xについて制約式の数は３本（三次元ベクトルなので）の式(1)であるのに対し、未知数はα、Ｆ、Ｂの７つである（Ｆ、Ｂはそれぞれ三次元ベクトル）。そこで本実施形態では、複数枚の画像と画像のぼけを利用することで、解空間を大幅に狭める。 In the foreground extraction problem, the equation (1) alone has too few constraints on the number of values to be obtained, and there are an infinite number of solutions. Particularly in the case of an RGB image, the number of constraint equations for each pixel x is three (because it is a three-dimensional vector), whereas the unknowns are seven of α, F, and B (F , B are three-dimensional vectors). Therefore, in this embodiment, the solution space is significantly narrowed by using a plurality of images and image blur.

＜＜画像処理方法の流れについて＞＞
次に、上記概念に基づく画像処理方法の流れについて、図３を用いて説明する。図３は、本実施形態に係る画像処理装置３における、入力画像から合成画像４１を得るための処理を示すフローチャートである。 << Flow of image processing method >>
Next, the flow of the image processing method based on the above concept will be described with reference to FIG. FIG. 3 is a flowchart showing a process for obtaining the composite image 41 from the input image in the image processing apparatus 3 according to the present embodiment.

図示するように、まず画像処理装置３は、背景をぼかし且つ異なる角度から対象物体を撮影することにより得られた複数の画像データを、入力画像として受け取る（ステップＳ１０）。 As shown in the drawing, first, the image processing device 3 receives a plurality of image data obtained by blurring the background and photographing the target object from different angles as an input image (step S10).

すると、ぼけ推定部１１が複数の入力画像につき、ぼけを推定する。これにより、各入力画像の各画素の周辺が、どの程度ぼけているかを示すぼけ量３０が得られる（ステップＳ１１）。 Then, the blur estimation unit 11 estimates blur for a plurality of input images. Thereby, a blur amount 30 indicating how much the periphery of each pixel of each input image is blurred is obtained (step S11).

次に領域分割部１２が複数の入力画像につき、ぼけ量３０を用いて領域分割を行い、trimapを得る（ステップＳ１２）。すなわち、領域分割部１２は、各入力画像における各画素を、「確実に背景である領域」、「確実に前景である領域」、及び「不明な領域」の３領域のいずれかに分類する。 Next, the region dividing unit 12 performs region division using a blur amount 30 for a plurality of input images to obtain a trimap (step S12). That is, the region dividing unit 12 classifies each pixel in each input image into one of three regions: “a region that is definitely a background”, “a region that is definitely a foreground”, and “an unknown region”.

次にα値推定部１６が、ステップＳ１２で得られたトライマップ３１から、α値の初期値を推定する（ステップＳ１３）。これにより、マスク画像３６が得られる。 Next, the α value estimating unit 16 estimates an initial value of the α value from the trimap 31 obtained in step S12 (step S13). Thereby, the mask image 36 is obtained.

次に、背景対応点計算部１３及び前景対応点計算部１４が、マスク画像と入力画像とを用いて、複数の入力画像間における対応位置関係を計算する（ステップＳ１４）。すなわち、入力画像から２枚を選んでペアを作った際の、各ペアについて、前景領域と背景領域との間の対応点をそれぞれ計算する。言い換えれば、一方の入力画像において確実に背景である領域における画素が、他方の入力画像において確実に背景である領域におけるいずれの画素に対応するのかを計算する。前景も同様であり、一方の入力画像において確実に前景である領域における画素が、他方の入力画像において確実に前景である領域におけるいずれの画素に対応するのかを計算する。なお、ステップＳ１３の処理を行った段階では、「確実に前景である領域」が無い場合がある。本実施形態では、このような場合について以下詳細に説明し、ステップＳ１３において「確実に前景である領域」を求める方法については、第３の実施形態において説明する。入力画像の撮影条件から、前景と背景とで、物体の移動量が異なるので、前景対応点と背景対応点との２つを計算する。この計算結果から、一方の入力画像における画素が、他方の入力画像においてどれだけ移動しているかを示す前景ワープ関数３３及び背景ワープ関数３２が得られる（ステップＳ１５）。 Next, the background corresponding point calculation unit 13 and the foreground corresponding point calculation unit 14 calculate the corresponding positional relationship between the plurality of input images using the mask image and the input image (step S14). That is, when two images are selected from the input image to create a pair, corresponding points between the foreground region and the background region are calculated for each pair. In other words, it is calculated which pixel in the region that is the background reliably in one input image corresponds to which pixel in the region that is the background in the other input image. The same applies to the foreground, and it is calculated which pixel in the region that is surely the foreground in one input image corresponds to the pixel in the region that is definitely the foreground in the other input image. It should be noted that there is a case where there is no “area that is definitely the foreground” at the stage where the process of step S13 is performed. In the present embodiment, such a case will be described in detail below, and a method for obtaining “a region that is surely a foreground” in step S13 will be described in a third embodiment. Since the amount of movement of the object differs between the foreground and the background from the shooting conditions of the input image, two of the foreground corresponding point and the background corresponding point are calculated. From this calculation result, a foreground warp function 33 and a background warp function 32 indicating how much the pixels in one input image have moved in the other input image are obtained (step S15).

次に、前背景色推定部１５が、現在得られている前景ワープ関数３３、背景ワープ関数３２、及びマスク画像に基づいて、各入力画像における前景色３５と背景色３４とを推定する（ステップＳ１６）。 Next, the foreground color estimation unit 15 estimates the foreground color 35 and the background color 34 in each input image based on the currently obtained foreground warp function 33, background warp function 32, and mask image (step) S16).

次にα値推定部１６が、ステップＳ１６で得られた前景色と背景色、及びステップＳ１５で得られた前景ワープ関数とに基づいて、各入力画像におけるα値を推定し、マスク画像３６を得る（ステップＳ１７）。すなわち、それまでに得られたα値を更新する。 Next, the α value estimation unit 16 estimates the α value in each input image based on the foreground and background colors obtained in step S16 and the foreground warp function obtained in step S15, and creates a mask image 36. Obtain (step S17). That is, the α value obtained so far is updated.

引き続きα値推定部１６は、α値が収束したか否かを判定する（ステップＳ１８）。つまり、ステップＳ１７におけるα値の更新の前後で、α値の変化量が十分に小さいか否かを判定する。そして、変化量が大きければ、α値は未だ収束していないと判断し（ステップＳ１９、ＮＯ）、ステップＳ１４に戻る。逆に変化量が十分に小さければ、α値は収束したと判断し（ステップＳ１９、ＹＥＳ）、ステップＳ２０に進む。 Subsequently, the α value estimation unit 16 determines whether or not the α value has converged (step S18). That is, it is determined whether or not the change amount of the α value is sufficiently small before and after the update of the α value in step S17. If the amount of change is large, it is determined that the α value has not yet converged (NO in step S19), and the process returns to step S14. Conversely, if the amount of change is sufficiently small, it is determined that the α value has converged (step S19, YES), and the process proceeds to step S20.

ステップＳ２０において合成部２２は、得られた前景色とマスク画像とを用いて、入力画像の前景と、新背景４０とを合成する。これにより、合成画像４１が得られる。 In step S20, the synthesizing unit 22 synthesizes the foreground of the input image and the new background 40 using the obtained foreground color and the mask image. Thereby, the composite image 41 is obtained.

＜＜各ステップの詳細について＞＞
次に、図３における各ステップの詳細について説明する。以下、Ｍ（Ｍは２以上の自然数）枚の入力画像のそれぞれを、入力画像Ｉi(x)と呼ぶことにする（ｉは０〜（Ｍ−１）のいずれか）。 << Details of each step >>
Next, details of each step in FIG. 3 will be described. Hereinafter, each of M (M is a natural number of 2 or more) input images will be referred to as an input image Ii (x) (i is one of 0 to (M-1)).

（ステップＳ１１：ぼけ推定）
まず、ステップＳ１１におけるぼけ推定について説明する。入力画像Ｉi(x)は、前景に焦点を合わせ且つ背景をぼかして撮影されたものであるので、背景部Ｂi(x)のみがぼけていると考えられる。そしてカメラのレンズによるぼけは、カメラの開口部の形状に相似なぼけ関数ｈ(x; r)による畳み込みでモデル化できる。rはぼけ関数のサイズを表し、ぼけ量３０に相当する。よって、ぼけを含まない背景色をＣi(x)とすると、式(1)は下式のように書き換えられる。

(Step S11: blur estimation)
First, the blur estimation in step S11 will be described. Since the input image Ii (x) was taken with the foreground focused and the background blurred, it is considered that only the background portion Bi (x) is blurred. The blur due to the camera lens can be modeled by convolution with a blur function h (x; r) similar to the shape of the opening of the camera. r represents the size of the blur function and corresponds to a blur amount of 30. Therefore, when the background color not including blur is Ci (x), Equation (1) can be rewritten as the following equation.

ここで“*”は畳み込みを表す。式(3)は、ぼけ量が背景の全領域で一定であることを仮定した式であるが、ぼけ量が変化していてもそれがスムーズであれば、局所的には式(3)でモデル化できる。本実施形態では背景が十分遠い状態で撮影するので、ぼけ量の変化はスムーズであると考えてよい。 Here, “*” represents convolution. Equation (3) is an equation that assumes that the amount of blur is constant over the entire area of the background.However, if the amount of blur changes but is smooth, it can be locally expressed by Equation (3). Can be modeled. In the present embodiment, since the background is sufficiently far away, the blur amount change may be considered to be smooth.

なお、画像にぼけている領域とぼけていない領域があることと、ぼけ量が一様でない可能性があることから、非一様なぼけ推定をする必要がある。しかし、画素１点だけでは、ぼけは判断できない。そこで、各画素xの周りに小さな窓（通常は正方形）を設定し、その窓に対して一様なぼけを仮定した推定を行う。この様子を図４に示す。 It should be noted that non-uniform blur estimation needs to be performed because there is a blurred region and a non-blurred region in the image and the amount of blur may not be uniform. However, blur cannot be determined by only one pixel. Therefore, a small window (usually a square) is set around each pixel x, and estimation is performed assuming a uniform blur for the window. This is shown in FIG.

図４は入力画像の模式図であり、ぼけ推定の際に設定される窓を示す図である。図示するように、着目している画素（図４ではx = x1, x2, x3を示す）を中心に、所定の面積を有する窓ＷＮＤ（図４ではＷＮＤ１、ＷＮＤ２、ＷＮＤ３を示す）を設定する。ぼけ推定の手法には、例えば局所フーリエ変換(local Fourier transform)による方法等が使用出来る。 FIG. 4 is a schematic diagram of an input image, and is a diagram showing a window set in blur estimation. As shown in the drawing, a window WND (WND1, WND2, and WND3 are shown in FIG. 4) having a predetermined area is set around the pixel of interest (in FIG. 4, x = x1, x2, and x3 are shown). . As a blur estimation method, for example, a method using a local Fourier transform can be used.

式(3)に示されるように、窓の中で一様なぼけｈ(x; r)が仮定できるためには、窓の中が全てα=１（全て背景）でなければならない（図４における窓ＷＮＤ１）。ぼけ推定法は、窓内の画像の周波数成分にぼけ関数ｈ(x; r)が及ぼした影響を抽出する。よって、窓ＷＮＤ３のようにαが変化し、前景の影響が含まれる場合には、ぼけ関数の効果が隠蔽されてしまう。従って、窓ＷＮＤ３の領域は、全て前景部である窓ＷＮＤ２の領域と同様に、ぼけが無いと推定されることになる。 As shown in Equation (3), in order to be able to assume a uniform blur h (x; r) in the window, all the windows must be α = 1 (all background) (FIG. 4). In window WND1). The blur estimation method extracts the influence of the blur function h (x; r) on the frequency component of the image in the window. Therefore, when α changes like the window WND3 and the influence of the foreground is included, the effect of the blur function is concealed. Accordingly, the area of the window WND3 is estimated to have no blur as in the area of the window WND2 that is the foreground part.

（ステップＳ１２：領域分割）
次に、ステップＳ１２の領域分割について説明する。 (Step S12: area division)
Next, the area division in step S12 will be described.

領域分割部１２は、ぼけ推定によりぼけがあると推定された領域を、「確実に背景である領域」に分類する。ぼけ推定で説明したように、窓がα＞０の領域に重なる点ではぼけが無いと判断される。従って、「確実に背景である領域」は、前景物体から窓のサイズ分離れた控えめな見積もりとなる。この様子を図５に示す。 The region dividing unit 12 classifies the region estimated to be blurred by blur estimation as “a region that is definitely a background”. As described in the blur estimation, it is determined that there is no blur at the point where the window overlaps the region of α> 0. Therefore, the “region that is definitely the background” is a conservative estimate that is separated from the foreground object by the size of the window. This is shown in FIG.

図５は、入力画像の模式図であり、領域分割を行った結果について示しており、斜線を付した領域が確実に背景となる領域である。図示するように、確実に背景となる領域の端部は、前景物体の輪郭部分から、ほぼ窓のサイズだけ離れている。 FIG. 5 is a schematic diagram of the input image, showing the result of the area division, and the hatched area is the area that reliably becomes the background. As shown in the figure, the edge of the region that is surely the background is separated from the contour portion of the foreground object by approximately the size of the window.

また領域分割部１２は、「確実に背景」以外の領域を「不明な領域」とする。従って図５では、斜線を付していない領域の全てが「不明な領域」となる。このようにして、図５に示すようなトライマップが得られる。なお「トライマップ」とは、３つの領域を含む画像のことであるが、この時点では「確実に前景である領域」が存在しないため、「確実に背景である領域」と「不明な領域」との２つの領域のみが存在する。「確実に前景となる領域」は、以後のステップＳ１４以降の処理における反復計算によって現れる。 The area dividing unit 12 sets an area other than “certainly background” as an “unknown area”. Accordingly, in FIG. 5, all of the areas not hatched are “unknown areas”. In this way, a trimap as shown in FIG. 5 is obtained. Note that a “trimap” is an image that includes three areas, but since there is no “area that is definitely the foreground” at this point, “area that is definitely the background” and “area that is unknown” There are only two areas. The “region that is surely the foreground” appears by the iterative calculation in the processing after step S14.

ステップＳ１２の具体例について、図６及び図７を用いて説明する。図６は入力画像Ｉ１（i=1）、Ｉ２（i=2）の模式図であり、図７は入力画像Ｉ１、Ｉ２のトライマップを示す模式図である。 A specific example of step S12 will be described with reference to FIGS. FIG. 6 is a schematic diagram of the input images I1 (i = 1) and I2 (i = 2), and FIG. 7 is a schematic diagram showing a trimap of the input images I1 and I2.

図６に示すように、例えば山を背景に家の写真を２つの位置から撮影したとする。この際、前景である家には焦点が合っており、背景である山、雲、空はぼけている。すると、図６から得られるトライマップは図７のようになる。すなわち、家の輪郭から、ぼけ推定時に使用した窓サイズだけ拡がった領域が「不明な領域」とされ、その他の領域は「確実に背景（斜線を付した領域）」とされる。図７では家の周囲を除いて全面的に「確実に背景」領域が設定された状態を模式的に示したが、背景に模様（テクスチャ）がないなどの原因でボケ推定ができず、「不明」として残る領域が存在する可能性がある。 As shown in FIG. 6, it is assumed that, for example, pictures of a house are taken from two positions against a mountain background. At this time, the foreground house is in focus, and the background is mountains, clouds, and sky. Then, the trimap obtained from FIG. 6 is as shown in FIG. That is, an area expanded from the outline of the house by the window size used at the time of blur estimation is set as “unknown area”, and the other areas are set as “reliably background (hatched area)”. FIG. 7 schematically shows a state where the “reliable background” area is set on the entire surface except for the periphery of the house. However, the blur cannot be estimated due to the absence of a pattern (texture) in the background. There may be areas that remain as “unknown”.

（ステップＳ１３：α値初期値の推定）
次にステップＳ１３の、α値の初期値の推定について説明する。 (Step S13: Estimation of α value initial value)
Next, estimation of the initial value of α value in step S13 will be described.

α値推定部１６は、ステップＳ１２で得られたトライマップを用いて、α値の初期値を生成する。α値推定部１６は、「確実に背景となる領域」についてはα＝０とし、「不明な領域」についてはα＝０．５とする。その結果得られるマスク画像を図８に示す。図８は、図７に示すトライマップから得られたマスク画像の模式図である。図示するように、家の輪郭から、ぼけ推定時に使用した窓サイズだけ拡がった領域はα＝０．５とされ、その他の領域はα＝０．０「確実に背景（斜線を付した領域）」とされる。 The α value estimation unit 16 generates an initial value of the α value using the trimap obtained in step S12. The α value estimation unit 16 sets α = 0 for “a region that is surely a background” and α = 0.5 for “an unknown region”. The resulting mask image is shown in FIG. FIG. 8 is a schematic diagram of a mask image obtained from the trimap shown in FIG. As shown in the figure, the area expanded by the window size used at the time of blur estimation from the outline of the house is set to α = 0.5, and the other areas are set to α = 0.0 “reliably background (area with hatching). "

（ステップＳ１４、Ｓ１５：対応点計算とワープ関数算出）
次に、ステップＳ１４の対応点計算について説明する。以下、入力画像のペアを、それぞれＩi、Ｉjと呼ぶことにする。前述の通り、いずれもxの関数である。 (Steps S14 and S15: corresponding point calculation and warp function calculation)
Next, the corresponding point calculation in step S14 will be described. Hereinafter, a pair of input images will be referred to as Ii and Ij, respectively. As described above, both are functions of x.

ステップＳ１４では、各入力画像のペアＩi、Ｉjに含まれる前景色Ｆi、Ｆjのペアと、背景色Ｂi、Ｂjのペアそれぞれについて、入力画像Ｉi上の座標xが対応する入力画像Ｉj上の座標Ｖij(x)とＷij(x)を計算する。本計算は、前景については前景対応点計算部１４が行い、背景については背景対応点計算部１３が行う。Ｖij(x)は、前景Ｆiを前景Ｆjに一致するように変形するときのワープ関数（前景ワープ関数３３）であると考えることができる。またＷij(x)は、背景Ｂiを背景Ｂjに一致するよう変形させるワープ関数（背景ワープ関数３２）と考えることが出来る。従って、下記の関係が成立する。

In step S14, the coordinates x on the input image Ii correspond to the coordinates x on the input image Ii for each of the foreground colors Fi and Fj and the background colors Bi and Bj included in the pairs Ii and Ij of each input image. Vij (x) and Wij (x) are calculated. This calculation is performed by the foreground corresponding point calculation unit 14 for the foreground and the background corresponding point calculation unit 13 for the background. Vij (x) can be considered as a warp function (foreground warp function 33) when transforming the foreground Fi to match the foreground Fj. Wij (x) can be considered as a warp function (background warp function 32) that deforms the background Bi to match the background Bj. Therefore, the following relationship is established.

ＩjからＩi方向へのワープ関数Ｖji、Ｗjiも同様に定義できる。 Warp functions Vji and Wji from Ij to Ii can be defined similarly.

この様子を図９の具体例を用いて説明する。図９は、入力画像Ｉ１(i=1)、Ｉ２（j=2）の模式図であり、前景と背景との対応関係を求める様子を示している。図示するように、例えば入力画像Ｉ１において前景である家のある点x1が、入力画像Ｉ２のいずれの点に対応するかを計算する。この際、画素x1は、前景ワープ関数Ｖ１２により算出される画素Ｖ１２(x1)に対応することとなる。背景についても同様である。図示するように、例えば入力画像Ｉ１において背景である山のある点x0が、入力画像Ｉ２のいずれの点に対応するかを計算する。この際、画素x0は、背景ワープ関数Ｗ１２により算出される画素Ｗ１２(x0)に対応する。 This will be described with reference to a specific example of FIG. FIG. 9 is a schematic diagram of the input images I1 (i = 1) and I2 (j = 2), and illustrates how the correspondence between the foreground and the background is obtained. As shown in the figure, for example, it is calculated which point of the input image I2 corresponds to the point x1 with the house which is the foreground in the input image I1. At this time, the pixel x1 corresponds to the pixel V12 (x1) calculated by the foreground warp function V12. The same applies to the background. As shown in the figure, for example, it is calculated which point of the input image I2 corresponds to a point x0 having a mountain as a background in the input image I1. At this time, the pixel x0 corresponds to the pixel W12 (x0) calculated by the background warp function W12.

まず、背景ワープ関数Ｗijの求め方を説明する。これにはオプティカル・フロー法を用い、下式Ｅoを最小化するＷijを求める。

First, how to obtain the background warp function Wij will be described. For this, an optical flow method is used to determine Wij that minimizes the following equation Eo.

(6)式の第一項は、式(5)を極力満たすようにすることを意味する。しかし、色が似ているという情報だけでは対応点の候補が多くありうるので、第二項によりＷijがスムーズであるという制約を課す。ここでＮ(x)は画素xの近傍の画素の集合を表す。例えば隣接する上下左右の４近傍、または斜めも含めた８近傍とする。第二項を小さくすることは、Ｗij(x)の値がxの近傍yにおける値Ｗij(y)と比較して差が小さい、すなわちスムーズであることを意味する。λoはスムーズさの制約の強さを調節するパラメータである。背景が十分遠いため、ワープ関数は非常にスムーズであると考えられるので、λoは大きくとってよい。これにより解をより安定に求めることができる。 The first term of equation (6) means to satisfy equation (5) as much as possible. However, since there are many candidates for corresponding points only with information that the colors are similar, the second term imposes a constraint that Wij is smooth. Here, N (x) represents a set of pixels in the vicinity of the pixel x. For example, it is assumed that there are 4 neighboring vertical, horizontal, and 8 neighboring areas including diagonally. Decreasing the second term means that the value of Wij (x) is small compared to the value Wij (y) in the vicinity y of x, that is, smooth. λo is a parameter that adjusts the strength of the smoothness constraint. Since the background is far enough, the warp function is considered very smooth, so λo may be large. As a result, the solution can be obtained more stably.

式(6)では背景Ｂi、Ｂjが既知であることを仮定した。しかし、実際に与えられるのは入力画像Ｉi、Ｉjのみである。そこで、式(6)を以下のように変形し、入力画像のペアＩi、Ｉjに対して適用できるようにし、入力画像中の背景の割合に応じた重み付けをする。

In equation (6), it is assumed that the backgrounds Bi and Bj are known. However, only the input images Ii and Ij are actually given. Therefore, Equation (6) is modified as follows so that it can be applied to the input image pairs Ii and Ij, and weighting is performed according to the ratio of the background in the input image.

新たに導入したＳ(α)は、α＝０、すなわち背景のとき１とされる。またαがある閾値より大きくなると、前景の割合が大きいために背景の対応点計算に用いることはできないとして、０とされる。すなわち、重み付け関数である。 The newly introduced S (α) is set to 1 when α = 0, ie, the background. On the other hand, when α is larger than a certain threshold value, the ratio of the foreground is large, so that it cannot be used for the calculation of the corresponding points of the background. That is, a weighting function.

重み付け関数Ｓ（α）の一例を図１０に示す。図示するように、関数Ｓ（α）は、α＝０で最大値の１を取り、αが大きくなるにつれて低下する。これにより、入力画像Ｉi上の点xとＩj上の対応点Ｗij(x)の両方において背景の割合が高いときのみ、式(5)が考慮されることになる。但し、求める変数ＷijがＳ(α)の中にもＳ(αj(Ｗij(x)))として現れることで、式(7)を一度に最小化するのが難しくなる場合がある。このような場合は、Ｓ(αj(Ｗij(x)))を現在のＷij(x)の見積もりで固定すれば良い。これにより、式(6)と同様、従来のオプティカル・フロー法で解くことができる。解が求まったら、Ｓ(αj(Ｗij(x)))を更新して繰り返し、収束させる。 An example of the weighting function S (α) is shown in FIG. As shown in the figure, the function S (α) takes a maximum value of 1 when α = 0, and decreases as α increases. Thus, equation (5) is considered only when the background ratio is high at both the point x on the input image Ii and the corresponding point Wij (x) on Ij. However, since the desired variable Wij appears as S (αj (Wij (x))) in S (α), it may be difficult to minimize Equation (7) at a time. In such a case, S (αj (Wij (x))) may be fixed by an estimate of the current Wij (x). As a result, like the equation (6), it can be solved by the conventional optical flow method. When the solution is obtained, S (αj (Wij (x))) is updated and repeated to converge.

次に、前景ワープ関数Ｖij(x)の求め方を説明する。ステップＳ１２の領域分割の時点で「確実に前景」領域が得られている場合（前述の通り、このケースは第３の実施形態で発生する状況であり、本実施形態ではこの段階では「確実に前景となる領域」は得られない）、及びステップＳ１９までの処理を一度乃至複数回行うことによりα値の大きい領域が得られている場合は、上記背景ワープ関数と同様の求め方ができる。ただし、重み付け関数はＳ(α)の代わりに、前景であるほど重みの大きい関数Ｔ(α)を使用する。重み付け関数Ｔ（α）の一例を図１１に示す。図示するように関数Ｔ（α）は、αが大きくなるにつれて増大し、α＝１で最大値の１を取る。 Next, how to obtain the foreground warp function Vij (x) will be described. When the “foreground” region is obtained at the time of the region division in step S12 (as described above, this case is a situation that occurs in the third embodiment. In the case where a region having a large α value is obtained by performing the processing up to step S19 once or a plurality of times, the same method as the background warp function can be obtained. However, instead of S (α), the weighting function uses a function T (α) having a larger weight for the foreground. An example of the weighting function T (α) is shown in FIG. As shown in the figure, the function T (α) increases as α increases, and takes a maximum value of 1 when α = 1.

ステップＳ１２の時点で「確実に前景となる領域」が得られていない場合は、次のようにする。図５を用いて説明したように、「不明である領域」は、前景物体をおよそぼけ推定窓サイズ分の間を開けて囲っている。よって、Ｉiにおける「不明な領域」からＩjにおける「不明な領域」への移動量を求めれば、前景ワープ関数の大まかな見積もりが得られる。 In the case where the “region for surely foreground” is not obtained at the time of step S12, the following is performed. As described with reference to FIG. 5, the “unknown region” surrounds the foreground object with a gap of approximately the estimated blur window size. Therefore, if the amount of movement from the “unknown area” in Ii to the “unknown area” in Ij is obtained, a rough estimate of the foreground warp function can be obtained.

更に、入力画像Ｉi、Ｉjに、低周波成分を除去するハイパス・フィルターをかけて背景の影響を低減しても良い。そのためには、例えば図１２に示す重み付け関数Ｔ’（α）を使用しても良い。図示するように関数Ｔ’（α）は、関数Ｔ（α）に比べて、αの値が小さい場合であっても、重みを０にしない関数である。 Further, a high-pass filter that removes low-frequency components may be applied to the input images Ii and Ij to reduce the influence of the background. For this purpose, for example, a weighting function T ′ (α) shown in FIG. 12 may be used. As shown in the figure, the function T ′ (α) is a function that does not set the weight to 0 even when the value of α is smaller than the function T (α).

なお、α値の大きい領域が得られている場合にも、ハイパス・フィルターをかけることはできる。しかし、この場合には、前景の低周波成分を無視するので、対応点計算の精度が落ちる可能性がある。そこで、次のような二段階処理を行う。まず、入力画像Ｉi、Ｉjそのままで、重み付け関数Ｔ（α）を用いてＶijを求める。次に、Ｔ(α(x))＞０となるような、前景の割合が高い領域のＶij(x)を固定する。そして、残りの領域について、ハイパス・フィルターをかけたＩi、Ｉjに対して、重み付け関数Ｔ’（α）を用いてＶijを更新する。 Note that a high-pass filter can be applied even when a region having a large α value is obtained. However, in this case, since the low-frequency component of the foreground is ignored, the accuracy of the corresponding point calculation may be reduced. Therefore, the following two-stage process is performed. First, Vij is obtained using the weighting function T (α) with the input images Ii and Ij as they are. Next, Vij (x) in a region with a high foreground ratio such that T (α (x))> 0 is fixed. For the remaining regions, Vij is updated using the weighting function T '(α) for Ii and Ij subjected to the high-pass filter.

（ステップＳ１６：前背景色推定）
次に、ステップＳ１６の前背景色推定について説明する。本処理は、前背景色推定部１５によって行われる。 (Step S16: Foreground / Background Color Estimation)
Next, foreground / background color estimation in step S16 will be described. This process is performed by the foreground / background color estimation unit 15.

前背景色推定部１５は、前景及び背景の対応点情報（ワープ関数）Ｖij、Ｗijと、各画像のα値の見積もりαi(x)から、各画像の前景色Ｆiと背景色Ｂiを推定する。具体的には、以下の式Ｅcを最小化するＦi、Ｂi(i = ０、１、…（Ｍ−１）)を求める。

The foreground / background color estimation unit 15 estimates the foreground color Fi and the background color Bi of each image from the foreground and background corresponding point information (warp functions) Vij and Wij and the estimated α value αi (x) of each image. . Specifically, Fi, Bi (i = 0, 1,... (M−1)) that minimizes the following expression Ec is obtained.

式(8)は、最小化するコスト関数Ｅcが、i番目の画像についてのコスト関数Ｅc,iの和になっており、全画像について最適化することを表している。式(9)の第一項は、マッティング式(1)がi番目の画像について満たされるようにすることを意味する。この項の重み付けは、パラメータλmによって行う。第二項および第三項は、ＦiおよびＢiがスムーズであることを要求する。全体のスムーズさの調整はパラメータλfとλbによって行い、局所的なスムーズさの調整はＵf,iとＵb,iによって行う。局所的なスムーズさの調整は、例えば下式を用いる。

Equation (8) represents that the cost function Ec to be minimized is the sum of the cost functions Ec, i for the i-th image and is optimized for all images. The first term in equation (9) means that the matting equation (1) is satisfied for the i-th image. This term is weighted by the parameter λm. The second and third terms require that Fi and Bi be smooth. The overall smoothness is adjusted by parameters λf and λb, and the local smoothness is adjusted by Uf, i and Ub, i. For example, the following formula is used to adjust local smoothness.

式(10)は、αの変化が大きい領域、つまり前景から背景への遷移部では、前景・背景色ともに変化が小さいと考えられることに基づいた式である。 Expression (10) is an expression based on the fact that it is considered that the change in both the foreground and the background color is small in the region where the change of α is large, that is, in the transition part from the foreground to the background.

式(9)の第四項と第五項は、前景と背景がそれぞれ対応点間で色が似ていることを要求する。注目しているi番目の画像について、i番目を除く全ての画像jに対応点があるので、それらを総和している。これらの項の影響力はパラメータκfとκbによって行う。 The fourth and fifth terms of Equation (9) require that the foreground and background are similar in color between corresponding points, respectively. For the i-th image of interest, since all the images j except the i-th have corresponding points, they are summed up. The influence of these terms is determined by the parameters κf and κb.

式(9)において、求める変数（未知数）は、トライマップの「不明」領域におけるＦi(x)とＢi(x)である。「確実に前景」領域ではＦi(x) = Ｉi(x)と固定し、Ｂi(x)は未定義とする。「確実に背景」領域ではＢi(x) = Ｉi(x)と固定し、Ｆi(x)は未定義とする。第四項と第五項で対応点Ｖij(x)またはＷij(x)が画像外の座標になる場合と、未定義の前景・背景色を参照してしまう場合は、その項を式に含めないものとする。 In equation (9), the variables (unknown numbers) to be calculated are Fi (x) and Bi (x) in the “unknown” region of the trimap. In the “certain foreground” area, Fi (x) = Ii (x) is fixed, and Bi (x) is undefined. In the “definite background” area, Bi (x) = Ii (x) is fixed, and Fi (x) is undefined. If the corresponding points Vij (x) or Wij (x) are coordinates outside the image in the fourth and fifth terms, and if you refer to an undefined foreground / background color, include that term in the formula Make it not exist.

現在のα値が、ステップＳ１３で得られた初期値である段階では、まだα値の見積もりに信頼性がない。この場合は、λmを小さくしてαの含まれる式の効果を下げる。また、スムーズ制約の局所重み付けに、例えばＵf,i(x, y) = Ｕb,i(x, y) = １のような、αの現れない式を用いる。より簡単には、式(8)を使わず、前景色は「確実に前景」領域の色から他の領域の色を補間し、背景色は「確実に背景」領域の色から他の領域の色を補間してもよい。「確実に前景」領域が無い場合は、前景色は入力画像そのもので初期化する。 At the stage where the current α value is the initial value obtained in step S13, the estimation of the α value is not yet reliable. In this case, λm is decreased to reduce the effect of the expression including α. For the local weighting of the smooth constraint, an expression where α does not appear, such as Uf, i (x, y) = Ub, i (x, y) = 1, is used. More simply, without using Equation (8), the foreground color interpolates the color of the other area from the color of the `` definitely foreground '' area, and the background color of the other area from the color of the `` definitely background '' area Colors may be interpolated. If there is no “certain foreground” area, the foreground color is initialized with the input image itself.

式(8)は未知数であるＦi(x), Ｂi(x)に関して二次なので、最小二乗法(least-squares method)で解くことができる。より解空間を狭めるために、０≦Ｆi(x)、Ｂi(x)≦１なる制約を課して、式(8)を最小化することもできる。その場合は二次計画法(quadratic programming)を用いればよい。 Since equation (8) is quadratic with respect to the unknowns Fi (x) and Bi (x), it can be solved by the least-squares method. In order to further narrow the solution space, it is also possible to minimize Equation (8) by imposing constraints of 0 ≦ Fi (x) and Bi (x) ≦ 1. In that case, quadratic programming may be used.

さらに解空間を狭めたい場合、各画素xにおいてＦi(x)とＢi(x)のペアの取りうる値が、xの周囲からサンプリングしたＫ個（Ｋは１以上の自然数）の離散的な色のペア｛Ｆi,k(x), Ｂi,k(x)} (k = ０、１、…（Ｋ−１）)のみであるとして式(8)を最小化することもできる。i番目の画像の画素xのための前景色をサンプリングする対象領域は、i番目の画像の座標xの周囲の「確実に前景」領域と、j(≠i)番目の画像の座標Ｖij(x)の周囲で「確実に前景」領域である。背景色をサンプリングする対象領域は、i番目の画像の座標xの周囲の「確実に背景」領域と、j (≠i)番目の画像の座標Ｗij(x)の周囲で「確実に背景」領域である。この様子を図１３に示す。図１３は、入力画像Ｉ１、Ｉ２のトライマップの模式図である。 To further narrow the solution space, the possible values of Fi (x) and Bi (x) pairs at each pixel x are K discrete colors (K is a natural number of 1 or more) sampled from around x. Equation (8) can be minimized by assuming that only the pair {Fi, k (x), Bi, k (x)} (k = 0, 1,... (K-1)). The target area for sampling the foreground color for the pixel x of the i-th image is the “definitely foreground” area around the coordinate x of the i-th image and the coordinates Vij (x of the j (≠ i) -th image. ) Is a "certainly foreground" area. The target area for sampling the background color is a “reliably background” area around the coordinate x of the i-th image and a “reliably background” area around the coordinate Wij (x) of the j (≠ i) -th image. It is. This is shown in FIG. FIG. 13 is a schematic diagram of a trimap of the input images I1 and I2.

図示するように、入力画像Ｉ１の画素x1のための前景色をサンプリングする対象領域は、入力画像Ｉ１の座標x1の周囲の「確実に前景」領域と、入力画像Ｉ２の前景対応点Ｖij(x)の周囲で「確実に前景」領域である。またx1の背景色をサンプリングする対象領域は、入力画像Ｉ１の座標x1の周囲の「確実に背景」領域と、入力画像Ｉ２の座標Ｗij(x)の周囲で「確実に背景」領域である。 As shown in the figure, the target area for sampling the foreground color for the pixel x1 of the input image I1 is a “certainly foreground” area around the coordinate x1 of the input image I1, and a foreground corresponding point Vij (x ) Is a "certainly foreground" area. The target area for sampling the background color of x1 is a “reliably background” area around the coordinate x1 of the input image I1 and a “reliably background” area around the coordinate Wij (x) of the input image I2.

前背景色サンプルをそれぞれ適当な個数とったら、前景色と背景色を組み合わせてペアを作り、そのうちの重要度の高い上位Ｋ個のペアをとる。そして、{Ｆi(x), Ｂi(x)}の取りうる値の範囲がそのＫ個のペアであるとして、式(8)を最小化する。すなわち、各画素xは、Ｋ個の状態のみを取りうるものとする。このような条件の最適化には確率伝播法(belief propagation)を用いればよい。 When an appropriate number of foreground / background color samples are obtained, a pair is formed by combining the foreground color and the background color, and the top K pairs having the highest importance are taken. Then, assuming that the range of possible values of {Fi (x), Bi (x)} is the K pairs, Equation (8) is minimized. That is, each pixel x can take only K states. For the optimization of such conditions, a probability propagation method may be used.

なお、ペアの重要度には以下のようなもの、またそれらの組み合わせが考えられる。
・サンプリング点との距離Ｇd
・線形混合モデルへの適合度Ｇb
サンプリング点との距離による重要度は、前景サンプル点xfおよび背景サンプル点xbが、それぞれ注目点xに近いほどそれらのサンプルが重要であるとするものであり、以下の式で表すことができる。

Note that the importance of a pair may be as follows, or a combination thereof.
・ Distance Gd with sampling point
・ Fitness Gb to linear mixed model
The importance based on the distance to the sampling point is such that the closer the foreground sample point xf and the background sample point xb are to the point of interest x, the more important those samples are, and can be expressed by the following equation.

Where Df = | x − xf | (xf is a point on image i)
Df = | Vij (x) −xf | (xf is a point on the image j)
Db = | x−xb | (xb is a point on the image i)
Db = | Wij (x) −xb | (xb is a point on the image j)
The importance based on the degree of conformity to the linear mixed model is that the α value estimated from the foreground sample color F, the background sample color B, and the color Ii (x) of the pixel of interest (referred to as α *) is a matting equation. Those samples are important enough to satisfy (1), and can be expressed by the following equation.

ただし、α* = (Ｉi(x) − B)・(F − B) / |F − B|²である。また、“・”はベクトルの内積を表す。 However, α * = (Ii (x) −B) · (F−B) / | F−B | ² . “·” Represents an inner product of vectors.

（ステップＳ１７：α値の推定）
次に、ステップＳ１７におけるα値の推定処理について説明する。本処理は、α値推定部１６において行われる。 (Step S17: α value estimation)
Next, the α value estimation process in step S17 will be described. This process is performed in the α value estimation unit 16.

α値推定部１６は、前景の対応点情報（ワープ関数）Ｖijと、各画像の前背景色の見積もりＦi、Ｂiから、各画像のマットαiを推定する。具体的には、以下の式Ｅaを最小化するαi (i = ０、１、…（Ｍ−１）)を求める。

The α value estimation unit 16 estimates the mat αi of each image from the foreground corresponding point information (warp function) Vij and the foreground color estimates Fi and Bi of each image. Specifically, αi (i = 0, 1,... (M−1)) that minimizes the following expression Ea is obtained.

式(13)は、最小化すべきコスト関数Ｅaが、i番目の画像についてのコスト関数Ｅa,iの和になっており、全画像について最適化することを表している。式(14)の第一項はマッティング式(1)がi番目の画像について満たされるようにすることを意味する。第二項はαiがスムーズであることを要求する。全体のスムーズさの調整はパラメータλaによって行い、局所的なスムーズさの調整はＵa,iによって行う。局所的なスムーズさの調整は、例えば下式を用いる。

Equation (13) represents that the cost function Ea to be minimized is the sum of the cost functions Ea, i for the i-th image and is optimized for all images. The first term in equation (14) means that the matting equation (1) is satisfied for the i-th image. The second term requires that αi be smooth. The overall smoothness is adjusted by the parameter λa, and the local smoothness is adjusted by Ua, i. For example, the following formula is used to adjust local smoothness.

式(13)は、入力画像の変化が小さい領域では、α値も変化が小さいと考えられることに基づいた式である。 Expression (13) is an expression based on the fact that the α value is considered to be small in a region where the change of the input image is small.

また、背景がボケていることを利用して、入力画像の変化が隣接画素間で小さくても、その変化が周囲とは異なる変化であれば、重みを下げるように式(15)を変形した下式も考えられる。

Also, using the fact that the background is blurred, even if the change in the input image is small between adjacent pixels, if the change is different from the surroundings, equation (15) was modified to reduce the weight The following formula is also conceivable.

ただし sigmoid(t) = 1 / (1 + exp{−σt}) はシグモイド関数、z = y − x、ωはオフセットパラメータ、σはスケールパラメータである。ボケている背景においては画像の変化がスムーズであるので、隣接画素 x と y との間の画像の差 Ii(x) − Ii(y) に対して、より離れた画素 x-z と y+z との間の画像の差 Ii(x-z) − Ii(y+z) の比がある一定値ω（およそ2程度）以上をとることが期待される。これが満たされないところは、前景と背景の境界部である可能性があるとして、シグモイド関数によりスムーズさの制約を軽減する。 Where sigmoid (t) = 1 / (1 + exp {−σt}) is a sigmoid function, z = y − x, ω is an offset parameter, and σ is a scale parameter. Since the image changes smoothly in the blurred background, the image difference Ii (x) − Ii (y) between the adjacent pixels x and y is more distant from the pixels xz and y + z. It is expected that the ratio Ii (xz) −Ii (y + z) of the difference between the two is greater than a certain value ω (about 2). Where this is not satisfied, there is a possibility of a boundary between the foreground and the background, and the sigmoid function reduces the smoothness constraint.

式(14)の第三項は、α値が前景の対応点間で値が近いことを要求する。注目しているi番目の画像について、i番目を除く全ての画像jに対応点があるので、それらを総和している。これらの項の影響力はパラメータκaによって行う。 The third term in equation (14) requires that the α value be close to the foreground corresponding points. For the i-th image of interest, since all the images j except the i-th have corresponding points, they are summed up. The influence of these terms is determined by the parameter κa.

式(14)の第四項は、何らかの情報により、画素xが前景である可能性が高いことが分かっている場合に有効である。重みＲf,i(x)を大きく設定しておくことで、αの推定値を１に近づく方向に偏らせることができる。第四項の全体的な影響力はパラメータγfで調節する。第五項は逆に、画素xが背景である可能性が高いことが分かっているときに重みＲb,i(x)を大きくすることで、αの推定値を０に近づく方向に偏らせることができる。第五項の全体的な影響力はパラメータγbで調節する。 The fourth term of Equation (14) is effective when it is known from some information that the pixel x is likely to be the foreground. By setting the weight Rf, i (x) to be large, the estimated value of α can be biased in a direction approaching 1. The overall influence of the fourth term is adjusted by the parameter γf. Conversely, the fifth term is to bias the estimated value of α toward 0 by increasing the weight Rb, i (x) when it is known that the pixel x is likely to be the background. Can do. The overall influence of the fifth term is adjusted by the parameter γb.

例えば、もし前景物体の色が背景の色に比べて明るいことが事前に分かっているなら、画素xの輝度が高いときにＲf,i(x)を大きくし、輝度が低いときにＲb,i(x)を大きくする。これにより。推定精度が向上する可能性がある。また、前景対応点を囲むある一定範囲において複数の入力画像が似ていれば、そこは前景である可能性が高いと考えられるので、以下のような式が考えられる。このことは、背景についても同様である。Ｒf,i(x)及びＲb,i(x)は、次式で与えられる。

For example, if it is known in advance that the color of the foreground object is brighter than the background color, Rf, i (x) is increased when the luminance of the pixel x is high, and Rb, i when the luminance is low. Increase (x). By this. The estimation accuracy may be improved. In addition, if a plurality of input images are similar in a certain range surrounding the foreground corresponding point, it is considered that there is a high possibility that the input image is the foreground. The same applies to the background. Rf, i (x) and Rb, i (x) are given by the following equations.

但し、Ｌは原点を中心とした適当なサイズの円または正方形に含まれる点の集合である。 However, L is a set of points included in a circle or square of an appropriate size centered on the origin.

式(14)において、求める変数（未知数）は、トライマップの「不明」領域におけるαi(x)である。「確実に前景」領域ではαi(x) = １と固定し、「確実に背景」領域ではαi(x) = ０と固定する。第三項で対応点Ｖij(x)が画像外の座標になる場合は、その項を式に含めないものとする。 In equation (14), the variable (unknown number) to be obtained is αi (x) in the “unknown” region of the trimap. In the “definitely foreground” region, αi (x) = 1 is fixed, and in the “definitely background” region, αi (x) = 0 is fixed. When the corresponding point Vij (x) is a coordinate outside the image in the third term, the term is not included in the equation.

式(13)は、未知数であるαi(x)に関して二次なので、最小二乗法(least-squares method)で解くことができる。より解空間を狭めるために、０≦αi(x)≦１という制約を課して、式(13)を最小化することもできる。その場合は二次計画法(quadratic programming)を用いればよい。 Since Equation (13) is quadratic with respect to αi (x), which is an unknown, it can be solved by a least-squares method. In order to further narrow the solution space, the constraint of 0 ≦ αi (x) ≦ 1 can be imposed to minimize Equation (13). In that case, quadratic programming may be used.

α値の推定は、前背景色推定と合わせて、式(8)と式(13)を統合した式（Ｅc + Ｅa）を最小化していると考えることもできる。Ｆi、Ｂi、及びαiについて同時にこれを最適化しようとすると、二次式ではなくなるので最小二乗法や二次計画法は使用できないが、勾配降下法(gradient descent)などを用いて最適化することができる。 It can be considered that the estimation of the α value minimizes the expression (Ec + Ea) obtained by integrating the expressions (8) and (13) together with the foreground / background color estimation. If you try to optimize Fi, Bi, and αi at the same time, you will not be able to use the least-squares method or quadratic programming because it is not a quadratic equation, but you should optimize it using gradient descent, etc. Can do.

（ステップＳ１８、Ｓ１９：α値の収束判定）
次に、ステップＳ１８におけるα値の収束判定について説明する。本処理は、α値推定部１６において行われる。 (Steps S18 and S19: Determination of convergence of α value)
Next, α value convergence determination in step S18 will be described. This process is performed in the α value estimation unit 16.

α値推定部１６は、α値推定によるα値の更新量が十分小さければ、収束したと判断する。更新前のα値をαi(x) (i = ０、１、…（Ｍ−１）)、更新後をα⁺i(x)とすると、ある閾値θに対して以下の式が満たされたとき収束したとする。

The α value estimation unit 16 determines that the convergence has been achieved if the update amount of the α value by the α value estimation is sufficiently small. When the α value before update is αi (x) (i = 0, 1,... (M−1)) and after update is α ⁺ i (x), the following equation is satisfied for a certain threshold θ. Suppose that it has converged.

式(19)は、全画像の全ての点において、更新前と後でαの値の変化が小さいことを意味する。 Equation (19) means that the change in the value of α is small before and after the update at all points of all images.

α値が収束していない場合は、ステップＳ１４の対応点計算に戻る。このとき、トライマップを次のようにして更新する。更新後のα値が十分１に近いと推定された点、すなわち微小なεに対してα⁺i(x)＞（１−ε）となったxを新たに「確実に前景」領域に含める。更新後のα値が十分０に近いと推定された点、すなわちα⁺i(x)＜εとなったxを新たに「確実に背景」領域に含める。 If the α value has not converged, the process returns to the corresponding point calculation in step S14. At this time, the trimap is updated as follows. A point where the updated α value is estimated to be sufficiently close to 1, that is, x satisfying α ⁺ i (x)> (1-ε) for a small ε is newly included in the “certainly foreground” region. . A point where the updated α value is estimated to be sufficiently close to 0, that is, x satisfying α ⁺ i (x) <ε is newly included in the “certainly background” region.

以上により、複数の入力画像についてのα値が求まり、それぞれにつきマスク画像３６が得られる。図６に示した入力画像Ｉ１、Ｉ２についてのマスク画像を図１４に示す。図示するように、前景たる家の部分はα＝１．０となり、その他の領域はα＝０．０となる。なお図１４では、家の輪郭がはっきりとしている場合を示している。従って、０＜α＜１の領域が存在しないが、これはあくまで説明の簡単化のためである。 As described above, α values for a plurality of input images are obtained, and a mask image 36 is obtained for each of them. FIG. 14 shows mask images for the input images I1 and I2 shown in FIG. As shown in the drawing, the foreground house portion has α = 1.0, and the other regions have α = 0.0. FIG. 14 shows a case where the outline of the house is clear. Therefore, there is no region where 0 <α <1, but this is only for the sake of simplicity.

（ステップＳ２０：新背景との合成）
次に、ステップＳ２０における合成処理について説明する。 (Step S20: Composition with new background)
Next, the synthesis process in step S20 will be described.

ステップＳ２０の合成処理は、合成部２２において行われる。合成部２２は、メモリ２１から新背景４０を読み出し、新たな背景色Ｂ’(x)を得る。そして、式(22)で求めた前景色Ｆ(x)とマットα(x)、並びに新たな背景色Ｂ’(x)を、式(2)に代入することで、合成画像Ｉ’(x)を得る。この様子を図１５に示す。 The combining process in step S20 is performed in the combining unit 22. The synthesis unit 22 reads the new background 40 from the memory 21 and obtains a new background color B ′ (x). Then, by substituting the foreground color F (x), the mat α (x) and the new background color B ′ (x) obtained in Expression (22) into Expression (2), the composite image I ′ (x ) This is shown in FIG.

図１５は、新背景４０と入力画像Ｉ１の前景とを合成する様子を示す模式図である。図示するように、例えば新背景４０は、月面から見た地球の画像であったとする。これに入力画像Ｉ１の前景を合成することで、図示するように入力画像Ｉ１における家が、月面にあるような合成画像が得られる。勿論、入力画像Ｉ２における前景を用いても良い。 FIG. 15 is a schematic diagram showing how the new background 40 and the foreground of the input image I1 are combined. As shown in the figure, for example, the new background 40 is assumed to be an image of the earth viewed from the moon. By combining the foreground of the input image I1 with this, a composite image in which the house in the input image I1 is on the moon as shown in the figure is obtained. Of course, the foreground in the input image I2 may be used.

＜効果＞
上記のように、この発明の第１の実施形態に係る画像処理装置及び画像処理方法であると、下記（１）の効果が得られる。 <Effect>
As described above, the image processing apparatus and the image processing method according to the first embodiment of the present invention can achieve the following effect (1).

（１）簡便な手法により、画像データから前景を抽出出来る。
本実施形態に係る方法であると、前景抽出にあたり、複数枚の画像データを用いている。この画像データとしては、同一の対象物体に対して焦点をあわせ、且つ対象物体の背景について焦点をずらして撮影された画像データを用いている。そして、各画像データにつき、ぼけの度合いを推定することにより、α値の初期値を推定している。 (1) A foreground can be extracted from image data by a simple technique.
In the method according to the present embodiment, a plurality of pieces of image data are used for foreground extraction. As this image data, image data photographed by focusing on the same target object and shifting the focus of the background of the target object is used. Then, the initial value of the α value is estimated by estimating the degree of blur for each image data.

すなわち、前景たる対象物体と背景との間のぼけの差を推定することで、前景を抽出している。従って、撮影に使用するカメラには較正は不要となり、非常に簡便な撮影環境で、高精度な前景抽出が可能となる。また、頭髪や毛皮などシルエットの複雑な物体や、半透明部分のある物体を抽出する際に特に効果がある。このような場合にの例について、図１６（ａ）、（ｂ）、図１７（ａ）、（ｂ）、及び図１８を用いて説明する。 That is, the foreground is extracted by estimating the difference in blur between the target object that is the foreground and the background. Therefore, calibration is not necessary for the camera used for photographing, and high-precision foreground extraction can be performed in a very simple photographing environment. Further, it is particularly effective when extracting a complex object with a silhouette such as hair or fur or an object with a translucent part. An example in such a case will be described with reference to FIGS. 16A, 16B, 17A, 17B, and 18. FIG.

図１６（ａ）、（ｂ）は、同一の前景を左側及び右側から撮影した写真である。図１７（ａ）は、図１６（ａ）に対応する最終的なマスク画像であり、図１７（ｂ）は図１７（ａ）のマスク画像により抽出される前景色画像である。そして図１８は、図１７（ｂ）の前景色を、図１７（ａ）のマスク画像を用いて新たな背景に合成した合成画像である。図１６（ａ）、（ｂ）に示すように、前景物体が例えばひつじ（ぬいぐるみ）であったとする。すると、前景の輪郭はひつじの毛によって、前景と背景とが混合した領域となる。従来の方法であると、このような複雑な輪郭を有する物体の抽出は非常に困難であった。しかし、本実施形態に係る方法であると、ぼけを抽出してα値を推定するため、図１７（ａ）、（ｂ）に示すように、前景を高精度に抽出出来る。 FIGS. 16A and 16B are photographs taken of the same foreground from the left side and the right side. 17A is a final mask image corresponding to FIG. 16A, and FIG. 17B is a foreground color image extracted by the mask image of FIG. 17A. FIG. 18 is a composite image obtained by combining the foreground color of FIG. 17B with a new background using the mask image of FIG. Assume that the foreground object is, for example, a sheep (stuffed animal) as shown in FIGS. Then, the outline of the foreground becomes a region where the foreground and the background are mixed by the sheep's hair. According to the conventional method, it is very difficult to extract an object having such a complicated contour. However, in the method according to the present embodiment, blur is extracted and the α value is estimated, so that the foreground can be extracted with high accuracy as shown in FIGS.

以上のように、この発明の第１の実施形態であると、未較正のカメラによって撮影された複数の画像から、複雑な物体であっても高精度に抽出出来る。その際には、背景技術で説明したような人手による情報入力も不要である。従って、非常に簡便な手法により、物体を抽出出来る。 As described above, according to the first embodiment of the present invention, even a complex object can be extracted with high accuracy from a plurality of images taken by an uncalibrated camera. In this case, manual information input as described in the background art is also unnecessary. Therefore, an object can be extracted by a very simple method.

［第２の実施形態］
次に、この発明の第２の実施形態に係る画像処理装置及び画像処理方法について説明する。本実施形態は、上記第１の実施形態において、複数の入力画像について得られた前景につき補間を行うことによって、中間的な視点から見た前景を得るものである。以下では、第１の実施形態と異なる点についてのみ説明する。 [Second Embodiment]
Next, an image processing apparatus and an image processing method according to the second embodiment of the present invention will be described. This embodiment obtains a foreground viewed from an intermediate viewpoint by performing interpolation on the foreground obtained for a plurality of input images in the first embodiment. Hereinafter, only differences from the first embodiment will be described.

＜画像処理装置３の構成＞
図１９は、本実施形態に係る画像処理装置３のブロック図である。図示するように、本実施形態に係る画像処理装置３は、第１の実施形態で説明した図２の構成において、更に前景補間部２３を備えている。 <Configuration of Image Processing Device 3>
FIG. 19 is a block diagram of the image processing apparatus 3 according to the present embodiment. As shown in the figure, the image processing apparatus 3 according to this embodiment further includes a foreground interpolation unit 23 in the configuration of FIG. 2 described in the first embodiment.

前景補間部２３は、前景ワープ関数３３、前景色３５、及びマスク画像３６を用いて、前景色補間画像４２及びα値補間画像４３を得る。前景色補間画像４２及びα値補間画像４３は、複数の前景物体についての複数の視点の、中間的な位置から前景物体（以下これを中間的な前景、と呼ぶことがある）を見たときの前景色及びα値である。 The foreground interpolation unit 23 uses the foreground warp function 33, the foreground color 35, and the mask image 36 to obtain a foreground color interpolation image 42 and an α value interpolation image 43. The foreground color interpolation image 42 and the α value interpolation image 43 are obtained when the foreground object (hereinafter sometimes referred to as an intermediate foreground) is viewed from an intermediate position of a plurality of viewpoints with respect to a plurality of foreground objects. Foreground color and α value.

そして合成部２２は、マスク画像３６及び前景色３５を直接使用するのでは無く、前景補間部２３で得られた前景色補間画像４２とα値補間画像４３とを用いて、中間的な前景と新背景４０との合成を行う。 The synthesizing unit 22 does not directly use the mask image 36 and the foreground color 35, but uses the foreground color interpolation image 42 and the α value interpolation image 43 obtained by the foreground interpolation unit 23 to generate an intermediate foreground image. Synthesis with the new background 40 is performed.

＜画像処理装置３の動作＞
次に、上記概念に基づく画像処理方法の流れについて、図２０を用いて説明する。図２０は、本実施形態に係る画像処理装置３における、入力画像から合成画像４１を得るための処理を示すフローチャートである。 <Operation of Image Processing Device 3>
Next, the flow of the image processing method based on the above concept will be described with reference to FIG. FIG. 20 is a flowchart showing processing for obtaining the composite image 41 from the input image in the image processing apparatus 3 according to the present embodiment.

図示するように、第１の実施形態で説明したステップＳ１０〜Ｓ１８の後、α値が収束した場合（ステップＳ１９、ＹＥＳ）、ステップＳ２１の処理を行う。ステップＳ２１では、前景補間部２３において、前景色補間画像４２とα値補間画像４３の算出が行われる。その後、合成部２２において、中間的な前景と新背景４０との合成が行われる（ステップＳ２２）。 As shown in the drawing, after the steps S10 to S18 described in the first embodiment, if the α value converges (step S19, YES), the process of step S21 is performed. In step S 21, the foreground interpolation unit 23 calculates the foreground color interpolation image 42 and the α value interpolation image 43. Thereafter, the synthesis unit 22 performs synthesis of the intermediate foreground and the new background 40 (step S22).

（ステップＳ２１：前景補間）
次に、上記ステップＳ２１における前景補間処理の詳細について説明する。前景補間は、前景補間部２３において行われる。 (Step S21: Foreground interpolation)
Next, details of the foreground interpolation processing in step S21 will be described. Foreground interpolation is performed in the foreground interpolation unit 23.

前景補間部２３は、ステップＳ１０〜Ｓ１８で求まった複数の前景色Ｆi(x)とマットαi(x)を補間して、中間的な前景情報Ｆ(x)とα(x)を作る。i番目の画像の寄与率をβiとすると（Σi {βi} = １）、前景補間部２３はまず各画像のワープ関数３３をこの寄与率の重みで補間する。

The foreground interpolation unit 23 interpolates a plurality of foreground colors Fi (x) and mats αi (x) obtained in steps S10 to S18 to generate intermediate foreground information F (x) and α (x). If the contribution rate of the i-th image is βi (Σi {βi} = 1), the foreground interpolation unit 23 first interpolates the warp function 33 of each image with the weight of this contribution rate.

但し、自身へのワープは恒等変換Ｖii(x) = xとする。次に、この補間されたワープ関数Ｖi(x)を用いて、Ｆi(x)とαi(x)をワープする。この処理には、以下のようにＶi(x)の逆関数Ｖi^-1(x)を用いる。

However, warping to itself is the identity transformation Vii (x) = x. Next, using this interpolated warp function Vi (x), Fi (x) and αi (x) are warped. In this process, an inverse function Vi ⁻¹ (x) of Vi (x) is used as follows.

最後に、ワープされた前景情報を、寄与率に従って混合して補間された前景情報を得る。

Finally, the foreground information obtained by interpolating the warped foreground information according to the contribution ratio is obtained.

図１に示す画像処理システム１では、複数の入力画像は複数の視点と対応する。すると、入力画像Ｆiの視点iにおける三次元座標をＸiとすると、式(22)に示される前景情報は、下記視点Ｘから見た前景に相当する。

In the image processing system 1 shown in FIG. 1, a plurality of input images correspond to a plurality of viewpoints. Then, if the three-dimensional coordinate at the viewpoint i of the input image Fi is Xi, the foreground information shown in the equation (22) corresponds to the foreground viewed from the viewpoint X below.

以上により、補間された前景色Ｆ(x)及び補間されたα値α(x)が得られる。 Thus, the interpolated foreground color F (x) and the interpolated α value α (x) are obtained.

＜効果＞
以上のように、この発明の第２の実施形態に係る画像処理装置及び画像処理方法であると、第１の実施形態で説明した（１）の効果に加えて、下記（２）の効果が得られる。 <Effect>
As described above, the image processing apparatus and the image processing method according to the second embodiment of the present invention have the following effect (2) in addition to the effect (1) described in the first embodiment. can get.

（２）任意の視点から見た新たな前景を得ることが出来る。
本実施形態に係る構成であると、画像処理装置３は前景補間部２３を備えている。そして前景補間部２３は、複数の入力画像につき前背景色推定部１５で得られた複数の前景に基づいて、任意の視点から見た前景の前景色（前景色補間画像４２）と、それに対応したα値（α値補間画像４３）を算出している。従って、任意の視点から見た前景を得ることが出来る。すなわち、入力画像には無い新たな前景を抽出出来る。 (2) A new foreground viewed from an arbitrary viewpoint can be obtained.
In the configuration according to the present embodiment, the image processing apparatus 3 includes a foreground interpolation unit 23. Then, the foreground interpolation unit 23, based on the plurality of foregrounds obtained by the foreground / background color estimation unit 15 for a plurality of input images, the foreground foreground (foreground color interpolation image 42) viewed from an arbitrary viewpoint, and corresponding to it The calculated α value (α value interpolated image 43) is calculated. Therefore, a foreground viewed from an arbitrary viewpoint can be obtained. That is, a new foreground that is not present in the input image can be extracted.

この様子を、図２１に示す。図２１は、２枚の入力画像から抽出した前景を用いて、新たな前景を算出する様子を示す模式図である。図示するように、２枚の入力画像は前述の図６と同様であり、入力画像Ｉ１は前景たる家を、図面を記載した紙面の左側から撮影した画像であり、入力画像Ｉ２は右側から撮影したものである。この２つの家の画像から、前景補間部２３は、例えば正面から見た家の画像を得ることが出来る。 This situation is shown in FIG. FIG. 21 is a schematic diagram showing how a new foreground is calculated using the foreground extracted from two input images. As shown in the figure, the two input images are the same as those in FIG. 6 described above. The input image I1 is an image of the house that is the foreground, taken from the left side of the drawing, and the input image I2 is taken from the right side. It is what. From these two house images, the foreground interpolation unit 23 can obtain, for example, a house image viewed from the front.

［第３の実施形態］
次に、この発明の第３の実施形態に係る画像処理装置及び画像処理方法について説明する。本実施形態は、上記第１、第２の実施形態において、トライマップを得る際にユーザによって確実に前景となる領域に関する情報を与えるものである。以下では、上記第１、第２の実施形態と異なる点についてのみ説明する。 [Third Embodiment]
Next, an image processing apparatus and an image processing method according to the third embodiment of the present invention will be described. In this embodiment, in the first and second embodiments, when a trimap is obtained, information regarding a region that is surely a foreground is given by the user. Hereinafter, only differences from the first and second embodiments will be described.

本実施形態に係る構成であると、領域分割部１２は、混合領域の背景領域からの上限距離ｄ１を保持する。この情報は、例えばユーザによって予め与えられた情報である。上限距離ｄ１について、図２２を用いて説明する。図２２は、ある前景物体を撮影した入力画像に関するトライマップである。 In the configuration according to the present embodiment, the region dividing unit 12 holds the upper limit distance d1 from the background region of the mixed region. This information is, for example, information given in advance by the user. The upper limit distance d1 will be described with reference to FIG. FIG. 22 is a trimap relating to an input image obtained by photographing a foreground object.

図示するように、入力画像によっては、確実に背景となる領域の端部から、少なくとも一定の距離だけ離れた領域が確実に前景となる場合がある。つまり、確実に背景となる領域の端部から、上記一定の距離までの領域が、０＜α＜１となる。これが混合領域である。そして、上記確実に前景となる領域があることをユーザが知っている場合がある。このような場合に、上記一定の距離ｄ１をユーザが画像処理装置３に与え、領域分割部１２が一定の距離ｄ１を保持する。 As shown in the figure, depending on the input image, there is a case where a region that is at least a certain distance away from the end of the region that is reliably the background is surely the foreground. In other words, the region from the end of the region that is surely the background to the certain distance satisfies 0 <α <1. This is the mixing region. In some cases, the user knows that there is an area that is surely the foreground. In such a case, the user gives the fixed distance d1 to the image processing apparatus 3, and the area dividing unit 12 holds the fixed distance d1.

そして領域分割部１２は、第１の実施形態で説明した方法により、確実に背景となる領域を求めた後、その領域から上記距離ｄ１だけ「不明な領域」の内側の領域を、確実に前景となる領域とする。そして、確実に背景となる領域と、確実に前景となる領域との間の領域を、不明な領域とする。そして領域分割部１２は、図１３におけるステップＳ１３の処理において、「確実に背景となる領域」につきα＝０とし、「確実に前景となる領域」につきα＝１とし、「不明な領域」につきα＝０．５とする。 Then, the area dividing unit 12 reliably obtains the area as the background by the method described in the first embodiment, and then reliably identifies the area inside the “unknown area” by the distance d1 from the area. It becomes an area to become. Then, an area between a surely background area and a surely foreground area is set as an unknown area. Then, in the process of step S13 in FIG. 13, the region dividing unit 12 sets α = 0 for the “certainly background region”, sets α = 1 for the “certainly foreground region”, and sets the “unknown region”. Let α = 0.5.

また、本手法を用いれば、複数回繰り返されるステップＳ１４、Ｓ１５の最初のステップにおいて、既に「確実に前景となる領域」が得られている。従って、第１の実施形態で説明したように、前景ワープ関数を背景ワープ関数と同様に求めることが出来る。更に、重み付け関数には関数Ｔ（α）が使用出来る。 In addition, if this method is used, “a region that is surely a foreground” has already been obtained in the first step of steps S14 and S15 that are repeated a plurality of times. Therefore, as described in the first embodiment, the foreground warp function can be obtained in the same manner as the background warp function. Further, the function T (α) can be used as the weighting function.

＜効果＞
本実施形態に係る手法であると、上記第１、第２の実施形態で説明した（１）、（２）の効果に加えて、下記（３）の効果が得られる。 <Effect>
In the method according to this embodiment, the following effect (3) is obtained in addition to the effects (1) and (2) described in the first and second embodiments.

（３）前景抽出に関する処理速度を向上出来る。
本実施形態であると、ぼけ推定の後、領域分割部１２は確実に前景となる領域を把握出来る。従って、α値の初期値の段階で、画像全体に占める「不明な領域」の割合は非常に少なくなる。従って、前背景色推定部１５やα値推定部１６の負荷を軽減し、前景抽出速度を向上出来る。 (3) The processing speed related to foreground extraction can be improved.
In the present embodiment, after blur estimation, the region dividing unit 12 can reliably grasp the region that is the foreground. Accordingly, at the initial value stage of the α value, the ratio of the “unknown area” in the entire image is very small. Therefore, it is possible to reduce the load on the foreground / background color estimation unit 15 and the α value estimation unit 16 and improve the foreground extraction speed.

［第４の実施形態］
次に、この発明の第４の実施形態に係る画像処理装置及び画像処理方法について説明する。本実施形態は、上記第１乃至第３の実施形態における、ぼけ推定方法の詳細に関するものである。以下では、第１乃至第３の実施形態と異なる点についてのみ説明する。 [Fourth Embodiment]
Next, an image processing apparatus and an image processing method according to the fourth embodiment of the present invention will be described. The present embodiment relates to details of the blur estimation method in the first to third embodiments. Hereinafter, only differences from the first to third embodiments will be described.

本実施形態においてぼけ推定部１１は、入力画像についてそれぞれ異なるボケ除去を行う、また、入力画像の画像内容に応じて領域分割を行い、各領域について局所的なボケ推定を行う。そして、各領域について適正なボケ除去の行われた画像を統合する。図２３は、本実施形態に係るぼけ推定部１１のブロック図である。 In this embodiment, the blur estimation unit 11 performs different blur removal for each input image, performs region division according to the image content of the input image, and performs local blur estimation for each region. Then, the images from which the proper blur removal has been performed for each region are integrated. FIG. 23 is a block diagram of the blur estimation unit 11 according to the present embodiment.

図示するように、ぼけ推定部１１は、ＰＳＦモデル記憶装置１０３、逆畳み込み部１０４、ボケ除去画像記憶装置１０５、局所ボケパラメータ推定部１０６、局所推定ボケパラメータ記憶装置１０７、画像統合部１０８、復元画像記憶装置１０９を備える。なお、本実施形態におけるぼけパラメータが、上記実施形態で説明したぼけ量３０に相当する。 As illustrated, the blur estimation unit 11 includes a PSF model storage device 103, a deconvolution unit 104, a blur removal image storage device 105, a local blur parameter estimation unit 106, a local estimation blur parameter storage device 107, an image integration unit 108, and a restoration. An image storage device 109 is provided. Note that the blur parameter in the present embodiment corresponds to the blur amount 30 described in the above embodiment.

ＰＳＦモデル記憶装置１０３は、ぼけの状態をモデル化したＰＳＦ（ぼけ関数）１０３ａと、このＰＳＦ３ａに適用される複数段階のぼけパラメータ３ｂとを記憶する。ＰＳＦ１０３ａは、上記第１の実施形態で説明したぼけ関数ｈ(x; r)に相当する。 The PSF model storage device 103 stores a PSF (blur function) 103a obtained by modeling a blur state, and a plurality of blur parameters 3b applied to the PSF 3a. The PSF 103a corresponds to the blur function h (x; r) described in the first embodiment.

逆畳み込み部１０４は、複数の段階について、入力画像から、その段階に対応するぼけパラメータ１０３ｂを適用したＰＳＦ１０３ａで表されるぼけを除去したぼけ除去画像データ１５１〜１５ｎを生成し、生成された各段階に対応するぼけ除去画像データ１５１〜１５ｎをぼけ除去画像記憶装置１０５に記憶する。 The deconvolution unit 104 generates, from the input image, the deblurred image data 151 to 15n from which the blur represented by the PSF 103a to which the blur parameter 103b corresponding to the step is applied is removed from the input image. The deblurred image data 151 to 15n corresponding to the stage is stored in the deblurred image storage device 105.

局所ぼけパラメータ推定部１０６は、入力画像を読み出し、局所ぼけパラメータ推定部１０６の領域分割部１６１は、例えば色の変化などのような画像の内容に基づいて、入力画像を複数の領域に分割する。 The local blur parameter estimation unit 106 reads the input image, and the region division unit 161 of the local blur parameter estimation unit 106 divides the input image into a plurality of regions based on the content of the image such as color change. .

局所ぼけパラメータ推定部１０６は、各段階に対応するぼけ除去画像データ１５１〜１５ｎを読み出し、局所ぼけパラメータ推定部１０６の局所ぼけ推定部１６２は、各領域について、各段階のぼけ除去画像データ１５１〜１５ｎから領域内で所定の適正条件を満たすデータを選択し、各領域について選択されたデータを特定するための局所推定ぼけパラメータ１０７ａを推定する。 The local blur parameter estimation unit 106 reads the blur-removed image data 151 to 15n corresponding to each stage, and the local blur estimation unit 162 of the local blur parameter estimation unit 106 performs the blur-removed image data 151 to 150 for each region. From 15n, data satisfying a predetermined appropriate condition is selected in the area, and the local estimation blur parameter 107a for specifying the selected data for each area is estimated.

そして、局所ぼけパラメータ推定部１０６は、各領域について推定された局所推定ぼけパラメータ１０７ａを、局所推定ぼけパラメータ記憶装置１０７に記憶する。この局所推定ぼけパラメータが、上記実施形態で説明した、各窓におけるぼけ量３０に相当する。 Then, the local blur parameter estimation unit 106 stores the local estimated blur parameter 107 a estimated for each region in the local estimated blur parameter storage device 107. This local estimation blur parameter corresponds to the blur amount 30 in each window described in the above embodiment.

画像統合部１０８は、各段階のぼけ除去画像データ１５１〜１５ｎと、各領域の局所推定ぼけパラメータ１０７ａとを読み出す。そして各領域について、その領域に対して推定された局所推定ぼけパラメータ１０７ａに対応するぼけ除去画像データ１０７ａの該当領域データを抽出し、抽出された各領域の該当領域データを組み合わせて復元画像データ１０９ａを生成し、復元画像データ１０９ａを出力画像記憶装置１０９に記憶する。 The image integration unit 108 reads the blur-removed image data 151 to 15n at each stage and the local estimated blur parameter 107a of each region. Then, for each area, the corresponding area data of the blur-removed image data 107a corresponding to the local estimated blur parameter 107a estimated for the area is extracted, and the restored area data 109a is combined with the extracted corresponding area data of each area. And the restored image data 109a is stored in the output image storage device 109.

以下に、ぼけ推定部１１の構成の詳細について説明する。
逆畳み込み部１０４は、パラメータ変化部１４１及び逆畳み込み処理部１４２を具備する。さらに、逆畳み込み処理部１４２は、微分部１４３、微分領域逆畳み込み処理部１４４、及び積分部１４５を具備する。逆畳み込み部１０４は、ぼけが一様であると仮定して、入力画像に対して逆畳み込みを行う。ＰＳＦ１０３ａは、ぼけの大きさについてパラメータ化されている。逆畳み込み部１０４のパラメータ変化部１４１は、ぼけパラメータ１０３ｂにしたがって、ＰＳＦ１０３ａのパラメータrを、例えばr＝０．５、１．０、１．５、２．０、…などのように、段階的に変化させる。 Details of the configuration of the blur estimation unit 11 will be described below.
The deconvolution unit 104 includes a parameter changing unit 141 and a deconvolution processing unit 142. Further, the deconvolution processing unit 142 includes a differentiation unit 143, a differential domain deconvolution processing unit 144, and an integration unit 145. The deconvolution unit 104 performs deconvolution on the input image on the assumption that the blur is uniform. The PSF 103a is parameterized with respect to the magnitude of blur. The parameter changing unit 141 of the deconvolution unit 104 changes the parameter r of the PSF 103a according to the blur parameter 103b in a stepwise manner, for example, r = 0.5, 1.0, 1.5, 2.0,. To change.

逆畳み込み部１０４は、r＝０．５のぼけパラメータを適用したＰＳＦ１０３ａを用いて逆畳み込みを行った結果を示すぼけ除去画像データ、r＝１．０のぼけパラメータを適用したＰＳＦ１０３ａを用いて逆畳み込みを行った結果を示すぼけ除去画像データ、…というように、各ぼけパラメータに対応するぼけ除去画像データ１５１〜１５ｎを生成する。 The deconvolution unit 104 performs deblurred image data indicating the result of deconvolution using the PSF 103a to which the blur parameter of r = 0.5 is applied, and reverses using the PSF 103a to which the blur parameter of r = 1.0 is applied. Deblurred image data 151 to 15n corresponding to each blur parameter is generated, such as deblurred image data indicating the result of the convolution.

ぼけの大きさについてパラメータ化されたＰＳＦ１０３ａとしては様々な関数が用いられる。ぼけパラメータを標準偏差rとしたガウシアン分布は、ＭＲＩ（Magnetic Resonance Imaging）やＣＴ（Computed Tomography）など多くの分野におけるＰＳＦのモデルとなり得る。 Various functions are used as the PSF 103a parameterized with respect to the magnitude of blur. The Gaussian distribution with the blur parameter as the standard deviation r can be a model of PSF in many fields such as MRI (Magnetic Resonance Imaging) and CT (Computed Tomography).

式(24)は、ぼけパラメータを標準偏差rとしたガウシアン分布である。なお、以下では上記実施形態で説明した画素の座標x（２次元ベクトル）を、（x, y）座標として表現する。

Equation (24) is a Gaussian distribution with the blur parameter as the standard deviation r. In the following, the pixel coordinate x (two-dimensional vector) described in the above embodiment is expressed as an (x, y) coordinate.

式(24)は、一般化して周波数領域で式(25)のように表される。

Equation (24) is generalized and expressed as Equation (25) in the frequency domain.

ここで、Ｆ［ｈ］はＰＳＦｆのフーリエ変換であり、光学伝達関数（ＯＴＦ：Optical Transfer Function）と呼ばれる。 Here, F [h] is a Fourier transform of PSF f and is called an optical transfer function (OTF).

ξ，ηは、空間周波数である。ｂ＝１とすると、式(25)は式(24)に相当する。ｂ＝５／６の場合は大気乱流によるＰＳＦ、ｂ＝１／２はＸ線の散乱によるＰＳＦ／ＯＴＦのモデルとなる。 ξ and η are spatial frequencies. When b = 1, the expression (25) corresponds to the expression (24). When b = 5/6, the PSF is caused by atmospheric turbulence, and b = 1/2 is the PSF / OTF model caused by X-ray scattering.

また、カメラなどのレンズによる焦点ぼけの場合、ぼけ半径をぼけパラメータとして式(26)のようにＰＳＦを表すことができる。

Also, in the case of defocusing by a lens such as a camera, PSF can be expressed as in equation (26) using the blur radius as a blur parameter.

また、ｒをある基本ＰＳＦｑ（ｘ，ｙ）を拡大縮小するスケーリングパラメータとすることで、任意のＰＳＦを定義することができる。

Also, an arbitrary PSF can be defined by using r as a scaling parameter for scaling a certain basic PSF q (x, y).

基本ＰＳＦは例えばキャリブレーションによって得られる。 The basic PSF is obtained by calibration, for example.

逆畳み込み処理部１４２における逆畳み込みの処理においては、式(28)のような画像データの劣化モデルが用いられる。

In the deconvolution process in the deconvolution processing unit 142, a degradation model of image data as shown in Expression (28) is used.

ただし、ｇ，ｆ，ｎはそれぞれ入力画像、元の鮮明な画像データ、ノイズであり、離散化された画素値を辞書順に並べてベクトル表記している。 Here, g, f, and n are the input image, the original clear image data, and noise, respectively, and the pixel values that have been discretized are arranged in the order of the dictionary and expressed as vectors.

ＨはＰＳＦｈ（x, y; r）による二次元畳み込みを表す行列である。逆畳み込みとは、入力画像ｇとぼけ行列Ｈが与えられて元画像データｆを推定することに相当し、不良設定問題であることが知られている。単純に｜ｇ−Ｈｆ｜^２を最小化する最小二乗問題として解くと、例え正確なぼけ行列Ｈを使用していても結果の画像データにおいてノイズが拡大される。 H is a matrix representing a two-dimensional convolution by PSF h (x, y; r). Deconvolution corresponds to estimating the original image data f given the input image g and the blur matrix H, and is known to be a defect setting problem. When simply solving as a least squares problem that minimizes | g−Hf | ² , noise is magnified in the resulting image data even if an accurate blur matrix H is used.

このノイズの拡大を解決するためには、元画像データｆについての事前知識が必要となる。事前知識として、自然画像データの輝度勾配の分布は一般に図２４の実線のようになることを用いる。この自然画像データの輝度勾配の分布は、一般によく使用されるガウシアン分布（破線）よりも頂点が狭く、裾野が広い。この自然画像データの輝度勾配の分布は、例えば、一般化ラプラシアン分布として、式(29)のようにモデル化できる。

In order to solve this noise expansion, prior knowledge about the original image data f is required. As prior knowledge, it is used that the distribution of the luminance gradient of natural image data is generally as shown by the solid line in FIG. The luminance gradient distribution of the natural image data has narrower vertices and wider bases than the Gaussian distribution (broken line) that is generally used. The distribution of the luminance gradient of the natural image data can be modeled as a generalized Laplacian distribution, for example, as shown in Expression (29).

この式(29)の分布を微分係数事前分布と呼ぶ。この事前知識は、輝度勾配、すなわち微分係数に適用されるので、式(29)を微分して画像劣化モデルを式(30)及び式(31)のように変形する。

The distribution of this equation (29) is called differential coefficient prior distribution. Since this prior knowledge is applied to the luminance gradient, that is, the differential coefficient, Equation (29) is differentiated to transform the image degradation model as Equation (30) and Equation (31).

ここで、添え字のｘ，ｙは、それぞれの方向の微分を表す。ｎx及びｎyは微分領域におけるノイズを表し、元の画像領域におけるノイズｎとは一般に異なる。 Here, the subscripts x and y represent differentiations in the respective directions. nx and ny represent noise in the differential region and are generally different from noise n in the original image region.

ｇxとｇyは、劣化画像データｇを微分すれば得られる。 gx and gy can be obtained by differentiating the deteriorated image data g.

式(30)及び式(31)から、それぞれ元画像データのｘ微分の推定値ｆ^xとｙ微分の推定値ｆ^yとが得られる。ｘ微分の推定値ｆ^xとｙ微分の推定値ｆ^yを積分して、元画像データの推定値ｆ^が得られる。積分は式(32)のポワッソン方程式を解けばよい。

From the expressions (30) and (31), an estimated value f ^ x and an estimated value f ^ y of the x derivative of the original image data are obtained, respectively. The estimated value f ^ of the original image data is obtained by integrating the estimated value f ^ x of the x derivative and the estimated value f ^ y of the y derivative. The integration can be done by solving the Poisson equation of Equation (32).

逆畳み込み処理部４２は、図２５に示す処理を実行する。 The deconvolution processing unit 42 executes the processing shown in FIG.

すなわち、逆畳み込み処理部４２の微分部１４３は、入力画像をｘ方向に微分する処理１１０ａ、入力画像をｙ方向に微分する処理１１０ｂを実行する。 That is, the differentiation unit 143 of the deconvolution processing unit 42 executes a process 110a for differentiating the input image in the x direction and a process 110b for differentiating the input image in the y direction.

次に、逆畳み込み処理部４２の微分領域逆畳み込み処理部１４４は、入力画像のｘ方向微分に逆畳み込みを実行する処理１１０ｃ、入力画像のｙ方向微分に逆畳み込みを実行する処理１１０ｄを実行する。この処理１１０ｃ、１１０ｄは、パラメータ変化部１４１によって得られる段階的なぼけパラメータ１０３ｂ、ＰＳＦ１０３ａ、記憶装置１１１に記憶されている微分係数事前分布１１１ａに基づいて、段階的に実行される。微分領域逆畳み込み処理部１４４は、この逆畳み込み処理の結果として得られる画像データが、微分係数事前分布１１１ａ（画像の微分係数の統計分布）に従うように促す（微分係数事前分布１１１ａに極力従うような制約を課す）。 Next, the differential domain deconvolution processing unit 144 of the deconvolution processing unit 42 executes processing 110c for performing deconvolution on the x-direction differentiation of the input image and processing 110d for performing deconvolution on the y-direction differentiation of the input image. . The processes 110c and 110d are executed step by step based on the stepwise blur parameters 103b and PSF 103a obtained by the parameter changing unit 141 and the differential coefficient prior distribution 111a stored in the storage device 111. The differential domain deconvolution processing unit 144 prompts the image data obtained as a result of the deconvolution processing to follow the differential coefficient prior distribution 111a (statistic distribution of the differential coefficient of the image) (so as to follow the differential coefficient prior distribution 111a as much as possible). Impose some constraints).

そして、逆畳み込み処理部１４２の積分部１４２は、入力画像のｘ方向微分に逆畳み込みを行った結果と、入力画像のｙ方向微分に逆畳み込みを行った結果とを積分し、ぼけパラメータ１０３ｂの各段階のぼけ除去画像データ１５１〜１５ｎを得る処理１１０ｅを実行する。 Then, the integration unit 142 of the deconvolution processing unit 142 integrates the result of deconvolution to the x-direction derivative of the input image and the result of deconvolution to the y-direction derivative of the input image, and the blur parameter 103b A process 110e for obtaining the blur-removed image data 151 to 15n at each stage is executed.

逆畳み込みの処理は、微分した状態での画像の劣化モデル式(30)、(31)についての逆畳み込みを表す。 The deconvolution process represents deconvolution of the degradation model equations (30) and (31) of the image in the differentiated state.

式(30)と式(31)とは同様に扱えるので、以降では式(30)の逆畳み込みについて説明し、式(31)の逆畳み込みについては説明を省略する。 Since Equation (30) and Equation (31) can be handled in the same manner, the deconvolution of Equation (30) will be described below, and the description of the deconvolution of Equation (31) will be omitted.

ノイズｎ’xを分散σ^２のガウシアンノイズと仮定すると、劣化画像データ２ａのｘ微分ｇxが与えられた上での元画像データのｘ微分ｆxの条件付き確率は式(33)のようになる。

Assuming that the noise n′x is Gaussian noise with variance σ ² , the conditional probability of the x differential fx of the original image data given the x differential gx of the deteriorated image data 2a is given by equation (33). .

ここで、ｐ（．）は引数となる変数の確率密度関数、ｆx,iは、元画像データのｘ微分ｆxのｉ番目の画素値である。また、微分係数事前分布は画素毎に適用できることを仮定している。 Here, p (.) Is the probability density function of the variable serving as an argument, and fx, i is the i-th pixel value of the x differential fx of the original image data. Further, it is assumed that the differential coefficient prior distribution can be applied to each pixel.

式(33)を最大化するｆxを推定値とするＭＡＰ推定（Maximum A Posteriori estimate）を考えると、最大値の位置は対数（ｌｎ）をとっても不変であるため、式(34)が得られる。

Considering a MAP estimation (Maximum A Posteriori estimate) with fx that maximizes Expression (33) as an estimated value, the position of the maximum value is invariant even when taking a logarithm (ln), and therefore Expression (34) is obtained.

式(34)は、微分係数事前分布ｐ（ｆx,i）が一般的によく使用されるガウシアン分布であれば、線形方程式に帰着される。しかし、ガウシアン分布は画像データのいたるところを平滑化する効果を持ち、ぼけを除去して鮮明な画像データを復元するにはそぐわない場合がある。そこで、本実施の形態においては上記図２４の実線で示す分布を用いる。 Equation (34) is reduced to a linear equation if the differential prior distribution p (fx, i) is a Gaussian distribution that is generally used. However, the Gaussian distribution has an effect of smoothing every part of the image data, and may not be suitable for removing blur and restoring clear image data. Therefore, in the present embodiment, the distribution indicated by the solid line in FIG. 24 is used.

この上記図２４の実線を用いることにより、式(34)は、非線形な最適化問題となる。本実施の形態では、上記図２４の実線を、ＥＭ(Expectation Maximization)法により反復的な線形方程式に変形して処理を高速化する。 By using the solid line in FIG. 24, the equation (34) becomes a nonlinear optimization problem. In the present embodiment, the solid line in FIG. 24 is transformed into an iterative linear equation by the EM (Expectation Maximization) method to speed up the processing.

以下に、ＥＭ法を微分画像データに適用する場合について説明する。まず、微分係数事前分布ｐ（ｆx,i）を、平均が０で分散が異なるガウシアンの重ね合わせとして表す。すなわち、式(35)において、ｐ（ｆx,i｜ｚi）を平均０分散ｚiのガウシアン、ｐ（ｚi）は分散ｚiの確率分布とする。

Below, the case where EM method is applied to differential image data is demonstrated. First, the differential coefficient prior distribution p (fx, i) is expressed as a superposition of Gaussians having an average of 0 and different variances. That is, in equation (35), p (fx, i | zi) is a Gaussian with an average 0 variance zi, and p (zi) is a probability distribution with variance zi.

｛ｚi｝を欠けた（観測不能な）データとみなしてＥＭ法を適用すると、式(34)は式((36)のような反復的な推定値の更新式になる。ｎ回目の反復におけるｆxの推定値をｆ^（ｎ）xとする。

When {EM} is applied assuming that {zi} is missing (unobservable), Equation (34) becomes an iterative estimate update equation such as Equation (36). The estimated value of fx is assumed to be f ⁽ⁿ⁾ x.

Ｅ^（ｎ）i［．］はｉ番目の微分係数に対する分散ｚiの確率分布ｐ（ｚi｜ｆ^（ｎ）x,i）に基づく期待値を表す。式(36)の第２項はｐ（ｆx,i｜ｚi）がガウシアンであることから、式(37)のようになる。

E ⁽ⁿ⁾ i [. ] Represents an expected value based on the probability distribution p (zi | f ⁽ⁿ⁾ x, i) of the variance zi for the i-th differential coefficient. The second term of equation (36) is expressed by equation (37) because p (fx, i | zi) is Gaussian.

ただし、

However,

である。式(38)は、例えば、式(29)に示した微分係数の事前分布モデルを用いた場合、具体的には、式(39)のようになる。

It is. For example, when the prior distribution model of the differential coefficient shown in Expression (29) is used, Expression (38) is specifically expressed as Expression (39).

これにより式(36)は、ｆxに関して二次式になり、式(40)の線型方程式をｆxについて解くことで最大化できる。

Thus, Equation (36) becomes a quadratic equation with respect to fx, and can be maximized by solving the linear equation of Equation (40) with respect to fx.

ここで、Ｄ^（ｎ）は、式(37)に示すＥ^（ｎ）i［１／ｚi］を対角成分とする対角行列である。Ｈ^Tは、ぼけ行列Ｈの転置を表す。 Here, D ⁽ⁿ⁾ is a diagonal matrix having E ⁽ⁿ⁾ i [1 / zi] shown in Equation (37) as a diagonal component. H ^T represents the transpose of the blur matrix H.

上記図２５における入力画像の微分画像データに対する逆畳み込み処理１０ｃ，１０ｄの処理の一例を、図２６に示す。すなわち、ｎ回目の反復において、微分領域逆畳み込み処理部１４４は、現在の推定ぼけ除去微分画像データｆ⁽ⁿ⁾xに基づき、式(38)により計算される対角成分を持つ行列Ｄ^（ｎ）を計算し、線形方程式(40)の係数（σ^２Ｄ^（ｎ）＋Ｈ^ＴＨ）及びＨ^Ｔｇxを計算する（ステップＳ１）。 An example of the deconvolution processes 10c and 10d for the differential image data of the input image in FIG. 25 is shown in FIG. That is, in the n-th iteration, the differential domain deconvolution processing unit 144 performs a matrix D ⁽ⁿ⁾ having a diagonal component calculated by the equation (38) based on the current estimated deblurred differential image data f ⁽ⁿ⁾ x. ⁾ is calculated (the coefficient of 40) (sigma ² D ⁽ⁿ⁾ linear equations ^{+ H} T H) and ^H T gx calculating the (step S1).

次に、微分領域逆畳み込み処理部１４４は、式(40)の線型方程式を解く（ステップＳ２）。 Next, the differential domain deconvolution processing unit 144 solves the linear equation (40) (step S2).

同様に、新たな推定ぼけ除去微分画像データｆ^{（ｎ＋１）}xについても、ステップＳ１，Ｓ２が実行される。 Similarly, steps S1 and S2 are executed for new estimated blur-removed differential image data f ^{(n + 1)} x.

微分領域逆畳み込み処理部１４４は、上記の処理を、初期の推定ぼけ除去微分画像データｆ^（０）xから始めて推定値の変化が所定値より小さくなるまで繰り返し、最終的な推定値をぼけ除去微分画像データとして出力する（ステップＳ３）。 The differential region deconvolution processing unit 144 repeats the above process starting from the initial estimated blur removal differential image data f ⁽⁰⁾ x until the change in the estimated value becomes smaller than a predetermined value, and the final estimated value is removed by blurring. It outputs as differential image data (step S3).

線型方程式の解法には例えば共役勾配法や二次定常反復法を用いることができる。解法中の主な処理はぼけ行列Ｈおよびその転置Ｈ^Ｔとベクトルとの積の計算である。このぼけ行列Ｈおよびその転置Ｈ^Ｔとベクトルとの積の計算は高速フーリエ変換を用いて高速に行うことができる。 For solving the linear equation, for example, a conjugate gradient method or a second-order stationary iterative method can be used. The main process during the solution is the calculation of the product of the blur matrix H and its transpose H ^T and the vector. The calculation of the product of the blur matrix H and its transpose H ^T and the vector can be performed at high speed using a fast Fourier transform.

初期推定微分画像データｆ^（０）xとしては、最も単純には、入力画像の微分データｇxを用いる。 As the initial estimated differential image data f ⁽⁰⁾ x, the differential data gx of the input image is most simply used.

ノイズの分散σ^２は劣化画像データの微分データｇxから見積もることができる。また、σ^２を式（１７）における事前分布項の重み付け係数とみなして任意の値を設定してもよい。σ^２の値が大きいほど復元画像データのノイズが軽減されるが、大きすぎると復元画像データが不鮮明なものとなる。σ^２の初期値を大きくしておき、逆畳み込み計算の反復とともに減少させていく方法が、ノイズ軽減と鮮明さの両立に有効である。 The noise variance σ ² can be estimated from the differential data gx of the deteriorated image data. Further, σ ² may be regarded as a weighting coefficient of the prior distribution term in Expression (17), and an arbitrary value may be set. As the value of σ ² is larger, the noise of the restored image data is reduced. However, if the value is too large, the restored image data becomes unclear. A method of increasing the initial value of σ ² and decreasing it with repeated deconvolution calculations is effective in achieving both noise reduction and sharpness.

上記のような微分領域逆畳み込み処理部１４４による逆畳み込みの処理については、各画素が１要素のグレースケール画像の場合について説明している。カラー画像に対して逆畳み込みを行う場合には、要素数だけ同様の処理を繰り返すことで適用可能である。 As for the deconvolution processing by the differential domain deconvolution processing unit 144 as described above, the case where each pixel is a single-element grayscale image has been described. When deconvolution is performed on a color image, it can be applied by repeating the same process for the number of elements.

画像処理装置１では、逆畳み込み部１０４による処理の一方で、局所ぼけパラメータ推定部１０６により、各座標（ｘ，ｙ）における局所推定ぼけパラメータ１０７ａであるr(x, y)が得られる。 In the image processing apparatus 1, while the processing by the deconvolution unit 104 is performed, the local blur parameter estimation unit 106 obtains r (x, y) which is the local estimation blur parameter 107 a at each coordinate (x, y).

図２７に、局所ぼけパラメータ推定部７ａの処理の一例を示す。ステップＴ１において、局所ぼけパラメータ推定部１０６は、入力画像を、画像内容に応じて複数の領域に分割する。この領域が、窓である。 FIG. 27 shows an example of processing of the local blur parameter estimation unit 7a. In step T1, the local blur parameter estimation unit 106 divides the input image into a plurality of regions according to the image contents. This area is a window.

ステップＴ２において、局所ぼけパラメータ推定部１０６は、複数のぼけ除去画像データ１５１〜１５ｎに基づいて、各領域について、ぼけ除去結果の良好さの指標となる値（発散度）を求める。ここで、発散度は、後述するように、ノイズや周期的パターンが現れることを指し、ぼけ除去結果の一般的な自然画像からの逸脱の程度を表すとしている。すなわち、本実施の形態では、各領域について、発散度の状態によりぼけ除去が適正条件を満たすか否かの判断が実行される。 In step T 2, the local blur parameter estimation unit 106 obtains a value (divergence) that serves as an index of a good blur removal result for each region based on the plurality of blur removal image data 151 to 15 n. Here, the degree of divergence refers to the appearance of noise or a periodic pattern, as will be described later, and represents the degree of deviation from a general natural image as a result of blur removal. That is, in the present embodiment, for each region, a determination is made as to whether blur removal satisfies an appropriate condition depending on the state of divergence.

ステップＴ３において、局所ぼけパラメータ推定部１０６は、発散度に基づいて、各領域について、ぼけ除去結果の良いぼけパラメータを推定（選択）する。 In step T3, the local blur parameter estimation unit 106 estimates (selects) a blur parameter having a good blur removal result for each region based on the divergence.

ステップＴ４において、局所ぼけパラメータ推定部１０６は、各領域について推定されたぼけパラメータに基づいて、ぼけパラメータの推定されていない領域に対するぼけパラメータの補間や、領域間でのぼけパラメータのスムージングなどの後処理を実行する。 In step T4, the local blur parameter estimation unit 106 performs post-blurring of blur parameters for regions where blur parameters are not estimated or smoothing of blur parameters between regions based on the blur parameters estimated for each region. Execute the process.

この局所ぼけパラメータ推定部６の処理において、入力画像は、色の似た画素をまとめるなどにより領域に分割されるとしてもよい。図２８は、画像データの内容に応じて領域分割を行った結果の一例を示している。なお、入力画像を領域に分割する方法は、任意であり、例えば単純な矩形分割でもよい。画像の内容に応じた分割をすることで、ぼけパラメータの推定精度を向上させることができる。領域の形状は制限されない。領域が小さすぎるところは画素数が少なく十分なサンプルが取れないので周りの領域と併合させてもよく、その領域は推定対象外としてもよい。大きすぎる領域については推定の局所性が低くなるため、適当に細分してもよい。細長い形状の領域、画素値の変化が微小な領域、画素値が飽和した領域などは除外することで推定の信頼性を向上させてもよい。 In the processing of the local blur parameter estimation unit 6, the input image may be divided into regions by collecting pixels having similar colors. FIG. 28 shows an example of the result of area division according to the contents of image data. The method of dividing the input image into regions is arbitrary, and for example, simple rectangular division may be used. By performing the division according to the content of the image, it is possible to improve the estimation accuracy of the blur parameter. The shape of the region is not limited. Where the area is too small, the number of pixels is small and a sufficient sample cannot be obtained, so it may be merged with the surrounding area, and the area may be excluded from the estimation target. An area that is too large may be subdivided appropriately because the locality of estimation becomes low. The reliability of the estimation may be improved by excluding an elongated shape region, a region where the change in pixel value is minute, a region where the pixel value is saturated, and the like.

分割された領域をＡjとし、入力画像全体の領域をＩとすると、式(41)及び式(42)が成り立つ。

Assuming that the divided area is Aj and the area of the entire input image is I, Expressions (41) and (42) hold.

各領域Ａjにおいて、その領域内ではぼけパラメータが一定であるとみなしてぼけ推定を行う。本実施の形態では、複数のぼけ除去画像データ１５１〜１５ｎから、その領域内で最も良い復元結果となるぼけ除去画像データに対応するぼけパラメータを推定する。例えば、ぼけパラメータr＝０．５、１．０、１．５、２．０、…で逆畳み込みをした複数のぼけ除去画像データ１５１〜１５ｎがあり、注目領域での真のぼけパラメータがr＝１．６であった場合、複数のぼけ除去画像データ１５１〜１５ｎのうち注目領域についてはr＝１．５で逆畳み込みをしたぼけ除去画像データが最も良い復元結果を与えると考えられる。よって、復元結果の良さが判定できれば、この例では注目領域でのぼけパラメータをr＝１．５と推定することができる。ぼけパラメータの推定値は事前に準備した段階的なぼけパラメータのいずれかになるので、必要な精度に応じてぼけパラメータ群を用意する必要がある。 In each region Aj, blur estimation is performed assuming that the blur parameter is constant in that region. In the present embodiment, the blur parameter corresponding to the blur-removed image data that provides the best restoration result in the region is estimated from the plurality of blur-removed image data 151 to 15n. For example, there are a plurality of deblurred image data 151 to 15n that are deconvolved with the blur parameter r = 0.5, 1.0, 1.5, 2.0,..., And the true blur parameter in the attention area is r. When 1.6 = 1.6, it is considered that the deblurred image data obtained by deconvolution with r = 1.5 for the attention area among the plurality of deblurred image data 151 to 15n gives the best restoration result. Therefore, if the quality of the restoration result can be determined, the blur parameter in the attention area can be estimated as r = 1.5 in this example. Since the estimated value of the blur parameter is one of the stepwise blur parameters prepared in advance, it is necessary to prepare a blur parameter group according to the required accuracy.

復元結果の良さの基準として、画像データの発散度を考える。なお、発散とは、元画像データにはない周期性を持ったパターンや乱雑なノイズがぼけ除去画像データに現れることを意味する。発散度とは、元画像データにはない周期性を持ったパターンや乱雑なノイズの程度の大きさを表し、ぼけ除去画像データをｆ^とすると、以下の式(43)〜(46)のような指標で評価できる。

The divergence of the image data is considered as a criterion for the goodness of the restoration result. Note that divergence means that a pattern having periodicity and messy noise that do not exist in the original image data appear in the blur-removed image data. The degree of divergence represents a pattern with periodicity that is not present in the original image data or the level of random noise. If the deblurred image data is f ^, the following expressions (43) to (46) It can be evaluated with such an index.

ここで、ｆ^xx及びｆ^yyはぼけ除去画像データのそれぞれｘとｙに関する２階微分である。δ（．）は引数が真のとき１、偽のとき０を返す関数、［θmin，θmax］は画素値の取りうる範囲である。 Here, f ^ xx and f ^ yy are second-order derivatives with respect to x and y, respectively, of the deblurred image data. δ (.) is a function that returns 1 when the argument is true, and 0 when the argument is false, and [θmin, θmax] is a range of pixel values.

真のぼけパラメータがｒtであるときに、ぼけパラメータrで逆畳み込みをした場合、ぼけ除去画像データはｒ≦ｒtの間において発散の程度が比較的小さい。しかしながら、ｒ＞ｒtになると、rの増加にしたがって急激に発散度が増す性質がある。この理由を以下に説明する。 When the true blur parameter is rt and the deconvolution is performed with the blur parameter r, the degree of divergence is relatively small in the deblurred image data when r ≦ rt. However, when r> rt, the divergence increases rapidly as r increases. The reason for this will be described below.

ぼけパラメータrが大きいほど、そのときのＰＳＦによってぼかされた画像データは高周波成分が減衰する。 As the blur parameter r is larger, the high-frequency component is attenuated in the image data blurred by the PSF at that time.

例えば、式(24)のＰＳＦの周波数領域におけるグラフは式(25)においてｂ＝１とすると得られる。図２９は、この式(25)においてｂ＝１とした場合のグラフを簡単のため一次元で図示した例である。図示するように、ぼけパラメータｒが大きくなるにつれ、高周波成分の減衰はより大きくなることが分かる。逆畳み込みは、この減衰した周波数成分を増幅する処理に相当する。よって、ぼけパラメータが大きいほど小さい周波数成分をも大きく増幅することになる。 For example, a graph in the frequency domain of the PSF in Expression (24) is obtained when b = 1 in Expression (25). FIG. 29 is an example in which the graph when b = 1 in this equation (25) is shown in one dimension for simplicity. As shown in the figure, it can be seen that as the blur parameter r increases, the attenuation of the high-frequency component increases. Deconvolution corresponds to a process of amplifying the attenuated frequency component. Therefore, the smaller the blur parameter, the larger the smaller frequency component is amplified.

したがって、ｒ＜ｒtのときは、真のぼけパラメータｒtによる減衰の程度よりも逆畳み込みによる増幅の程度の方が弱いため、ぼけ除去画像データは、元画像データに比べて高周波成分が未だ小さい。 Therefore, when r <rt, since the degree of amplification by deconvolution is weaker than the degree of attenuation by the true blur parameter rt, the high-frequency component of the blur-removed image data is still smaller than the original image data.

逆に、ｒ＞ｒtのときは、真のぼけパラメータｒtによる減衰の程度よりも逆畳み込みによる増幅の程度の方が強いため、ぼけ除去画像データは元画像データに比べて高周波成分が大きくなり、発散する。 On the other hand, when r> rt, the degree of amplification by deconvolution is stronger than the degree of attenuation by the true blur parameter rt, so that the deblurred image data has a higher frequency component than the original image data. Diverge.

ｒ＝ｒtのときは理想的にはフィルタはいたるところ１となって元画像データが復元される。実際にはノイズの影響があるためｒ≦ｒtであってもノイズも一緒に増幅されてしまい、ある程度の発散が起こるが、ノイズが画像の内容より十分小さい場合には、ｒ＞ｒtのときの発散のしかたに比べれば十分小さい。 When r = rt, ideally the filter is 1 everywhere and the original image data is restored. Actually, because of the influence of noise, even if r ≦ rt, the noise is amplified together and a certain degree of divergence occurs. However, when the noise is sufficiently smaller than the content of the image, when r> rt Small enough compared to the method of divergence.

説明を簡単化するために、元画像データｆがぼけパラメータｒtのガウシアンでぼかされた画像データを、ぼけパラメータrのガウシアンで逆畳み込みをすることを考え、ノイズを無視して単純な逆フィルタを用いると式(47)の関係を得る。

In order to simplify the explanation, it is considered that the original image data f is blurred by a Gaussian with a blur parameter rt and deconvolved with a Gaussian with a blur parameter r, and a simple inverse filter ignoring noise. Is used to obtain the relationship of equation (47).

式(47)においてexp｛−(ｒt^２−ｒ^２)（ξ^２＋η^２）｝を元画像データｆに対する周波数領域でのフィルタと考えると、このフィルタのグラフはやはりηを無視して一次元の場合で示すとrとｒt の大小関係によって図３０のようになる。ｒ＞ｒtのときのぼけ除去画像データの高周波成分が、元画像データに対して過剰に増幅されることが分かる。この議論は、式(25)においてｂ＝１とした場合について行っているが、ｂの値が１ではない場合も同様である。 If exp {− (rt ² −r ² ) (ξ ² + η ² )} is considered as a filter in the frequency domain with respect to the original image data f in the equation (47), the graph of this filter also ignores η and is one-dimensional. In this case, FIG. 30 shows the relationship between r and rt. It can be seen that the high-frequency component of the blur-removed image data when r> rt is excessively amplified with respect to the original image data. This discussion is made for the case where b = 1 in the equation (25), but the same applies when the value of b is not 1.

式(26)の焦点ぼけのＰＳＦを用いる場合について以下に説明する。簡単化のために一次元の場合と仮定すると、式(48)が得られる。

A case where the defocused PSF of Expression (26) is used will be described below. Assuming a one-dimensional case for simplicity, Equation (48) is obtained.

このｈ（ｘ）の周波数変換は式(49)で与えられる。

The frequency conversion of h (x) is given by equation (49).

この式(49)は、図３１に示すグラフになり、高周波数成分が減衰するのみではなく、ξ＝ｍπ／ｒ（ｍ＝±１，±２，…）において、周期的に周波数を０にするフィルタになる。このため、式(24)及び式(25)のＰＳＦとは異なり、低周波部においても逆畳み込みによる過剰な増幅が起こる。 This equation (49) is a graph shown in FIG. 31 and not only attenuates high frequency components but also periodically sets the frequency to 0 at ξ = mπ / r (m = ± 1, ± 2,...). It becomes a filter to do. For this reason, unlike the PSFs of the equations (24) and (25), excessive amplification occurs due to deconvolution even in the low frequency part.

上記と同様に、元画像データｆがぼけパラメータｒtの焦点ぼけでぼかされた画像データを、ぼけパラメータｒの焦点ぼけで逆畳み込みをすることを考える。この場合、単純な逆フィルタでは分母が０になるため、微小な値εを加えた擬似逆フィルタを用いると、式(50)の関係が得られる。

Similarly to the above, it is assumed that the original image data f is deconvolved with the blur of the blur parameter r by defocusing the image data blurred by the blur of the blur parameter rt. In this case, since the denominator is 0 in a simple inverse filter, using a pseudo inverse filter to which a minute value ε is added, the relationship of Expression (50) is obtained.

式(50)における、式(49)の値が０になる点ξ＝ｍπ／ｒでの値は、式(51)で与えられる。

The value at the point ξ = mπ / r where the value of equation (49) becomes 0 in equation (50) is given by equation (51).

微小な値εで除算しているため、該当周波数成分は大きく拡大されることになる。画像データは通常低周波成分を多く含むため、特に、ｍ＝±１のξ＝±π／ｒの成分が大きく発散する。 Since the frequency is divided by the minute value ε, the corresponding frequency component is greatly expanded. Since image data usually contains many low frequency components, the component of ξ = ± π / r where m = ± 1 is particularly divergent.

ｍ＝１のときの式(51)の値をぼけパラメータｒの関数としてグラフを描くと、図３２のようになる。ｒ＞ｒtのときのぼけ除去画像データの該当周波成分が、元画像データに対して過剰に増幅されることが理解できる。 When a graph is drawn using the value of the equation (51) when m = 1 as a function of the blur parameter r, it is as shown in FIG. It can be understood that the corresponding frequency component of the deblurred image data when r> rt is excessively amplified with respect to the original image data.

以上より、式(24)及び式(25)のＰＳＦと式(26)のＰＳＦとでは、逆畳み込みを行った場合の画像データの発散の仕方が異なるが、いずれも真のぼけパラメータｒtより大きなぼけバラメータで逆畳み込みを行った場合に発散度が大きくなる。一般のＰＳＦについても多くの場合に同様の状態となる。よって、上記の図２７のように、まず、各領域Ａjにおいて、ぼけパラメータrで逆畳み込みをした複数のぼけ除去画像データに対して発散度ｖj(r)を算出する。 From the above, the PSFs of the equations (24) and (25) and the PSF of the equation (26) differ in the manner of image data divergence when deconvolution is performed, but both are larger than the true blur parameter rt. The degree of divergence increases when deconvolution is performed with a blur parameter. A general PSF is in a similar state in many cases. Therefore, as shown in FIG. 27, first, in each region Aj, a divergence degree vj (r) is calculated for a plurality of deblurred image data deconvolved with the blur parameter r.

本実施の形態では、逆畳み込み部１０４によって得られたぼけ除去画像データ１５１〜１５ｎを用いるとしているが、擬似逆フィルタ(pseudo-inverse filter)などを用いた単純な逆畳み込み法により別途生成されるとしてもよい。ぼけパラメータ推定のために必要なぼけ除去画像データは画質が良い必要はなく、適正と判断されるぼけ除去画像データがある程度発散していても構わない。 In the present embodiment, the deblurred image data 151 to 15n obtained by the deconvolution unit 104 is used, but is separately generated by a simple deconvolution method using a pseudo-inverse filter or the like. It is good. The blur-removed image data necessary for blur parameter estimation need not have good image quality, and the blur-removed image data determined to be appropriate may diverge to some extent.

次に、各領域において、画像が大きく発散し始める直前のぼけ除去画像データを、最も良い復元結果として選択する。すなわち、ｖj（ｒ）が急激に増加し始める直前のrをこの領域でのぼけパラメータの推定値とする。 Next, in each region, the blur-removed image data immediately before the image starts to diverge greatly is selected as the best restoration result. That is, r immediately before vj (r) starts to increase rapidly is taken as the estimated value of the blur parameter in this region.

図３３に、発散度とぼけパラメータとの関係の一例を示す。急激な増加点の特定は、例えばｖj（０）、すなわち入力画像自体の発散度を基準としたしきい値を考え、このしきい値を越えないｖｊ（ｒ）を与える最大のrを選べばよい。発散度を領域の面積で正規化し、しきい値を設定する方法も有効である。あるいは、例えば、ｖj（ｒ）のｒに関する２階微分の最大点を選択してもよい。 FIG. 33 shows an example of the relationship between the divergence degree and the blur parameter. For example, vj (0), that is, a threshold value based on the divergence of the input image itself is considered, and the maximum r that gives vj (r) that does not exceed this threshold value is selected. Good. A method of setting the threshold value by normalizing the divergence by the area of the region is also effective. Alternatively, for example, the maximum point of the second order derivative with respect to r of vj (r) may be selected.

ｖj（ｒ）にスムージングをかけ、ノイズの影響を抑えてパラメータを推定することで、ノイズの影響を抑えて判定精度を向上させることができる場合がある。また、あるｒ＝ｒ0の近傍でｒ＜ｒ0のときのｖj（ｒ）の値と、ｒ＞ｒ0のときの値それぞれに直線をフィッティングし、その２直線の角度が最大となるようなｒ0を選ぶとしてもよい。 By smoothing vj (r) and estimating the parameters while suppressing the influence of noise, the determination accuracy may be improved while suppressing the influence of noise. Further, in the vicinity of r = r0, a straight line is fitted to the value of vj (r) when r <r0 and the value when r> r0, and r0 is set so that the angle of the two straight lines becomes maximum. You may choose.

以上の処理により、各領域Ａjについてぼけパラメータｒjが得られたため、各領域Ａjについてぼけパラメータｒjをつなぎ合わせれば局所推定ぼけパラメータr(x, y)が得られる。 As a result of the above processing, the blur parameter rj is obtained for each region Aj. Therefore, the local estimated blur parameter r (x, y) can be obtained by connecting the blur parameter rj for each region Aj.

上記の図２７の後処理としては、推定値の穴埋めと領域間のスムージングなどのような付加的な処理が行われる。まず、領域分割したときに、領域が小さすぎるなどで推定対象外とした領域や、領域内の画像値の変化が微小であったため発散度の変化が小さく、信頼できる推定ができなかった領域の推定値を周囲の領域に基づいて埋める。例えば、式(52)のラプラシアン方程式を、推定値のない領域Ａjについて解く。

As post-processing of FIG. 27 described above, additional processing such as filling of estimated values and smoothing between regions is performed. First, when the region was divided, the region that was not estimated due to the region being too small, or the region that could not be reliably estimated because the change in image value within the region was very small and the change in divergence was small. Fill estimates based on surrounding area. For example, the Laplacian equation of Equation (52) is solved for the region Aj where there is no estimated value.

そして、領域の境界でぼけパラメータが大きく異なると最終的な復元画像データに不連続が見える可能性があるため、局所推定ぼけパラメータr(x, y)にスムージングが施される。 Then, if the blur parameter greatly differs at the boundary of the region, there is a possibility that discontinuity appears in the final restored image data, and thus the local estimated blur parameter r (x, y) is smoothed.

また、式(53)の方程式を画像領域Ｉ全体でr(x, y)について解いて、穴埋めとスムージングを同時に行うとしてもよい。

Alternatively, the equation (53) may be solved for r (x, y) in the entire image region I, and hole filling and smoothing may be performed simultaneously.

ただし、（ｘ，ｙ）∈Ａjについて、領域Ａjにぼけパラメータの推定値がある場合にλ（ｘ，ｙ）＝λcとし、ｅ（ｘ，ｙ）＝ｒjとし、推定値がない場合にλ（ｘ，ｙ）＝０とし、ｅ（ｘ，ｙ）＝０とする。定数λcはぼけパラメータr(x, y)をどれだけ各領域での推定値ｅ（ｘ，ｙ）に適合させるかを表す正数で、λcが小さいとラプラシアンによるスムージングの効果が相対的に大きくなる。 However, for (x, y) ∈ Aj, if there is an estimated value of the blur parameter in the region Aj, λ (x, y) = λc, e (x, y) = rj, and λ if there is no estimated value. (X, y) = 0 and e (x, y) = 0. The constant λc is a positive number indicating how much the blur parameter r (x, y) is adapted to the estimated value e (x, y) in each region. When λc is small, the effect of smoothing by Laplacian is relatively large. Become.

画像統合部１０８は、逆畳み込み部１０４によって生成された複数のぼけ除去画像データ１５１〜１５ｎを、局所ぼけパラメータ推定部１０６によって推定されたぼけパラメータ７ａのｒ（ｘ，ｙ）に基づいて統合する。各座標（ｘ，ｙ）について、その周辺はぼけパラメータｒ（ｘ，ｙ）でぼけているわけであるから、段階的に変化させたぼけパラメータの中からｒ（ｘ，ｙ）に最も近いぼけパラメータｒ1及びｒ2を選び（ｒ1≦ｒ（ｘ，ｙ）≦ｒ2）、パラメータｒ1，ｒ2で逆畳み込みをしたぼけ除去画像データｆ^1，ｆ^2を用いて統合画像データｆ^を式(54)のように補間する。

The image integration unit 108 integrates the plurality of blur-removed image data 151 to 15n generated by the deconvolution unit 104 based on r (x, y) of the blur parameter 7a estimated by the local blur parameter estimation unit 106. . Since each coordinate (x, y) is blurred by the blur parameter r (x, y), the blur closest to r (x, y) among the blur parameters changed in stages. The parameters r1 and r2 are selected (r1 ≦ r (x, y) ≦ r2), and the defocused image data f ^ 1 and f ^ 2 deconvolved with the parameters r1 and r2 are used to express the integrated image data f ^ Interpolate as shown in 54).

以上の手法を各入力画像に適用することで、各入力画像についてのぼけ量３０（ぼけパラメータ）を得ることが出来る。 By applying the above method to each input image, a blur amount 30 (blur parameter) for each input image can be obtained.

＜効果＞
本実施形態に係る手法であると、上記第１乃至第３の実施形態で説明した（１）乃至（３）の効果に加えて、下記（４）の効果が得られる。 <Effect>
In the method according to this embodiment, in addition to the effects (1) to (3) described in the first to third embodiments, the following effect (4) can be obtained.

（４）前景の抽出精度を向上出来る（その１）。
本実施形態に係るぼけ推定手法であると、ぼけの推定精度を向上出来る。その結果、前景の抽出精度を向上出来る。本効果につき、ぼけの推定精度に関して、以下説明する。 (4) The foreground extraction accuracy can be improved (part 1).
With the blur estimation method according to the present embodiment, the blur estimation accuracy can be improved. As a result, the foreground extraction accuracy can be improved. Regarding this effect, the blur estimation accuracy will be described below.

本実施形態に係る手法であると、局所的にぼけ推定が行われる。よって、ぼけの大きさが事前に得られている必要がない。よって、ぼけの大きさは入力画像において一様でなくてもよく、入力画像のぼけが比較的大きく変化する場合にも、ぼけの除去又はぼけの状態の変換を行うことができる。 In the method according to the present embodiment, blur estimation is performed locally. Therefore, the size of the blur need not be obtained in advance. Therefore, the size of the blur does not have to be uniform in the input image, and even when the blur of the input image changes relatively greatly, the blur can be removed or the blur state can be converted.

また、本実施形態では、局所的なぼけ推定のための領域分割の形状が制限されない。したがって、画像データの画像内容にそったぼけ推定を行うことができ、推定の局所性と精度を向上させることができる。 In the present embodiment, the shape of region division for local blur estimation is not limited. Therefore, blur estimation along the image content of the image data can be performed, and the locality and accuracy of the estimation can be improved.

更に本実施形態では、入力画像に対してＰＳＦを用いて複数の段階でぼけ除去を行い、複数のぼけ除去画像データ１５１〜１５ｎに基づいて各領域について適切なデータを統合し、復元画像データ１０９ａを作成する。これにより、非一様なぼけの除去は、一様なＰＳＦによる画像データの逆畳み込みを複数行う処理により実行することができ、効率がよく、ノイズを拡大しにくい逆畳み込み法を用いることができる。 Furthermore, in this embodiment, the input image is subjected to blur removal at a plurality of stages using PSF, and appropriate data is integrated for each region based on the plurality of blur removal image data 151 to 15n, and the restored image data 109a. Create Thereby, non-uniform blur removal can be performed by a process of performing multiple deconvolutions of image data by uniform PSF, and an efficient deconvolution method that hardly increases noise can be used. .

本実施の形態において、ＰＳＦはぼけの大きさについてパラメータ化されていればよく、様々な種類のＰＳＦモデルを用いることができる。 In the present embodiment, the PSF only needs to be parameterized with respect to the magnitude of blur, and various types of PSF models can be used.

また、本実施形態において、逆畳み込みの処理は、入力画像を微分した輝度勾配に対して実行され、入力画像の輝度勾配に対する事前分布が適用される。これにより、質の良いぼけ除去画像データを高速に得ることができる。 In the present embodiment, the deconvolution process is performed on the luminance gradient obtained by differentiating the input image, and a prior distribution with respect to the luminance gradient of the input image is applied. Thereby, high-quality blur-removed image data can be obtained at high speed.

なお、本実施形態において、逆畳み込み部１０４は、微分領域逆畳み込み処理部１４４において、入力画像の微分画像データに逆畳み込みを適用し、当該逆畳み込みにおいて微分係数事前分布に応じたぼけの除去を行って第１の逆畳み込みデータを生成する。更に逆畳み込み部１０４は、入力画像に対して単純な擬似逆フィルタを用いるなどして、微分係数事前分布に依存しないぼけの除去を行って、第２の逆畳み込みデータを生成するとしてもよい。この場合、積分部１４５は、第１の逆畳み込みデータを積分して第１のぼけ除去画像データを生成する。第２の逆畳み込みデータはは元から微分されていないので、積分せずにそのまま第２のぼけ除去画像データとなる。そして、局所ぼけパラメータ推定部１０６は、第２のぼけ除去画像データに基づいて、局所推定ぼけパラメータ１０７ａを生成し、画像統合部１０８は、局所推定ぼけパラメータ１０７ａに基づいて、各領域に対して第１のぼけ除去画像データの統合処理を実行し、復元画像データ１０９ａを生成する。 In the present embodiment, the deconvolution unit 104 applies deconvolution to the differential image data of the input image in the differential domain deconvolution processing unit 144, and removes blur according to the differential coefficient prior distribution in the deconvolution. To generate first deconvolution data. Further, the deconvolution unit 104 may generate the second deconvolution data by removing blur that does not depend on the differential coefficient prior distribution by using a simple pseudo inverse filter for the input image. In this case, the integration unit 145 integrates the first deconvolution data to generate first deblurred image data. Since the second deconvolution data is not differentiated from the original, it becomes the second deblurred image data as it is without integration. Then, the local blur parameter estimation unit 106 generates a local estimation blur parameter 107a based on the second blur-removed image data, and the image integration unit 108 applies to each region based on the local estimation blur parameter 107a. The first blur-removed image data is integrated to generate restored image data 109a.

［第５の実施形態］
次に、この発明の第５の実施形態に係る画像処理装置及び画像処理方法について説明する。本実施形態は、上記第４の実施形態において、前景及び背景の位置あわせ（対応点計算）に、ぼけ推定部１１で得られた復元画像データ１０９ａを用いるものである。以下では、上記実施形態と異なる点についてのみ説明する。 [Fifth Embodiment]
Next, an image processing apparatus and an image processing method according to the fifth embodiment of the present invention will be described. In the fourth embodiment, the restored image data 109a obtained by the blur estimation unit 11 is used for foreground and background alignment (corresponding point calculation) in the fourth embodiment. Hereinafter, only differences from the above embodiment will be described.

図３４は、本実施形態に係る前景抽出部１０のブロック図である。図示するように、背景対応点計算部１３及び前景対応点計算部１４は、入力画像の代わりにぼけの無い入力画像３７を用いて位置あわせを行う。ぼけの無い入力画像３７は、ぼけ推定部の画像統合部１０８で得られた復元画像データ１０９ａである。その他の構成及び動作は、第１乃至第４の実施形態と同様であるので、説明は省略する。 FIG. 34 is a block diagram of the foreground extraction unit 10 according to the present embodiment. As illustrated, the background corresponding point calculation unit 13 and the foreground corresponding point calculation unit 14 perform alignment using an input image 37 having no blur instead of the input image. The input image 37 without blur is the restored image data 109a obtained by the image integration unit 108 of the blur estimation unit. Other configurations and operations are the same as those in the first to fourth embodiments, and a description thereof will be omitted.

＜効果＞
本実施形態に係る画像処理装置及び画像処理方法であると、上記第１乃至第４の実施形態で説明した（１）乃至（４）の効果に加えて、下記（５）の効果が得られる。 <Effect>
In the image processing apparatus and the image processing method according to the present embodiment, in addition to the effects (1) to (4) described in the first to fourth embodiments, the following effect (5) can be obtained. .

（５）前景の抽出精度を向上出来る（その２）。
本実施形態に係る構成であると、ぼけの無い入力画像３７を用いて背景及び前景についての位置あわせを行っている。これによれば、位置あわせの精度を向上でき、その結果として、前景の抽出精度を向上出来る。 (5) The foreground extraction accuracy can be improved (part 2).
In the configuration according to the present embodiment, the background and foreground are aligned using the input image 37 without blur. According to this, the accuracy of alignment can be improved, and as a result, the foreground extraction accuracy can be improved.

［第６の実施形態］
次に、この発明の第６の実施形態に係る画像処理装置及び画像処理方法について説明する。本実施形態は、上記第４の実施形態と異なる手法によりぼけ推定を行うものであり、特開２００７−２０１５３３号公報に開示の手法を用いるものである。 [Sixth Embodiment]
Next, an image processing apparatus and an image processing method according to the sixth embodiment of the present invention will be described. In this embodiment, blur estimation is performed by a method different from that of the fourth embodiment, and the method disclosed in Japanese Patent Application Laid-Open No. 2007-201533 is used.

図３５は、本実施形態に係るぼけ推定部１１のブロック図である。図示するようにぼけ推定部１１は、エッジ検出部２００、記憶装置２１０、及び推定部２２０を備えている。 FIG. 35 is a block diagram of the blur estimation unit 11 according to the present embodiment. As shown in the figure, the blur estimation unit 11 includes an edge detection unit 200, a storage device 210, and an estimation unit 220.

エッジ検出部２００は、入力画像Ｉ（ｘ，ｙ）を読み出し、入力画像のエッジ検出計算を実行し、エッジと判定された画素（エッジ点）の座標（ａi，ｂi）と、この座標（ａi，ｂi）におけるエッジの方向又は法線ベクトルｎiを記憶装置２１０に記憶する。ここで、ｉは、エッジ点の番号である。エッジの検出方法は、既存の方法を適用できる。 The edge detection unit 200 reads the input image I (x, y), executes edge detection calculation of the input image, and coordinates (ai, bi) of pixels (edge points) determined to be edges and the coordinates (ai). , Bi) store the edge direction or normal vector ni in the storage device 210. Here, i is an edge point number. An existing method can be applied to the edge detection method.

推定部２２０は、入力画像と、記憶装置２１０に記憶されているエッジ位置及びエッジ方向とを読み出し、各エッジ点について、エッジ点の周辺の入力画像データの値に基づいてぼけを推定し、ぼけ推定結果であるぼけパラメータを出力する。このぼけパラメータが、上記実施形態で説明したぼけ量３０である。 The estimation unit 220 reads the input image and the edge position and edge direction stored in the storage device 210, estimates blur based on the value of input image data around the edge point, and blurs each edge point. Outputs the blur parameter that is the estimation result. This blur parameter is the blur amount 30 described in the above embodiment.

次に、推定部２２０におけるぼけ推定方法について説明する。入力画像のぼけは、前述のぼけ関数ｈ（ｘ，ｙ；ｒ）でモデル化される。ｒがぼけパラメータ、すなわちぼけ量３０である。 Next, the blur estimation method in the estimation unit 220 will be described. The blur of the input image is modeled by the blur function h (x, y; r) described above. r is a blur parameter, that is, a blur amount of 30.

このぼけ関数ｈによって、下記の式(55)及び式(56)に示すようなステップエッジがぼかされた場合を考える。なお、式(55)においてg0はエッジの一方側の輝度であり、(56)式においてg1はエッジの他方側の輝度である。

Consider a case where step edges as shown in the following equations (55) and (56) are blurred by the blur function h. In equation (55), g0 is the luminance on one side of the edge, and in equation (56), g1 is the luminance on the other side of the edge.

以下においては、簡略化のためにステップエッジがｙ軸に沿っている場合について説明するが、他の一般的な場合についても、注目しているエッジ点を原点とし、エッジの方向にｙ軸をとり、エッジと垂直な方向にｘ軸をとることにより、同様の手法を用いることができる。 In the following, the case where the step edge is along the y-axis will be described for the sake of simplicity. However, in other general cases, the edge point of interest is the origin and the y-axis is in the direction of the edge. By taking the x axis in the direction perpendicular to the edge, a similar method can be used.

ステップエッジがぼかされた場合の輝度分布は、下記の(57)式で表される。

The luminance distribution when the step edge is blurred is expressed by the following equation (57).

ここで“*”は二次元の畳み込み(convolution)である。例えばぼけを、ぼけ半径ｒというパラメータをひとつ持つ下記(24)式のガウシアン（gaussian）でモデル化すると、下記(58)式となる。

Here, “*” is a two-dimensional convolution. For example, when blur is modeled by a Gaussian of the following formula (24) having one parameter of blur radius r, the following formula (58) is obtained.

ここでｅｒｆ（ｘ）はエラー関数（error function）を表す。上述のようにｙ軸に沿ったエッジを考えると、fは(58)式のようにｙに非依存な関数になる。 Here, erf (x) represents an error function. Considering an edge along the y-axis as described above, f is a function independent of y as shown in equation (58).

図３６にステップエッジのｙ＝０での断面ｇ（ｘ，０）の例、図３７にぼけ関数のｙ＝０での断面ｈ（ｘ，０；ｒ）の例、図３８にぼけたエッジのｙ＝０での断面ｆ（ｘ，０；ｇ０，ｇ１，ｒ）の例を示す。 FIG. 36 shows an example of a cross-section g (x, 0) at a step edge y = 0, FIG. 37 shows an example of a cross-section h (x, 0; r) at a blur function y = 0, and FIG. 38 shows a blurred edge. An example of a cross section f (x, 0; g0, g1, r) at y = 0 is shown.

ぼけの推定は、(57)式がエッジ点（ａi，ｂi）周辺の入力画像データの輝度分布Ｉ（ｘ＋ａi，ｙ＋ｂi）と近くなるようなｇ０，ｇ１、ｒを求めることによって行う。このぼけ推定では、最小二乗フィッティングを行う。最小二乗フィッティングはエッジの法線ｎi方向について一次元で行えばよい。上記のｙ軸に沿ったエッジの場合、ｙ＝０としてｆ（ｘ，０；ｇ０，ｇ１，ｒ）とＩ（ｘ＋ａi，ｂi）をｘ方向についてフィッティングする。
以上のようにして、推定部は、各エッジ点におけるぼけパラメータｒを算出する。 The blur is estimated by obtaining g0, g1, and r such that equation (57) is close to the luminance distribution I (x + ai, y + bi) of the input image data around the edge point (ai, bi). In this blur estimation, a least square fitting is performed. The least square fitting may be performed in one dimension in the direction of the normal line ni of the edge. In the case of the edge along the y axis, f = 0 (x, 0; g0, g1, r) and I (x + ai, bi) are fitted in the x direction with y = 0.
As described above, the estimation unit calculates the blur parameter r at each edge point.

＜効果＞
本実施形態に係る手法であると、上記第１乃至第３の実施形態で説明した（１）乃至（３）の効果に加えて、下記（６）の効果が得られる。 <Effect>
In the method according to the present embodiment, the following effect (6) can be obtained in addition to the effects (1) to (3) described in the first to third embodiments.

（６）前景の抽出精度を向上出来る（その３）。
本実施形態に係る手法であると、推定部１０５は、入力画像と、ぼけ関数とによってぼかされたステップエッジとをフィッティングし、これに基づいて、ぼけを推定する。これにより、ぼけ推定の問題を、関数とデータとのフィッティングの問題に簡単化することができる。従って、ぼけ推定部１１における負荷を軽減し、また前景の抽出精度を向上出来る。 (6) The foreground extraction accuracy can be improved (part 3).
In the method according to the present embodiment, the estimation unit 105 fits an input image and a step edge blurred by a blur function, and estimates blur based on this. Thereby, the problem of blur estimation can be simplified to the problem of fitting between a function and data. Therefore, it is possible to reduce the load on the blur estimation unit 11 and improve the foreground extraction accuracy.

［第７の実施形態］
次に、この発明の第７の実施形態に係る画像処理装置及び画像処理方法について説明する。本実施形態は、上記第１乃至第６の実施形態において、入力画像として動画を使用するものである。その他の点に関しては、上記第１乃至第６の実施形態と同様であるので、説明は省略する。 [Seventh Embodiment]
Next, an image processing apparatus and an image processing method according to the seventh embodiment of the present invention will be described. The present embodiment uses a moving image as an input image in the first to sixth embodiments. The other points are the same as those in the first to sixth embodiments, and thus the description thereof is omitted.

図３９は、本実施形態に係る画像処理システム１のブロック図である。図示するように、第１の実施形態で説明した図１の構成とは、カメラ２をビデオカメラ４に置き換えた点である。そしてビデオカメラ４は、撮影した動画データを画像処理装置３へ出力する。 FIG. 39 is a block diagram of the image processing system 1 according to the present embodiment. As shown in the figure, the configuration of FIG. 1 described in the first embodiment is that the camera 2 is replaced with a video camera 4. Then, the video camera 4 outputs the captured moving image data to the image processing device 3.

ビデオカメラ４は、第１の実施形態と同様に、対象物体に焦点を合わせ、その背景に関しては焦点をずらして撮影する。ビデオカメラ４が較正されている必要はない。また、背景が前景に対して十分遠いという条件が保たれれば、前景物体が移動しても良いし、また前景に対して背景が移り変わるような場合であっても良い。すなわち、第１の実施形態と異なるのは、第１の実施形態が同時刻における複数視点からの複数の画像データを得るのに対し、本実施形態は複数時刻における同一視点からの複数の画像データを得る、という点である。 As in the first embodiment, the video camera 4 focuses on the target object and shoots with the background shifted. The video camera 4 need not be calibrated. Further, if the condition that the background is sufficiently far from the foreground is maintained, the foreground object may move or the background may change with respect to the foreground. That is, the first embodiment differs from the first embodiment in that the first embodiment obtains a plurality of image data from a plurality of viewpoints at the same time, whereas the present embodiment has a plurality of image data from the same viewpoint at a plurality of times. Is to get.

動画データは、時間軸に沿った複数の静止画（これをフレームと呼ぶ）の集合である。この様子を、図４０に示す。図４０は動画の模式図であり、時間と共に複数のフレームが得られる様子を示している。 The moving image data is a set of a plurality of still images (referred to as frames) along the time axis. This is shown in FIG. FIG. 40 is a schematic diagram of a moving image, and shows how a plurality of frames are obtained with time.

画像処理装置３は、ビデオカメラ４から動画データを受け取ると、例えば内部に備えた動画受信部（図示せず）が、いずれのフレームを入力画像として使用するかを決定する。例えば、動画データの最初の２枚のフレームを入力画像として使用する場合には、図４０に示すように、それぞれのフレームを入力画像Ｆ１(x)、Ｆ２(x)として、前景抽出部１０へ出力する。 When the image processing apparatus 3 receives the moving image data from the video camera 4, for example, an internal moving image receiving unit (not shown) determines which frame is used as the input image. For example, when the first two frames of moving image data are used as input images, as shown in FIG. 40, the respective frames are input images F1 (x) and F2 (x) to the foreground extraction unit 10. Output.

なお、入力画像として選択出来るフレームは、前述の通り、次の条件を満たす必要がある。このことは、第１乃至第６の実施形態と同様である。
（Ａ）前景物体が実質的に同じ形を維持したまま、背景に対して動いた画像列になっていること。
（Ｂ）前景物体に焦点があっており、背景がぼけていること。
条件（Ａ）は、シーンが静止していれば、ビデオカメラ４をシーンの奥行き方向に対してほぼ垂直な平面内で移動しながら撮影すれば満たされる。このことは、第１の実施形態においてカメラ２を１台として、そのカメラ２が移動しつつ撮影を行うことと等価である。前景物体が動く場合は、ビデオカメラ４が固定であっても条件（Ａ）は満たされる。但し、前景は実質的に同じ形状を維持していなければならないので、例えば前景物体が１８０度後ろを向いてしまうような動きをする場合には、動画データを、実質的に同じ形状を維持しているとみなせる程度に分割して扱う必要がある。 Note that the frame that can be selected as the input image needs to satisfy the following conditions as described above. This is the same as in the first to sixth embodiments.
(A) The foreground object is an image sequence that moves relative to the background while maintaining substantially the same shape.
(B) The foreground object is in focus and the background is blurred.
Condition (A) is satisfied if the scene is stationary, and the video camera 4 is photographed while moving in a plane substantially perpendicular to the depth direction of the scene. This is equivalent to taking the camera 2 as a single camera 2 in the first embodiment while moving. When the foreground object moves, the condition (A) is satisfied even if the video camera 4 is fixed. However, since the foreground must maintain substantially the same shape, for example, when the foreground object moves 180 degrees backward, the moving image data is maintained in substantially the same shape. It is necessary to divide it up to such an extent that it can be regarded as being.

また、本実施形態では、複数の画像は複数の時刻と対応する。従って、画像iの時刻をtiとすると、第２の実施形態で説明した式(22)に示される前景情報は、下記の式(59)で表される時刻に観測される前景に相当する。

In the present embodiment, a plurality of images correspond to a plurality of times. Therefore, if the time of the image i is ti, the foreground information represented by the equation (22) described in the second embodiment corresponds to the foreground observed at the time represented by the following equation (59).

上記のように、カメラ２の代わりにビデオカメラ４を使用し、動画に含まれる複数のフレームを入力画像として使用しても良い。また、動画から前景を抽出する場合において、ユーザによって前景情報を与える第３の実施形態を適用した場合であっても、第３の実施形態で説明したように、ユーザが与えるべき情報は距離ｄ１のみである。従って、動画を扱う場合であっても、簡便な手法により、高精度に前景を抽出出来る。 As described above, the video camera 4 may be used instead of the camera 2 and a plurality of frames included in the moving image may be used as the input image. Further, in the case where the foreground is extracted from the moving image, even if the third embodiment in which the foreground information is given by the user is applied, as described in the third embodiment, the information to be given by the user is the distance d1. Only. Therefore, even when handling moving images, the foreground can be extracted with high accuracy by a simple method.

［第８の実施形態］
次に、この発明の第８の実施形態に係る画像処理装置及び画像処理方法について説明する。本実施形態は、上記第１乃至第７の実施形態で説明した画像処理方法を、複数のプロセッサを備えたプロセッサシステムによって実現するものである。 [Eighth Embodiment]
Next explained is an image processing device and image processing method according to the eighth embodiment of the invention. In the present embodiment, the image processing method described in the first to seventh embodiments is realized by a processor system including a plurality of processors.

図４１は、本実施形態に係るプロセッサシステムのブロック図である。図示するようにプロセッサシステム３００は、第１プロセッサ３１０、複数の第２プロセッサ３２０、第１デコーダ３３０、第２デコーダ３４０、ＤＭＡ（direct memory access）コントローラ３５０、メモリ３６０、メインメモリ３７０、インターフェース３８０、及びメモリコントローラ３９０を備えている。 FIG. 41 is a block diagram of a processor system according to the present embodiment. As illustrated, the processor system 300 includes a first processor 310, a plurality of second processors 320, a first decoder 330, a second decoder 340, a DMA (direct memory access) controller 350, a memory 360, a main memory 370, an interface 380, And a memory controller 390.

第１プロセッサ３１０は、当該システム３００における主たる制御を司る。第２プロセッサ３２０は、第１プロセッサ３２０の制御に従って演算を行う。第２プロセッサ３２０の各々は、制御部３２１、演算部３２２、及びメモリ３２３を備えている。 The first processor 310 manages main control in the system 300. The second processor 320 performs calculations according to the control of the first processor 320. Each of the second processors 320 includes a control unit 321, a calculation unit 322, and a memory 323.

第１デコーダ３３０は、外部から入力されたＭＰＥＧ（Moving Picture Experts Group）形式の動画データをデコードする。第２デコーダ３４０は、外部から入力されたＨ．２６４形式の動画データをデコードする。ＤＭＡコントローラ３５０は、システム３００内に含まれるメモリ間のデータ転送を司る。 The first decoder 330 decodes moving picture expert group (MPEG) moving image data input from the outside. The second decoder 340 receives the H.264 input from the outside. H.264 format moving picture data is decoded. The DMA controller 350 manages data transfer between memories included in the system 300.

メモリ３６０は、フラッシュメモリ等の不揮発性半導体メモリであり、制御プログラム３６１、ぼけ推定プログラム３６２、トライマップ作成プログラム３６３、α値推定プログラム３６４、位置あわせプログラム３６５、色推定プログラム３６６、合成画像作成プログラム３６７、及び新背景４０を保持している。プログラム３６１〜３６７については後述する。 The memory 360 is a non-volatile semiconductor memory such as a flash memory, and includes a control program 361, a blur estimation program 362, a trimap creation program 363, an α value estimation program 364, a registration program 365, a color estimation program 366, and a composite image creation program. 367 and the new background 40 are held. The programs 361 to 367 will be described later.

メインメモリ３７０は、例えばＤＲＡＭ（Dynamic Random Access Memory）等の半導体メモリであり、第１プロセッサ３１０の作業領域となるメモリである。インターフェース３８０は、システム３００と外部との接続を司る。従って、例えばユーザによる命令等は、インターフェース３８０を介してシステム３００に与えられる。メモリコントローラ３９０は、システム３００と外部とのデータの授受を司る。従って、例えばカメラ２やビデオカメラ４から与えられるデータは、メモリコントローラ３９０の制御に従って、メモリ３６０に書き込まれる。 The main memory 370 is a semiconductor memory such as a DRAM (Dynamic Random Access Memory), for example, and serves as a work area for the first processor 310. The interface 380 manages connection between the system 300 and the outside. Thus, for example, user instructions or the like are provided to the system 300 via the interface 380. The memory controller 390 manages data exchange between the system 300 and the outside. Therefore, for example, data given from the camera 2 or the video camera 4 is written in the memory 360 under the control of the memory controller 390.

次に、上記システム３００による前景抽出・画像合成方法について、図４２のフローチャートを用いて説明する。なお、基本的な前景抽出方法及び画像合成方法は、上記第１乃至第７の実施形態と同様である。以下では、上記第１乃至第７の実施形態で説明した各処理ステップが、システム３００内において如何にして実行されるのかについて説明する。 Next, the foreground extraction / image composition method by the system 300 will be described with reference to the flowchart of FIG. The basic foreground extraction method and image composition method are the same as those in the first to seventh embodiments. Hereinafter, how each processing step described in the first to seventh embodiments is executed in the system 300 will be described.

まず、メモリコントローラ３９０が入力画像を外部から受け取って、入力画像をメモリ３６０に格納する（ステップＳ３０）。第１乃至第６の実施形態であれば、入力画像はカメラ２から与えられる静止画であり、第７の実施形態であれば入力画像はビデオカメラ４から与えられる動画である。また入力画像は、すでにＭＰＥＧ形式やＨ．２６４形式にエンコードされたものであっても良い。この場合には、第１デコーダ３３０または第２デコーダ３４０が入力画像をデコードする。 First, the memory controller 390 receives an input image from the outside, and stores the input image in the memory 360 (step S30). In the first to sixth embodiments, the input image is a still image provided from the camera 2, and in the seventh embodiment, the input image is a moving image provided from the video camera 4. Input images are already in MPEG format or H.264 format. It may be encoded in the H.264 format. In this case, the first decoder 330 or the second decoder 340 decodes the input image.

次に第１プロセッサ３１０が、メモリ３６０から入力画像と制御プログラム３６１とをメインメモリ３７０に読み出し、制御プログラム３６１を実行する（ステップＳ３１）。制御プログラム３６１は、前景抽出及び画像合成を行うための主たる処理の流れに関するプログラムである。この制御プログラム３６１が実行されることにより、前景抽出及び画像合成の処理が開始される。 Next, the first processor 310 reads the input image and the control program 361 from the memory 360 to the main memory 370, and executes the control program 361 (step S31). The control program 361 is a program relating to the main processing flow for performing foreground extraction and image synthesis. By executing the control program 361, foreground extraction and image composition processing is started.

制御プログラム３６１に従って第１プロセッサ３１０は、まずぼけ推定処理を開始する（ステップＳ３２）。ぼけ推定処理にあたって第１プロセッサ３１０は、例えば図４で説明した窓ＷＮＤを設定する。引き続き第１プロセッサ３１０は複数の第２プロセッサ３２０に対して、各窓ＷＮＤについてのぼけ推定を命令する（ステップＳ３３）。 According to the control program 361, the first processor 310 first starts blur estimation processing (step S32). In the blur estimation process, the first processor 310 sets, for example, the window WND described with reference to FIG. Subsequently, the first processor 310 instructs the plurality of second processors 320 to estimate blur for each window WND (step S33).

すると、各々の第２プロセッサ３２０の制御部３２１は、メモリ３６０からぼけ推定プログラム３６２をメモリ３２３に読み出す。そして制御部３２１はぼけ推定プログラム３６２を実行して、演算部３２２に対して演算を行わせる。この結果、第２プロセッサ３２０は、自身が担当する窓ＷＮＤに関するぼけ量３０を得る（ステップＳ３４）。本処理は、第１の実施形態で説明したステップＳ１１の処理であり、その具体的な方法は、第４、第６の実施形態で説明した内容である。ぼけ推定プログラム３６１は、上記方法を実行するためのプログラムである。 Then, the control unit 321 of each second processor 320 reads the blur estimation program 362 from the memory 360 into the memory 323. Then, the control unit 321 executes the blur estimation program 362 to cause the calculation unit 322 to perform calculation. As a result, the second processor 320 obtains the blur amount 30 regarding the window WND that it is in charge of (step S34). This process is the process of step S11 described in the first embodiment, and the specific method thereof is the contents described in the fourth and sixth embodiments. The blur estimation program 361 is a program for executing the above method.

次に第１プロセッサ３１０は、トライマップ作成プログラム３６３をメインメモリ３７０に読み出す。そして、ステップＳ３４において各第２プロセッサ３２０で得られたぼけ量３０に基づいて、トライマップ作成プログラム３６３を実行することで、トライマップ３１を得る（ステップＳ３５）。トライマップ作成プログラム３６３は、第１の実施形態で説明したステップＳ１２を実行するためのプログラムである。勿論、第１プロセッサ３１０だけでなく、第２プロセッサ３２０がその一部の処理を担っても良い。 Next, the first processor 310 reads the trimap creation program 363 into the main memory 370. Then, the trimap 31 is obtained by executing the trimap creation program 363 based on the blur amount 30 obtained by each second processor 320 in step S34 (step S35). The trimap creation program 363 is a program for executing step S12 described in the first embodiment. Of course, not only the first processor 310 but also the second processor 320 may take part of the processing.

引き続き第１プロセッサ３１０は、α値推定プログラム３６４をメインメモリ３７０に読み出す。そして、ステップＳ３５において得られたトライマップ３１を用いてα値推定プログラム３６４を実行することで、α値の初期値を得る（ステップＳ３６）。α値推定プログラム３６４は、第１の実施形態で説明したステップＳ１３、Ｓ１７、Ｓ１８を実行するためのプログラムである。勿論、第１プロセッサ３１０だけでなく、第２プロセッサ３２０がその一部の処理を担っても良い。 Subsequently, the first processor 310 reads the α value estimation program 364 into the main memory 370. Then, by executing the α value estimation program 364 using the trimap 31 obtained in step S35, an initial value of α value is obtained (step S36). The α value estimation program 364 is a program for executing steps S13, S17, and S18 described in the first embodiment. Of course, not only the first processor 310 but also the second processor 320 may take part of the processing.

次に制御プログラム３６１に従って第１プロセッサ３１０は、対応点計算とワープ関数の算出処理を開始する（ステップＳ３７）。本処理にあたって第１プロセッサ３１０は、例えば図９で説明したように、対応を取るべき画素ｘを、複数の第２プロセッサ３２０に割り当てる（ステップＳ３８）。 Next, according to the control program 361, the first processor 310 starts corresponding point calculation and warp function calculation processing (step S37). In this process, the first processor 310 allocates the pixel x to be dealt with to the plurality of second processors 320 as described with reference to FIG. 9, for example (step S38).

すると、各々の第２プロセッサ３２０の制御部３２１は、メモリ３６０から位置あわせプログラム３６５をメモリ３２３に読み出す。そして制御部３２１は位置あわせプログラム３６５を実行して、演算部３２２に対して演算を行わせる。この結果、第２プロセッサ３２０は、自身が担当する画素ｘに関するワープ関数３２、３３を得る（ステップＳ３９）。本処理は、第１の実施形態で説明したステップＳ１４、Ｓ１５の処理であり、位置あわせプログラム３６５は、本処理を実行するためのプログラムである。例えば図９において、ある第２プロセッサ３２０が画素x0を担当するとすると、第２プロセッサ３２０は入力画像Ｉ１(x)における画素x0が入力画像Ｉ２(x)におけるいずれの画素に対応するのかを計算し、ワープ関数Ｗ１２(x0)を得る。 Then, the control unit 321 of each second processor 320 reads the alignment program 365 from the memory 360 to the memory 323. Then, the control unit 321 executes the alignment program 365 and causes the calculation unit 322 to perform calculation. As a result, the second processor 320 obtains the warp functions 32 and 33 relating to the pixel x that it is in charge of (step S39). This process is the process of steps S14 and S15 described in the first embodiment, and the alignment program 365 is a program for executing this process. For example, in FIG. 9, if a certain second processor 320 takes charge of the pixel x0, the second processor 320 calculates which pixel in the input image I2 (x) corresponds to the pixel x0 in the input image I1 (x). , A warp function W12 (x0) is obtained.

引き続き第１プロセッサ３１０は制御プログラム３６１に従って、前背景色の推定処理を開始する（ステップＳ４０）。本処理にあたって第１プロセッサ３１０は、前景色及び背景色を算出すべき画素の各々を、第２プロセッサ３２０に対して割り当てる（ステップＳ４１）。 Subsequently, the first processor 310 starts the foreground / background color estimation process according to the control program 361 (step S40). In this processing, the first processor 310 assigns each pixel for which the foreground color and the background color are to be calculated to the second processor 320 (step S41).

すると、各々の第２プロセッサ３２０の制御部３２１は、メモリ３６０から色推定プログラム３６６をメモリ３２３に読み出す。そして制御部３２１は色推定プログラム３６６を実行して、演算部３２２に対して演算を行わせる。この結果、第２プロセッサ３２０は、自身が担当する画素ｘに関する前景色３５（Ｆi）及び背景色３４（Ｂi）を得る（ステップＳ４２）。本処理は、第１の実施形態で説明したステップＳ１６の処理であり、色推定プログラム３６６は、本処理を実行するためのプログラムである。 Then, the control unit 321 of each second processor 320 reads the color estimation program 366 from the memory 360 into the memory 323. Then, the control unit 321 executes the color estimation program 366 and causes the calculation unit 322 to perform calculation. As a result, the second processor 320 obtains the foreground color 35 (Fi) and the background color 34 (Bi) related to the pixel x that it is in charge of (step S42). This process is the process of step S16 described in the first embodiment, and the color estimation program 366 is a program for executing this process.

次に第１プロセッサ３１０は、α値推定プログラム３６４をメインメモリ３７０に読み出す。そして、ステップＳ４２において得られた前景色３５と背景色３４とを用いてα値推定プログラム３６４を実行することで、α値を更新すると共に、α値の収束判断を行う（ステップＳ４３）。本処理は、上記第１の実施形態で説明したステップＳ１７〜Ｓ１９の処理に対応する。勿論、第１プロセッサ３１０だけでなく、第２プロセッサ３２０が本処理の一部を担っても良い。 Next, the first processor 310 reads the α value estimation program 364 into the main memory 370. Then, the α value estimation program 364 is executed using the foreground color 35 and the background color 34 obtained in step S42, thereby updating the α value and determining whether the α value has converged (step S43). This process corresponds to the processes in steps S17 to S19 described in the first embodiment. Of course, not only the first processor 310 but also the second processor 320 may perform a part of this processing.

第１プロセッサ３１０は、α値が収束したと判断すると、メモリ３６０から合成画像作成プログラム３６７及び新背景４０をメインメモリ３７０に読み出す。そして、ステップＳ４２において得られた前景色３５と、ステップＳ４３で得られたα値を用いて、入力画像の前景と新背景４０とを合成する（ステップＳ４４）。本処理は、上記第１の実施形態で説明したステップＳ２０の処理に対応する。勿論、第１プロセッサ３１０だけでなく、第２プロセッサ３２０が本処理の一部を担っても良い。 When determining that the α value has converged, the first processor 310 reads the composite image creation program 367 and the new background 40 from the memory 360 into the main memory 370. Then, using the foreground color 35 obtained in step S42 and the α value obtained in step S43, the foreground of the input image and the new background 40 are synthesized (step S44). This process corresponds to the process of step S20 described in the first embodiment. Of course, not only the first processor 310 but also the second processor 320 may perform a part of this processing.

以上のように、複数のプロセッサ３１０を有するプロセッサシステム３００によって、上記第１乃至第６の実施形態で説明した画像処理方法を実現しても良い。この場合、第１プロセッサ３１０及び第２プロセッサ３２０が、図２、図１９及び図３４で説明した前景抽出部１０及び画像合成部２０として機能する。 As described above, the image processing method described in the first to sixth embodiments may be realized by the processor system 300 including the plurality of processors 310. In this case, the first processor 310 and the second processor 320 function as the foreground extraction unit 10 and the image composition unit 20 described with reference to FIGS.

以上のように、この発明の第１乃至第８の実施形態に係る画像処理装置及び画像処理方法であると、前景に焦点を合わせ、且つ背景をぼかして撮影された複数の画像を用いて、各画像におけるぼけ具合を推定している。そして、このぼけ具合に応じてα値を推定している。従って、特殊な撮影環境を必要とせずに、高精度な前景抽出が可能となる。 As described above, the image processing apparatus and the image processing method according to the first to eighth embodiments of the present invention use a plurality of images that are photographed with a focus on the foreground and a blurred background. The degree of blur in each image is estimated. The α value is estimated according to the degree of blur. Therefore, highly accurate foreground extraction is possible without requiring a special shooting environment.

また、複数の入力画像の対応点計算を、背景となる領域同士、及び前景となる領域同士で行っている。そして、不明な領域については、背景となる領域及び前景となる領域の値から推測する。つまり、対応関係が確実に取れる領域につきマッチング（matching）を行い、その後、対応関係が不明な領域についての見積もりを行う。そして、その結果により、一方の入力画像の不明な領域における画素が、他方の入力画像のいずれの画素に対応しているかを把握出来る。更に、その結果に基づいてα値を推定している。 In addition, corresponding point calculation of a plurality of input images is performed between the areas that are the background and the areas that are the foreground. And about an unknown area | region, it estimates from the value of the area | region used as a background, and the area | region used as a foreground. That is, matching is performed for areas where the correspondence can be reliably obtained, and thereafter, estimation is performed for areas where the correspondence is unknown. As a result, it is possible to grasp which pixel in the unknown area of one input image corresponds to which pixel of the other input image. Further, the α value is estimated based on the result.

従って、従来のステレオマッチング等に比べて、高精度に対応関係を取ることができ、その結果として、前景抽出精度が向上する。 Therefore, the correspondence can be taken with higher accuracy than conventional stereo matching and the like, and as a result, the foreground extraction accuracy is improved.

なお、上記第１の実施形態では、複数のカメラ２を備えた画像処理システム１を例に説明した。しかし、カメラ２は１台のみであっても良い。すなわち、対象物体が静止している場合には、撮影者が移動することによって、複数視点からの画像が得られる。また、対象物体が移動している場合には、撮影者が移動することなく、複数視点からの画像が得られる。 In the first embodiment, the image processing system 1 including a plurality of cameras 2 has been described as an example. However, only one camera 2 may be provided. That is, when the target object is stationary, images from a plurality of viewpoints can be obtained by moving the photographer. When the target object is moving, images from a plurality of viewpoints can be obtained without the photographer moving.

また、複数の入力画像間において、前景はその少なくとも一部が同じであれば良い。しかし、対応点計算の精度を上げるためには、前景は入力画像間でほぼ同じであることが好ましいが、例え多くの部分で一致しなかったとしても、上記実施形態を適用することは可能である。 Further, it is only necessary that at least a part of the foreground is the same among a plurality of input images. However, in order to increase the accuracy of the corresponding point calculation, it is preferable that the foreground is substantially the same between the input images. However, even if the foreground does not match in many parts, the above embodiment can be applied. is there.

また、システム１自体がカメラ２またはビデオカメラ４を有している必要は無い。つまり、例えばネットワーク等を介して、入力画像となる画像データが画像処理装置３に与えられる場合であっても良い。 Further, the system 1 itself does not have to have the camera 2 or the video camera 4. In other words, for example, image data serving as an input image may be given to the image processing apparatus 3 via a network or the like.

なお、上記第１の実施形態において、入力画像によっては、ステップＳ１３で得られたマスク画像を最終的なマスク画像として用いても良い。この場合には、ステップＳ１４〜Ｓ２０の処理は省略される。また、ステップＳ１４〜Ｓ２０を省略するシステムであれば、前景抽出部１０はぼけ推定部１１、領域分割部１２、及びα値推定部１６のみさえ備えていれば十分である。 In the first embodiment, depending on the input image, the mask image obtained in step S13 may be used as the final mask image. In this case, steps S14 to S20 are omitted. Further, if the system omits steps S14 to S20, it is sufficient that the foreground extraction unit 10 includes only the blur estimation unit 11, the region division unit 12, and the α value estimation unit 16.

なお、本願発明は上記実施形態に限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で種々に変形することが可能である。更に、上記実施形態には種々の段階の発明が含まれており、開示される複数の構成要件における適宜な組み合わせにより種々の発明が抽出されうる。例えば、実施形態に示される全構成要件からいくつかの構成要件が削除されても、発明が解決しようとする課題の欄で述べた課題が解決でき、発明の効果の欄で述べられている効果が得られる場合には、この構成要件が削除された構成が発明として抽出されうる。 Note that the present invention is not limited to the above-described embodiment, and various modifications can be made without departing from the scope of the invention in the implementation stage. Furthermore, the above embodiments include inventions at various stages, and various inventions can be extracted by appropriately combining a plurality of disclosed constituent elements. For example, even if some constituent requirements are deleted from all the constituent requirements shown in the embodiment, the problem described in the column of the problem to be solved by the invention can be solved, and the effect described in the column of the effect of the invention Can be extracted as an invention.

この発明の第１の実施形態に係る画像処理システムのブロック図。1 is a block diagram of an image processing system according to a first embodiment of the present invention. この発明の第１の実施形態に係る画像処理装置のブロック図。1 is a block diagram of an image processing apparatus according to a first embodiment of the present invention. この発明の第１の実施形態に係る画像処理方法のフローチャート。1 is a flowchart of an image processing method according to the first embodiment of the present invention. 入力画像の模式図であり、この発明の第１の実施形態に係る画像処理方法におけるぼけ推定の様子を示す図。It is a schematic diagram of an input image, and is a diagram showing a state of blur estimation in the image processing method according to the first embodiment of the present invention. 入力画像の模式図であり、この発明の第１の実施形態に係る画像処理方法におけるぼけ推定の様子を示す図。It is a schematic diagram of an input image, and is a diagram showing a state of blur estimation in the image processing method according to the first embodiment of the present invention. 入力画像の模式図。The schematic diagram of an input image. この発明の第１の実施形態に係る画像処理方法により得られるトライマップの模式図。The schematic diagram of the trimap obtained by the image processing method which concerns on 1st Embodiment of this invention. この発明の第１の実施形態に係る画像処理方法により得られるマスク画像の模式図。The schematic diagram of the mask image obtained by the image processing method concerning a 1st embodiment of this invention. 入力画像の模式図であり、この発明の第１の実施形態に係る画像処理方法における位置あわせの様子を示す図。It is a schematic diagram of an input image, and is a diagram showing a state of alignment in the image processing method according to the first embodiment of the present invention. この発明の第１の実施形態に係る画像処理方法で使用する重み付け関数のグラフ。The graph of the weighting function used with the image processing method which concerns on 1st Embodiment of this invention. この発明の第１の実施形態に係る画像処理方法で使用する重み付け関数のグラフ。The graph of the weighting function used with the image processing method which concerns on 1st Embodiment of this invention. この発明の第１の実施形態に係る画像処理方法で使用する重み付け関数のグラフ。The graph of the weighting function used with the image processing method which concerns on 1st Embodiment of this invention. この発明の第１の実施形態に係る画像処理方法により得られるマスク画像の模式図であり、色推定の様子を示す図。It is a schematic diagram of a mask image obtained by the image processing method according to the first embodiment of the present invention, and shows a state of color estimation. この発明の第１の実施形態に係る画像処理方法により得られるマスク画像の模式図。The schematic diagram of the mask image obtained by the image processing method concerning a 1st embodiment of this invention. この発明の第１の実施形態に係る画像処理方法における画像合成の様子を示す模式図。FIG. 3 is a schematic diagram showing a state of image composition in the image processing method according to the first embodiment of the present invention. 図面に代わる写真であって、この発明の第１の実施形態に係る画像処理方法に使用する入力画像であり（ａ）図は左側、（ｂ）図は右側から撮影した写真。It is a photograph replacing the drawing, and is an input image used in the image processing method according to the first embodiment of the present invention, (a) a photograph taken from the left side, and (b) a photograph taken from the right side. 図面に代わる写真であって、（ａ）図はこの発明の第１の実施形態に係る画像処理方法で得られたマスク画像であり、（ｂ）図は前景色画像。FIG. 4A is a photograph replacing a drawing, and FIG. 5A is a mask image obtained by the image processing method according to the first embodiment of the present invention, and FIG. 図面に代わる写真であって、この発明の第１の実施形態に係る画像処理方法で得られた合成画像。It is a photograph replaced with drawings, Comprising: The composite image obtained by the image processing method which concerns on 1st Embodiment of this invention. この発明の第２の実施形態に係る画像処理装置のブロック図。The block diagram of the image processing apparatus which concerns on 2nd Embodiment of this invention. この発明の第２の実施形態に係る画像処理方法のフローチャート。The flowchart of the image processing method which concerns on 2nd Embodiment of this invention. この発明の第２の実施形態に係る画像処理方法における、前景補間の様子を示す模式図。The schematic diagram which shows the mode of foreground interpolation in the image processing method which concerns on 2nd Embodiment of this invention. この発明の第３の実施形態に係る画像処理方法により得られるトライマップの模式図。The schematic diagram of the trimap obtained by the image processing method concerning the 3rd Embodiment of this invention. この発明の第４の実施形態に係るぼけ推定部のブロック図。The block diagram of the blur estimation part which concerns on 4th Embodiment of this invention. 自然画像データの微分係数の分布の一例と、比較対象のためのガウシアン分布とを示すグラフ。The graph which shows an example of distribution of the differential coefficient of natural image data, and the Gaussian distribution for a comparison object. この発明の第４の実施形態に係る逆畳み込み部の処理の一例を示すブロック図。The block diagram which shows an example of the process of the deconvolution part which concerns on 4th Embodiment of this invention. この発明の第４の実施形態に係る逆畳み込み部により実行される逆畳み込み処理の一例を示すフローチャート。The flowchart which shows an example of the deconvolution process performed by the deconvolution part which concerns on 4th Embodiment of this invention. この発明の第４の実施形態に係る局所ボケパラメータ推定部の処理の一例を示すフローチャート。The flowchart which shows an example of the process of the local blur parameter estimation part which concerns on 4th Embodiment of this invention. この発明の第４の実施形態における領域分割の結果の一例を示す図。The figure which shows an example of the result of the area | region division in 4th Embodiment of this invention. この発明の第４の実施形態に係る画像処理方法で使用するフィルタ応答値を示すグラフ。The graph which shows the filter response value used with the image processing method which concerns on 4th Embodiment of this invention. この発明の第４の実施形態に係る画像処理方法で使用するフィルタ応答値を示すグラフ。The graph which shows the filter response value used with the image processing method which concerns on 4th Embodiment of this invention. この発明の第４の実施形態に係る画像処理方法で使用するフィルタ応答値を示すグラフ。The graph which shows the filter response value used with the image processing method which concerns on 4th Embodiment of this invention. この発明の第４の実施形態に係る画像処理方法で得られる復元画像周波数成分を示すグラフ。The graph which shows the decompression | restoration image frequency component obtained with the image processing method which concerns on 4th Embodiment of this invention. この発明の第４の実施形態に係る画像処理方法で得られる発散度を示すグラフ。The graph which shows the divergence degree obtained with the image processing method which concerns on 4th Embodiment of this invention. この発明の第５の実施形態に係る前景抽出部のブロック図。The block diagram of the foreground extraction part which concerns on 5th Embodiment of this invention. この発明の第６の実施形態に係るぼけ推定部のブロック図。The block diagram of the blur estimation part which concerns on 6th Embodiment of this invention. ステップエッジのｙ＝０での断面ｇ（ｘ，０）の例を示すグラフ。The graph which shows the example of the cross section g (x, 0) in y = 0 of a step edge. この発明の第６の実施形態におけるボケ関数のｙ＝０での断面ｈ（ｘ，０；ｒ）の例を示すグラフ。The graph which shows the example of the cross section h (x, 0; r) in y = 0 of the blurring function in 6th Embodiment of this invention. ボケたエッジのｙ＝０での断面ｆ（ｘ，０；ｇ０，ｇ１，ｒ）の例を示すグラフ。The graph which shows the example of the cross section f (x, 0; g0, g1, r) in y = 0 of the blurred edge. この発明の第７の実施形態に係る画像処理システムのブロック図。The block diagram of the image processing system which concerns on 7th Embodiment of this invention. 動画の構造を示す概念図。The conceptual diagram which shows the structure of a moving image. この発明の第８の実施形態に係るプロセッサシステムのブロック図。The block diagram of the processor system which concerns on 8th Embodiment of this invention. この発明の第８の実施形態に係るプロセッサシステムによる、画像処理方法のフローチャート。The flowchart of the image processing method by the processor system concerning 8th Embodiment of this invention.

Explanation of symbols

１…画像処理システム、２…カメラ、３…画像処理装置、４…ビデオカメラ、１０…前景抽出部、１１…ぼけ推定部、１２…領域分割部、１３…背景対応点計算部、１４…前景対応点計算部、１５…前背景色推定部、１６…α値推定部、２０…画像合成部、２１、３６０…メモリ、２２…合成部、２３…前景補間部、３０…ぼけ量、３１…トライマップ、３２…背景ワープ関数、３３…前景ワープ関数、３４…背景色、３５…前景色、３６…マスク画像、４０…新背景、４１…合成画像、４２…前景色補間画像、４３…α値補間画像、１０３…ＰＳＦモデル記憶装置、１０４…逆畳み込み部、１０５…ボケ除去画像記憶装置、１０６…局所ボケパラメータ推定部、１０７…局所推定ボケパラメータ記憶装置、１０８…画像統合部、１０９…復元画像記憶装置、１４１…パラメータ変化部、１４２…逆畳み込み処理部、１４３…微分部、１４４…微分領域逆畳み込み部、１４５…積分部、１６１…領域分割部、１６２…局所ぼけ推定部、２００…エッジ検出部、２１０…記憶装置、２２０…推定部、３００…プロセッサシステム、３１０…第１プロセッサ、３２０…第２プロセッサ、３３０…第１デコーダ、３４０…第２デコーダ、３５０…ＤＭＡコントローラ、３６１…制御プログラム、３６２…ぼけ推定プログラム、３６３…トライマップ作成プログラム、３６４…α値推定プログラム、３６５…位置あわせプログラム、３６６…色推定プログラム、３６７…合成画像作成プログラム、３７０…メインメモリ、３８０…インターフェース、３９０…メモリコントローラ DESCRIPTION OF SYMBOLS 1 ... Image processing system, 2 ... Camera, 3 ... Image processing apparatus, 4 ... Video camera, 10 ... Foreground extraction part, 11 ... Blur estimation part, 12 ... Area division part, 13 ... Background corresponding point calculation part, 14 ... Foreground Corresponding point calculation unit, 15 ... foreground / background color estimation unit, 16 ... α value estimation unit, 20 ... image synthesis unit, 21, 360 ... memory, 22 ... synthesis unit, 23 ... foreground interpolation unit, 30 ... blur amount, 31 ... Tri-map, 32 ... Background warp function, 33 ... Foreground warp function, 34 ... Background color, 35 ... Foreground color, 36 ... Mask image, 40 ... New background, 41 ... Composite image, 42 ... Foreground color interpolation image, 43 ... α Value interpolation image, 103 ... PSF model storage device, 104 ... Deconvolution unit, 105 ... Deblurred image storage device, 106 ... Local blur parameter estimation unit, 107 ... Local estimation blur parameter storage device, 108 ... Image integration unit, 109 ... Restore Image storage device 141 ... parameter changing unit 142 ... deconvolution processing unit 143 ... differentiation unit 144 ... differential region deconvolution unit 145 ... integration unit 161 ... region division unit 162 ... local blur estimation unit 200 Edge detection unit 210 ... Storage device 220 ... Estimation unit 300 ... Processor system 310 ... First processor 320 ... Second processor 330 ... First decoder 340 ... Second decoder 350 ... DMA controller 361 ... Control program, 362 ... blur estimation program, 363 ... trimap creation program, 364 ... α value estimation program, 365 ... registration program, 366 ... color estimation program, 367 ... composite image creation program, 370 ... main memory, 380 ... interface 390 ... Memory controller

Claims

An estimation unit for estimating a blur state in each of a plurality of image data related to the same object;
Based on the estimation result in the estimation unit, a dividing unit that divides a background region and a foreground region in each of the image data;
An extraction unit that extracts a foreground in each of the image data according to a division result in the division unit, and each of the image data is an image focused on the foreground compared to the background. An image processing apparatus.

An estimation unit for estimating a blur state in each of a plurality of image data related to the same object;
Based on the estimation result in the estimation unit, in each of the image data, a dividing unit that divides the first region as the background and the other second region,
An alignment unit for grasping a positional relationship between pixels corresponding to each other between the image data;
An extraction unit that extracts a foreground in each of the image data according to a division result in the division unit and the positional relationship grasped in the alignment unit, and each of the image data is included in the background Compared to the foreground, the image is
The dividing unit regards an area estimated to be blurred in the estimation result as the background,
The image processing apparatus, wherein the alignment unit performs alignment between the first regions and the second regions.

The image processing apparatus according to claim 1, wherein the extraction unit calculates a ratio of the foreground color and a ratio of the background color for each pixel included in the image data. .

The estimation unit generates blur-removed image data from which the blur is removed for each of the image data,
The image processing apparatus according to claim 2, wherein the alignment unit grasps the positional relationship between the deblurred image data.

Receiving a plurality of first image data;
A plurality of processors estimating a blur state for each of a plurality of areas included in each of the plurality of received first image data;
The processor dividing each of the first image data into a background area and a foreground area based on the blur state;
The processor extracting the foreground for each of the first image data;
The processor comprising combining the foreground with second image data different from the first image data, each of the first image data being an image focused on the foreground compared to the background. An image processing method characterized by that.