WO2013087880A1

WO2013087880A1 - Method and system for interpolating a virtual image from a first and a second input images

Info

Publication number: WO2013087880A1
Application number: PCT/EP2012/075634
Authority: WO
Inventors: Philippe Robert; Matthieu Fradet; Tomas CRIVELLI; Cedric Thebault
Original assignee: Thomson Licensing
Priority date: 2011-12-14
Filing date: 2012-12-14
Publication date: 2013-06-20

Abstract

The invention concerns a method of interpolating a virtual image (Ti) from a first and a second input images (T₁, T₀) the luminance (L) or color (C_R, C_G, C_B) of each interpolated pixel of the virtual image being calculated from corresponding pixels of the first and the second input images comprising the steps of, if a first input image is partially occluded (T₁), for the pixels belonging to an occluded area, estimating a luminance and/or color gain variation factor (β) between the first and second input images (T₁, T₀) from spatio-temporal neighboring information from pixels that do not belong to occluded arrea, calculating a weighting coefficient (α) taking in account a distance of the virtual image to the second input image (Τ_i- T₀) relative to the distance between the first and the second input images (Tr T₀) and computing the luminance and/or color of each interpolated pixel from the luminance and/or color of the corresponding pixel belonging to the second input image taking account of the estimated luminance and/or color variation gain (β) and the calculated weighting coefficient (α).

Description

Method and system for interpolating a virtual image from a first and a second input images

The invention concerns the interpolation of images from at least one pair of images. It concerns particularly the interpolation of the luminance for this interpolated image.

Usually, interpolation is carried out from a dense correspondence map assigned to the virtual view. This map is either the motion or the disparity map. This map has been either directly computed for the view to be interpolated or derived from such a map first computed for one of the two input views.

Then, during the interpolation step, the assigned correspondence vector of each pixel in the virtual view allows linking it to the two input images, getting the corresponding points in these input images and then interpolating the pixel.

US7 697 769 discloses an interpolation image generating method includes dividing each of the first reference image and the second reference image into reference regions, correlation values between the regions of the first and second destination images are indicated by the motion vectors. An interpolation image is generated using the reference region determined as the high correlation region, and mixing the interpolation image candidates using the motion vectors of the reference region to produce an interpolation image.

WO 201 1 105337 (CA 2 790 268) from Nippon telegraph and telephone corporation discloses a method for encoding object frame from previously encoded reference frame. A correction parameter for correcting mismatches in terms of local brightness and color is estimated from viewpoint synthesized image and reference frame and is used to correct viewpoint synthesized image. Interpolation of disparity or depth values such as interpolation of luminance or color are described but it happens that some areas are occluded in one view and visible in only one view. Thus if a pixel is interpolated from one frame only, the value of the luminance and or color in an interpolated view cannot be directly determined.

Luminance or color of the pixel to interpolate is obtained via the following equation:

L(x, T_t ) = (ΐ -α)^χ ΐ(χ-α d_x , T₀ ) + a x L(x + (l - a) d_x , T_l ) Eq. (1 )

"T," is the image to be interpolated, "T₀" and "Τ are input images. "d_x" corresponds to motion or disparity information linking "T₀" and "Τ and is assigned to point (x,Ti). In the following, it is assumed that Ti corresponds to a value: for example, in the case of interpolation along the temporal axis, Ti corresponds to a time value, the various frames of the sequence are referred by this value. In the case of view interpolation from multiple views, the input and output views are assumed to have their focal points aligned, and Ti corresponds to a position index along this line.

The (temporal or spatial) distance between the input frames can be set to 1 (Ti-To=1 ) and Ti refers to the relative position of image Ti with respect to T₀. L can be a luminance map or a color channel map. In case of color, an a value can be different for each color channel ,the rot one CR , the green one

In a simple interpolation case the coefficient a depends of the disparity or motion of the images TO, T1 and Ti as follow:

This equation combines the luminance L or color C component of the two input imagesTO and T1. In particular, in case of color or luminance difference between the two images, the result takes in consideration a weighting coefficient a between the input values taking into accountjhe distance of the virtual frame to the two input views^

Let us assume that there is a luminance gain factor β between the two input images views "T₀" and "Τ :

L(X + (l-a) d_x,T_l)= βχ L(X -a d_x,T₀) Eq . (3)

In this case, interpolation equation (1) becomes:

L(x, T_I ) = (l + a x (β - l))x L(X -a d_x,T₀)

If the luminance gain factor β is equivalent to 1, there is no variation of luminance between the two views and the interpolated views:

L(x,T_t) =∑x-a x d_x,T₀) = L(x + (l-a)xd_x,T_l)

But if the luminance gain factor β is different to 1 (β≠1), the luminance or color component in the interpolated view Ti resulting from the interpolation from both views T₀ and Ti becomes (introducing Eq.(3) in Eq.(1)) :

L(x,T_t) =∑x-a xd_x,T₀)+a ^χ(β -l)x L(X -a xd_x,T₀) Eq. (4)

But it happens that some areas are visible in only one view and occluded in the other view. This case requires a special processing to locate such areas in the virtual view. Once it is done, the luminance in the virtual view can be computed from just one frame via the following equation if the pixel is occluded in Ti : L(x,T_i) = L(x-axd_x,T₀) Eq. (2) Thus the luminance or color of the interpolated pixel is the luminance or color of the visible pixel.

If a pixel is interpolated from one frame only, due to occlusion, for example from To only (Eq. (2, the luminance gain factor β between the two input images views "T₀" and "Τ is not known, the error on the pixel reconstruction is :

E x, T_i ) = α (β - l) x l(x - a d_x , T₀ )

The problem is that these partially occluded pixels are not compensated in a similar way as the pixels visible in both views. This may lead to annoying defects.

The invention will remedy these disadvantages. It consists in a method of interpolating a virtual image from a first and a second input image the luminance or color of each interpolated pixel of the virtual image being calculated from corresponding pixels of the first and the second input images

Characterized in that the method comprises further the steps of:

If a first input image is partially occluded, for the pixels belonging to an occluded area,

estimating a luminance and/or color gain variation factor (β) between the first and second input images from spatio-temporal neighboring information from pixels that do not belong to occluded area;

calculating a weighting coefficient taking in account a distance of the virtual image to the second input image relative to the distance between the first and the second input images;

computing the luminance and/or color of each interpolated pixel from the luminance and/or color of the corresponding pixel belonging to the second input image taking account of the estimated luminance and/or color variation gain and the calculated weighting coefficient. Thus luminance and/or color are compensated in a similar way as the pixels visible in both views

According to an aspect of the invention, pixels belonging to an occluded area are determined by an occlusion map relative to the interpolated image.

According to another aspect of the invention, the luminance/color gain variation factor is a value estimated by correlation during matching between the first and second input images as it varies linearly between the first and second input images.

According to another aspect of the invention if the luminance and/or color gain factor variations follow an affine model, the luminance and/or color variation gain takes account of the offset value defined by the affine model.

According to another aspect of the invention if the luminance and/or color gain factor variations follow a parabolic model, the luminance and/or color gain variation gain takes account of the offset value defined by the parabolic model.

According to another aspect of the invention for pixels belonging to occluded object in the two input images, interpolated pixel luminance and/or color is filled from spatio-temporal neighboring pixels without taking into account luminance/color gain

According to another aspect of the invention for pixels belonging to occluded areas in the first and second input images, luminance and/or color variation are filled from estimation of the spatial luminance and/or color variation of pixels in the visible parts. The present invention concerns furthermore a method of extrapolating a virtual image corresponding to the preceding method of interpolating a virtual image. The present invention concerns furthermore a system for interpolating a virtual image from a first and a second input images, the luminance or color of each interpolated pixel of the virtual image being calculated from corresponding pixels of the first and the second input images. The system comprises further, if a first input image is partially occluded, for the pixels belonging to an occluded area, Means for estimating a luminance and/or color gain variation factor between the first and second input images from spatio-temporal neighboring information from pixels that do not belong to occluded area, means for calculating a weighting coefficient taking in account a distance of the virtual image to the second input image) relative to the distance between the first and the second input images and means for computing the luminance and/or color of each interpolated pixel from the luminance and/or color of the corresponding pixel belonging to the second input image taking account of the estimated luminance and/or color variation gain and the calculated weighting coefficient .

The above and other aspects of the invention will become more apparent by the following detailed description of exemplary embodiments thereof with reference to the attached drawings in which: Figure 1 is an illustration of a simple interpolation case;

Figure 2 illustrates the situation of an orphan pixel which gets at least a region label and disparity/motion information;

Figure 3 represents a flowchart of a method of the invention; Hereinafter, the present invention will be described more fully with reference to the accompanying drawings, in which exemplary embodiments of the invention are shown As described previously, luminance or color of the pixel to interpolate is obtained via the following equation : L(x, T_t) = (ΐ -α)^χ ΐ(χ-α x d_x ,T₀ ) + a x L(x + (l - a) d_x ,T_l ) Eq. (1 )

As represented by the figure 1 , "T" is the image to be interpolated, "T₀" and "Ti" are input images. "d_x" corresponds to motion or disparity information linking "To" and "Ti".

This equation (1 ) combines the luminance L or color C component of the two input imagesTO and T1 . In particular, in case of color or luminance difference between the two images, the result takes in consideration a weighting coefficient a between the input values taking into account the distance of the virtual frame to one of the two input views T,-Ti and the distance between the two input views Ti -To .

In a simple interpolation case the coefficient a depends on the images TO, T1 and Ti .

If the pixel is visible in the two images "T₀" and "Τ , equation (1 ) is used for interpolation. It means that the luminance/color gain is not explicitly considered in the interpolation but integrated in equation (1 ).

If a pixel is interpolated from one frame only, due to occlusion, for example from To only (Eq. (2)), the luminance gain factor β between the two input images views "T₀" and "Τ is not known, the error on the pixel reconstruction is :

E x,T_i) = a x (β - l)x l(x -a x d_x , Γ₀ ) In order to avoid the problem described above, it is proposed to estimate the luminance or color variation (β) between both views and to compensate this variation during interpolation. The presentation below assumes two input views, but it can be applied with more than two views.

During motion/disparity estimation, an additional luminance and/or /color gain is estimated for each pixel of one input image with respect to the other. The result from correspondence between two frames is, for each frame : · A disparity/motion map

• An occlusion map

• A luminance and/or color gain map

The gain (Eq.(3)) can be estimated for example via correlation during matching between views. Considering two blocks in two images (e.g.T₀ and Ti) that are candidates for matching, if X and Y represent the luminance or color component respectively in each block, then the luminance gain or color gain β can be given by :

B =— r _Ί ^J = I I ^L _r ^J J- ^J assuming : Υ. = β ^χ Χ. ,

E[XY] is the covariance between X and Y, E[X] is the average of X, E[X²] is the variance of X. index i is for the pixels of the block X or Y. Thus, a luminance gain β (x) and a color gainp (x) can be estimated for each pixel in the image. The color gain can be obtained by applying this formula to each color channel CR, CG, CB, leading to a gain factor for each of these channels.

If the occlusion map indicates that the pixel is visible in one view only and as matching is not possible in these areas, motion value or disparity value and luminance and/or color gain must be extrapolated from spatio-temporal neighboring information of the background or from pixels that do not belong to the occluding object. For example, a parametric model can be estimated from the neighborhood and used for extrapolation of disparity, motion or luminance/color gain.

Then, the interpolation can be the following:

L(x, T_i ) = (l + a x (β - l))x L(X - a d_x , T₀ )

if x is visible in T₀ only (Eq. (4))

if x is visible in Ti only

The solution above concerns what is done in the interpolation of pixels visible in one image in presence of a linear luminance and/or color gain.

Nothing changes for the interpolation of a pixel visible in both input views. But, some variants of more complex luminance/color variation models are proposed:

• For example, one can even consider an affine model instead of a linear one :

In this case, the gain can be computed as before, and the offset γ can be given by :

The relation between Ti and T₀ is supposed to be (Equation (3) modified) :

L(X + (\ - a) x d_X , T_L ) = β χ L(X - α χ ά_Χ , Τ₀ ) + γ Eq. (6)

Then, in case of occlusion in one frame, the interpolation of the luminance can be given (introducing Eq. (6) in Eq. (1 )), by : o L(x, T_t) = (l + a ^χ(β -l))xL(x-a xd_x, Τ₀)+αγ Eq. (7a) if x is visible in To only

o Z(x,i;.) = Eq. (7b) if x is visible in Ti only

• Up to now, the gain β is supposed to vary linearly between the two views. However, a more complex model may be more appropriate. Actually, considering more than two views, more complex models can be considered that may better fit the variation. In motion estimation, it could be 3 or 4 successive images, in disparity estimation, 3 or 4 adjacent views. For example, with 3 images To, Ti, T₂, one can identify a parabolic model : a first gain is estimated between T₀ and Ti, then a second one between Ti and T₂ :

L X + (1 - a) x d_x , JJ ) = β_ι x L(X -a d_x,T₀)

From the triplet (1,βι,β₂), the parabolic model β(α) can be estimated : β( ) = axa²+bxa+c

In the same way, the offset γ(α) can be estimated. Then interpolation is carried out via the same equations as previously (Eqs. (7)), except that the parameters β and γ depend on a value.

Furthermore, in this case, even the interpolation of the pixels visible in two images must be reconsidered as the gain β depends on the relative position a. Equation (1) is no more valid and is replaced by:

L(x,T_t) = β{α)χ∑(χ-α xd_x,T₀)+

With the following conditions: β(θ) = 1 and jS(l) = 0

= β(α) = α^χα²-(α + ΐ)^χα+1 All these methods can be used for image extrapolation instead of image interpolation: in this case, Ti is no more in the range [T₀, T1 ], a is out of the range [0, 1 ]. If the occlusion map indicates that the pixel is not visible in an input view, so it is an "orphan" pixel. Thus there is no corresponding vector that links this pixel to a point in another view. The pixel luminance/color is filled from spatio- temporal neighboring pixels without taking into account luminance/color gain. A solution is that disparity/motion based segmentation is applied to the input images to distinguish the various depth layers in these images and in particular to identify occluded and occluding regions.

So, these different occluded regions get a label and a relative occlusion order. Moreover, the most probable shape of the occluded regions in the hidden parts is determined.

Then, inpainting is applied to disparity/motion and possibly luminance/color and luminance/color change information in these hidden areas.

For a given region, color change data in the hidden area may be filled from the color change data in the visible parts:

• Via exemplar-based inpainting techniques

• from estimation of the spatial variation of the color change information inside the region via a color change (e.g. affine parametric) model. Similarly, disparity/motion and color can be filled in the same way.

Figure 2 illustrates this situation. In first line, A and B represent two input images from both images in which part O1 corresponds to the background in which objects O2 and 03 are represented. Object 04 is partially occluded by objects 03 and 04, C represents the image to be interpolated. In the second line the first image right represents estimated disparity map of the first input image A. Different parts are recognized corresponding to different correspondence values. The disparity/motion of 01 is represented in black, the disparity/motion of 02 is represented by the color white, 03 is hell gray and 04 is dark gray. 04 is partially occluded by 02 and 03.

The middle image at the second line depicts the disparity/motion map of 01 and 04 from which objects 02 and 03 have been removed. So, the hidden parts of 01 and 04 have been filled as described above. Similarly, luminance/color variation maps have been filled (gain map and possibly offset map). This processing can also be carried out in the second image (B). The next step is the projection of the regions from the input images onto the virtual view via disparity/motion compensation. With the depth layer representation, each orphan pixel can get now at least a region label and disparity/motion information. Similarly, a luminance/color variation map can be built from the input luminance/color variation map(s).

The lower right image in Figure 2 shows the complete interpolated disparity/motion map, filled in the areas occluded in the input views according to the description above. A luminance/color variation map is obtained in the same way.

The next step is the interpolation of the virtual image. The pixels that are visible in both input images can then be interpolated via a known method or via the method that computes a non-linear luminance/color gain that is described previously. The pixels that are visible in only one image are filled according to are interpolated according what is described above.

Then, color filling of the orphan pixels is carried out via inpainting. In this context, three cases are considered: · Either the pixel is filled from pixels belonging to the same object region in the current image, i.e. from spatial neighboring pixels. This process is called intra-frame filling, spatial filling or image inpainting. • If the regions in the input views have been filled, i.e. if color and color change vector in their hidden parts have been filled, then the orphan pixels can be directly interpolated via disparity/motion compensation as pixels visible in two images.

• If the color data in the hidden areas of the regions in the input views have not been filled, then, the orphan pixels are filled from the visible parts of the region it belongs to in the input views. For example, exemplar-based inpainting is applied to color and to color change vector. The filling color value is then modified from the color change vector as in the case of pixels visible in one view :

Considering a pixel in an image to be interpolated, from any model that allows identifying the position of this pixel in a set of input images distributed in 3D space and time, and from any model that allows predicting the luminance or color appearance of this pixel from the luminance or color of the corresponding points in this set of input images, assign the predicted color appearance value to the pixel.

It can be applied in the same way to extrapolation. In this case, Ti is no more in the range [T₀,Ti] and a is out of the range [0,1 ].

A flow from a method of the invention is represented in figure 3. For pixels belonging to an occluded area, the method comprises a first step of estimating a luminance and/or color gain variation factor β between the first and second input images Ti, T₀ from spatio-temporal neighboring information from pixels that do not belong to occluded area.

The method comprises further the step of calculating a weighting coefficient (a) taking in account a distance of the virtual image to the second input image Τ,- T₀ relative to the distance between the first and the second input images Ti- T₀. Knowing the estimated luminance and/or color variation gain( β) and the calculated weighting coefficient (a), the method comprises the step of computing the luminance and/or color of each interpolated pixel from the luminance and/or color of the corresponding pixel belonging to the second input image taking account of the estimated luminance and/or color variation gain( β) and the calculated weighting coefficient a.

Claims

1 . Method of interpolating a virtual image (Ti) from a first and a second input images (ΤΊ, T₀) the luminance (L) or color (CR, CG, CB) of each interpolated pixel of the virtual image being calculated from corresponding pixels of the first and the second input images

Characterized in that the method comprises further the steps of:

If a first input image is partially occluded (Ti), for the pixels belonging to an occluded area,

estimating a luminance and/or color gain variation factor (β) between the first and second input images (Ti, T₀) from spatio-temporal neighboring information from pixels that do not belong to occluded area;

calculating a weighting coefficient (a) taking in account a distance of the virtual image to the second input image (Τ> T₀) relative to the distance between the first and the second input images (TV T₀);

computing the luminance and/or color of each interpolated pixel from the luminance and/or color of the corresponding pixel belonging to the second input image taking account of the estimated luminance and/or color variation gain( β) and the calculated weighting coefficient (a).

2. Method as claimed in claim 1 , wherein pixels belonging to an occluded area are determined by an occlusion map relative to the interpolated image.

3. Method as claimed in claim 1 , wherein the luminance/color gain variation factor is a value estimated by correlation during matching between the first and second input images as it varies linearly between the first and second input images.

4. Method as claimed in claim 1 , wherein if the luminance and/or color gain factor variations follow an affine model , the luminance and/or color variation gain ( β) takes account of the offset value defined by the affine model.

5. Method as claimed in claim 1 , wherein if the luminance and/or color gain factor variations follow a parabolic model, the luminance and/or color gain variation gain ( β) takes account of the offset value defined by the parabolic model.

6. Method as claimed in claim 1 , wherein for pixels belonging to occluded areas in the first and second input images, interpolated pixel luminance and/or color is filled from the luminance and/or color of spatio-temporal neighboring pixels without taking into account luminance/color gain factor.

7. Method as claimed in claim 1 , wherein for pixels belonging to occluded areas in the first and second input images, luminance and/or color variation ( β) are filled from estimation of the spatial luminance and/or color variation of pixels in the visible parts.

8. System for interpolating a virtual image (Ti) from a first and a second input images (Ti, T₀), the luminance (L) or color (CR, CG, CB) of each interpolated pixel of the virtual image being calculated from corresponding pixels of the first and the second input images

Characterized in that the system comprises further:

Means for estimating a luminance and/or color gain variation factor (β) between the first and second input images (Ti, T₀) from spatio-temporal neighboring information from pixels that do not belong to occluded area;

Means for calculating a weighting coefficient (a) taking in account a distance of the virtual image to the second input image (Τ,- T₀) relative to the distance between the first and the second input images (Ti- To) and Means for computing the luminance and/or color of each interpolated pixel from the luminance and/or color of the corresponding pixel belonging to the second input image taking account of the estimated luminance and/or color variation gain( β) and the calculated weighting coefficient (a).