WO2010049917A2

WO2010049917A2 - Image prediction method and system

Info

Publication number: WO2010049917A2
Application number: PCT/IB2009/055226
Authority: WO
Inventors: Ronggang Wang; Yongbing Zhang
Original assignee: France Telecom
Priority date: 2008-10-31
Filing date: 2009-10-20
Publication date: 2010-05-06
Also published as: WO2010049917A3

Abstract

A method for computing a predicted frame from a first and a second reference frames, said method comprising for each block of pixels to be predicted in the predicted frame the acts of: a) defining a first and a second block of pixels corresponding, respectively in said first and second reference frames, to the block of pixels to be predicted along the motion vector of said block of pixels to be predicted respectively from the first to the second reference frame b1 ) computing a first coefficient vector allowing the estimation of the second block from the first block c) computing pixels of the block of pixels to be predicted using said first coefficient vector and pixels in the first block.

Description

IMAGE PREDICTION METHOD AND SYSTEM

Field of the Invention

The present invention relates in general to image processing and more specifically to image prediction.

Background of the Invention

Frame rate up conversion (FRUC) is a prediction or interpolation method where pixels in the interpolated or predicted frame are generated based on the observations of the pixels in previous and following frames. An example of application of FRUC is the reproduction of video sequences captured by digital camera for entertainment, i.e. slow-motion play back and complex video editing. Another example of application of FRUC is the enhancement of visual quality in low bit rate video coding, where only parts of frames in the original sequence are encoded and all the remaining frames need to be interpolated/predicted using adjacent decoded frames. Besides, FRUC may also be used for example in video surveillance, medical imaging, remote sensing etc... One of the most simple and direct FRUC techniques, such as frame repetition and frame averaging, neglects the motion between successive frames. It achieves good results for stationary regions in successive frames; however, for moving regions in successive images, resulting interpolated frames will be choppy and not smooth.

Another more effective and prevalent technique is the Motion Compensation Interpolation (MCI) method, which interpolates the intermediate frame along the motion trajectory between the previous and following frames. Since motion information is used in MCI, the accuracy of motion information plays a significant role for the quality of the interpolated frames. Many pioneering works have been done to improve the accuracy of the motion information between successive frames, which can be divided into two categorizes: motion estimation (ME) method and the motion vector processing method. The paper "True-motion estimation with 3-D recursive search block matching," (G. de Haan, P. W. Biezen, H. Huijgen, and O. A. Ojo, IEEE Trans, on Circuits and System Video Technology, vol. 3, No. 5, pp. 368-379, Oct. 1993) proposes a 3D recursive (3DRS) method to derive the true motions between successive frames. The paper "New frame rate up-conversion using bi-directional motion estimation," (B. T. Choi, S. H. Lee, and S. J. Ko, IEEE Trans, on Consumer Electron., vol. 46, No. 3, pp. 603-609, Aug. 2000) proposes a bi-directional ME to obtain more reliable motions for FRUC. Besides, a hierarchical ME is also proposed in "Hierarchical motion compensated frame rate up-conversion based on the Gaussian/Laplacian pyramid," (G. I. Lee, B. W. Jeon. R. H. Park, and S. H. Lee, Proc. IEEE Int. Conf. Consumer Electronics, 2003, pp. 350-351 ) to get more faithful motions. To overcome the limitation of the translational motion with constant velocity assumption, a constant acceleration model in described in "Motion compensated frame interpolation based on H.264 decoder," (Z. Gan, L. Qi, and X. Zhu, Electronics Letters., vol. 43, No. 2, pp. 96-98, Jan. 2007) to further improve the accuracy of motion information between successive frames. Due to the absence of the actual pixels in the to-be-interpolated frame in FRUC, the ME is performed in the previous and following frames, which may result in non-consistent motion fields in FRUC sometimes. Consequently, some motion vector post processing methods were proposed in "A method for motion adaptive frame rate up-conversion," (R. Castagno, P. Haavisto, and G. Ramponi, IEEE Trans, on Circuits and System Video Technology, vol. 6, No. 5, pp. 436-446, Oct. 1996) and "Motion vector processing for frame rate up conversion" (G. Dane and T. Nguyen, Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, ICASSP, 2004, pp. 309-312), to make the motion field much smoother. MCI methods have achieved much better performance than the frame averaging and the frame repetition. Whereas the MCI method is usually performed block by block, which may not be consistent with the heterogeneous object shapes for some scenes, and thus blocking artifacts are usually perceived in the regions with complex motion.

By extending traditional MCI, Overlapped Block Motion Compensation (OBMC) was also employed in FRUC (B. Girod, "Efficiency analysis of multihypothesis motion-compensatedprediction for video coding," IEEE Trans, on Image Processing., vol.9, No.2, pp. 173-183, Feb. 2000), due to its ability of reducing blocking artifacts, which are usually observed in the interpolated frames by MCI. J. Zhai, K. Yu, J. Li, and S. Li₁ "A low complexity motion compensated frame interpolation method," Proc. ISCAS, May. 2005, vol. 5, pp. 4927-4930 proposes an interpolation method by positioning overlapped blocks from the previous and the following frames utilizing a weighting window to further suppress the blocking artifacts. Compared with MCI, OBMC is able to generate a much smoother interpolated frame; however it assigns fixed weights for different blocks in the position relevant to the center block and may result in blurring or oversmoothing artifacts in case of non-consistent motion regions. For better adjusting the weights of OBMC, an Adaptive OBMC (AOBMC) is proposed in "Motion-compensated frame interpolation using bilateral motion estimation and adaptive overlapped block motion compensation," (B.D.Choi, J.W.Han, CS. Kim, and SJ. Ko, IEEE Trans, on Circuits System Video Technology, vol. 17, No. 4, pp. 407-416, Apr. 2007) to tune the weights of different blocks according to the reliability of neighbouring motion vectors. AOBMC is able to adjust the weights of overlapped blocks to some extent, however it has inferior performance in case of stationary region or when the neighbouring motion vectors are very similar.

All the afore mentioned methods can be seen as finding the blocks in the previous and following frames most similar to the to-be-interpolated block, and then taking the average of the similar blocks as the ultimate interpolations. In the majority of cases, the motion displacement may be fractional-pixel rather than integer-pixel. In a consequence, all these methods first interpolate the previous and the following frames, by a fixed interpolation tap filter, to the half-pel or quarter-pel accuracy, and then search the most similar block for each to-be-interpolated block. It is stated in Adaptive interpolation filters and high-resolution displacements for video coding," (T. Wedi,/EEE Trans, on Circuits System Video Technology, vol. 16, No. 4, pp. 484-491 , Apr. 2006) that the invariant interpolation filter dose not consider the non-stationary properties of video signals like e.g. aliasing, displacement estimation errors in the interpolation process. Consequently, the existing FRUC solutions suffer from the inferiority of the fixed interpolation filter and do not solve the problem of non-linear prediction. Today there is a need for an image prediction solution that can be easily implemented on the existing communication infrastructures, overcoming the drawbacks of the prior art.

Summary of Invention

It is an object of the present system to overcome disadvantages and/or make improvement over the prior art.

To that extend, the invention proposes a method for computing a predicted frame from a first and a second reference frames, said method comprising for each block of pixels to be predicted in the predicted frame the acts of: a) defining a first and a second block of pixels corresponding, respectively in said first and second reference frames, to the block of pixels to be predicted along the motion vector of said block of pixels to be predicted respectively from the first to the second reference frame b1) computing a first coefficient vector allowing the transformation of the first block into the second block

C) computing pixels of the block of pixels to be predicted using said first coefficient vector and pixels in the first block.

The invention also relates to a system according to claim 6.

The invention also relates to a device according to claim 5.

The invention also relates to a computer program according to claim 7.

In this paper a method is proposed to adaptively tune the interpolation coefficients of the interpolation filter for each block from the observations of previous and following frames.

Different from the state-of-art weight estimating algorithm that performs the estimation and interpolation separately, the method according to the invention is performed iteratively utilizing the progressively predicted frame. More accurate coefficient (interpolation) vector may be obtained, thereby enhancing the quality of the interpolated frame. Brief Description of the Drawings

Embodiments of the present invention will now be described solely by way of example and only with reference to the accompanying drawings, where like parts are provided with corresponding reference numerals, and in which:

Figure 1A schematically illustrates a method according to an embodiment of the present invention;

Figure 1 B schematically illustrates a method according to an additional embodiment of the present invention;

Figure 2 describes the motion trajectory of a pixel or a set of pixels from one frame to another in a sequence of frames of a video flow;

Figure 3 schematically illustrates an example of pixel prediction according to an embodiment of the present invention;

Figure 4 schematically illustrates an example of pixel prediction according to an additional embodiment of the present invention;

Figure 5 schematically illustrates an example of pixel prediction according to another additional embodiment of the present invention;

Figure 6 schematically illustrates an example of pixel prediction according to an embodiment of the present invention;

Figure 7 illustrates the MSE of the forward model and the backward model against iteration numbers with different initial weights, (a) forward and backward MSE in MoMe (QCIF). (b) forward and backward MSE in Tempete (CIF);

Fig. 8 illustrates the average PSNRs of the 50 interpolated frames with different initial weights, (a) Mobile (QCIF). (b) Tempete (CIF),

Figure 9 describes the method according to the invention according to one illustrative embodiment; and,

Figure 10 describes an iterative algorithm implementing the present method according to the invention. Description of Preferred Embodiments

The following are descriptions of exemplary embodiments that when taken in conjunction with the drawings will demonstrate the above noted features and advantages, and introduce further ones.

In the following description, for purposes of explanation rather than limitation, specific details are set forth such as architecture, interfaces, techniques, devices etc., for illustration. However, it will be apparent to those of ordinary skill in the art that other embodiments that depart from these details would still be understood to be within the scope of the appended claims.

Moreover, for the purpose of clarity, detailed descriptions of well- known devices, systems, and methods are omitted so as not to obscure the description of the present system. Furthermore, routers, servers, nodes, base stations, gateways or other entities in a telecommunication network are not detailed as their implementation is beyond the scope of the present system and method.

In addition, it should be expressly understood that the drawings are included for illustrative purposes and do not represent the scope of the present system.

The method according to the invention proposes in particular a model for predicting an image (i.e. called predicted or current image/frame) based on observations made in previous and following images. In the method according to the invention, the prediction is performed in the unit of block of pixels and may be performed for each block of the predicted image. By extension, an image may be assimilated to a block (of pixels).

The method according to the invention is suitable for predicting frames in a sequence or stream of frames and allows in particular predicting a frame between a first and a second reference frames.

Figure 2 describes the motion trajectory of a pixel or a set of pixels from one frame to another in a sequence of frames of a video flow. Assuming a first set or block of pixels 200 in a frame 2k, then, the corresponding block, in the following frame 2k+1 , along the motion trajectory (defined by its associated motion vector) is the block 210. Similarly, the corresponding block, in the following frame 2k+2 of the frame 2k+1 , along the same motion trajectory (defined by its associated motion vector) is the block 220.

Figure 1a describes the method according to the invention wherein a first reference frame 100 and a second reference frame 110 are used to define, in a first act 120, a first and a second block of pixels corresponding, respectively in said first and second reference frames, to the block of pixels to be predicted along the motion vector of said block of pixels to be predicted respectively from the first to the second reference frame.

In reference to Figure 2, the first block defined here above corresponds to block 200, the block of pixels to be predicted corresponds to block 210 and the second block corresponds to block 220.

In act 130, a first coefficient vector allowing the transformation of the first block into the second block is computed. This transformation corresponds to the approximation of pixels in the second block from pixels in the first block along the motion vector of the trajectory of said pixels. Knowing the actual (i.e. real or existing) pixels in the second block, known methods such as, e.g. Mean Square Estimation (MSE), allow deriving the optimum coefficient vector, i.e. the first coefficient vector, that give the best approximation.

The assumption is made that the first coefficient vector derived in act 130 may also be used to approximate pixels in the block of pixels to be predicted from pixels in the first block in an act 140 in order to obtain the predicted frame 150. This assumption is based on the fact that there are high redundancies between consecutives or adjacent frames in a stream of frames (in particular for a video stream).

Figure 3 schematically describes the prediction of a pixel 311 in a predicted frame 310 from pixels in a reference frame 320 along the motion vector 330 linking the pixel to be predicted 311 and its corresponding pixel 322 in the reference frame 320.

As shown in Figure 3, for each pixel 311 in the predicted frame 310, the corresponding pixel 321 in the reference frame 320 is derived along the motion trajectory (shown in Figure 3 through motion vector 330). A square spatial neighborhood 325, centered on the corresponding pixel 321 in the reference frame 320 is defined. The pixel 311 in the predicted frame is thus approximated as a linear combination of the pixels 322 of the corresponding spatial neighborhood 325 in the reference frame 320. This interpolation process may be expressed as

Y_t (m,n) = ^ X_t_ι (m + i,ή + j)*0Ci_j + n_t (m, n) (^1A)

-r≤(i,j)≤r

Where:

Ϋ_t (m,n) v ' represents the predicted pixel 311 located at coordinates

(m,n) in the predicted frame 310,

^M allows representing represents the pixels in the reference frame 320,

- ' represents the position of the corresponding pixel 321 in the reference frame 320 pointed by the motion vector 330 of the

(m,n) corresponding predicted pixel 311 located at ^{v /} in the predicted

frame 310 through Ϋ_t ^/ (^vm,n ') ,

ai i

'^J are the components of the coefficient vector, n_t (m,n)

- ^v ' is the additive Gaussian white noise,

- r is the radius of the filter defined by the square spatial neighborhood 325. It corresponds to a (2r+1 )x(2r+1 ) area around a pixel 321 of the reference frame wherein the pixels are weighted with the coefficients of the coefficient vector in a linear combination in order to approximate the corresponding pixel 311 in the predicted frame 310. For instance in Figure 3, the radius is r=1 and the size of the interpolation filter is 3x3. In the description here under, Y_2k+l is a block, with the size of FFxJF (with W an interger greater than 1), of pixels to be predicted in the predicted frame 2k + 1 , X_2k is the first block, i.e. the aligned or corresponding block in the previous frame 2k along the motion trajectory and X_2k+2 is the second block, i.e. the aligned (i.e.) corresponding block in the following frame 2AΓ + 2 along the motion trajectory.

With this notation, the scheme presented in reference to Figure 3 may be described as follows here under.

If X_2k (m,n) represents the pixel at (m,n) in frame 2k in the block X₂κ, then the predicted pixel within block Y_2k+l may be approximated as:

Ϊ2k+l (^m>ⁿ) = Σ ^X2k {^{i + m + v}xJ ^{+ n + v}y)'^ai,j ^{+ n}2k₊\ (^m>ⁿ) (^{1 B}) -L≤(i,j)≤L

where:

- L is the radius of the filter,

- (V_x, V_y) represents the motion vector of the block Y_2/t+1 between the first reference frame 2k and the predicted frame 2k+1 ,

- CC_{1 J} are the components of the coefficient vector from X₂κ to Y_2k+i_>

- n_2k+l (m,n) is the additive Gaussian white noise.

- L is the radius of the filter defined by the square spatial neighborhood 325. It corresponds to a (2L+1 )x(2L+1 ) area around a pixel 321 of the reference frame wherein the pixels are weighted with the coefficients of the coefficient vector in a linear combination in order to approximate the corresponding pixel 311 in the predicted frame 310. For instance in Figure 3, the radius is L=1 and the size of the interpolation filter is 3x3.

In the description here under, Y_2k+l is a block, with the size of W xW (with W an interger greater than 1), of pixels to be predicted in the predicted frame 2k + 1 , X_2k is the first block, i.e. the aligned or corresponding block in the previous frame 2k along the motion trajectory and X_2k+2 is the second block, i.e. the aligned (i.e.) corresponding block in the following frame 2& + 2 along the motion trajectory.

Rearranging the interpolated pixels in block Y_2/t+1 as a WxW column vector Y_2U] ={Y_2Ui{0,0),Y_2M(0,l),...,Y_2k+l{W-l,W-l))' , representing the concatenated and lexicographically ordered intensity values, then Equation (1) can be rewritten as:

where:

-X_2k=(X_2k(0_i0),X_2k(0,i),...,X_2k(W-\,W-ϊ))' represents the concatenated and lexicographically ordered intensity values in the corresponding block,

- α = {^a-L,-L>^a-L,-L+ι>~->^aL,L) is the coefficient vector,

- ⁿ _2/t₊i is the additive Gaussian white noise vector,

J

) is a function transferring X_2k to a (W-W)x((2L + l).(2L + l)) matrix.

Here the /th row of matrix /(X₂J consists of the (2Z + l)x(2Z, + l) square neighbourhoods of the /th pixel within X_n , i = 0,1,...,WxW -1. The coefficient vector α should be chosen to be the optimum.

A known method is the Mean Square Error (MSE) to measure its performance as follows:

*² (V_2ω) = £ ||Y_2i+, -Y₂₁₊I ≡ E i( Y₂₁₊₁ -Y₂₄₊₁)^' (Y₂₄₊, -Y₂₄₊₁)

where | . | denotes the L₂ norm.

The optimum coefficient vector can be computed by minimizing the MSE in Equation (3). However, since the actual/real pixels in Y_2yt+1 are not available (as it is a block of the frame to be predicted wherein, by definition, pixels have not been predicted yet), Equation (3) can not be directly used to compute the optimum weights.

Thus, in the method according to the invention, the first coefficient vector used to compute pixels in frame 2k+1 is the optimum coefficient vector derived from the non-linear transformation used to approximate pixels in frame 2k+2 from pixels in frame 2k.

Figure 4 describes the use twice in a row of the scheme detailed here above. The assumption that the coefficient vector, used to approximate (or estimate) the corresponding actual pixels within frame 2k + 2 , remain the same as those used to interpolate the pixels within frame 2k + 1 is accurate since there is a high redundancy between pixels along the motion trajectory from frame 2k to 2k + 2 (as they are consecutive frames) and thus it is reasonable to assume that the sample covariance does not change in those motion aligned blocks.

As shown on Figure 4, the actual pixels in the motion aligned block within frame 2k + 2 can be estimated as:

X₂*₊2 = /(Y₂W )« + ⁿ 2k+2 , (4)

where X_2A+2 is a Wx W column vector, representing the concatenated and lexicographically ordered intensity values in the motion aligned block within frame 2k + 2 , /(Y_2λ+I ) is a matrix, whose element is computed according to Equation (2), and α is the same with that in Equation (2). Incorporating Equation (2) into Equation (4), the interpolation (i.e. approximation or estimation) of X₂^₊₂ can be obtained using the corresponding pixels within the aligned/corresponding block X_n as follows: Xk₊I

+ ⁿ2*₊l )α + ⁿ2*₊2

+ ⁿ2*₊2

(5) where g(X_2A ) is a function which transfers the column vector X_2A to a

(W*W)x((4L +

matrix, Λ(α) represents the coefficient vector corresponding to the enlarged square neighbourhood with size of (4I + l)x(4Z + l) , shown in frame 2k in Figure 5.

In Figure 5, each pixel in the aligned block in the following frame 2k + 2 may be estimated as a weighted summation of pixels in an enlarged square neighbourhood with size of (4I + l)x(4I + l) in the previous frame 2k along its motion trajectory. It should be noted that the length of the coefficient vector h(a) corresponding to the enlarged square is (4L + \)x(4L + l) and each element of Λ(α) is the quadratic of the elements of α . In other words, in a two dimensional way, each element of h(a) may be expressed as:

with -L ≤i,j ≤Lf]-L ≤ p,q ≤L and -2L ≤m,n≤2L . Here the Λh row of matrix g(X_2A ) consists of the (4L + \)x(4L + l) square neighbourhood of the Λh pixel within X_2A , i = 0,i,...,WxW-i .

From Equation 5, the coefficient vector α may be computed by minimizing the MSE as follows:

where r {a) = X_2k+2 — g [X_2k ) h(a) is defined as the residual vector.

The principle behind Equation (7) is that the original frame should satisfy 3f(α)/3α = 0 as seen in reference with Figure 3. However, as mentioned in the previous subsection r (α) is not linear with respect to α like it were in the scheme described in reference to Figure 3. Thus, α cannot be directly computed by setting 9£-(α)/3α = 0 as previously emphasized.

In an illustrative embodiment of the method according to the invention, a is assumed to be the approximation of α, F(α) may be expressed around a in Taylor series as:

r(α)~r(a) + J(a)(α-a) _t (8)

where J(a) is the Jacobian matrix of r(α) at a. The Jacobian matrix J (a) can be computed as

with:

— ^ -²Ji\^X2k)^α (10)

where:

+ l)x(2Z + l)-l, and /(X_2;) represents the /th row of the matrix /(X₂J with i = (m/W-L + n/L)xW + (m/W-L + n/L).

Let r(α) = 0, α may be computed as:

α = Λ-(J' (a)J(a))^"1 J' (a)r (a) (11). Consequently, according to the method according to the invention, if the coefficient vector α' is derived after the /th iteration, then the coefficient vector in the (/+1 )th iteration may be computed as:

α'⁺¹ = α' -(y' (α')y(α'))^"l y' (α')f(α') _(12).

Since this scheme imposes a strong demand on the initial value of α , in an optional embodiment of the method according to the invention, Equation (12) may be modified by adding a damping factor as:

a^i+l =a^l -of (j' (a'jjfo))^'1 J^* (a'jrfi) ₍₁₃₎

where

(2L + l)-(2L + l)-l , is the damping factor.

By adding the damping factor, the convergence of the method according to the invention in this illustrative embodiment is ensured and the convergence speed is accelerated.

Figure 10 describes the iterative algorithm implementing the method according to the invention.

In an act 1000, α° is initialized (as explained further).

In act 1005, J(α°)and r (a°) are initialized.

The iteration is then started and is stopped when a pre-defined number of iterations has been reached or when the convergence of the iteration has been reached (act 1010).

Δα' is computed in an act 1015 and α'⁺¹ is updated in act 1020.

r(α'⁺¹ ) and J(α'^+I ) are updated in act 1025 and the iteration restarts at act 1010 with the stopping conditions until said condition is fulfilled. Figure 9 describes the method according to the invention according to one illustrative embodiment.

The predicted frame is first divided into non-overlapped blocks in act 900. If it is not the last block (act 905), forward (from frames 2k to 2k+2) and backward (from frames 2k+2 to 2k) motion vectors are derived.

Coefficient vector α (forward) and β (backward) are derived using the method according to the invention in respectively acts 915 and 920.

Pixels are predicted using α and β (respectively acts 925 and 930) and combined in act 935 to derive the predicted frame.

In the following description, an illustrative implementation, called here after damping Newton algorithm, of the method according to the invention is detailed.

initialize Y₂ ³^₊₁ and α° ; for / = 0 to iMax-1

Compute Y₂^₁ according to Eq. (2);

Compute the forward MSE: E(i) = |Y^+I - %_k+\\ . (14);

if E{i) < T

break; end if

Compute J(α' ) according to Eq. (9) and Eq. (10).

Compute r (α' ) : r (aⁱ) = X_2k+2 -g{X_2k )h(a_i) (15);

Compute cd according to Table 2; Compute α'^+l according to Eq. (13); end for

The proposed damping Newton algorithm to estimate the accurate weight vector α is summarized in Equations (14) and (15).

As shown in the summary of the iterative algorithm, the forward or backward MSE E(i) is used to judge whether the damping Newton algorithm has been converged or not. In other words, if E(i) is smaller than a preset threshold T , it is considered that the damping Newton algorithm has been converged, otherwise, it is not converged and the algorithm moves to the next iteration. It is clear that the computation of a new weight vector in Equation (13) involves the operations of computing the inverse of a matrix and a multiplication of matrix. Since the Hessian matrix in Equation (13) is

A = J ya J J ya J \\ \_s obvious that A is positive definitive. And thus we

can obtain α'⁺¹ by computing the inverse of A in Eq. (13) in our proposed method.

If we define the change of a weight vector α'⁺¹ compared to α' as

(16) then the convergence ratio of each weight is defined as

(17)

Based on the convergence ratios in Equation (17), we propose a method to adaptively adjust the damping factor. The adjusting of damping factor ω' is summarized in Table 2, where the variable a is the effective convergence coefficient, and b is the accelerated coefficient. It is noted that a , b and v should be positive and should satisfy α ≤ l , b < \ and v > l . In this invention, the values of a , b and v are set to be 0.7, 0.2, and 2, respectively. And ω° is set to be the identity matrix / . The computation of ω' is synchronous with the computation of α'⁺¹ , and consequently E(ϊ) in Equation (14) is the same with that in Table "Damping factor adjusting" below:

Damping factor adjusting:

initialize ω° ; for / = 1 to iMax-1

Compute φ(a^ι ) according to Eq. (16);

for ; = 1 to (2I + 1).(2Z + 1)-1

Compute A₁J according to Eq. (17);

end if

b

o)'^J = ω' ^J »v ; end if end for if E(i) < T

break; end if end for

Since the convergence of the proposed method depends on the selection of the initial coefficient vector α°, setting proper α° is very important. In this example of implementation, the initialization of α° is performed as follows. The initial interpolations Y₂V_+I derived by the traditional FRUC methods, such as e.g. MCI, OBMC and AOBMC with quarter-pel accuracy motion vectors, are first obtained. Then the corresponding pixels within the motion aligned blocks in frames 2k + 1 and 2£ + 2 are predicted using the method according to the invention:

The initial coefficient vector α° are then computed according to:

(20)

In an additional embodiment of the method according to the invention, in reference to Figure 1 B, after deriving the first coefficient weight α , the similar iterative process may also be applied to derive a second coefficient vector β in acts 135 and 145. This is done in the reverse way, as described in

Figure 6 (with parallel to Figure 4) from X_2k+2 instead of X₂κ applying the method backward.

Finally, in order to make the prediction more accurate, the predicted pixels within the current to be interpolated block may be computed/optimized as:

Y₂It₊I = (/(X2t ) ^β + /(X₂t₊2 ) β) / 2 (2D

The computation bottleneck of the proposed algorithm is in Equation (13), which involves computing the inverse of a matrix and a multiplication of matrix. However, there are many algorithms to speed up these operations. In addition, with fast algorithms and more powerful computing resources, the running time of the proposed clamping Newton algorithms can be further reduced.

Various experiments are conducted to show the performance of the method according to the invention and compare it to other methods. We will first study the convergence property of the method according to the invention. Subseqentially, the interpolation performance comparsions are presented. To evaluate the method according to the invention performance, the following well-known metrics, peak signal-to-noise ration (PSNR), are used in this work. The PSNR is defined as follows.

, (22)

where Ϋ and Y are the interpolated frame and the actual frame, respectively, W_γ and H_γ are the width and the height of the frame, respectively.

Convergence Study

In this subsection, various experiments were conducted to show the effect of initial weights on the convergence of the method according to the invention. Tempete (CIF) and Mobile (QCIF) sequences were selected to conduct the experiments for showing the convergence property of the method according to the invention. The sizes of the block for Tempete (CIF) and Mobile (QCIF) are set to be 16x16 and 8x8 , respectively. And the supporting orders are all set to be 2 for these two test sequences. Every other frame of the first 100 frames of each test sequence was skipped and interpolated by the proposed method by using different initial weights.

Fig. 7 illustrates the MSE of the forward model and the backward model against iteration numbers with different initial weights, (a) forward and backward MSE in Mobile (QCIF). (b) forward and backward MSE in Tempete (CIF). Three interpolation results, MCI, OBMC [8], and AOBMC [9], were selected for the computation of initial weights. Bi-directional ME as in [2] was used to derive motion vectors in these three interpolation methods, and the motion vector was of quarter-pel accuracy. The motion vector post processing algorithm as in [5] was used to smooth the motion field after the derivation of motion vectors. The threshold T was set to be 50 in the experiments. The MSE of the forward model and the backward model averaged over the 50 interpolated frames against the number of iterations are plotted as in Fig. 7. It is easy to observe in Fig. 7 that with the increase of iteration number, the MSEs of both the forward and backward models in Mobile (QCIF) and Tempete (CIF) are decreased, and when the iteration number is larger than 3, the MSEs tend to be constant and converged.

Fig. 8 illustrates the average PSNRs of the 50 interpolated frames with different initial weights, (a) Mobile (QCIF). (b) Tempete (CIF).

The average PSNR values of the 50 interpolated frames in Mobile (QCIF) and Tempete (CIF), when compared with the original frames, are plotted in Fig. 8. It is observed in Fig. 8 that with the increase of iteration numbers, the PSNRs are increased and tend to be converged when the iteration number is larger than 3 in these two test sequences. Another observation is that Fig. 8 keeps consistent with Fig. 7, where smaller forward and backward MSEs correspond to higher PSNR values of interpolated frames. It is noted that the forward/backward MSEs are computed between the interpolated following/previous frame ( X_2yt+2 / X-_2k ) ^anc' ^tne original following/previous frame ( X₂*+₂ /X₂A_; ), while the PSNR is computed between the interpolated intermediate frame ( Ϋ_2A+1 ) and the actual one of the intermediate frame (Y₂*_+! )■ which is skipped in the experiment. Consequently, it proves that the assumption that the weights are the same in the propagation of the method according to the invention is valid. In Fig. 7 and Fig. 8, we also find that the convergence of the proposed method is independent of the initial weights, since no matter what the initial weights are, the MSEs of both the forward and backward models as well as the PSNRs of interpolated frames tend to be converged with the increase of iteration number.

The experimental results in Figs. 7-8 provide the empirical evidence on the convergence of the proposed method. Based on these observations, the maximum iteration number is set to be 3 for all the experiments in the following subsection.

Interpolation Comparsion

In this subsection, various simulations are performed on five QCIF sequences, five CIF sequences, five 4CIF sequences and five 720P sequences to show the performance of the proposed method. The experimental results are depicted in Table 3, where the comparison methods include 3DRS [1], MCI, OBMC [8], and AOBMC [9]. Here, MCI is the traditional FRUC method, which interpolates the to-be-interpolated block as the average of two corresponding blocks in the previous and following frames. In Table 3, (W, L) represents the size of the block and the supporting order of the method according to the invention for each test sequence, and the number in the brackets in the three rightmost columns represent the gains over the best results among 3DRS, MCI, OBMC, and AOBMC. It is easy to observe that the method according to the invention ranks the first among all the comparison methods in terms of PSNR values under different initial weights for all the test sequences. The average gains of PSNR values by the method according to the invention are 0.66dB, 0.27dB, 0.49dB and 0.61 dB, compared to the best results among the 3DRS, MCI, OBMC and AOBMC methods, for the QCIF, CIF, 4CIF and 720P sequences, respectively. Particularly, for sequences with high frequency components, such as Mobile (QCIF) and Spincalendar (720P), the method according to the invention exceeds the PSNR values of the best method, among 3DRS, MCI, OBMC and AOBMC, by 1.48dB and 1.94dB, respectively, when the AOBMC method is utilized to compute the initial interpolation weights.

Table 3 The PSNRs of interpolated frames by different methods

Re Invention

Other results are presented in Appendix A here after.

From a practical point of view, regarding any implementation in electronic devices, in the case of, for example, an encoder/decoder system, the method according to the invention is based on the fact that the blocks of pixels in the first and second reference frames are available / known both to the encoder and decoder, allowing thus obtaining the predicted frame using data derived from these reference frames. The present method may also be implemented using an interpolating device for computing the predicted frame from a first and second reference frames in a video flow. The encoding, decoding or interpolating devices may typically electronic devices comprising a processor arranged to load executable instructions stored on a computer readable medium, causing said processor to perform the present method. The interpolating device may also be an encoder/decoder part of a system of computing the predicted frame from a first and second reference frames in a video flow, the system comprising a transmitting device for transmitting the video flow comprising the reference frames to the interpolating device for further computing of the predicted frame. Appendix A

Figs. 11 (a) to (c) present the PSNR values of each interpolated frame by MCI, OBMC [8], AOBMC [9] and the method according to the invention for Mobile (QCIF) and Spincalendar (72Op). It can be easily observed that no matter what interpolation methods are chosen for the computation of initial weights, the proposed method achieves higher PSNR values than the corresponding interpolation method for each interpolated frame in Mobile (QCIF) and Spincalendar (72Op). Especially for the frames around 25^th frame in Mobile (QCIF), the gain is almost 3dB and for the frames around 20^th frame, the gain is 4dB. It also reveals that the proposed method is robust enough to generate frames with higher PSNR values than the traditional interpolation methods.

Figure 11 (a)

Figure 11(b)

Figure 11(c)

Claims

W27Claims

1. A method for computing a predicted frame from a first and a second reference frames, said method comprising for each block of pixels to be predicted in the predicted frame the acts of: a) defining a first and a second block of pixels corresponding, respectively in said first and second reference frames, to the block of pixels to be predicted along the motion vector of said block of pixels to be predicted respectively from the first to the second reference frame b1) computing a first coefficient vector allowing the estimation of the second block from the first block c) computing pixels of the block of pixels to be predicted using said first coefficient vector and pixels in the first block.

2. A method according to claim 1 , further comprising an act b2) of computing a second coefficient vector allowing the transformation of the second block into the first block, the act c) further using said second coefficient vector and pixels in the second block.

3. A method according to any of the claims 1 and 2, wherein the predicted frame is sequentially positioned between the first and the second reference frames.

4. A method according to any of the claims 1 to 3, wherein, in act b1), the transformation of the first block into the second block is a non-linear transformation.

5. An interpolating device for computing a predicted frame from a first and a second reference frames of a video flow, said device being arranged to select said first and second frames from the video flow, said device being further arranged for each block of pixels to be predicted in the predicted frame to: a) define a first and a second block of pixels corresponding, respectively in said first and second reference frames, to the block of pixels to be predicted along the motion vector of said block of pixels to be predicted respectively from the first to the second reference frame b1) compute a first coefficient vector allowing the estimation of the second block from the first block c) compute pixels of the block of pixels to be predicted using said first coefficient vector and pixels in the first block.

6. A system for computing a predicted frame from a first and a second reference frames of a video flow, said system comprising:

- a transmitting device for transmitting the video flow,

- an interpolating device arranged to:

- receive the video flow from the transmitting device,

- select said first and second frames from the video flow, said device being further arranged for each block of pixels in the predicted frame to: a) define a first and a second block of pixels corresponding, respectively in said first and second reference frames, to the block of pixels to be predicted along the motion vector of said block of pixels to be predicted respectively from the first to the second reference frame b1) compute a first coefficient vector allowing the estimation of the second block from the first block c) compute pixels of the block of pixels to be predicted using said first coefficient vector and pixels in the first block.

7. A computer program providing computer executable instructions stored on a computer readable medium, which when loaded on to a data processor causes the data processor to perform a method for computing a predicted frame from a first and a second reference frames according to claims 1 to 4.