WO2010049917A2 - Image prediction method and system - Google Patents

Image prediction method and system Download PDF

Info

Publication number
WO2010049917A2
WO2010049917A2 PCT/IB2009/055226 IB2009055226W WO2010049917A2 WO 2010049917 A2 WO2010049917 A2 WO 2010049917A2 IB 2009055226 W IB2009055226 W IB 2009055226W WO 2010049917 A2 WO2010049917 A2 WO 2010049917A2
Authority
WO
WIPO (PCT)
Prior art keywords
block
pixels
predicted
frame
frames
Prior art date
Application number
PCT/IB2009/055226
Other languages
French (fr)
Other versions
WO2010049917A3 (en
Inventor
Ronggang Wang
Yongbing Zhang
Original Assignee
France Telecom
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by France Telecom filed Critical France Telecom
Publication of WO2010049917A2 publication Critical patent/WO2010049917A2/en
Publication of WO2010049917A3 publication Critical patent/WO2010049917A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/01Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
    • H04N7/0135Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level involving interpolation processes
    • H04N7/014Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level involving interpolation processes involving the use of motion vectors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4007Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/01Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
    • H04N7/0127Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level by changing the field or frame frequency of the incoming video signal, e.g. frame rate converter

Definitions

  • the present invention relates in general to image processing and more specifically to image prediction.
  • Frame rate up conversion is a prediction or interpolation method where pixels in the interpolated or predicted frame are generated based on the observations of the pixels in previous and following frames.
  • An example of application of FRUC is the reproduction of video sequences captured by digital camera for entertainment, i.e. slow-motion play back and complex video editing.
  • Another example of application of FRUC is the enhancement of visual quality in low bit rate video coding, where only parts of frames in the original sequence are encoded and all the remaining frames need to be interpolated/predicted using adjacent decoded frames.
  • FRUC may also be used for example in video surveillance, medical imaging, remote sensing etc...
  • One of the most simple and direct FRUC techniques such as frame repetition and frame averaging, neglects the motion between successive frames. It achieves good results for stationary regions in successive frames; however, for moving regions in successive images, resulting interpolated frames will be choppy and not smooth.
  • MCI Motion Compensation Interpolation
  • OBMC Overlapped Block Motion Compensation
  • 4927-4930 proposes an interpolation method by positioning overlapped blocks from the previous and the following frames utilizing a weighting window to further suppress the blocking artifacts.
  • OBMC Compared with MCI, OBMC is able to generate a much smoother interpolated frame; however it assigns fixed weights for different blocks in the position relevant to the center block and may result in blurring or oversmoothing artifacts in case of non-consistent motion regions.
  • AOBMC Adaptive OBMC
  • AOBMC is able to adjust the weights of overlapped blocks to some extent, however it has inferior performance in case of stationary region or when the neighbouring motion vectors are very similar.
  • the invention proposes a method for computing a predicted frame from a first and a second reference frames, said method comprising for each block of pixels to be predicted in the predicted frame the acts of: a) defining a first and a second block of pixels corresponding, respectively in said first and second reference frames, to the block of pixels to be predicted along the motion vector of said block of pixels to be predicted respectively from the first to the second reference frame b1) computing a first coefficient vector allowing the transformation of the first block into the second block
  • the invention also relates to a system according to claim 6.
  • the invention also relates to a device according to claim 5.
  • the invention also relates to a computer program according to claim 7.
  • the method according to the invention is performed iteratively utilizing the progressively predicted frame. More accurate coefficient (interpolation) vector may be obtained, thereby enhancing the quality of the interpolated frame.
  • Figure 1A schematically illustrates a method according to an embodiment of the present invention
  • Figure 1 B schematically illustrates a method according to an additional embodiment of the present invention
  • Figure 2 describes the motion trajectory of a pixel or a set of pixels from one frame to another in a sequence of frames of a video flow
  • Figure 3 schematically illustrates an example of pixel prediction according to an embodiment of the present invention
  • Figure 4 schematically illustrates an example of pixel prediction according to an additional embodiment of the present invention
  • Figure 5 schematically illustrates an example of pixel prediction according to another additional embodiment of the present invention.
  • Figure 6 schematically illustrates an example of pixel prediction according to an embodiment of the present invention
  • Figure 7 illustrates the MSE of the forward model and the backward model against iteration numbers with different initial weights, (a) forward and backward MSE in MoMe (QCIF). (b) forward and backward MSE in Tempete (CIF);
  • Fig. 8 illustrates the average PSNRs of the 50 interpolated frames with different initial weights,
  • (b) Tempete (CIF)
  • Figure 9 describes the method according to the invention according to one illustrative embodiment.
  • Figure 10 describes an iterative algorithm implementing the present method according to the invention. Description of Preferred Embodiments
  • routers, servers, nodes, base stations, gateways or other entities in a telecommunication network are not detailed as their implementation is beyond the scope of the present system and method.
  • the method according to the invention proposes in particular a model for predicting an image (i.e. called predicted or current image/frame) based on observations made in previous and following images.
  • the prediction is performed in the unit of block of pixels and may be performed for each block of the predicted image.
  • an image may be assimilated to a block (of pixels).
  • the method according to the invention is suitable for predicting frames in a sequence or stream of frames and allows in particular predicting a frame between a first and a second reference frames.
  • Figure 2 describes the motion trajectory of a pixel or a set of pixels from one frame to another in a sequence of frames of a video flow. Assuming a first set or block of pixels 200 in a frame 2k, then, the corresponding block, in the following frame 2k+1 , along the motion trajectory (defined by its associated motion vector) is the block 210. Similarly, the corresponding block, in the following frame 2k+2 of the frame 2k+1 , along the same motion trajectory (defined by its associated motion vector) is the block 220.
  • Figure 1a describes the method according to the invention wherein a first reference frame 100 and a second reference frame 110 are used to define, in a first act 120, a first and a second block of pixels corresponding, respectively in said first and second reference frames, to the block of pixels to be predicted along the motion vector of said block of pixels to be predicted respectively from the first to the second reference frame.
  • the first block defined here above corresponds to block 200
  • the block of pixels to be predicted corresponds to block 210
  • the second block corresponds to block 220.
  • a first coefficient vector allowing the transformation of the first block into the second block is computed.
  • This transformation corresponds to the approximation of pixels in the second block from pixels in the first block along the motion vector of the trajectory of said pixels.
  • known methods such as, e.g. Mean Square Estimation (MSE)
  • MSE Mean Square Estimation
  • the assumption is made that the first coefficient vector derived in act 130 may also be used to approximate pixels in the block of pixels to be predicted from pixels in the first block in an act 140 in order to obtain the predicted frame 150.
  • This assumption is based on the fact that there are high redundancies between consecutives or adjacent frames in a stream of frames (in particular for a video stream).
  • Figure 3 schematically describes the prediction of a pixel 311 in a predicted frame 310 from pixels in a reference frame 320 along the motion vector 330 linking the pixel to be predicted 311 and its corresponding pixel 322 in the reference frame 320.
  • the corresponding pixel 321 in the reference frame 320 is derived along the motion trajectory (shown in Figure 3 through motion vector 330).
  • a square spatial neighborhood 325, centered on the corresponding pixel 321 in the reference frame 320 is defined.
  • the pixel 311 in the predicted frame is thus approximated as a linear combination of the pixels 322 of the corresponding spatial neighborhood 325 in the reference frame 320.
  • This interpolation process may be expressed as
  • Y t (m,n) ⁇ X t _ ⁇ (m + i, ⁇ + j)*0Ci j + n t (m, n) ( 1A )
  • ⁇ t (m,n) v ' represents the predicted pixel 311 located at coordinates
  • M allows representing represents the pixels in the reference frame 320
  • - r is the radius of the filter defined by the square spatial neighborhood 325. It corresponds to a (2r+1 )x(2r+1 ) area around a pixel 321 of the reference frame wherein the pixels are weighted with the coefficients of the coefficient vector in a linear combination in order to approximate the corresponding pixel 311 in the predicted frame 310.
  • Y 2k+l is a block, with the size of FFxJF (with W an interger greater than 1), of pixels to be predicted in the predicted frame 2k + 1
  • X 2k is the first block, i.e. the aligned or corresponding block in the previous frame 2k along the motion trajectory
  • X 2k+2 is the second block, i.e. the aligned (i.e.) corresponding block in the following frame 2A ⁇ + 2 along the motion trajectory.
  • ⁇ 2k+l ( m > n ) ⁇ X 2k ⁇ i + m + v xJ + n + v y)' a i,j + n 2k + ⁇ ( m > n ) ( 1 B ) -L ⁇ (i,j) ⁇ L
  • V x , V y represents the motion vector of the block Y 2/t+1 between the first reference frame 2k and the predicted frame 2k+1 ,
  • - L is the radius of the filter defined by the square spatial neighborhood 325. It corresponds to a (2L+1 )x(2L+1 ) area around a pixel 321 of the reference frame wherein the pixels are weighted with the coefficients of the coefficient vector in a linear combination in order to approximate the corresponding pixel 311 in the predicted frame 310.
  • Y 2k+l is a block, with the size of W xW (with W an interger greater than 1), of pixels to be predicted in the predicted frame 2k + 1
  • X 2k is the first block, i.e. the aligned or corresponding block in the previous frame 2k along the motion trajectory
  • X 2k+2 is the second block, i.e. the aligned (i.e.) corresponding block in the following frame 2& + 2 along the motion trajectory.
  • Equation (1) Equation (1)
  • J is a function transferring X 2k to a (W-W)x((2L + l).(2L + l)) matrix.
  • the coefficient vector ⁇ should be chosen to be the optimum.
  • MSE Mean Square Error
  • Equation (3) The optimum coefficient vector can be computed by minimizing the MSE in Equation (3). However, since the actual/real pixels in Y 2yt+1 are not available (as it is a block of the frame to be predicted wherein, by definition, pixels have not been predicted yet), Equation (3) can not be directly used to compute the optimum weights.
  • the first coefficient vector used to compute pixels in frame 2k+1 is the optimum coefficient vector derived from the non-linear transformation used to approximate pixels in frame 2k+2 from pixels in frame 2k.
  • Figure 4 describes the use twice in a row of the scheme detailed here above.
  • the assumption that the coefficient vector, used to approximate (or estimate) the corresponding actual pixels within frame 2k + 2 , remain the same as those used to interpolate the pixels within frame 2k + 1 is accurate since there is a high redundancy between pixels along the motion trajectory from frame 2k to 2k + 2 (as they are consecutive frames) and thus it is reasonable to assume that the sample covariance does not change in those motion aligned blocks.
  • the actual pixels in the motion aligned block within frame 2k + 2 can be estimated as:
  • Equation (2) is a Wx W column vector, representing the concatenated and lexicographically ordered intensity values in the motion aligned block within frame 2k + 2
  • /(Y 2 ⁇ +I ) is a matrix, whose element is computed according to Equation (2), and ⁇ is the same with that in Equation (2).
  • Equation (2) the interpolation (i.e. approximation or estimation) of X 2 ⁇ +2 can be obtained using the corresponding pixels within the aligned/corresponding block X n as follows: Xk + I + n 2* + l ) ⁇ + n 2* + 2 + n 2* + 2
  • each pixel in the aligned block in the following frame 2k + 2 may be estimated as a weighted summation of pixels in an enlarged square neighbourhood with size of (4I + l)x(4I + l) in the previous frame 2k along its motion trajectory.
  • the length of the coefficient vector h(a) corresponding to the enlarged square is (4L + ⁇ )x(4L + l) and each element of ⁇ ( ⁇ ) is the quadratic of the elements of ⁇ .
  • each element of h(a) may be expressed as:
  • the coefficient vector ⁇ may be computed by minimizing the MSE as follows:
  • J(a) is the Jacobian matrix of r( ⁇ ) at a.
  • the Jacobian matrix J (a) can be computed as
  • ⁇ ' +1 ⁇ ' -(y' ( ⁇ ')y( ⁇ ')) "l y' ( ⁇ ')f( ⁇ ') (12).
  • Equation (12) may be modified by adding a damping factor as:
  • Figure 10 describes the iterative algorithm implementing the method according to the invention.
  • the iteration is then started and is stopped when a pre-defined number of iterations has been reached or when the convergence of the iteration has been reached (act 1010).
  • ⁇ ' is computed in an act 1015 and ⁇ ' +1 is updated in act 1020.
  • Figure 9 describes the method according to the invention according to one illustrative embodiment.
  • the predicted frame is first divided into non-overlapped blocks in act 900. If it is not the last block (act 905), forward (from frames 2k to 2k+2) and backward (from frames 2k+2 to 2k) motion vectors are derived.
  • Coefficient vector ⁇ (forward) and ⁇ (backward) are derived using the method according to the invention in respectively acts 915 and 920.
  • Pixels are predicted using ⁇ and ⁇ (respectively acts 925 and 930) and combined in act 935 to derive the predicted frame.
  • Equation (14) The proposed damping Newton algorithm to estimate the accurate weight vector ⁇ is summarized in Equations (14) and (15).
  • the forward or backward MSE E(i) is used to judge whether the damping Newton algorithm has been converged or not. In other words, if E(i) is smaller than a preset threshold T , it is considered that the damping Newton algorithm has been converged, otherwise, it is not converged and the algorithm moves to the next iteration. It is clear that the computation of a new weight vector in Equation (13) involves the operations of computing the inverse of a matrix and a multiplication of matrix. Since the Hessian matrix in Equation (13) is
  • Equation (17) Based on the convergence ratios in Equation (17), we propose a method to adaptively adjust the damping factor.
  • the adjusting of damping factor ⁇ ' is summarized in Table 2, where the variable a is the effective convergence coefficient, and b is the accelerated coefficient. It is noted that a , b and v should be positive and should satisfy ⁇ ⁇ l , b ⁇ ⁇ and v > l . In this invention, the values of a , b and v are set to be 0.7, 0.2, and 2, respectively. And ⁇ ° is set to be the identity matrix / .
  • the computation of ⁇ ' is synchronous with the computation of ⁇ ' +1 , and consequently E( ⁇ ) in Equation (14) is the same with that in Table "Damping factor adjusting" below:
  • the initialization of ⁇ ° is performed as follows.
  • the initial interpolations Y 2 V +I derived by the traditional FRUC methods, such as e.g. MCI, OBMC and AOBMC with quarter-pel accuracy motion vectors, are first obtained. Then the corresponding pixels within the motion aligned blocks in frames 2k + 1 and 2£ + 2 are predicted using the method according to the invention:
  • the initial coefficient vector ⁇ ° are then computed according to:
  • the similar iterative process may also be applied to derive a second coefficient vector ⁇ in acts 135 and 145. This is done in the reverse way, as described in
  • the predicted pixels within the current to be interpolated block may be computed/optimized as:
  • Equation (13) The computation bottleneck of the proposed algorithm is in Equation (13), which involves computing the inverse of a matrix and a multiplication of matrix.
  • Equation (13) there are many algorithms to speed up these operations.
  • the running time of the proposed clamping Newton algorithms can be further reduced.
  • PSNR peak signal-to-noise ration
  • ⁇ and Y are the interpolated frame and the actual frame, respectively
  • W ⁇ and H ⁇ are the width and the height of the frame, respectively.
  • Tempete (CIF) and Mobile (QCIF) sequences were selected to conduct the experiments for showing the convergence property of the method according to the invention.
  • the sizes of the block for Tempete (CIF) and Mobile (QCIF) are set to be 16x16 and 8x8 , respectively.
  • the supporting orders are all set to be 2 for these two test sequences. Every other frame of the first 100 frames of each test sequence was skipped and interpolated by the proposed method by using different initial weights.
  • Fig. 7 illustrates the MSE of the forward model and the backward model against iteration numbers with different initial weights,
  • Bi-directional ME as in [2] was used to derive motion vectors in these three interpolation methods, and the motion vector was of quarter-pel accuracy.
  • the motion vector post processing algorithm as in [5] was used to smooth the motion field after the derivation of motion vectors.
  • the threshold T was set to be 50 in the experiments.
  • the MSE of the forward model and the backward model averaged over the 50 interpolated frames against the number of iterations are plotted as in Fig. 7. It is easy to observe in Fig. 7 that with the increase of iteration number, the MSEs of both the forward and backward models in Mobile (QCIF) and Tempete (CIF) are decreased, and when the iteration number is larger than 3, the MSEs tend to be constant and converged.
  • QCIF Mobile
  • CIF Tempete
  • Fig. 8 illustrates the average PSNRs of the 50 interpolated frames with different initial weights, (a) Mobile (QCIF). (b) Tempete (CIF).
  • the forward/backward MSEs are computed between the interpolated following/previous frame ( X 2yt+2 / X- 2k ) anc ' tne original following/previous frame ( X 2 *+ 2 /X 2 A ; ), while the PSNR is computed between the interpolated intermediate frame ( ⁇ 2A+1 ) and the actual one of the intermediate frame (Y 2 * +! ) ⁇ which is skipped in the experiment. Consequently, it proves that the assumption that the weights are the same in the propagation of the method according to the invention is valid.
  • the method according to the invention ranks the first among all the comparison methods in terms of PSNR values under different initial weights for all the test sequences.
  • the average gains of PSNR values by the method according to the invention are 0.66dB, 0.27dB, 0.49dB and 0.61 dB, compared to the best results among the 3DRS, MCI, OBMC and AOBMC methods, for the QCIF, CIF, 4CIF and 720P sequences, respectively.
  • the method according to the invention exceeds the PSNR values of the best method, among 3DRS, MCI, OBMC and AOBMC, by 1.48dB and 1.94dB, respectively, when the AOBMC method is utilized to compute the initial interpolation weights.
  • QCIF Mobile
  • 720P Spincalendar
  • the method according to the invention is based on the fact that the blocks of pixels in the first and second reference frames are available / known both to the encoder and decoder, allowing thus obtaining the predicted frame using data derived from these reference frames.
  • the present method may also be implemented using an interpolating device for computing the predicted frame from a first and second reference frames in a video flow.
  • the encoding, decoding or interpolating devices may typically electronic devices comprising a processor arranged to load executable instructions stored on a computer readable medium, causing said processor to perform the present method.
  • the interpolating device may also be an encoder/decoder part of a system of computing the predicted frame from a first and second reference frames in a video flow, the system comprising a transmitting device for transmitting the video flow comprising the reference frames to the interpolating device for further computing of the predicted frame.
  • Figs. 11 (a) to (c) present the PSNR values of each interpolated frame by MCI, OBMC [8], AOBMC [9] and the method according to the invention for Mobile (QCIF) and Spincalendar (72Op). It can be easily observed that no matter what interpolation methods are chosen for the computation of initial weights, the proposed method achieves higher PSNR values than the corresponding interpolation method for each interpolated frame in Mobile (QCIF) and Spincalendar (72Op). Especially for the frames around 25 th frame in Mobile (QCIF), the gain is almost 3dB and for the frames around 20 th frame, the gain is 4dB. It also reveals that the proposed method is robust enough to generate frames with higher PSNR values than the traditional interpolation methods.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Television Systems (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A method for computing a predicted frame from a first and a second reference frames, said method comprising for each block of pixels to be predicted in the predicted frame the acts of: a) defining a first and a second block of pixels corresponding, respectively in said first and second reference frames, to the block of pixels to be predicted along the motion vector of said block of pixels to be predicted respectively from the first to the second reference frame b1 ) computing a first coefficient vector allowing the estimation of the second block from the first block c) computing pixels of the block of pixels to be predicted using said first coefficient vector and pixels in the first block.

Description

IMAGE PREDICTION METHOD AND SYSTEM
Field of the Invention
The present invention relates in general to image processing and more specifically to image prediction.
Background of the Invention
Frame rate up conversion (FRUC) is a prediction or interpolation method where pixels in the interpolated or predicted frame are generated based on the observations of the pixels in previous and following frames. An example of application of FRUC is the reproduction of video sequences captured by digital camera for entertainment, i.e. slow-motion play back and complex video editing. Another example of application of FRUC is the enhancement of visual quality in low bit rate video coding, where only parts of frames in the original sequence are encoded and all the remaining frames need to be interpolated/predicted using adjacent decoded frames. Besides, FRUC may also be used for example in video surveillance, medical imaging, remote sensing etc... One of the most simple and direct FRUC techniques, such as frame repetition and frame averaging, neglects the motion between successive frames. It achieves good results for stationary regions in successive frames; however, for moving regions in successive images, resulting interpolated frames will be choppy and not smooth.
Another more effective and prevalent technique is the Motion Compensation Interpolation (MCI) method, which interpolates the intermediate frame along the motion trajectory between the previous and following frames. Since motion information is used in MCI, the accuracy of motion information plays a significant role for the quality of the interpolated frames. Many pioneering works have been done to improve the accuracy of the motion information between successive frames, which can be divided into two categorizes: motion estimation (ME) method and the motion vector processing method. The paper "True-motion estimation with 3-D recursive search block matching," (G. de Haan, P. W. Biezen, H. Huijgen, and O. A. Ojo, IEEE Trans, on Circuits and System Video Technology, vol. 3, No. 5, pp. 368-379, Oct. 1993) proposes a 3D recursive (3DRS) method to derive the true motions between successive frames. The paper "New frame rate up-conversion using bi-directional motion estimation," (B. T. Choi, S. H. Lee, and S. J. Ko, IEEE Trans, on Consumer Electron., vol. 46, No. 3, pp. 603-609, Aug. 2000) proposes a bi-directional ME to obtain more reliable motions for FRUC. Besides, a hierarchical ME is also proposed in "Hierarchical motion compensated frame rate up-conversion based on the Gaussian/Laplacian pyramid," (G. I. Lee, B. W. Jeon. R. H. Park, and S. H. Lee, Proc. IEEE Int. Conf. Consumer Electronics, 2003, pp. 350-351 ) to get more faithful motions. To overcome the limitation of the translational motion with constant velocity assumption, a constant acceleration model in described in "Motion compensated frame interpolation based on H.264 decoder," (Z. Gan, L. Qi, and X. Zhu, Electronics Letters., vol. 43, No. 2, pp. 96-98, Jan. 2007) to further improve the accuracy of motion information between successive frames. Due to the absence of the actual pixels in the to-be-interpolated frame in FRUC, the ME is performed in the previous and following frames, which may result in non-consistent motion fields in FRUC sometimes. Consequently, some motion vector post processing methods were proposed in "A method for motion adaptive frame rate up-conversion," (R. Castagno, P. Haavisto, and G. Ramponi, IEEE Trans, on Circuits and System Video Technology, vol. 6, No. 5, pp. 436-446, Oct. 1996) and "Motion vector processing for frame rate up conversion" (G. Dane and T. Nguyen, Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, ICASSP, 2004, pp. 309-312), to make the motion field much smoother. MCI methods have achieved much better performance than the frame averaging and the frame repetition. Whereas the MCI method is usually performed block by block, which may not be consistent with the heterogeneous object shapes for some scenes, and thus blocking artifacts are usually perceived in the regions with complex motion.
By extending traditional MCI, Overlapped Block Motion Compensation (OBMC) was also employed in FRUC (B. Girod, "Efficiency analysis of multihypothesis motion-compensatedprediction for video coding," IEEE Trans, on Image Processing., vol.9, No.2, pp. 173-183, Feb. 2000), due to its ability of reducing blocking artifacts, which are usually observed in the interpolated frames by MCI. J. Zhai, K. Yu, J. Li, and S. Li1 "A low complexity motion compensated frame interpolation method," Proc. ISCAS, May. 2005, vol. 5, pp. 4927-4930 proposes an interpolation method by positioning overlapped blocks from the previous and the following frames utilizing a weighting window to further suppress the blocking artifacts. Compared with MCI, OBMC is able to generate a much smoother interpolated frame; however it assigns fixed weights for different blocks in the position relevant to the center block and may result in blurring or oversmoothing artifacts in case of non-consistent motion regions. For better adjusting the weights of OBMC, an Adaptive OBMC (AOBMC) is proposed in "Motion-compensated frame interpolation using bilateral motion estimation and adaptive overlapped block motion compensation," (B.D.Choi, J.W.Han, CS. Kim, and SJ. Ko, IEEE Trans, on Circuits System Video Technology, vol. 17, No. 4, pp. 407-416, Apr. 2007) to tune the weights of different blocks according to the reliability of neighbouring motion vectors. AOBMC is able to adjust the weights of overlapped blocks to some extent, however it has inferior performance in case of stationary region or when the neighbouring motion vectors are very similar.
All the afore mentioned methods can be seen as finding the blocks in the previous and following frames most similar to the to-be-interpolated block, and then taking the average of the similar blocks as the ultimate interpolations. In the majority of cases, the motion displacement may be fractional-pixel rather than integer-pixel. In a consequence, all these methods first interpolate the previous and the following frames, by a fixed interpolation tap filter, to the half-pel or quarter-pel accuracy, and then search the most similar block for each to-be-interpolated block. It is stated in Adaptive interpolation filters and high-resolution displacements for video coding," (T. Wedi,/EEE Trans, on Circuits System Video Technology, vol. 16, No. 4, pp. 484-491 , Apr. 2006) that the invariant interpolation filter dose not consider the non-stationary properties of video signals like e.g. aliasing, displacement estimation errors in the interpolation process. Consequently, the existing FRUC solutions suffer from the inferiority of the fixed interpolation filter and do not solve the problem of non-linear prediction. Today there is a need for an image prediction solution that can be easily implemented on the existing communication infrastructures, overcoming the drawbacks of the prior art.
Summary of Invention
It is an object of the present system to overcome disadvantages and/or make improvement over the prior art.
To that extend, the invention proposes a method for computing a predicted frame from a first and a second reference frames, said method comprising for each block of pixels to be predicted in the predicted frame the acts of: a) defining a first and a second block of pixels corresponding, respectively in said first and second reference frames, to the block of pixels to be predicted along the motion vector of said block of pixels to be predicted respectively from the first to the second reference frame b1) computing a first coefficient vector allowing the transformation of the first block into the second block
C) computing pixels of the block of pixels to be predicted using said first coefficient vector and pixels in the first block.
The invention also relates to a system according to claim 6.
The invention also relates to a device according to claim 5.
The invention also relates to a computer program according to claim 7.
In this paper a method is proposed to adaptively tune the interpolation coefficients of the interpolation filter for each block from the observations of previous and following frames.
Different from the state-of-art weight estimating algorithm that performs the estimation and interpolation separately, the method according to the invention is performed iteratively utilizing the progressively predicted frame. More accurate coefficient (interpolation) vector may be obtained, thereby enhancing the quality of the interpolated frame. Brief Description of the Drawings
Embodiments of the present invention will now be described solely by way of example and only with reference to the accompanying drawings, where like parts are provided with corresponding reference numerals, and in which:
Figure 1A schematically illustrates a method according to an embodiment of the present invention;
Figure 1 B schematically illustrates a method according to an additional embodiment of the present invention;
Figure 2 describes the motion trajectory of a pixel or a set of pixels from one frame to another in a sequence of frames of a video flow;
Figure 3 schematically illustrates an example of pixel prediction according to an embodiment of the present invention;
Figure 4 schematically illustrates an example of pixel prediction according to an additional embodiment of the present invention;
Figure 5 schematically illustrates an example of pixel prediction according to another additional embodiment of the present invention;
Figure 6 schematically illustrates an example of pixel prediction according to an embodiment of the present invention;
Figure 7 illustrates the MSE of the forward model and the backward model against iteration numbers with different initial weights, (a) forward and backward MSE in MoMe (QCIF). (b) forward and backward MSE in Tempete (CIF);
Fig. 8 illustrates the average PSNRs of the 50 interpolated frames with different initial weights, (a) Mobile (QCIF). (b) Tempete (CIF),
Figure 9 describes the method according to the invention according to one illustrative embodiment; and,
Figure 10 describes an iterative algorithm implementing the present method according to the invention. Description of Preferred Embodiments
The following are descriptions of exemplary embodiments that when taken in conjunction with the drawings will demonstrate the above noted features and advantages, and introduce further ones.
In the following description, for purposes of explanation rather than limitation, specific details are set forth such as architecture, interfaces, techniques, devices etc., for illustration. However, it will be apparent to those of ordinary skill in the art that other embodiments that depart from these details would still be understood to be within the scope of the appended claims.
Moreover, for the purpose of clarity, detailed descriptions of well- known devices, systems, and methods are omitted so as not to obscure the description of the present system. Furthermore, routers, servers, nodes, base stations, gateways or other entities in a telecommunication network are not detailed as their implementation is beyond the scope of the present system and method.
In addition, it should be expressly understood that the drawings are included for illustrative purposes and do not represent the scope of the present system.
The method according to the invention proposes in particular a model for predicting an image (i.e. called predicted or current image/frame) based on observations made in previous and following images. In the method according to the invention, the prediction is performed in the unit of block of pixels and may be performed for each block of the predicted image. By extension, an image may be assimilated to a block (of pixels).
The method according to the invention is suitable for predicting frames in a sequence or stream of frames and allows in particular predicting a frame between a first and a second reference frames.
Figure 2 describes the motion trajectory of a pixel or a set of pixels from one frame to another in a sequence of frames of a video flow. Assuming a first set or block of pixels 200 in a frame 2k, then, the corresponding block, in the following frame 2k+1 , along the motion trajectory (defined by its associated motion vector) is the block 210. Similarly, the corresponding block, in the following frame 2k+2 of the frame 2k+1 , along the same motion trajectory (defined by its associated motion vector) is the block 220.
Figure 1a describes the method according to the invention wherein a first reference frame 100 and a second reference frame 110 are used to define, in a first act 120, a first and a second block of pixels corresponding, respectively in said first and second reference frames, to the block of pixels to be predicted along the motion vector of said block of pixels to be predicted respectively from the first to the second reference frame.
In reference to Figure 2, the first block defined here above corresponds to block 200, the block of pixels to be predicted corresponds to block 210 and the second block corresponds to block 220.
In act 130, a first coefficient vector allowing the transformation of the first block into the second block is computed. This transformation corresponds to the approximation of pixels in the second block from pixels in the first block along the motion vector of the trajectory of said pixels. Knowing the actual (i.e. real or existing) pixels in the second block, known methods such as, e.g. Mean Square Estimation (MSE), allow deriving the optimum coefficient vector, i.e. the first coefficient vector, that give the best approximation.
The assumption is made that the first coefficient vector derived in act 130 may also be used to approximate pixels in the block of pixels to be predicted from pixels in the first block in an act 140 in order to obtain the predicted frame 150. This assumption is based on the fact that there are high redundancies between consecutives or adjacent frames in a stream of frames (in particular for a video stream).
Figure 3 schematically describes the prediction of a pixel 311 in a predicted frame 310 from pixels in a reference frame 320 along the motion vector 330 linking the pixel to be predicted 311 and its corresponding pixel 322 in the reference frame 320.
As shown in Figure 3, for each pixel 311 in the predicted frame 310, the corresponding pixel 321 in the reference frame 320 is derived along the motion trajectory (shown in Figure 3 through motion vector 330). A square spatial neighborhood 325, centered on the corresponding pixel 321 in the reference frame 320 is defined. The pixel 311 in the predicted frame is thus approximated as a linear combination of the pixels 322 of the corresponding spatial neighborhood 325 in the reference frame 320. This interpolation process may be expressed as
Yt (m,n) = ^ Xt_ι (m + i,ή + j)*0Cij + nt (m, n) (1A)
-r≤(i,j)≤r
Where:
Ϋt (m,n) v ' represents the predicted pixel 311 located at coordinates
(m,n) in the predicted frame 310,
M allows representing represents the pixels in the reference frame 320,
- ' represents the position of the corresponding pixel 321 in the reference frame 320 pointed by the motion vector 330 of the
(m,n) corresponding predicted pixel 311 located at v / in the predicted
frame 310 through Ϋt / (vm,n ') ,
ai i
'J are the components of the coefficient vector, nt (m,n)
- v ' is the additive Gaussian white noise,
- r is the radius of the filter defined by the square spatial neighborhood 325. It corresponds to a (2r+1 )x(2r+1 ) area around a pixel 321 of the reference frame wherein the pixels are weighted with the coefficients of the coefficient vector in a linear combination in order to approximate the corresponding pixel 311 in the predicted frame 310. For instance in Figure 3, the radius is r=1 and the size of the interpolation filter is 3x3. In the description here under, Y2k+l is a block, with the size of FFxJF (with W an interger greater than 1), of pixels to be predicted in the predicted frame 2k + 1 , X2k is the first block, i.e. the aligned or corresponding block in the previous frame 2k along the motion trajectory and X2k+2 is the second block, i.e. the aligned (i.e.) corresponding block in the following frame 2AΓ + 2 along the motion trajectory.
With this notation, the scheme presented in reference to Figure 3 may be described as follows here under.
If X2k (m,n) represents the pixel at (m,n) in frame 2k in the block X2κ, then the predicted pixel within block Y2k+l may be approximated as:
Ϊ2k+l (m>n) = Σ X2k {i + m + vxJ + n + vy)'ai,j + n2k+\ (m>n) (1 B) -L≤(i,j)≤L
where:
- L is the radius of the filter,
- (Vx, Vy) represents the motion vector of the block Y2/t+1 between the first reference frame 2k and the predicted frame 2k+1 ,
- CC1 J are the components of the coefficient vector from X2κ to Y2k+i>
- n2k+l (m,n) is the additive Gaussian white noise.
- L is the radius of the filter defined by the square spatial neighborhood 325. It corresponds to a (2L+1 )x(2L+1 ) area around a pixel 321 of the reference frame wherein the pixels are weighted with the coefficients of the coefficient vector in a linear combination in order to approximate the corresponding pixel 311 in the predicted frame 310. For instance in Figure 3, the radius is L=1 and the size of the interpolation filter is 3x3.
In the description here under, Y2k+l is a block, with the size of W xW (with W an interger greater than 1), of pixels to be predicted in the predicted frame 2k + 1 , X2k is the first block, i.e. the aligned or corresponding block in the previous frame 2k along the motion trajectory and X2k+2 is the second block, i.e. the aligned (i.e.) corresponding block in the following frame 2& + 2 along the motion trajectory.
Rearranging the interpolated pixels in block Y2/t+1 as a WxW column vector Y2U] ={Y2Ui{0,0),Y2M(0,l),...,Y2k+l{W-l,W-l))' , representing the concatenated and lexicographically ordered intensity values, then Equation (1) can be rewritten as:
Figure imgf000012_0001
where:
-X2k=(X2k(0i0),X2k(0,i),...,X2k(W-\,W-ϊ))' represents the concatenated and lexicographically ordered intensity values in the corresponding block,
- α = {a-L,-L>a-L,-L+ι>~->aL,L) is the coefficient vector,
- n 2/t+i is the additive Gaussian white noise vector,
J
Figure imgf000012_0002
) is a function transferring X2k to a (W-W)x((2L + l).(2L + l)) matrix.
Here the /th row of matrix /(X2J consists of the (2Z + l)x(2Z, + l) square neighbourhoods of the /th pixel within Xn , i = 0,1,...,WxW -1. The coefficient vector α should be chosen to be the optimum.
A known method is the Mean Square Error (MSE) to measure its performance as follows:
*2 (V) = £ ||Y2i+, -Y21+I ≡ E i( Y21+1 -Y24+1)' (Y24+, -Y24+1)
Figure imgf000012_0003
where | . | denotes the L2 norm.
The optimum coefficient vector can be computed by minimizing the MSE in Equation (3). However, since the actual/real pixels in Y2yt+1 are not available (as it is a block of the frame to be predicted wherein, by definition, pixels have not been predicted yet), Equation (3) can not be directly used to compute the optimum weights.
Thus, in the method according to the invention, the first coefficient vector used to compute pixels in frame 2k+1 is the optimum coefficient vector derived from the non-linear transformation used to approximate pixels in frame 2k+2 from pixels in frame 2k.
Figure 4 describes the use twice in a row of the scheme detailed here above. The assumption that the coefficient vector, used to approximate (or estimate) the corresponding actual pixels within frame 2k + 2 , remain the same as those used to interpolate the pixels within frame 2k + 1 is accurate since there is a high redundancy between pixels along the motion trajectory from frame 2k to 2k + 2 (as they are consecutive frames) and thus it is reasonable to assume that the sample covariance does not change in those motion aligned blocks.
As shown on Figure 4, the actual pixels in the motion aligned block within frame 2k + 2 can be estimated as:
X2*+2 = /(Y2W )« + n 2k+2 , (4)
where X2A+2 is a Wx W column vector, representing the concatenated and lexicographically ordered intensity values in the motion aligned block within frame 2k + 2 , /(Y2λ+I ) is a matrix, whose element is computed according to Equation (2), and α is the same with that in Equation (2). Incorporating Equation (2) into Equation (4), the interpolation (i.e. approximation or estimation) of X2^+2 can be obtained using the corresponding pixels within the aligned/corresponding block Xn as follows: Xk+I
Figure imgf000014_0001
+ n2*+l )α + n2*+2
Figure imgf000014_0002
+ n2*+2
(5) where g(X2A ) is a function which transfers the column vector X2A to a
(W*W)x((4L +
Figure imgf000014_0003
matrix, Λ(α) represents the coefficient vector corresponding to the enlarged square neighbourhood with size of (4I + l)x(4Z + l) , shown in frame 2k in Figure 5.
In Figure 5, each pixel in the aligned block in the following frame 2k + 2 may be estimated as a weighted summation of pixels in an enlarged square neighbourhood with size of (4I + l)x(4I + l) in the previous frame 2k along its motion trajectory. It should be noted that the length of the coefficient vector h(a) corresponding to the enlarged square is (4L + \)x(4L + l) and each element of Λ(α) is the quadratic of the elements of α . In other words, in a two dimensional way, each element of h(a) may be expressed as:
Figure imgf000014_0004
with -L ≤i,j ≤Lf]-L ≤ p,q ≤L and -2L ≤m,n≤2L . Here the Λh row of matrix g(X2A ) consists of the (4L + \)x(4L + l) square neighbourhood of the Λh pixel within X2A , i = 0,i,...,WxW-i .
From Equation 5, the coefficient vector α may be computed by minimizing the MSE as follows:
Figure imgf000014_0005
where r {a) = X2k+2 — g [X2k ) h(a) is defined as the residual vector.
The principle behind Equation (7) is that the original frame should satisfy 3f(α)/3α = 0 as seen in reference with Figure 3. However, as mentioned in the previous subsection r (α) is not linear with respect to α like it were in the scheme described in reference to Figure 3. Thus, α cannot be directly computed by setting 9£-(α)/3α = 0 as previously emphasized.
In an illustrative embodiment of the method according to the invention, a is assumed to be the approximation of α, F(α) may be expressed around a in Taylor series as:
r(α)~r(a) + J(a)(α-a) t (8)
where J(a) is the Jacobian matrix of r(α) at a. The Jacobian matrix J (a) can be computed as
Figure imgf000015_0001
with:
— ^ -2Ji\X2k)α (10)
where:
Figure imgf000015_0002
+ l)x(2Z + l)-l, and /(X2;) represents the /th row of the matrix /(X2J with i = (m/W-L + n/L)xW + (m/W-L + n/L).
Let r(α) = 0, α may be computed as:
α = Λ-(J' (a)J(a))"1 J' (a)r (a) (11). Consequently, according to the method according to the invention, if the coefficient vector α' is derived after the /th iteration, then the coefficient vector in the (/+1 )th iteration may be computed as:
α'+1 = α' -(y' (α')y(α'))"l y' (α')f(α') (12).
Since this scheme imposes a strong demand on the initial value of α , in an optional embodiment of the method according to the invention, Equation (12) may be modified by adding a damping factor as:
ai+l =al -of (j' (a'jjfo))'1 J* (a'jrfi) (13)
where
Figure imgf000016_0001
(2L + l)-(2L + l)-l , is the damping factor.
By adding the damping factor, the convergence of the method according to the invention in this illustrative embodiment is ensured and the convergence speed is accelerated.
Figure 10 describes the iterative algorithm implementing the method according to the invention.
In an act 1000, α° is initialized (as explained further).
In act 1005, J(α°)and r (a°) are initialized.
The iteration is then started and is stopped when a pre-defined number of iterations has been reached or when the convergence of the iteration has been reached (act 1010).
Δα' is computed in an act 1015 and α'+1 is updated in act 1020.
r(α'+1 ) and J(α'+I ) are updated in act 1025 and the iteration restarts at act 1010 with the stopping conditions until said condition is fulfilled. Figure 9 describes the method according to the invention according to one illustrative embodiment.
The predicted frame is first divided into non-overlapped blocks in act 900. If it is not the last block (act 905), forward (from frames 2k to 2k+2) and backward (from frames 2k+2 to 2k) motion vectors are derived.
Coefficient vector α (forward) and β (backward) are derived using the method according to the invention in respectively acts 915 and 920.
Pixels are predicted using α and β (respectively acts 925 and 930) and combined in act 935 to derive the predicted frame.
In the following description, an illustrative implementation, called here after damping Newton algorithm, of the method according to the invention is detailed.
initialize Y2 3^+1 and α° ; for / = 0 to iMax-1
Compute Y2^1 according to Eq. (2);
Compute the forward MSE: E(i) = |Y^+I - %k+\\ . (14);
if E{i) < T
break; end if
Compute J(α' ) according to Eq. (9) and Eq. (10).
Compute r (α' ) : r (ai) = X2k+2 -g{X2k )h(ai) (15);
Compute cd according to Table 2; Compute α'+l according to Eq. (13); end for
The proposed damping Newton algorithm to estimate the accurate weight vector α is summarized in Equations (14) and (15).
As shown in the summary of the iterative algorithm, the forward or backward MSE E(i) is used to judge whether the damping Newton algorithm has been converged or not. In other words, if E(i) is smaller than a preset threshold T , it is considered that the damping Newton algorithm has been converged, otherwise, it is not converged and the algorithm moves to the next iteration. It is clear that the computation of a new weight vector in Equation (13) involves the operations of computing the inverse of a matrix and a multiplication of matrix. Since the Hessian matrix in Equation (13) is
A = J ya J J ya J \\ \s obvious that A is positive definitive. And thus we
can obtain α'+1 by computing the inverse of A in Eq. (13) in our proposed method.
If we define the change of a weight vector α'+1 compared to α' as
Figure imgf000018_0001
(16) then the convergence ratio of each weight is defined as
Figure imgf000018_0002
(17)
Based on the convergence ratios in Equation (17), we propose a method to adaptively adjust the damping factor. The adjusting of damping factor ω' is summarized in Table 2, where the variable a is the effective convergence coefficient, and b is the accelerated coefficient. It is noted that a , b and v should be positive and should satisfy α ≤ l , b < \ and v > l . In this invention, the values of a , b and v are set to be 0.7, 0.2, and 2, respectively. And ω° is set to be the identity matrix / . The computation of ω' is synchronous with the computation of α'+1 , and consequently E(ϊ) in Equation (14) is the same with that in Table "Damping factor adjusting" below:
Damping factor adjusting:
initialize ω° ; for / = 1 to iMax-1
Compute φ(aι ) according to Eq. (16);
for ; = 1 to (2I + 1).(2Z + 1)-1
Compute A1J according to Eq. (17);
Figure imgf000019_0001
end if
Figure imgf000019_0002
b
o)'J = ω' J »v ; end if end for if E(i) < T
break; end if end for
Since the convergence of the proposed method depends on the selection of the initial coefficient vector α°, setting proper α° is very important. In this example of implementation, the initialization of α° is performed as follows. The initial interpolations Y2V+I derived by the traditional FRUC methods, such as e.g. MCI, OBMC and AOBMC with quarter-pel accuracy motion vectors, are first obtained. Then the corresponding pixels within the motion aligned blocks in frames 2k + 1 and 2£ + 2 are predicted using the method according to the invention:
Figure imgf000020_0001
The initial coefficient vector α° are then computed according to:
Figure imgf000020_0002
(20)
In an additional embodiment of the method according to the invention, in reference to Figure 1 B, after deriving the first coefficient weight α , the similar iterative process may also be applied to derive a second coefficient vector β in acts 135 and 145. This is done in the reverse way, as described in
Figure 6 (with parallel to Figure 4) from X2k+2 instead of X2κ applying the method backward.
Finally, in order to make the prediction more accurate, the predicted pixels within the current to be interpolated block may be computed/optimized as:
Y2It+I = (/(X2t ) β + /(X2t+2 ) β) / 2 (2D
The computation bottleneck of the proposed algorithm is in Equation (13), which involves computing the inverse of a matrix and a multiplication of matrix. However, there are many algorithms to speed up these operations. In addition, with fast algorithms and more powerful computing resources, the running time of the proposed clamping Newton algorithms can be further reduced.
Various experiments are conducted to show the performance of the method according to the invention and compare it to other methods. We will first study the convergence property of the method according to the invention. Subseqentially, the interpolation performance comparsions are presented. To evaluate the method according to the invention performance, the following well-known metrics, peak signal-to-noise ration (PSNR), are used in this work. The PSNR is defined as follows.
, (22)
Figure imgf000021_0001
where Ϋ and Y are the interpolated frame and the actual frame, respectively, Wγ and Hγ are the width and the height of the frame, respectively.
Convergence Study
In this subsection, various experiments were conducted to show the effect of initial weights on the convergence of the method according to the invention. Tempete (CIF) and Mobile (QCIF) sequences were selected to conduct the experiments for showing the convergence property of the method according to the invention. The sizes of the block for Tempete (CIF) and Mobile (QCIF) are set to be 16x16 and 8x8 , respectively. And the supporting orders are all set to be 2 for these two test sequences. Every other frame of the first 100 frames of each test sequence was skipped and interpolated by the proposed method by using different initial weights.
Fig. 7 illustrates the MSE of the forward model and the backward model against iteration numbers with different initial weights, (a) forward and backward MSE in Mobile (QCIF). (b) forward and backward MSE in Tempete (CIF). Three interpolation results, MCI, OBMC [8], and AOBMC [9], were selected for the computation of initial weights. Bi-directional ME as in [2] was used to derive motion vectors in these three interpolation methods, and the motion vector was of quarter-pel accuracy. The motion vector post processing algorithm as in [5] was used to smooth the motion field after the derivation of motion vectors. The threshold T was set to be 50 in the experiments. The MSE of the forward model and the backward model averaged over the 50 interpolated frames against the number of iterations are plotted as in Fig. 7. It is easy to observe in Fig. 7 that with the increase of iteration number, the MSEs of both the forward and backward models in Mobile (QCIF) and Tempete (CIF) are decreased, and when the iteration number is larger than 3, the MSEs tend to be constant and converged.
Fig. 8 illustrates the average PSNRs of the 50 interpolated frames with different initial weights, (a) Mobile (QCIF). (b) Tempete (CIF).
The average PSNR values of the 50 interpolated frames in Mobile (QCIF) and Tempete (CIF), when compared with the original frames, are plotted in Fig. 8. It is observed in Fig. 8 that with the increase of iteration numbers, the PSNRs are increased and tend to be converged when the iteration number is larger than 3 in these two test sequences. Another observation is that Fig. 8 keeps consistent with Fig. 7, where smaller forward and backward MSEs correspond to higher PSNR values of interpolated frames. It is noted that the forward/backward MSEs are computed between the interpolated following/previous frame ( X2yt+2 / X-2k ) anc' tne original following/previous frame ( X2*+2 /X2A; ), while the PSNR is computed between the interpolated intermediate frame ( Ϋ2A+1 ) and the actual one of the intermediate frame (Y2*+! )■ which is skipped in the experiment. Consequently, it proves that the assumption that the weights are the same in the propagation of the method according to the invention is valid. In Fig. 7 and Fig. 8, we also find that the convergence of the proposed method is independent of the initial weights, since no matter what the initial weights are, the MSEs of both the forward and backward models as well as the PSNRs of interpolated frames tend to be converged with the increase of iteration number.
The experimental results in Figs. 7-8 provide the empirical evidence on the convergence of the proposed method. Based on these observations, the maximum iteration number is set to be 3 for all the experiments in the following subsection.
Interpolation Comparsion
In this subsection, various simulations are performed on five QCIF sequences, five CIF sequences, five 4CIF sequences and five 720P sequences to show the performance of the proposed method. The experimental results are depicted in Table 3, where the comparison methods include 3DRS [1], MCI, OBMC [8], and AOBMC [9]. Here, MCI is the traditional FRUC method, which interpolates the to-be-interpolated block as the average of two corresponding blocks in the previous and following frames. In Table 3, (W, L) represents the size of the block and the supporting order of the method according to the invention for each test sequence, and the number in the brackets in the three rightmost columns represent the gains over the best results among 3DRS, MCI, OBMC, and AOBMC. It is easy to observe that the method according to the invention ranks the first among all the comparison methods in terms of PSNR values under different initial weights for all the test sequences. The average gains of PSNR values by the method according to the invention are 0.66dB, 0.27dB, 0.49dB and 0.61 dB, compared to the best results among the 3DRS, MCI, OBMC and AOBMC methods, for the QCIF, CIF, 4CIF and 720P sequences, respectively. Particularly, for sequences with high frequency components, such as Mobile (QCIF) and Spincalendar (720P), the method according to the invention exceeds the PSNR values of the best method, among 3DRS, MCI, OBMC and AOBMC, by 1.48dB and 1.94dB, respectively, when the AOBMC method is utilized to compute the initial interpolation weights.
Table 3 The PSNRs of interpolated frames by different methods
Re Invention
Figure imgf000023_0001
Figure imgf000024_0001
Figure imgf000025_0001
Other results are presented in Appendix A here after.
From a practical point of view, regarding any implementation in electronic devices, in the case of, for example, an encoder/decoder system, the method according to the invention is based on the fact that the blocks of pixels in the first and second reference frames are available / known both to the encoder and decoder, allowing thus obtaining the predicted frame using data derived from these reference frames. The present method may also be implemented using an interpolating device for computing the predicted frame from a first and second reference frames in a video flow. The encoding, decoding or interpolating devices may typically electronic devices comprising a processor arranged to load executable instructions stored on a computer readable medium, causing said processor to perform the present method. The interpolating device may also be an encoder/decoder part of a system of computing the predicted frame from a first and second reference frames in a video flow, the system comprising a transmitting device for transmitting the video flow comprising the reference frames to the interpolating device for further computing of the predicted frame. Appendix A
Figs. 11 (a) to (c) present the PSNR values of each interpolated frame by MCI, OBMC [8], AOBMC [9] and the method according to the invention for Mobile (QCIF) and Spincalendar (72Op). It can be easily observed that no matter what interpolation methods are chosen for the computation of initial weights, the proposed method achieves higher PSNR values than the corresponding interpolation method for each interpolated frame in Mobile (QCIF) and Spincalendar (72Op). Especially for the frames around 25th frame in Mobile (QCIF), the gain is almost 3dB and for the frames around 20th frame, the gain is 4dB. It also reveals that the proposed method is robust enough to generate frames with higher PSNR values than the traditional interpolation methods.
Figure imgf000026_0001
Figure 11 (a)
Figure imgf000027_0001
Figure 11(b)
Figure imgf000028_0001
Figure 11(c)

Claims

W27Claims
1. A method for computing a predicted frame from a first and a second reference frames, said method comprising for each block of pixels to be predicted in the predicted frame the acts of: a) defining a first and a second block of pixels corresponding, respectively in said first and second reference frames, to the block of pixels to be predicted along the motion vector of said block of pixels to be predicted respectively from the first to the second reference frame b1) computing a first coefficient vector allowing the estimation of the second block from the first block c) computing pixels of the block of pixels to be predicted using said first coefficient vector and pixels in the first block.
2. A method according to claim 1 , further comprising an act b2) of computing a second coefficient vector allowing the transformation of the second block into the first block, the act c) further using said second coefficient vector and pixels in the second block.
3. A method according to any of the claims 1 and 2, wherein the predicted frame is sequentially positioned between the first and the second reference frames.
4. A method according to any of the claims 1 to 3, wherein, in act b1), the transformation of the first block into the second block is a non-linear transformation.
5. An interpolating device for computing a predicted frame from a first and a second reference frames of a video flow, said device being arranged to select said first and second frames from the video flow, said device being further arranged for each block of pixels to be predicted in the predicted frame to: a) define a first and a second block of pixels corresponding, respectively in said first and second reference frames, to the block of pixels to be predicted along the motion vector of said block of pixels to be predicted respectively from the first to the second reference frame b1) compute a first coefficient vector allowing the estimation of the second block from the first block c) compute pixels of the block of pixels to be predicted using said first coefficient vector and pixels in the first block.
6. A system for computing a predicted frame from a first and a second reference frames of a video flow, said system comprising:
- a transmitting device for transmitting the video flow,
- an interpolating device arranged to:
- receive the video flow from the transmitting device,
- select said first and second frames from the video flow, said device being further arranged for each block of pixels in the predicted frame to: a) define a first and a second block of pixels corresponding, respectively in said first and second reference frames, to the block of pixels to be predicted along the motion vector of said block of pixels to be predicted respectively from the first to the second reference frame b1) compute a first coefficient vector allowing the estimation of the second block from the first block c) compute pixels of the block of pixels to be predicted using said first coefficient vector and pixels in the first block.
7. A computer program providing computer executable instructions stored on a computer readable medium, which when loaded on to a data processor causes the data processor to perform a method for computing a predicted frame from a first and a second reference frames according to claims 1 to 4.
PCT/IB2009/055226 2008-10-31 2009-10-20 Image prediction method and system WO2010049917A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CNPCT/CN2008/072904 2008-10-31
CN2008072904 2008-10-31

Publications (2)

Publication Number Publication Date
WO2010049917A2 true WO2010049917A2 (en) 2010-05-06
WO2010049917A3 WO2010049917A3 (en) 2010-08-19

Family

ID=42129396

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2009/055226 WO2010049917A2 (en) 2008-10-31 2009-10-20 Image prediction method and system

Country Status (1)

Country Link
WO (1) WO2010049917A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012001520A2 (en) 2010-06-30 2012-01-05 France Telecom Pixel interpolation method and system
WO2017035831A1 (en) * 2015-09-06 2017-03-09 Mediatek Inc. Adaptive inter prediction
TWI646836B (en) * 2017-06-05 2019-01-01 元智大學 Frame rate up-conversion method and architecture thereof

Non-Patent Citations (10)

* Cited by examiner, † Cited by third party
Title
B. GIROD: "Efficiency analysis of multihypothesis motion-compensatedprediction for video coding", IEEE TRANS. ON IMAGE PROCESSING, vol. 9, no. 2, February 2000 (2000-02-01), pages 173 - 183
B. T. CHOI; S. H. LEE; S. J. KO, IEEE TRANS. ON CONSUMER ELECTRON., vol. 46, no. 3, August 2000 (2000-08-01), pages 603 - 609
B.D.CHOI; J.W.HAN; C.S. KIM; S.J. KO, IEEE TRANS. ON CIRCUITS SYSTEM VIDEO TECHNOLOGY, vol. 17, no. 4, April 2007 (2007-04-01), pages 407 - 416
G. DANE; T. NGUYEN, PROC. IEEE INT. CONF ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2004, pages 309 - 312
G. DE HAAN; P. W. BIEZEN; H. HUIJGEN; O. A. OJO, IEEE TRANS. ON CIRCUITS AND SYSTEM VIDEO TECHNOLOGY, vol. 3, no. 5, October 1993 (1993-10-01), pages 368 - 379
G. I. LEE; B. W. JEON.; R. H. PARK; S. H. LEE, PROC. IEEE INT. CONF CONSUMER ELECTRONICS, 2003, pages 350 - 351
J. ZHAI; K. YU; J. LI; S. LI: "A low complexity motion compensated frame interpolation method", PROC. ISCAS, vol. 5, May 2005 (2005-05-01), pages 4927 - 4930
R. CASTAGNO; P. HAAVISTO; G. RAMPONI, IEEE TRANS. ON CIRCUITS AND SYSTEM VIDEO TECHNOLOGY, vol. 6, no. 5, October 1996 (1996-10-01), pages 436 - 446
T. WEDI, EEE TRANS. ON CIRCUITS SYSTEM VIDEO TECHNOLOGY, vol. 16, no. 4, April 2006 (2006-04-01), pages 484 - 491
Z. GAN; L. QI; X. ZHU, ELECTRONICS LETTERS, vol. 43, no. 2, January 2007 (2007-01-01), pages 96 - 98

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012001520A2 (en) 2010-06-30 2012-01-05 France Telecom Pixel interpolation method and system
WO2012001520A3 (en) * 2010-06-30 2012-03-01 France Telecom Pixel interpolation method and system
WO2017035831A1 (en) * 2015-09-06 2017-03-09 Mediatek Inc. Adaptive inter prediction
WO2017036417A1 (en) * 2015-09-06 2017-03-09 Mediatek Inc. Method and apparatus of adaptive inter prediction in video coding
US10979707B2 (en) 2015-09-06 2021-04-13 Mediatek Inc. Method and apparatus of adaptive inter prediction in video coding
TWI646836B (en) * 2017-06-05 2019-01-01 元智大學 Frame rate up-conversion method and architecture thereof

Also Published As

Publication number Publication date
WO2010049917A3 (en) 2010-08-19

Similar Documents

Publication Publication Date Title
JP3393832B2 (en) Image Data Interpolation Method for Electronic Digital Image Sequence Reproduction System
EP1747678B1 (en) Method and apparatus for motion compensated frame rate up conversion
Choi et al. Motion-compensated frame interpolation using bilateral motion estimation and adaptive overlapped block motion compensation
Wang et al. Hybrid de-interlacing algorithm based on motion vector reliability
JP4486560B2 (en) Scalable encoding method and apparatus, scalable decoding method and apparatus, program thereof, and recording medium thereof
Dane et al. Motion vector processing for frame rate up conversion
EP1960967A1 (en) Motion estimation using prediction guided decimated search
Zhang et al. A spatio-temporal auto regressive model for frame rate upconversion
WO2013049412A2 (en) Reduced complexity motion compensated temporal processing
KR100565066B1 (en) Method for interpolating frame with motion compensation by overlapped block motion estimation and frame-rate converter using thereof
KR100584597B1 (en) Method for estimating motion adapting adaptive weighting and frame-rate converter using thereof
Zhang et al. A motion-aligned auto-regressive model for frame rate up conversion
WO2010049917A2 (en) Image prediction method and system
WO2010032229A1 (en) Frame rate up conversion method and system
Guo et al. Frame rate up-conversion using linear quadratic motion estimation and trilateral filtering motion smoothing
KR100393063B1 (en) Video decoder having frame rate conversion and decoding method
EP2359601A1 (en) Image prediction method and system
Anagün et al. Super resolution using variable size block-matching motion estimation with rotation
Zhao et al. Frame rate up-conversion based on edge information
Ghutke Temporal video frame interpolation using new cubic motion compensation technique
John Motion compensation based multiple inter frame interpolation
CN102204256A (en) Image prediction method and system
KR101428531B1 (en) A Multi-Frame-Based Super Resolution Method by Using Motion Vector Normalization and Edge Pattern Analysis
Tourapis et al. Advanced deinterlacing techniques with the use of zonal-based algorithms
KR100228684B1 (en) A temporal predictive error concealment method and apparatus based on the motion estimation

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09813837

Country of ref document: EP

Kind code of ref document: A2

122 Ep: pct application non-entry in european phase

Ref document number: 09813837

Country of ref document: EP

Kind code of ref document: A2