WO2012000191A1 - Pixel interpolation method and system - Google Patents

Pixel interpolation method and system Download PDF

Info

Publication number
WO2012000191A1
WO2012000191A1 PCT/CN2010/074839 CN2010074839W WO2012000191A1 WO 2012000191 A1 WO2012000191 A1 WO 2012000191A1 CN 2010074839 W CN2010074839 W CN 2010074839W WO 2012000191 A1 WO2012000191 A1 WO 2012000191A1
Authority
WO
WIPO (PCT)
Prior art keywords
pixel
predicted
combination
neighborhood
pixels
Prior art date
Application number
PCT/CN2010/074839
Other languages
French (fr)
Inventor
Ronggang Wang
Yongbing Zhang
Dong YUAN
Original Assignee
France Telecom Research & Development Beijing Company Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by France Telecom Research & Development Beijing Company Limited filed Critical France Telecom Research & Development Beijing Company Limited
Priority to PCT/CN2010/074839 priority Critical patent/WO2012000191A1/en
Priority to PCT/IB2011/001857 priority patent/WO2012001520A2/en
Publication of WO2012000191A1 publication Critical patent/WO2012000191A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4007Interpolation-based scaling, e.g. bilinear interpolation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/182Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/192Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/577Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/587Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal sub-sampling or interpolation, e.g. decimation or subsequent interpolation of pictures in a video sequence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
    • H04N19/82Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/01Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
    • H04N7/0127Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level by changing the field or frame frequency of the incoming video signal, e.g. frame rate converter
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/01Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
    • H04N7/0127Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level by changing the field or frame frequency of the incoming video signal, e.g. frame rate converter
    • H04N7/0132Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level by changing the field or frame frequency of the incoming video signal, e.g. frame rate converter the field or frame frequency of the incoming video signal being multiplied by a positive integer, e.g. for flicker reduction

Definitions

  • an act 230 further allows computing a second predicted pixel as a combination of pixels in a neighborhood of said second pixel using second combination coefficients, said second combination coefficients allowing computing said first pixel as a combination of pixels of said neighborhood of said second pixel,

Abstract

A method of computing a frame to be predicted from a first and a second reference frames, said method comprising for each pixel to be predicted in the frame to be predicted the acts of defining a first and a second pixel corresponding, respectively in said first and second reference frames, to the pixel to be predicted along the motion vector of said pixel to be predicted; computing a first predicted pixel as a combination of pixels in a neighborhood of said first pixel using first combination coefficients, said first combination coefficients allowing computing said second pixel as a combination of pixels of said neighborhood of said first pixel, wherein said first combination coefficients are calculated using a similarity function representing a grey level difference between a pixel of said neighborhood of said first pixel and the pixel to be predicted; and computing said pixel to be predicted from said first predicted pixel.

Description

PIXEL INTERPOLATION METHOD AND SYSTEM
Field of the Invention
The present invention relates in general to media content and more specifically to video coding.
Background of the Invention
Prediction is a statistical estimation procedure where one or more random variables are estimated from observations of other random variables. It is called prediction when the variables to be estimated are in some sense associated with the "future" and the observable variables are associated with the "past" (or past and future) . One of the simplest, prevalent prediction techniques is linear prediction. Linear prediction is intuitively satisfying since it is powerful and yet fairly simple to understand and use. Much of linear prediction consists of predicting one vector given another. The most common use of prediction is to estimate a sample of a stationary random process from observations of several prior samples.
A video comprises a plurality of successive images or frames. A frame comprises pixels which may further me divided in sub-pixels. A frame may be divided into blocks of pixels. An application of prediction is in image/video compression where a block of pixels is estimated from an observed "prior" block of pixels in a block raster scan of the image or in a forward or backward reference image. In an existing solution, each predicted frame is divided into non- overlapped blocks, and a motion vector derived for each block by performing the existing technique known as Motion Estimation (ME) in a reference frame. Motion Estimation allows deriving a motion vector between two frames. The motion vector is a vector starting from one point of a first frame and pointing toward the corresponding point in a second frame (e.g. the next frame) along the motion trajectory (i.e. in the direction) of said point from the first frame to the second frame. After that, each block may be predicted using a known technique called Motion Compensation (MC) . Motion Compensation allows obtaining the corresponding block in the reference frame pointed by the derived motion vectors. By doing this, it can help to eliminate redundancy so that there is less waste during the compression process. Consequently, fewer bits are needed to describe the residual obtained during the compression process. However, linear prediction is actually not an accurate prediction method as the MC technique is based on the assumption that an object represented by pixels is performing translational motion, which is not always true. Besides, for non-Gaussian processes, linear prediction cannot fully squeeze out all possible information about the past or the future that will help to predict the current frame.
Patent application WO201004991 7 discloses a method for predicting pixels in a frame using backward and forward frames. A drawback of this method is that each pixel is interpolated as the weighted summation of a sample space including the pixels within its two temporal neighborhoods from the previous and following original frames as well as the available interpolated pixels within its spatial neighborhood in the current to-be-interpolated frame. This method makes difficult to capture the local image structure in case of acute motion regions, since the temporal neighborhood is centered at the collocated pixel. The reference data of filter training process in patent application WO201004991 7 considers only processing pixels by block using the motion vector of the block and a constant filter weights for pixels located in the same block. Consequently, when there is complex motion associated with the pixel to be predicted, the motion of collocated pixels in a temporal neighborhood tends to be different than the one of the current pixel to be interpolated (i.e. predicted), involving thus degraded performances of the prediction.
Today there is no solution to efficiently predict pixels in a video frame that allow optimizing the quality of the image and thus improving efficiency of such video coding and compression systems.
Today there is a need for a pixel prediction solution that can be easily implemented on the existing communication infrastructures.
Summary of Invention
It is an object of the present system to overcome disadvantages and/or make improvement over the prior art.
To that extend, the invention proposes a method of computing a frame to be predicted from a first and a second reference frames, said method comprising for each pixel to be predicted in the frame to be predicted the acts of:
a) defining a first and a second pixel corresponding, respectively in said first and second reference frames, to the pixel to be predicted along the motion vector of said pixel to be predicted,
bl ) computing a first predicted pixel as a combination of pixels in a neighborhood of said first pixel using first combination coefficients, said first combination coefficients allowing computing said second pixel as a combination of pixels of said neighborhood of said first pixel, wherein said first combination coefficients are calculated using a similarity function representing a grey level difference between a pixel of said neighborhood of said first pixel and the pixel to be predicted,
c) computing said pixel to be predicted from said first predicted pixel. An advantage of the method according to the invention is that, as each coefficient of the first combination coefficients is calculated using a similarity function representing a grey level difference between two pixels of said neighborhood of said first pixel, each coefficient may be different and reflects the similarity of each pixel in the neighborhood with the pixel to be predicted, allowing thus optimizing the accuracy of the prediction for each pixel. Another advantage of the method according to the invention is that the prediction is based on the motion trajectory for each pixel, achieving thus achieve more robust performances.
The invention also proposes a method according to claim 2. An advantage is that, as the exact difference between each pixel in the combination of pixels in the neighborhood of the first pixel and the pixel to be predicted is not known prior to any estimation, using the probability that each pixel in the combination of pixels in the neighborhood of the first pixel and the pixel to be predicted have the same value allows avoiding having a different texture structure between each pixel in the combination of pixels in the neighborhood of the first pixel and the pixel to be predicted compared with the pixel to be predicted, allowing thus achieving better results.
The invention also proposes a method according to claim 3. An advantage is that the similarity function considers both distance and similarity between pixels, allowing thus determining an accurate probability that each pixel in the combination of pixels in the neighborhood of the first pixel and the pixel to be predicted have the same value.
The invention also proposes a method according to claim 4. An advantage is that it allows practically deriving the first combination coefficients using a mean square error method.
The invention also proposes a method according to claim 5. An advantage is that a better result may be achieved using second combination coefficients as the results of forward and backward estimations of the pixel to be predicted may be averaged.
The invention also proposes a method according to claim 6. An advantage is that it allows practically predicting the pixel to be predicted by averaging the results of forward and backward estimations of the pixel to be predicted. The invention also proposes a method according to claim 7. An advantage is that the estimation of a pixel in a frame sequentially positioned between the first and second reference frames is based on the proximity of these the first and second reference frames, improving thus the accuracy of the method .
The invention also proposes a device according to claim 8.
The invention also proposes a system according to claim 14.
The invention also proposes a readable computer program according to claim 1 5. Brief Description of the Drawings
Embodiments of the present invention will now be described solely by way of example and only with reference to the accompanying drawings, where like parts are provided with corresponding reference numerals, and in which :
Figure 1 A schematically illustrates forward prediction according to an embodiment of the present invention;
Figure I B schematically illustrates backward prediction according to an embodiment of the present invention;
Figure 2 schematically illustrates forward prediction for an intermediate frame to be predicted according to an embodiment of the present invention;
Figure 3 schematically illustrates a method according to an embodiment of the present invention;
Figure 4 schematically illustrates a method according to an embodiment of the present invention;
Figure 5 schematically illustrates a process of forward prediction according to an embodiment of the present invention;
Figure 6 schematically illustrates a flowchart according to an embodiment of the present invention;
Figure 7 schematically illustrates a flowchart according to an embodiment of the present invention.
Description of Preferred Embodiments
The following are descriptions of exemplary embodiments that when taken in conjunction with the drawings will demonstrate the above noted features and advantages, and introduce further ones. In the following description, for purposes of explanation rather than limitation, specific details are set forth such as architecture, interfaces, techniques, devices etc., for illustration. However, it will be apparent to those of ordinary skill in the art that other embodiments that depart from these details would still be understood to be within the scope of the appended claims. Moreover, for the purpose of clarify, detailed descriptions of well-known devices, systems, and methods are omitted so as not to obscure the description of the present system. Furthermore, entities for video coding/decoding or compression/decompression are not detailed as their implementation is beyond the scope of the present system and method. Unless specified otherwise, the exemplary embodiment will be described hereafter in its application to a video coder. In addition, it should be expressly understood that the drawings are included for illustrative purposes and do not represent the scope of the present system.
The method according to an embodiment of the invention proposes in particular a model for predicting an image (i.e. called predicted or current image/frame) based on observations made in previous and following frames. In the method according to the invention, the prediction may be performed on each pixel. By extension, an image or frame may be assimilated to a matrix or block of pixels.
The method according to an embodiment of the invention is suitable for predicting frames in a sequence or stream of frames and allows in particular predicting a frame between any two first and second reference frames in the stream of frames, such as for example the forward and backward adjacent frames.
The method according to an embodiment of the invention allows predicting pixels in a frame to be predicted given a first and a second reference frames.
An auto-regressive (AR) model allows predicting an image pixel based on the corresponding forward and/or backward observations in forward and/or backward frames. In an illustrative embodiment of the method according to the invention, each pixel in the frame to be predicted may be predicted as the result generated by a Forward Auto-Regressive (FAR) model (or Backward Auto- Regressive (BAR) model) . In an illustrative embodiment of the method according to the invention, each pixel in the frame to be predicted may be interpolated as the average of the results generated by one FAR model and one BAR model using a first and as second reference frame. In the FAR model, each pixel in the predicted image may be first approximated as a linear combination of pixels within a spatial neighborhood along the forward motion trajectory in the forward reference frame, and then the pixels in the backward reference frame along the backward motion trajectory may be approximated as the linear combination of pixels in the frame to be predicted. Since adjacent frames are of high redundancy, the AR coefficients to may be to be the same. Consequently, the pixels in the backward reference frame may be approximated as a combination of pixels within the forward reference frame. The BAR may be performed likewise except that it is operated in the reverse direction. This invention gives the solution of the optimum FAR and BAR coefficients.
Figure 1 A describes an illustrative embodiment of forward interpolation prediction of a pixel 1 15 in a frame (2k+ l ) from a pixel 1 10 in a frame (2k) along the motion trajectory materialized by the corresponding motion vector in a sequence of frames of a video flow.
Figure I B describes an illustrative embodiment of backward interpolation prediction of a pixel 1 1 7 in a frame (2k+ l ) from a pixel 120 in a frame (2k+2) along the motion trajectory materialized by the corresponding backward motion vector in a sequence of frames of a video flow.
In the illustrative embodiments described here under, the frame to be predicted is sequentially positioned between the first and the second reference frames, which is in no way limiting of the scope of the present invention.
In an illustrative embodiment of the method according to the invention, a frame to be predicted is computed from a first and a second reference frames, and, for each pixel to be predicted in the frame to be predicted:
- an act 210 allows defining a first and a second pixel corresponding, respectively in said first and second reference frames, to the pixel to be predicted along the motion vector of said pixel to be predicted,
- an act 220 allows computing a first predicted pixel as a combination of pixels in a neighborhood of said first pixel using first combination coefficients, said first combination coefficients allowing computing said second pixel as a combination of pixels of said neighborhood of said first pixel,
wherein said first combination coefficients are calculated using a similarity function representing a grey level difference between a pixel of said neighborhood of said first pixel and the pixel to be predicted,
- an act 240 allows computing said pixel to be predicted from said first predicted pixel.
In an illustrative embodiment of the method according to the invention, an act 230 further allows computing a second predicted pixel as a combination of pixels in a neighborhood of said second pixel using second combination coefficients, said second combination coefficients allowing computing said first pixel as a combination of pixels of said neighborhood of said second pixel,
wherein said second combination coefficients are calculated using a similarity function representing a grey level difference between two pixels of said neighborhood of said second pixel, and wherein the act of computing said pixel to be predicted is performed from said first predicted pixel and said second predicted pixel. This further allows for example averaging the results of the forward and backward estimations for computing the pixel to be predicted using the first and second combination coefficients as described here under on reference to Equation (21 ) .
In an illustrative embodiment of the method according to the invention:
- the act 220 may comprise computing a first coefficient vector allowing the estimation of the second pixel from the first pixel,
- the act 240 may comprise computing the pixel to be predicted using said first coefficient vector and a similarity function applied onto pixels in a neighborhood of the first pixel in the first reference frame.
- an act 230 may comprise computing a second coefficient vector allowing the estimation from the first pixel of the second pixel, the act 240 further comprising using said second coefficient matrix and pixels in a neighborhood of the second pixel in the second reference frame for estimating the first pixel. This further allows for example averaging the pixel to be predicted using the first and second coefficient vectors as described here under on reference to Equation (21 ) .
In an illustrative embodiment of the method according to the invention, as described in Figure 1 A, each pixel within frame (2k+ l ) may be approximated as a linear combination of the corresponding pixels within a spatial neighborhood in the previous frame (2k) along the forward motion trajectory. Xlk (m, n) represents the pixel value located at (m, w) for the forward observation in the frame to be predicted (2k+ l ) . Then using a FAR model, the predicted pixel may be interpolated a
X2k+l {m n) =
Figure imgf000008_0001
where (ν^,ν^ ) represents the corresponding forward motion vector with sub-pixel accuracy from frame (2k) to (2k+ l ) , ...J is the floor function, which maps and vfy to the next full-pixel position, i . is the Forward AR coefficient, and «2fe+i 's tn e additive Gaussian while noise. Here / is defined to be the radius of the AR filter, and thus the size of the AR filter is (2/ + l)x(2/ + l) .
Defining /L [»] as an operator that extracts a patch of (2L + l) x(2L + l) pixels, the expression /L[x(m, n)] results with a vector of length (2L + 1)2 being the extracted patch, which is centered at pixel X (m, n) . Thus, Equation ( 1 ) may also be expressed as:
½+i (m, n) = fL [X2k ( , « )] a + n2k+1 (2) where (m, w) represents the forward motion aligned pixel coordinate (i.e. position) and a = ^a_L _L, a_L _L+i, ..., aL L ^j is the first coefficient vector (here e.g. the FAR weight vector) .
The Forward AR or first coefficient vector a should be chosen to be the "best" in some sense. The most common measure of performance of a predictor, i.e. the Mean Squared Error (MSE) may be used:
l2k+l (m,n) = E fit [Y2k+l {m,n) -fR f2k+l (m,n)
= E fit [¾k+i (m> ")] -/* [¾t+i (m>«)] fR [Y2k+i (m> »)] -//? [¾t+i (m> ")
Figure imgf000009_0001
(3)
where | · | denotes the L2 norm, with R≥L . The MSE as a performance criterion may be viewed as a measure of how much the energy of the signal is reduced by removing the predictable information based on the observables from it. Since the goal of a predictor is to remove this predictable information, a better predictor corresponds to a smaller MSE. However, such a use would not work, since the actual value of X2A+1 is not necessarily available (e.g . at a receiver side in a communication network if the video comprising frame is sent compressed by a transmitter and the method according to the invention is performed to predict frames on the receiver) . The method according to the invention allows thus continuing the approximation of pixels in the temporal axis along the motion trajectory, as shown in Figure 4.
Figure 3 describes the motion trajectory of a pixel from one frame (2k) to another frame (2k+2) in a sequence of frames of a video flow. Assuming a first pixel in a frame (2k) , then, the corresponding pixel, in the frame to be predicted (2k+ l ) , along the motion trajectory (defined by its associated motion vector) would be the pixel 210. Similarly, the corresponding pixel, in the following frame (2k+2) of the frame (2k+ l ) , along the same motion trajectory (defined by its associated motion vector) would be the pixel 220. In reference †o Figure 3, assuming the actual value for a pixel to be predicted has been obtained in frame(2k+l), the same FAR coefficients may be used to approximate (i.e. predict) the pixels within frame (2k+2) along the motion trajectory. The assumption that the FAR coefficients between frames (2k) to (2k+l) are the same as those between frames (2k+l) to (2k+2) is quite reasonable since there is a high redundancy between pixels along the motion trajectory from frame (2k) †o(2k+2). Based on such an assumption, pixels in frame (2k+2) may be approximated as:
X2k+2 (m,n) = fL [ k+i (Ή, ")] · a + n2k+2 (4) where (ιη,ίϊ) represents the backward motion aligned pixel coordinates within ^2k+2 ■ fh ^2k+iim ^ n) represents the patch extraction from the frame to-be-predicted (i.e. to-be-interpolated) (2k+l), whose element is computed according to Equation (2), and a is the same as in Equation (2). Incorporating (2) into (4) allows obtaining the approximation of X2k+2 utilizing the corresponding forward motion aligned pixels as:
X2k+2 (m,n) = fL [¾*+ι (m, «)] · a + n2k+2
= fh [/L [x2k {m,n) a + n2k+1 ] · a + n2k+2 (5)
= h [ x2k (m,n) h (a) + n2k+2
Where:
* fiL
Figure imgf000010_0001
(f"'")] represents extracting a (4L + l)x(4L+l) patch, as shown as the circles in Figure 4, which is centered at pixel 410 X2k (m,n) ;
* h(a) represents the new weight vector corresponding to the enlarged patch of sized (4L + l)x(4L+l) .
In reference to Figure 4, each backward motion aligned pixel in the following frame 2k + 2 may be estimated as a weighted summation of the pixels within an enlarged patch in the previous frame 2k. It should be noted that each element of h(a) is the quadratic of the elements of a. In other words, h(a) may be expressed in a two dimensional form as:
Aa)= ∑ i,j- p,q (6)
i+p=m
j+q=n with -L≤i, j≤L f] -L≤ p, q≤L and -2L≤m, n≤2L .
Using Equations (5) and (6) , the weight vector a may be computed by minimizing the following MSE:
Il2
£" («) = /R [ X2k+2 (m, n)] - fR f2L [X2k (m, n)] - h (a) (7) where r (a) = fR [X2k+2 (m,
Figure imgf000011_0001
- fR [f2L [X2k { , n )] - h (a) is defined to be the residual vector, with R≥L .
The act 220 (in reference to Figure 2) allows computing a first predicted pixel as a combination of pixels in a neighborhood of said first pixel using first combination coefficients, said first combination coefficients allowing computing said second pixel as a combination of pixels of said neighborhood of said first pixel, wherein said first combination coefficients are calculated using a similarity function representing a grey level difference between a pixel of said neighborhood of said first pixel and the pixel to be predicted,
In an illustrative embodiment of the method according to the invention, the similarity function reflects the probability that each pixel in the combination of pixels in the neighborhood of the first pixel and the pixel to be predicted have the same value.
In an illustrative embodiment of the method according to the invention, the similarity function may further depends, as described here under in Figure 5, on the distance between each pixel located at position (k,l) in the combination of pixels in the neighborhood of the first pixel and the pixel to be predicted located at position (m,n) .
Figure 5 describes an illustrative embodiment of the method according to the invention, wherein the determination of each sample (i.e. pixel) contribution within a patch sized of (2R + l)x(2R + l) (i.e. neighborhood of the first or second pixel) allows determining the quality of the pixel to be predicted . In other words, incorporating the contribution probability of each sample within the training window (2R + l)x (2R + l) into the minimization problem of equation (7) allows avoiding the training sample having a different texture structure compared with the to-be-interpolated pixel, and thus can achieve better results. For example, it may be desirable to give a higher contribution probability to the sample which is very similar to the target sample while a smaller contribution probability may be set to the sample which is quite different to the target sample. Each pixel or sample in the combination of pixels in a neighborhood of the first pixel may be assigned a weight that reflects the probability that this pixel and the pixel to be interpolated have the same value.
In an illustrative embodiment of the method according to the invention, the similarity function may be based on the geometric proximity or the intensity grey level of the pixel to be predicted and of the pixels in a neighborhood of the first or second pixel. For instance, the similarity between two pixels X(m,n) and
X(k,l) may depend on the similarity of geometric proximity as well as the intensity gray level N X (m,n)) and N X (k,l)) , where N x(i,j)) denotes a square neighborhood of fixed size ( (2W +l)x(2W + l) ) and centered at pixel
X(i,j). This similarity may be derived, in an act 510, as a decreasing function of the weighted Euclidean distance between N x (m,n)) and N(X (&,/)) as well as the geometric proximity, namely:
[N(m,n)-N(k,l)]®K
p[m,n,k,l] exp ,-R≤(m-k,n-l)≤R (8)
Z m,n) 2σι where p[m,n,k,l] represents the contribution probability of pixel located at (k,l) towards the pixel located at (m,n) , ® represents the multiplication of
-2 · two elements located at the same position between two matrixes and σ is a constant to control the decay of the exponential function, with
N{m,n)=-[fw [X2k (m,n)] + fw [X2k+2 (9)
Figure imgf000012_0001
and
[N(m,n)-N(k,l) ®K
Z(m,«) =∑( )exp (11
C is a normalization constant and K(x, y) is a weight value for each pixel in similarity window, which decreases with the distance from the center of similarity window.
For the current to-be-interpolated (i.e. to-be-predicted) pixel F2fe+1 (m,n) , the contribution probability of each sample within the training window (sized of (2R + l)x(2R + l) ) may be computed, in an act 520, as the minimization equation in Equation (7) may be modified as:
^(«) = ||{^[¾+2(^«)]-^[ 2L¾J^«)]^(«)]}®P|| =||r(a)®P||2 (12) where r (a) = fR [X2k+2
Figure imgf000013_0001
- fR [f2L [X2k (m,n) - ¾(a)] is defined to be the residual vector. It is noted that the element of P in Equation (12) is computed according to Equation (8). Incorporating the contribution probability of each sample within the training window into the minimization process allows avoiding the training sample having a different texture structure compared with the to-be-predicted pixel, and thus allowing achieve better results than existing methods.
The first combination coefficients a may then be computed as follows: assume a to be the approximation of a, r(a) may thus be expressed around a in Taylor series as:
r(a)»r(a)+/(a)(a-a) (13)
where J(a) is the Jacobian matrix of r(a) at a .
Using r (a) = 0 , a may then be, in an act 530, computed as:
Figure imgf000013_0002
According to the Newton iteration algorithm, in order to optimize the first combination coefficients 1 , if the weight vector a1 has been obtained after the ith iteration, then the weight vector in the (i+1 )th iteration may be updated as: ai+1 = a' - (/' (a*' ) J (a*' J' (a*' ) r (a*' ) (15).
The iterations may stop after a predetermined number of iterations or when a1 has converges to a predefined extent, allowing deriving a in an act 540.
In act 240, the pixel to be predicted is computed from said first predicted pixel using a .
Figure 6 describes an illustrative embodiment of iterative process for deriving and optimizing the first combination coefficients (or auto-regressive filter coefficients) according to the invention. An initial a0 may be computed at first in act 600. To initialize a0 , initial pixels for initiating the iterative process of pixel prediction may be obtained by using interpolation results of various known methods, such as e.g. Motion Compensation Interpolation (MCI), Overlapped Block Motion Compensation (OBMC), Adaptive OBMC (AOBMC), and Spatio- Temporal Auto-Regressive(STAR) .
Then the corresponding pixels may be approximated in frame 2k + 1 and 2k + 2 as: )
Figure imgf000014_0001
The initial a may be computed according to:
Figure imgf000014_0002
[18)
After that r l a ) and / ( a ) may be computed in act 605. Then the iteration counter is set to zero, and the iteration process may start. It may be first evaluated whether the iteration has converged or the iteration counter is larger than a preset maximum number of iterations, in act 610. If the answer is yes, the iteration process is over, and the algorithm for predicting a pixel may be carried out on another pixel. Otherwise, in act 615, Δα' may be computed according to Δα' = (/' (α*' ) J (α*' J' (a*' ) r (a*' ) ( 19)
And then, in act 620, al+1 may be updated as:
α'+1 = α' + Δα' (20)
And r (a'+1 ) and J (a'+1 ) are updated, which is shown as module 625.
Then the iteration counter is increased by one, and moves to the next iteration process.
Finally, the first combination coefficients a are derived, allowing the prediction or estimation of the pixel to be predicted.
The similar iterative algorithm may be performed to derive Backward Auto- Regressive (BAR) coefficients (i.e. deriving BAR coefficients by using, in a symmetric manner, the second pixel in the second reference frame, a second combination coefficients and a similarity function to estimate the first pixel in the first reference frame) . If we assume the derived Forward Aufo-Regressive (FAR) and BAR coefficients are a and β , respectively, then the final predicted pixel value at time instance 2k + l may be computed as:
¾+1 (m, W) = {/L [X2fe (m, ¾)] a + /L [X2fe+2 (m, «)] p}/2 (21 )
The method according to the invention may be used in various video streaming related applications, such as I PTV, Mobile TV and Digital Storage etc...
Various standard video sequences, with varying sizes (QCI F, CI F) , have been tested. To compare the invented method with other prediction methods, every other frame from test sequences is skipped and interpolated by invented method as well as other methods. The interpolated frames are then compared with the accurate ones within original sequences. The Peak-Signal-Noise-Ratio (PSN R) is used as the criterion to evaluate the performance of all the methods. The anchor methods are motion compensation interpolation (MCI) , overlapped block motion compensation (OBMC) method, adaptive overlapped block motion compensation (AOBMC) , and spatio-temporal auto-regressive (STAR) model. The MCI method interpolates one block directly by averaging the motion compensated blocks pointed by the forward and backward motion vectors. The OBMC method interpolates one block as the weighted sum of the blocks motion compensated by the motion vectors of neighboring blocks, and the weights are determined by the relative positions. AOBMC is similar to OBMC. The only difference is that the weights are determined by the reliability of neighboring blocks. STAR method interpolates one pixel as the linear combination of pixels within the previous frame, the following frame, and the available pixels in the current frame. The performance results are given in Table 1 . It is objectively observed that the method according to the invention outperforms MCI, OBMC, AOBMC and STAR methods in terms of PSN R values.
Table 1 Performance of various prediction methods
Figure imgf000015_0001

Claims

1 . A method of computing a frame to be predicted from a first and a second reference frames, said method comprising for each pixel to be predicted in the frame to be predicted the acts of:
a) defining a first and a second pixel corresponding, respectively in said first and second reference frames, to the pixel to be predicted along the motion vector of said pixel to be predicted,
b l ) computing a first predicted pixel as a combination of pixels in a neighborhood of said first pixel using first combination coefficients, said first combination coefficients allowing computing said second pixel as a combination of pixels of said neighborhood of said first pixel,
wherein said first combination coefficients are calculated using a similarity function representing a grey level difference between a pixel of said neighborhood of said first pixel and the pixel to be predicted,
c) computing said pixel to be predicted from said first predicted pixel.
2. A method according to claim 1 , wherein the similarity function reflects the probability that each pixel in the combination of pixels in the neighborhood of the first pixel and the pixel to be predicted have the same value.
3. A method according to any of the preceding claims, wherein the similarity function further depends on the distance between each pixel located at position (k,l) in the combination of pixels in the neighborhood of the first pixel and the pixel to be predicted located at position (m,n) according to:
l|2
I [N(m,n) -N(k,l)] ®K
p [m,n,k,l] exp , -R≤(m-k, n-l)≤R
Z (m,n) 2σι wherein p [m, n, k, l] represents the contribution probability of pixel located at (k, l) towards the pixel located at (m, n) , ® represents the multiplication of
2 two elements located at the same position between two matrixes and σ is a constant to control the decay of the exponential function, with
N (m,n) represents the grey intensity level of the located at pixel located at position (m,n)
N (k,l) represents the grey intensity level of the pixel located at position (k,l) K(x,y)=^exp ^j((x-m))2+(y-n)2\,-W≤{x-m,y-n)≤W (10) and
[N(m,n)-N(k,l) ®K
Z(m,«) =∑( )exp (11
C is a normalization constant and K(x, y) is a weight value for each pixel in similarity window, which decreases with the distance from the center of similarity window.
4. A method according to any of the preceding claims, wherein the first combination coefficients a may be computed, for a contribution probability applied to each pixel of a neighborhood of size (2R + l)x(2R + l) in a first reference frame \2k , by minimizing the equation:
Figure imgf000017_0001
wherein P is the matrix of contribution probabilities for each pixel of the neighborhood of size(2R + l)x(2R + l) , [x2fe+2 (m,w)] is a linear combination allowing an estimation \2k+2of the pixel to be predicted using the pixel located at position (m,n) in a second reference frame (2k+2), fR /2L[X2fe (m,w)]-¾(a) is a linear combination a neighborhood of size (4L + l)x(4L + l) allowing an estimation \2k of the pixel to be predicted using the pixel located at position (m,w) in the first reference frame (2k) and z(a) represents the coefficients for estimating the pixel located at position (m, ) in the second reference frame (2k+2) using a linear combination of pixels in a neighborhood of the first pixel located at (m,w) in the first reference frame(2k).
5. A method according to any of the preceding claims, said method further comprising an act b2) of computing a second predicted pixel as a combination of pixels in a neighborhood of said second pixel using second combination coefficients, said second combination coefficients allowing computing said first pixel as a combination of pixels of said neighborhood of said second pixel, wherein said second combination coefficients are calculated using a similarity function representing a grey level difference between two pixels of said neighborhood of said second pixel,
and wherein the act of computing said pixel to be predicted is performed from said first predicted pixel and said second predicted pixel.
6. A method according to claim 5 and 4, wherein the act of computing said pixel to be predicted F2fe+1 (m, w) located at position (m,n) in the frame to be predicted is performed from said first predicted pixel and said second predicted pixel according to:
(M> N) = {/L [x2k (m, n)] a + fL [X2k+2 ( > ")] P}/2
wherein a and β are the first and second combination coefficients.
7. A method according to any of the preceding claims, wherein the predicted frame is sequentially positioned between the first and the second reference frames.
8. A device for computing a frame to be predicted from a first and a second reference frames, said device comprising, for each pixel to be predicted in the frame to be predicted the acts of:
- a defining unit for defining a first and a second pixel corresponding, respectively in said first and second reference frames, to the pixel to be predicted along the motion vector of said pixel to be predicted,
- a predicted pixel computing unit for computing a first predicted pixel as a combination of pixels in a neighborhood of said first pixel using first combination coefficients, said first combination coefficients allowing computing said second pixel as a combination of pixels of said neighborhood of said first pixel,
wherein said first combination coefficients are calculated using a similarity function representing a grey level difference between a pixel of said neighborhood of said first pixel and the pixel to be predicted,
- a pixel to be predicted computing unit for computing said pixel to be predicted from said first predicted pixel.
9. A device according to claim 8, wherein the similarity function reflects the probability that each pixel in the combination of pixels in the neighborhood of the first pixel and the pixel to be predicted have the same value.
10. A device according†o any of the preceding claims 8†o 9, wherein the similarity function further depends on the distance between each pixel located at position (k,l) in the combination of pixels in the neighborhood of the first pixel and the pixel to be predicted located at position (m,n) according to:
ί \\ [~NN((mm..,nn))--NN((kk.,ll))l®®KK
p[m,n,k,l]■ exp ,-R≤(m-k,n-l)≤R
Z m,n) 2σι wherein p[m,n,k,l] represents the contribution probability of pixel located at (k,l) towards the pixel located at (m,n) , ® represents the multiplication of
2 two elements located at the same position between two matrixes and σ is a constant to control the decay of the exponential function, with
N(m,n) represents the grey intensity level of the located at pixel located at position (m,n)
N(k,l) represents the grey intensity level of the pixel located at position (k,l) K(x,y)=^exp\-J((x-m))2+(y-n)2\,-W≤{x-m,y-n)≤W (10) and
[N(m,n)-N(k,l) ®K
Z(m,«) =∑ exp [n:
C is a normalization constant and K(x, y) is a weight value for each pixel in similarity window, which decreases with the distance from the center of similarity window.
11. A device according to any of the preceding claims 8 to 10, wherein the first combination coefficients a may be computed, for a contribution probability applied to each pixel of a neighborhood of size (2R + l)x(2R + l) in a first reference frame , by minimizing the equation:
Figure imgf000019_0001
wherein P is the matrix of contribution probabilities for each pixel of the neighborhood of size(2R + l)x(2R + l) , [x2fe+2 (m,w)] is a linear combination allowing an estimation \2k+2of the pixel to be predicted using the pixel located at position (m, n) in a second reference frame (2k+2), fl J2L · «) a linear combination a neighborhood of size (4L + l)x(4L + l) allowing an estimation \2k of the pixel to be predicted using the pixel located at position (m, w) in the first reference frame (2k) and z (a) represents the coefficients for estimating the pixel located at position (m, ) in the second reference frame (2k+2) using a linear combination of pixels in a neighborhood of the first pixel located at (m, w ) in the first reference frame(2k) .
1 2. A device according to any of the preceding claims 8 to 1 1 , wherein the predicted pixel computing unit is further operable to compute a second predicted pixel as a combination of pixels in a neighborhood of said second pixel using second combination coefficients, said second combination coefficients allowing computing said first pixel as a combination of pixels of said neighborhood of said second pixel,
wherein said second combination coefficients are calculated using a similarity function representing a grey level difference between two pixels of said neighborhood of said second pixel,
and the pixel to be predicted computing unit is further operable to compute said pixel to be predicted from said first predicted pixel and said second predicted pixel.
1 3. A device according to claim 12 and 1 1 , wherein the pixel to be predicted computing unit is further operable to compute said pixel to be predicted Ylk+X (m, n) located at position (m,n) in the frame to be predicted is performed from said first predicted pixel and said second predicted pixel according to:
(M> N ) = { /L [x2k (m, n )] a + fL
Figure imgf000020_0001
(m, n)] p}/2
wherein a and β are the first and second combination coefficients.
14. A system for computing a predicted frame from a first and a second reference frames, said system comprising:
- an obtaining unit for obtaining said first and second reference frames, and,
- a device comprising, for each pixel to be predicted in the predicted frame: - a defining unit for defining a first and a second pixel corresponding, respectively in said first and second reference frames, to the pixel to be predicted along the motion vector of said pixel to be predicted,
- a predicted pixel computing unit for computing a first predicted pixel as a combination of pixels in a neighborhood of said first pixel using first combination coefficients, said first combination coefficients allowing computing said second pixel as a combination of pixels of said neighborhood of said first pixel,
wherein said first combination coefficients are calculated using a similarity function representing a grey level difference between a pixel of said neighborhood of said first pixel and the pixel to be predicted,
- a pixel to be predicted computing unit for computing said pixel to be predicted from said first predicted pixel.
15. A computer-readable medium having computer-executable instructions to enable a computer system to perform the method of any one of claims 1 to 7.
PCT/CN2010/074839 2010-06-30 2010-06-30 Pixel interpolation method and system WO2012000191A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2010/074839 WO2012000191A1 (en) 2010-06-30 2010-06-30 Pixel interpolation method and system
PCT/IB2011/001857 WO2012001520A2 (en) 2010-06-30 2011-06-29 Pixel interpolation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2010/074839 WO2012000191A1 (en) 2010-06-30 2010-06-30 Pixel interpolation method and system

Publications (1)

Publication Number Publication Date
WO2012000191A1 true WO2012000191A1 (en) 2012-01-05

Family

ID=44906214

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/CN2010/074839 WO2012000191A1 (en) 2010-06-30 2010-06-30 Pixel interpolation method and system
PCT/IB2011/001857 WO2012001520A2 (en) 2010-06-30 2011-06-29 Pixel interpolation method and system

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/IB2011/001857 WO2012001520A2 (en) 2010-06-30 2011-06-29 Pixel interpolation method and system

Country Status (1)

Country Link
WO (2) WO2012000191A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020155485A1 (en) * 2019-01-31 2020-08-06 平安科技(深圳)有限公司 Image difference determination method and apparatus, computer device, and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111161338B (en) * 2019-12-26 2022-05-17 浙江大学 Point cloud density improving method for depth prediction based on two-dimensional image gray scale

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006324888A (en) * 2005-05-18 2006-11-30 Fujitsu Ltd Dynamic-image coding equipment
CN1897706A (en) * 2002-01-18 2007-01-17 株式会社东芝 Video encoding method and apparatus and video decoding method and apparatus
US20090022220A1 (en) * 2005-04-13 2009-01-22 Universitaet Hannover Method and apparatus for enhanced video coding

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008244846A (en) * 2007-03-27 2008-10-09 Toshiba Corp Device and method for interpolating frame
WO2010032229A1 (en) * 2008-09-22 2010-03-25 France Telecom Frame rate up conversion method and system
WO2010049917A2 (en) 2008-10-31 2010-05-06 France Telecom Image prediction method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1897706A (en) * 2002-01-18 2007-01-17 株式会社东芝 Video encoding method and apparatus and video decoding method and apparatus
US20090022220A1 (en) * 2005-04-13 2009-01-22 Universitaet Hannover Method and apparatus for enhanced video coding
JP2006324888A (en) * 2005-05-18 2006-11-30 Fujitsu Ltd Dynamic-image coding equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020155485A1 (en) * 2019-01-31 2020-08-06 平安科技(深圳)有限公司 Image difference determination method and apparatus, computer device, and storage medium

Also Published As

Publication number Publication date
WO2012001520A2 (en) 2012-01-05
WO2012001520A3 (en) 2012-03-01

Similar Documents

Publication Publication Date Title
KR100498951B1 (en) Method of Motion Estimation for Video Coding in MPEG-4/H.263 Standards
EP1980109B1 (en) Methods and apparatus for edge-based spatio-temporal filtering
JP5844394B2 (en) Motion estimation using adaptive search range
US9602814B2 (en) Methods and apparatus for sampling-based super resolution video encoding and decoding
US8107748B2 (en) Adaptive motion search range
CN110741640B (en) Optical flow estimation for motion compensated prediction in video coding
US8279341B1 (en) Enhancing the resolution and quality of sequential digital images
US20100201870A1 (en) System and method for frame interpolation for a compressed video bitstream
US8625678B2 (en) Method for scalable video coding on a plurality of space resolution levels
EP2378775B1 (en) Image decoding device and image coding device
US20050243928A1 (en) Motion vector estimation employing line and column vectors
KR100605746B1 (en) Motion compensation apparatus based on block, and method of the same
US20120106639A1 (en) Motion-compensated temporal filtering based on variable filter parameters
KR100565066B1 (en) Method for interpolating frame with motion compensation by overlapped block motion estimation and frame-rate converter using thereof
EP3128485A1 (en) Method and apparatus for hierarchical motion estimation using dfd-based image segmentation
US9294676B2 (en) Choosing optimal correction in video stabilization
US8144775B2 (en) Method and device for generating candidate motion vectors from selected spatial and temporal motion vectors
EP2400768A2 (en) Method, apparatus and computer-readable medium for coding and decoding depth image using color image
JP2004096752A (en) Method of approximating motion vector, computer program for carrying out the same method, data storage medium for storing the same program, apparatus suitable to carry out the same method, and receiver equipped with the same apparatus
EP2320655A1 (en) Apparatus and method of compressing and restoring image using filter information
WO2012000191A1 (en) Pixel interpolation method and system
KR20050119285A (en) Apparatus and method for hybrid block-based motion estimation
JP2012104969A (en) Motion vector generation device, motion vector generation method and computer program
JP6055230B2 (en) Transmission device, reception device, and program
US11778224B1 (en) Video pre-processing using encoder-aware motion compensated residual reduction

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10853896

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10853896

Country of ref document: EP

Kind code of ref document: A1