WO2013105946A1 - Motion compensating transformation for video coding - Google Patents

Motion compensating transformation for video coding Download PDF

Info

Publication number
WO2013105946A1
WO2013105946A1 PCT/US2012/020888 US2012020888W WO2013105946A1 WO 2013105946 A1 WO2013105946 A1 WO 2013105946A1 US 2012020888 W US2012020888 W US 2012020888W WO 2013105946 A1 WO2013105946 A1 WO 2013105946A1
Authority
WO
WIPO (PCT)
Prior art keywords
picture
transformation
transformed
motion
pictures
Prior art date
Application number
PCT/US2012/020888
Other languages
French (fr)
Inventor
Mithun George Jacob
Sitaram Bhagavathy
Original Assignee
Thomson Licensing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Licensing filed Critical Thomson Licensing
Priority to PCT/US2012/020888 priority Critical patent/WO2013105946A1/en
Publication of WO2013105946A1 publication Critical patent/WO2013105946A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/527Global motion vector estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding

Abstract

Various implementations transform a group of pictures to reduce global motion, and then encode the pictures. In a particular implementation, a first picture is transformed to remove at least some motion occurring between the first picture and a second picture. The transformed first picture and one or more parameters indicating the transformation are provided for encoding. In another implementation, metadata indicates the transformation. In yet another implementation, a decoded version of the transformation of a first picture is accessed. One or more decoded parameters indicating the transformation is accessed. The decoded transformation of the first picture is inverse transformed to restore all or part of the motion.

Description

MOTION COMPENSATING TRANSFORMATION FOR VIDEO CODING

TECHNICAL FIELD

Implementations are described that relate to motion transformation. Various particular implementations further relate to encoding images to which motion transformation has been applied.

BACKGROUND

Video sequences often exhibit significant redundancy between the pictures of a sequence. Often this redundancy can be removed during encoding operations.

However, at other times, this redundancy is not removed with the use of standard encoding operations.

SUMMARY

According to a general aspect, a first picture is transformed to remove at least some motion occurring between the first picture and a second picture. The transformed first picture and one or more parameters indicating the transformation are provided for encoding.

According to another general aspect, metadata indicates a transformation performed on a first picture to remove at least some motion occurring between the first picture and a second picture. Further, a transformed first picture is a transformation of the first picture using the transformation indicated by the metadata.

According to another general aspect, a decoded version of a transformation of a first picture is accessed. One or more decoded parameters indicating the

transformation is accessed. The transformation is based on motion between the first picture and a second picture and removes at least some motion occurring between the first picture and the second picture. The decoded transformation of the first picture is inverse transformed to restore all or part of the motion.

The details of one or more implementations are set forth in the accompanying drawings and the description below. Even if described in one particular manner, it should be clear that implementations may be configured or embodied in various manners. For example, an implementation may be performed as a method, or embodied as an apparatus, such as, for example, an apparatus configured to perform a set of operations or an apparatus storing instructions for performing a set of operations, or embodied in a signal. Other aspects and features will become apparent from the following detailed description considered in conjunction with the accompanying drawings and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block/flow diagram depicting an implementation of an apparatus and process for performing motion transformation and encoding, as well as retransformation and decoding.

FIG. 2 is a more detailed block/flow diagram depicting an implementation of an apparatus and process for performing motion transformation and encoding, as well as retransformation and decoding.

FIGS. 3(a)-(e) are primarily a pictorial representation of an implementation of a process for performing motion transformation, and providing output encoded data.

FIG. 4 is a pictorial representation of an example of a video picture sequence illustrating motion between the pictures.

FIG. 5 is a pictorial representation of the video sequence of FIG. 4 after applying some operations of an implementation of a motion transformation process.

FIG. 6 is a pictorial representation of an implementation of a canvas picture. FIG. 7 is a pictorial representation of three constituent pictures enclosed in the canvas picture of FIG. 6.

FIG. 8 is a pictorial representation of an implementation of a process that includes motion transformation, resizing, and reordering a sequence of video pictures.

FIG. 9A is a pictorial representation of an implementation of a process that includes decoding, retransforming, and resizing a sequence of video pictures.

FIG. 9B is a pictorial representation of another implementation of a process that includes decoding, retransforming, and resizing a sequence of video pictures.

FIG. 10 is a block/flow diagram depicting an implementation of an apparatus and process for applying motion transformation to a picture.

FIG. 1 1 is a block/flow diagram depicting an implementation of an apparatus and process for applying motion retransformation to a picture. FIG. 12 is a block/flow diagram depicting an implementation of an encoder and an encoding process that may be used with one or more implementations.

FIG. 13 is a block/flow diagram depicting an implementation of a decoder and a decoding process that may be used with one or more implementations.

FIG. 14 is a block/flow diagram depicting an implementation of a transmission system and process that may be used with one or more implementations.

FIG. 15 is a block/flow diagram depicting an example of a receiving system and process that may be used with one or more implementations.

DETAILED DESCRIPTION

Video sequences often exhibit motion from one picture to another. The motion may be for a particular object. The motion also, or alternatively, may apply more generally to the picture than just to a single object. For example, the motion may be for a set of objects or a background. Such motion is often referred to as global motion. Removing all or part of the global motion, to produce transformed pictures, may cause the transformed video pictures to more closely resemble each other. Such resemblance can allow standard encoding operations to encode the transformed video pictures more efficiently. The increase in efficiency may result from, for example, smaller motion vectors, fewer motion vectors, smaller residues, or fewer residues, as is discussed further below.

As a preview of some of the features presented in this application, at least one implementation describes transforming the pictures in a video sequence to remove at least some of the global motion. The transformed pictures are then capable of being encoded more efficiently. The implementation also encodes information describing the transformation. A decoder receives the encoded pictures and the encoded

transformation information. After decoding, the decoder is able to retransform the decoded pictures, based on the decoded transformation information.

Such transformations may be considered as a form of video data pruning ("VDP"). VDP attempts, generally, to improve compression efficiency by preprocessing video pictures by "pruning" (removing) some information allowing the video pictures to be encoded at lower bitrates. At least one implementation uses a VDP approach for improving compression efficiency. Data pruning methods aim at improving compression efficiency beyond that achieved by standard compression methods. The main principle of many implementations of such methods is to remove data before (or during) encoding and to put back the removed data at the receiver after (or during) decoding.

At least one implementation proposes a global motion compensation approach to improve compression. In this implementation, the VDP paradigm is realized by

"pruning" away all or part of the global motion in video. The pruning occurs by transforming pictures in a group-of-pictures (GOP) to the coordinate system of a single reference picture. This allows the GOP to be encoded using fewer motion-vectors for macroblocks in the static portion of the picture. In order to successfully reconstruct the original video sequence, in this implementation we transmit information that indicates the transformation. Such information is typically the transformation matrices, and it is transmitted as metadata. The metadata allows the decoder side to retransform each picture to obtain the original sequence.

Note that global motion describes motion in a picture based on a perspective transform. Accordingly, using a transform allows global motion to be at least partially, pruned away.

The transformation allows pictures to be more efficiently coded for several reasons. For example, in various implementations, there are numerous motion vectors that are now similar (perhaps even being motion vectors that indicate "no motion"), and these are easier to compress than motion vectors that are widely varying. As another example, in various implementations, the block that a motion vector points to in a reference picture will now (because of the transformation) be a much better match to the current block being coded. This may occur because, for example, rotational and other motion has been removed. As a result of the better match, smaller residues result, and the smaller residues can be encoded with a comparatively small number of bits.

Referring to FIG. 1 , a process 100 provides an overview of several

implementations. The process 100 begins by receiving input video pictures for, for example, a GOP (1 10). The process 100 then estimates the transformation matrices that describe the transformation of each picture in the GOP to a single reference picture (120). The transformation matrices are then stored as metadata, and typically there is a separate transformation matrix for each picture in the GOP that is being transformed (120). Then, the process 100 transforms each picture in the GOP so that the entire GOP is now in a single coordinate system(120). The operation 120 thus includes estimating the transformation matrices, and performing the transformations, and provides as output (130) the transformed pictures and the metadata describing the transformation matrices.

The transformation of a picture generally creates several static spatiotemporal regions. A static spatiotemporal region is a portion of the picture whose position in the picture does not change with respect to the preceding picture. The existence of the static spatiotemporal regions allows an encoder to avoid storing motion vectors and significant residuals for these static spatiotemporal regions. For example, in one implementation, if the residual for a given block is small, then an encoder encodes that block with a "skip" mode. This skip mode is indicated by a single flag, and no motion vectors are encoded. The skip mode flag, in at least one implementation, indicates that the given block is to be reconstructed with the block pointed to by a motion vector predicted from blocks neighboring the given block. The process 100 continues, therefore, by encoding the transformed pictures (140). The operation 140 also includes encoding the metadata.

On the decoder side, the process 100 decodes the encoded pictures and the encoded metadata (150). This produces a reconstruction of the transformed pictures and the metadata (160). The process 100 then uses the transformation matrices (sent as metadata) to retransform (170) the pictures back to the original coordinate systems of the pictures. The operation 170 produces as output a reconstruction of the input video pictures 1 10, referred to as output pictures 180.

Referring to FIG. 2, a process 200 is shown that provides additional details for various aspects of the process 100. The process 200 includes accessing an input video sequence (205), such as, for example, a GOP. A transformation reference picture is selected from the input video sequence (210). The transformation reference picture in one implementation is selected to be one of the interior (that is, not the first or last) pictures of a GOP so that the relative motion to all other pictures in the GOP is reduced or minimized. The process 200 includes estimating, for each picture (except the transformation reference picture) in the input video sequence, the motion between the picture and the transformation reference picture (215). The process 200 also includes determining transformation metadata for each picture that is being transformed (220). The metadata is typically determined (220) as part of the motion estimation operation (215). The pictures that are to be transformed are then transformed using the set of transformation matrices, Θ (225). The transformations (225) have the effect of transforming the pictures to the coordinate system of a single reference picture. More specifically, in this implementation, the transformations have the effect of transforming the pictures to the coordinate system of the transformation reference picture.

Referring to FIGS. 3(a)-(e), we now discuss one implementation for performing the motion estimation of the operation 215, as well as the transformation metadata determination of the operation 220. FIG. 3(a) shows an input set of high-resolution ("HR") pictures, Hi , H2, H3, and H4, with Hi designated as the transformation reference picture.

FIG. 3(b) shows the motion between each picture Ht (that is, H2, H3, and H4) to the reference picture (H-i) of the set. Without loss of generality, it is assumed that the reference picture is

Figure imgf000008_0001
. Let us simplify the problem by assuming that there is only global motion among the pictures. In other words, we assume that the motion of pixels between any two pictures can be described by a global transformation with a few parameters. Examples of global transformations include translation, rotation, affine warp, projective transformation, etc.

In order to estimate the motion from picture H to picture Hj, we first choose a parametric global motion model that describes the motion between pictures. Using the data from /-/,· and HJ the parameters % of the model are then determined. Henceforth, we shall typically denote the transformation by 0,y and its parameters by The transformation 0,y can then be used to align (or warp) H, to Hj (or vice versa using the inverse model 0y; = 0y~1). FIG. 3(b) shows these transformations, referring to them as transformations Θ21 , Θ31 , and Θ41 , or 0ti more generally.

Global motion can be estimated using a variety of models and methods. One commonly used model is the projective transformation given by: alx + a2y + a3 blx + b2y + b3

. ' y

clx + c2y + l clx + c2y + \

The above equations give the new position {xnew, ynew) in Hy to which the pixel at (x, y) in Hj has moved. Note, however, that in moving the pixel (x, y) in H, to the new position {Xnew, Ynew) in Hj, the actual value of the pixel may, or may not, be changed. The above transformation pertains to the mapping between the spatial coordinates and not the pixel values themselves. The pixel values for some transformations (for example, horizontal translations of a discrete number of pixels) will not change.

However, the pixel values for other transformations will change due to, for example, interpolation in the case of non-integer pixel-location movement. Thus the eight model parameters %= {ai, a2, a^, £>ι, £>2, 63, Ci, c2} describe the motion from H, to Hj. The parameters are usually estimated by first determining a set of point correspondences between the two pictures and then using a robust estimation framework such as RANSAC or its variants, as is known in the art. Point correspondences between pictures can be determined by a number of methods, for example, by extracting and matching SIFT features or using optical flow, both of which are known in the art.

In order to perform global motion compensation, the motion between each picture Ht to the reference picture ( - 1 ) is estimated. Hence, three sets of parameters are estimated: Θ21 , Θ31 , and Θ41 (corresponding to transformations Θ21 , Θ31 , and Θ41 ). The transformation is invertible and the inverse model Θ/, = Θ,/1 describes the motion from Hj to H,.

One or more implementations operate in the following manner. At least one implementation determines correspondences between points in a given picture and the transformation reference picture by, for example, using SIFT or optical flow. These correspondences are, in various implementations, uniquely identifiable interest points that are present in both pictures and whose surroundings exhibit high texture. Such correspondences include, in certain implementations, one or more of a corner of a table or other object, or an edge of an object that is in both of the pictures. Using those correspondences, these particular implementations estimate the global motion for the whole picture using, for example, RANSAC. This provides a motion for the picture based on the motion of the correspondence points The motion model used is, for many implementations, more complex and flexible than the translational model used in standard encoders.

Referring to FIG. 4, a simple example is shown for a sequence of three pictures 410, 420, and 430. The pictures 410, 420, and 430 illustrate a global motion that is purely translational in the horizontal direction. Of course, the use of horizontal translation provides a simple transformation example. However, other implementations use transformations that (i) are non-translational, such as, for example, a rotational transformation, (ii) are translational in the vertical direction in addition to, or instead of, the horizontal direction, and/or (iii) include translational and non-translational aspects.

Referring again to FIG. 4, the horizontal translation can be seen by examining a stationary object. Each of the pictures 410, 420, and 430 include a tree 440, which is stationary and is seen to move (shift) from right to left in the picture sequence by an amount that we will simply refer to as "shift". Thus, the global motion is from right to left, as is indicated by the tree 440. The tree 440 is assumed to have a horizontal pixel location of (X+shift) in the picture 410, X in the picture 420, and X-shift in the picture 430.

The sequence also shows non-global motion of a vehicle 450 that is in each of the pictures 410, 420, and 430. The vehicle is seen to move from left to right in the picture sequence. The goal of a transformation described in FIG. 4 would be to estimate the global motion, as indicated by the tree 440.

We select the picture 420 as the transformation reference picture. Accordingly, we then determine the transformation from the picture 410 to the picture 420, and the picture 430 to the picture 420, so that all three pictures 410, 420, and 430 will have the same coordinate system. Using the model of Equation 1 above, the global motion of FIG. 4 would be shown by the following transformation and transformation parameters: Θ410, 420: (a1 , a2, a3) = (1, 0, -shift)

(b1, b2, b3) = (0, 1, 0)

(c1, c2) = (0, 0) y 430, 420- (a1, a2, a3) =(1, 0, +shift)

(b1, b2, b3) = (0, 1, 0)

(c1, c2) = (0, 0)

04io,42o denotes the transformation from the picture 410 to the transformation reference picture 420

0430,420 denotes the transformation from the picture 430 to the transformation reference picture 420 Once the transformation matrices have been determined, we transform each picture Ht to the coordinate system of the reference picture

Figure imgf000011_0001
FIG.3(c) shows the pictures H2, H3, and H4 transformed into pictures T2, T3, and T4, respectively. FIG.3(c) also displays a "T" next to each of T2, T3, and T4 to indicate that these pictures are transformed pictures. The transformed pictures T2, T3, and T4 are in the coordinate system of the reference picture H . Further, FIG.3(c) shows the three transformations (Θ21, 031, and 041 ) over the transformed pictures.

The transformations create large static spatiotemporal regions in the GOP.

These regions can be exploited during encoding, as explained earlier. We also transmit Θ as metadata and the size of the original image (as explained later) in order to retransform all the pictures to their original respective coordinate systems. FIG.3(d) shows the metadata as the transformations 02ι, Θ31, and 04i. However, some implementations store as metadata the individual parameters Θ. It should also be clear that FIG.3(d) does not show the size of the original image. However, FIG.3(e) shows that the encoded data of one implementation includes pictures Hi, T2: T3, and T4, the transformation metadata for the three transformations 02i, 03i, and Θ41, and the original picture size. We note that the transformed pictures are quadrilaterals and need not necessarily be rectangles. For example, the transformation can change the shape of the picture. This is not the case for the example of FIG. 4. In FIG. 4, the transformation merely changes the horizontal pixel values. Accordingly, the transformed pictures of the example of FIG. 4 are rectangles. However, in a more general case, in which the global motion is not purely horizontal, the transformed pictures will not be rectangles.

Therefore, in at least one implementation, we transform all the pictures and find the size of the minimum enclosing rectangle for all the quadrilaterals. We call this the canvas picture and this is used as the size of each picture in the video we pass to the encoder. Use of a single canvas picture for an entire GOP allows us to line up all of the transformed pictures for the GOP in one large picture, and assists us in identifying existing static spatiotemporal regions in a GOP. This process is also indicated in an operation 230 of FIG. 2.

We note that other implementations use transformations that do not produce quadrilaterals. Rather, such transformations produce more complex results.

Referring to FIG. 5, the simple example of FIG. 4 is continued. FIG. 5 includes a resized transformed picture 510, a resized picture 520, and a resized transformed picture 530. All three pictures 510, 520, and 530, have been resized to a canvas picture (minimum enclosing rectangle) that includes the transformations of the pictures 410 and 430, as well as the picture 420. Note that the transformations of the pictures 410 and 430 are not resized, per se. That is, the transformations of the pictures 410 and 430 retain the characteristics produced by the transformation algorithm. The resizing refers, rather, to the fact that the transformations of the pictures 410 and 430 are individually inserted into a canvas picture having a larger (enclosing) size. The larger (enclosing) size provides a new size, but does not change the transformation. Therefore, the resulting canvas-sized pictures 510 and 530 are larger than the original transformations of the pictures 410 and 430, but include the original transformations of the pictures 410 and 430.

The picture 510 includes a region 515, shown outlined with a dashed box, that corresponds to the transformation of the picture 410. The picture 520 includes a region 525, shown outlined with a dashed box, that corresponds to the picture 420. The picture 530 includes a region 535, shown outlined with a dashed box, that corresponds to the transformation of the picture 430. As can be seen, the regions 515, 525, and 535, if superimposed, will define a minimum enclosing rectangle that is the size of the pictures 510, 520, and 530.

Referring to FIGS. 6 and 7, a more general case is presented. FIG. 6 presumes a video sequence of three pictures, referred to as Picture 1 , Picture 2, and Picture 3, which are not shown. Pictures 2 and 3 are transformed to the coordinate system of Picture 1 , as indicated by FIG. 6. The transformed Picture 2 and the transformed Picture 3 are superimposed on the Picture 1 in FIG. 6. Thus, FIG. 6 defines the minimum enclosing rectangle.

The minimum enclosing rectangle of FIG. 6 is used as the size of all three constituent pictures in FIG. 7. FIG. 7 shows separately the three pictures of: a resized Picture 1 , a resized transformed Picture 2, and a resized transformed Picture 3. The use of the term "resized" carries the same meaning as discussed above with respect to FIG. 4. The three pictures of FIG. 7 are, in certain implementations, provided to an encoder for encoding.

In particular implementations, however, prior to encoding we reorder the pictures, as described in an operation 235 of FIG. 2. In such implementations, we reorder the pictures before encoding such that the transformation reference picture is also an intra- coded picture. The intra-coded picture is generally used as a prediction reference picture in prediction encoding (such as used in, for example, H.264 prediction modes). The purpose of the reordering is to ensure that the intra-coded picture, which is also a prediction reference picture, is not a smoothed picture.

The picture transformation is, in particular implementations, implemented with bilinear interpolation. Bilinear interpolation has a smoothing effect on pictures.

However, because the transformation reference picture is not transformed, the transformation reference picture is not smoothed. Recall, however, that in many implementations, the transformation reference picture is selected to be a picture from the interior of a GOP. Given that many encoders further select only the first picture in a GOP to be an intra-coded picture (and a prediction reference picture), such encoders will be using a transformed (and therefore smoothed) picture as the intra-coded picture. Such implementations will, therefore, be doing prediction-based encoding using a smoothed prediction reference picture.

Therefore, to prevent the use of smoothed pictures as intra-coded pictures and prediction reference pictures, particular implementations reorder the pictures. One such reordering moves the transformation reference picture, assumed to be the median picture in a GOP, so that it is the first picture in the GOP. Further, to prevent temporal discontinuity, the pictures following the median picture in the GOP are also reordered so that they are the second and following pictures of the GOP. Thus, the order of the pictures is retained for the median (original ordering) picture and those pictures following the median picture. Then, after the last picture of the GOP has been reordered, the remaining pictures of the original sequence, which are the first pictures in the original ordering, are reordered to be at the end of the GOP, starting from the first picture. By reordering as just described, the static spatiotemporal volume (that is, the aggregate size of the static spatiotemporal regions) for the GOP is increased for many implementations. Increasing the aggregate size of the static spatiotemporal regions generally increases compression efficiency.

Referring to FIG. 8, the picture reordering process of the above implementation is described graphically. FIG. 8 includes a sequence 810 of three pictures, including Hi, H2, and H3. H2 is selected as a transformation reference picture, and the two pictures Hi and H3 are transformed with respect to H2 so that all three pictures Hi , H2, and H3 are on the coordinate system of H2.

FIG. 8 also includes a sequence 820 of three pictures, including THi, enlarged H2, and TH3. THi is a resized transformed version of Hi. TH3 is a resized transformed version of H3. Enlarged H2 is a resized version of H2. The size of THi, enlarged H2, and TH3 is the minimum enclosing rectangle of the superimposing of the following three pictures: a transformation of Hi , H2, and a transformation of H3. As indicated in FIG. 8, THi and TH3 are smoothed by the transformation.

FIG. 8 also includes a sequence 830 of three pictures. The sequence 830 includes a reordering of the three pictures of the sequence 820. The resized version of the transformation reference picture, H2, is reordered to be the first picture in the GOP. Then, the pictures following the enlarged H2 in the sequence 820 are reordered to follow the enlarged H2 in the sequence 830. This results in TH3 becoming the second picture of the sequence 830. Then, the remaining pictures of the sequence 820, which are the first pictures of the sequence 820 are reordered. These first pictures are inserted after TH3 in the sequence 830. This results in THi becoming the third picture of the sequence 830.

FIG. 8 shows, with respect to the sequence 830, that the enlarged H2 picture is encoded as an intra-coded picture, referred to as an I picture in FIG. 8. Further, TH3 and THi are encoded as inter-coded pictures, referred to as P pictures in FIG. 8. Using typical H.264 encoders, TH3 and THi of the sequence 830 would generally be inter- coded using the I picture (the enlarged H2 picture) as a prediction reference picture.

Encoding is also described in FIG. 2. The process 200 of FIG. 2 includes encoding the reordered video pictures (240), and encoding transformation metadata and an original video picture size (245). The original video picture size is, in this implementation, the original size of all pictures in the GOP. Other implementations have pictures of different sizes and, therefore, transmit information indicating the original sizes of all pictures. The original picture size is used on the decoder side, in at least one implementation, as explained below.

In at least one implementation, on the decoder side, we recover a reconstruction of the input video picture sequence. Referring again to FIG. 2, the process 200 includes accessing and decoding the transformation metadata (250), accessing and decoding the original video picture size (250), and accessing and decoding the video pictures (255). In the implementation of FIG. 3, the decoded pictures include Hi , T2, T3, and T4.

The process 200 includes reordering the decoded video pictures (260) to restore the original order. In the implementation of FIG. 8, reordering includes moving THi to the front of the GOP ahead of the enlarged H2 and TH3.

The process 200 includes retransforming the reordered video pictures (265). Retransforming refers to performing an inverse transformation. The transformation metadata of various implementations indicates the transformation. The metadata of one particular such implementation indicates the transformation by describing the transformation parameters. The metadata of another particular such implementation indicates the transformation by describing the parameters for the inverse transformation.

Note that in performing an inverse transformation, various implementations do not reconstruct the original picture exactly. For example, in many implementations the following aspects are present and contribute to not being able to reconstruct the original picture exactly: (i) the transformation algorithm includes interpolation, (ii) the

transformed pictures are compressed in a lossy manner due to, for example, quantization, (iii) the compressed transformed pictures are decompressed in the lossy manner, and (iv) the decompressed transformed pictures are inverse transformed in a process that includes interpolation.

In various implementations, the transformation that is used is selected from a limited set. In these implementations, additional metadata, in the form of, for example, a flag bit(s), are provided to indicate which of the allowable (available) transformations (motion models) has been used. The decoder uses the flag bit(s) to select, or to help select, the proper inverse transformation to use. In one or more of these

implementations, the available transformations (for example, translational, rotational, or perspective) have different numbers of parameters. Therefore, by including a flag to indicate the selected transformation, these implementations are often able to save bits by using the transformations more frequently that have fewer parameters, and therefore using the more complex models less frequently. Additionally, bits may be saved if a transformation is selected that produces lower residues than another available transformation. However, even if bits are not saved, these implementations provide additional flexibility and may, for certain types of content, provide subjectively better reconstructed pictures.

The process 200 includes resizing the retransformed video pictures (270). A simple example will now be discussed for the implementation of the resizing operation 270, as well as the retransforming operation 265.

In the implementation of FIG. 5, for example, the pictures 510 and 530 are transformed pictures, and the pictures 510, 520, and 530 are resized pictures having the size of an enclosing rectangle. In one implementation, the pictures 510, 520, and 530 of FIG. 5 are encoded and provided to a decoder. Continuing with this implementation, referring to FIG. 9A, there are shown in FIG. 9A decoded versions of the pictures 510, 520, and 530. The decoded versions are referred to as pictures 510', 520', and 530', respectively. The retransforming operations for the pictures 510' and 530' are simple horizontal shifts. After the retransforming operations at a decoder-side, these pictures become pictures 910, 920, and 930, respectively.

Note that all of the content of the pictures 510' and 530' are retained in 910 and 930, respectively. This is analogous to the transforming operation of the pictures 410 and 430, which retained all of the content in producing the transformed pictures 510 and 530.

The pictures 910, 920, and 930 no longer have a common coordinate system. The different coordinate systems are shown by the horizontal staggering of the pictures 910, 920, and 930. The horizontal staggering of the pictures 910, 920, and 930 is implemented to horizontally align common x-coordinates from the three pictures 910, 920, and 930. This is shown by the horizontal alignment of the point (0, 0) in each of the pictures 910, 920, and 930.

Note that in one or more implementations the metadata includes (i) the transformation parameters, (ii) the origin of the reference picture, and (iii) the size of the original picture. The last two items of information are typically two integers each, and so do not typically represent significant overhead. Additionally, the transformation parameters typically provide the position of each transformation (for example, quadrilateral) on the canvas picture relative to the origin. Thus, the transformation parameters and the origin allow a determination of the exact extent and position of the transformed picture on the canvas picture.

The pictures 910, 920, and 930 are resized by extracting portions having the original picture size. The resizing thus uses the decoded original picture size value. The original picture size is indicated by the dashed lines in FIG. 9A that carve out, or extract, a portion of each of the pictures 910, 920, and 930. The extracted portions from the pictures 910, 920, and 930 are shown as regions 915, 925, and 935, respectively. The regions 915, 925, and 935 correspond to the original pictures 410, 420, and 430, respectively. Referring to FIG. 9B, a retransformation and resizing process of another implementation is depicted. FIG. 9B provides an alternate way to conceptualize the retransformation. The implementation of FIG. 9B knows the size of the pictures that were encoded, and if the retransformation pushes content outside of that size, then that content is simply ignored. In the implementation depicted by FIG. 9B, the decoded pictures 510', 520', and 530' are retransformed into pictures 940, 920 (same as in FIG. 9A), and 960, respectively.

In contrast to the pictures 910 and 930 of FIG. 9A, not all of the content of the pictures 510' and 530' are retained in the pictures 940 and 960, respectively. However, assuming that the original picture sizes of the pictures 410, 420, and 430 are the same, there will be no loss of content from the original pictures 410, 420, and 430. As in FIG. 9A, the pictures 940, 920, and 960 no longer have a common coordinate system (for example, the tree 440 is not in the same place in all three pictures 940, 920, and 960).

The pictures 940, 920, and 960 are resized by extracting portions having the original picture size. The resizing thus uses the decoded original picture size value. The original picture size is indicated by the dashed lines in FIG. 9B that carve out, or extract, a portion of each of the pictures 940, 920, and 960. The extracted portions from the pictures 940, 920, and 960 are shown as regions 915, 925, and 935, respectively. The regions 915, 925, and 935 (as in FIG. 9A) correspond to the original pictures 410, 420, and 430, respectively.

Note that the operations on the encoder-side of the process 200 do not mirror, in order, the operations on the decoder-side of the process 200. Specifically, the encoder- side performs transformation (225), resizing (230), and reordering (235), but the decoder-side performs reordering (260), retransformation (265), and resizing (270). So the decoder-side reverses the order by performing retransformation prior to resizing. This occurs because, in this implementation, the decoder-side does not necessarily have enough information to resize prior to retransformation. More specifically, the decoder-side does not receive metadata indicating the exact location of the transformed picture within the enclosing rectangle. Other implementations do indeed encode and transmit this information by, for example, including metadata identifying the four corners of the transformed picture, which is in general a quadrilateral. Accordingly, this implementation retransforms the entire enclosing rectangle, and then extracts (resizes) the appropriate region.

Referring to FIG. 10, there is shown a process 1000 for transforming a picture. The process 1000 includes transforming a first picture to remove at least some motion occurring between the first picture and a second picture (1010). The operation 1010 is performed, in one implementation, by the operation 225 of the process 200.

The process 1000 further includes providing the transformed first picture, and one or more parameters indicating the transformation, for encoding (1020). The operation 1020 occurs, in one implementation, prior to the encoding operations 240 and 245 of the process 200.

Referring to FIG. 1 1 , there is shown a process 1 100 for retransforming (also referred to as inverse transforming) a picture. The process 1 100 includes accessing a decoded version of a transformation of a first picture (1 1 10). The operation 1 1 10 occurs, in one implementation, after the decoding operation 255 of the process 200.

The process 1 100 includes accessing one or more decoded parameters indicating the transformation, wherein the transformation is based on motion between the first picture and a second picture and removes at least some motion occurring between the first picture and the second picture (1 120). The operation 1 120 occurs, in one implementation, after the decoding operation 250 of the process 200.

The process 1 100 includes inverse transforming the decoded transformation of the first picture to restore all or part of the motion (1 130). The operation 1 130 is performed, in one implementation, by the retransformation operation 265 of the process 200.

Note that various implementations perform decoding at the encoding side in order to provide reconstructions of the encoded data. Providing such reconstructions at the encoder allows, for example, the encoder to use the same pictures (reconstructed pictures) for prediction reference pictures that the decoder uses. Providing such reconstructions at the encoder also allows, for example, the encoder to more precisely determine the resulting distortion from various encoding modes.

Some implementations perform the retransformation operation at the encoder side. This allows the encoder to determine, for example, a distortion that is based on retransformed pictures rather than merely on transformed pictures. Such distortion measures provide better measures of end-to-end quality in various applications in which the retransformed pictures are of the most interest. Such distortion measures are particularly useful in certain applications for providing, for example, better selections of coding modes so as to increase the quality of the retransformed pictures. For example, in certain implementations, the encoder determines, based on the distortion measures, whether to encode a GOP normally or with a transformation.

Various implementations describe an apparatus for performing transformation and/or retransformation, or more generally, for performing one or more of the operations of the process 200. The structure used in these implementations can vary. Examples are provided throughout this application, and several are provided below.

- Some of these implementations include a means for transforming a first picture to remove at least some motion occurring between the first picture and a second picture. Examples of such a means, provided in various implementations, include one or more of the following, either individually or in combination: a processor and more particularly a processor programmed to perform the transforming operation, an application specific integrated circuit designed to perform the transforming operation, hardware circuit elements appropriately configured and interconnected to perform the transforming operation, or programmable logic appropriately programmed to perform the transforming operation.

- Some of these implementations include means for providing the transformed first picture and one or more parameters indicating the transformation for encoding. Examples of such a means, provided in various implementations, include one or more of the following, either individually or in combination: a register, a cache, a latch, another memory or storage device, a pin, trace, or a function return or a function call (returning information from, or supplying information to, a software function routine) being performed by a processor.

- Some of these implementations include means for receiving a decoded version of a transformation of a first picture and for accessing one or more decoded parameters indicating the transformation, wherein the transformation is based on motion between the first picture and the second picture and removes at least some motion occurring between the first picture and the second picture. Examples of such a means, provided in various implementations, include one or more of the following, either individually or in combination: a register, a cache, a latch, another memory or storage device, a pin, a trace, or a function return or a function call (returning information from, or supplying information to, a software function routine) being performed by a processor.

- Some of these implementations include means for inverse transforming the decoded transformation of the first picture to restore all or part of the motion. Examples of such a means, provided in various implementations, include one or more of the following, either individually or in combination: a processor and more particularly a processor programmed to perform the inverse transforming operation, an application specific integrated circuit designed to perform the inverse transforming operation, hardware circuit elements appropriately configured and interconnected to perform the inverse transforming operation, or programmable logic appropriately programmed to perform the inverse transforming operation.

Referring to FIG. 12, an encoder 1200 depicts an implementation of an encoder that is used, in various implementations, to encode images such as, for example, video images or depth images. The encoder 1200 is also used, in particular implementations, to encode data, such as, for example, metadata providing information about the encoded bitstream. The encoder 1200 is implemented, in one implementation, as part of, for example, a video transmission system as described below with respect to FIG. 14. It should also be clear that the blocks of FIG. 12 provide a flow diagram of an encoding process, in addition to providing a block diagram of an encoder.

An input image sequence arrives at an adder 1201 , as well as at a displacement compensation block 1220, and a displacement estimation block 1218. Note that displacement refers, for example, to either motion displacement or disparity

displacement. The input image sequence is, in one implementation, a depth sequence. Another input to the adder 1201 is one of a variety of possible reference picture information items received through a switch 1223.

For example, in a first scenario a mode decision module 1224 in signal communication with the switch 1223 determines that the encoding mode should be intra-prediction with reference to a block from the same picture (for example, a depth picture) currently being encoded. In this first scenario, the adder 1201 receives its input from an intra-prediction module 1222. Alternatively, in a second scenario, the mode decision module 1224 determines that the encoding mode should be displacement compensation and estimation with reference to a picture that is different (for example, a different time, or view, or both) from the picture currently being encoded. In this second scenario, the adder 1201 receives its input from the displacement compensation module 1220.

In various implementations, the intra-prediction module 1222 provides a predetermined predictor based on one or more blocks that are neighboring blocks to a block being encoded. In various implementations, the intra-prediction module 1222 provides a predictor (a reference) by searching within the picture being encoded for the best reference block.

More specifically, several such predictor-based implementations search within a reconstruction of those portions of the current picture that have already been encoded. In some implementations, the searching is restricted to blocks that lie on the existing block boundaries. However, in other implementations, the searching is allowed to search blocks regardless of whether those blocks cross existing block boundaries. Because of the searching, such implementations are often more time-intensive and processor-intensive than merely using predetermined neighboring blocks as the references. However, such implementations typically offer the advantage of finding a better prediction of a given block.

Such implementations may lead to a best estimate Intra-prediction block.

Additionally, in various implementations, the boundaries of the reference block can lie on a sub-pixel boundary, and recovery of the reference involves an interpolation step to restore the actual block to be used as a reference during decoding. Depending on the content of the pictures, such sub-pixel interpolation implementations may improve compression efficiency compared to the use of neighboring blocks as references.

The adder 1201 provides a signal to a transform module 1202, which is configured to transform its input signal and provide the transformed signal to a quantization module 1204. The quantization module 1204 is configured to perform quantization on its received signal and output the quantized information to an entropy encoder 1205. The entropy encoder 1205 is configured to perform entropy encoding on its input signal to generate a bitstream. An inverse quantization module 1206 is configured to receive the quantized signal from quantization module 1204 and perform inverse quantization on the quantized signal. In turn, an inverse transform module 1208 is configured to receive the inverse quantized signal from the inverse quantization module 1206 and perform an inverse transform on its received signal. The output of the inverse transform module 1208 is a reconstruction of the signal that is output from the adder 1201.

An adder (more generally referred to as a combiner) 1209 adds (combines) signals received from the inverse transform module 1208 and the switch 1223 and outputs the resulting signal to the intra-prediction module 1222, and an in-loop filter 1210. The resulting signal is a reconstruction of the image sequence signal that is input to the encoder 1200.

The intra-prediction module 1222 performs intra-prediction, as discussed above, using its received signals. The in-loop filter 1210 filters the signals received from the adder 1209 and provides filtered signals to a depth reference buffer 1212. The depth reference buffer 1212 provides image information to the displacement estimation and compensation modules 1218 and 1220. The in-loop filter is, in one implementation, a deblocking filter.

Metadata may be added to the encoder 1200 as encoded metadata and combined with the output bitstream from the entropy coder 1205. Alternatively, for example, unencoded metadata may be input to the entropy coder 1205 for entropy encoding along with the quantized image sequences.

Data is also provided to the output bitstream by the mode decision module 1224. The mode decision module 1224 provides information to the bitstream that indicates the mode used to encode a given block. Such information often includes an indication of the location of the reference block. For example, in various implementations that use intra-prediction and that perform a search of the current picture to find a reference block, the mode decision module 1224 indicates the location of the reference using a disparity vector. The disparity vector information may be provided to the mode decision module 1224 by the intra-prediction module 1222. As further described below, the disparity vector information may be differentially coded using the disparity vector of a neighboring macroblock as a reference. In addition, disparity vectors for a picture may be grouped and additionally encoded to remove entropy since there is likely to be spatial similarity in disparity vectors.

Referring to FIG. 13, a decoder 1300 depicts an implementation of a decoder that may be used to decode images, such as, for example, depth images. The decoded images are provided, in one implementation, to a rendering device for producing additional views based on the depth data. The decoder 1300 is used, in other implementations, for example, to decode metadata providing information about the decoded bitstream, and/or to decode video data. In one implementation, the decoder 1300 is implemented as part of, for example, a video receiving system as described below with respect to FIG. 15. It should also be clear that the blocks of FIG. 13 provide a flow diagram of a decoding process, in addition to providing a block diagram of a decoder.

The decoder 1300 is configured to receive a bitstream using a bitstream receiver

1302. The bitstream receiver 1302 is in signal communication with a bitstream parser 1304 and provides the bitstream to the bitstream parser 1304.

The bitstream parser 1304 is configured to transmit a residue bitstream to an entropy decoder 1306, to transmit control syntax elements to a mode selection module 1316, and to transmit displacement (motion/disparity) vector information to a

displacement compensation module 1326.

The displacement vector information may be, for example, motion vector information or disparity vector information. Motion vector information is typically used in inter-prediction to indicate relative motion from a previous image. Disparity vector information is typically used in either (i) inter-prediction to indicate disparity with respect to a separate image or (ii) intra-prediction to indicate disparity with respect to a portion of the same image. As is known in the art, disparity typically indicates the relative offset, or displacement, between two images. Disparity may also be used to indicate the relative offset, or displacement, between two portions of an image.

An inverse quantization module 1308 performs inverse quantization on an entropy decoded signal received from the entropy decoder 1306. In addition, an inverse transform module 1310 is configured to perform an inverse transform on an inverse quantized signal received from the inverse quantization module 1308 and to output the inverse transformed signal to an adder (also referred to as a combiner) 1312.

The adder 1312 can receive one of a variety of other signals depending on the decoding mode employed. For example, in one implementation, the mode decision module 1316 determines whether displacement compensation or intra-prediction encoding was performed by the encoder on the currently processed block by parsing and analyzing the control syntax elements. Depending on the determined mode, the mode selection control module 1316 accesses and controls a switch 1317, based on the control syntax elements, so that the adder 1312 receives signals from the displacement compensation module 1326, or an intra-prediction module 1318.

Here, the intra-prediction module 1318 is configured to perform intra-prediction to decode a block using references to the same picture currently being decoded. In turn, the displacement compensation module 1326 is configured to perform displacement compensation to decode a block. The decoding uses references to a block of another previously processed picture (from a different time or view, or both, for example) that is different from the picture currently being decoded.

After receiving prediction or compensation information signals, the adder 1312 adds the prediction or compensation information signals with the inverse transformed signal for transmission to an in-loop filter 1314. The in-loop filter 1314 is, for example, a deblocking filter that filters out blocking artifacts. The adder 1312 also outputs the added signal to the intra-prediction module 1318 for use in intra-prediction.

The in-loop filter 1314 is configured to filter its input signal and output decoded pictures. Further, the in-loop filter 1314 provides the filtered signal to a depth reference buffer 1320. The depth reference buffer 1320 is configured to parse its received signal to permit and aid in displacement compensation decoding by the displacement compensation module 1326, to which the depth reference buffer 1320 provides parsed signals. Such parsed signals may be, for example, all or part of various pictures that may have been used as a reference.

Metadata may be included in a bitstream provided to the bitstream receiver 1302.

The metadata may be parsed by the bitstream parser 1304, and decoded by the entropy decoder 1306. The decoded metadata may be extracted from the decoder 1300 after the entropy decoding using an output (not shown).

Referring now to FIG. 14, a video transmission system or apparatus 1400 is shown, to which various features and principles described above may be applied. The video transmission system or apparatus 1400 may be, for example, a head-end or transmission system for transmitting a signal using any of a variety of media, such as, for example, satellite, cable, telephone-line, or terrestrial broadcast. The video transmission system or apparatus 1400 also, or alternatively, may be used, for example, to provide a signal for storage. The transmission may be provided over the Internet or some other network. The video transmission system or apparatus 1400 is capable of generating and delivering, for example, video content and other content such as, for example, indicators of depth including, for example, depth and/or disparity values. It should also be clear that the blocks of FIG. 14 provide a flow diagram of a video transmission process, in addition to providing a block diagram of a video transmission system or apparatus.

The video transmission system or apparatus 1400 receives input video from a processor 1401 . In one implementation, the processor 1401 simply provides video images, such as the pictures 410, 420, and 430 of FIG. 4, or the pictures 510, 520, and 530 of FIG. 5, to the video transmission system or apparatus 1400. However, in another implementation, the processor 1401 alternatively, or additionally, provides depth images, to the video transmission system or apparatus 1400. The processor 1401 may also provide metadata to the video transmission system or apparatus 1400, in which the metadata relates to one or more of the input images. The metadata is, in one implementation, the transformation metadata and/or original picture size of FIG. 3(e). Additionally, the processor 1401 is, in one implementation, a processor configured for performing, for example, the operations 1 10-130 of the process 100, the operations 205-235 of the process 200, or the process 1000.

The video transmission system or apparatus 1400 includes an encoder 1402 and a transmitter 1404 capable of transmitting the encoded signal. The encoder 1402 receives video information from the processor 1401 . The video information may include, for example, video images, and/or disparity (or depth) images. The encoder 1402 generates an encoded signal(s) based on the video and/or disparity information. The encoder 1402 is, in one implementation, the encoder 1 100 of FIG. 1 1 .

In various implementations, the encoder 1402 is, for example, an AVC encoder. The AVC encoder may be applied to both video and disparity information. AVC refers to the existing International Organization for Standardization/International

Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-4 (MPEG-4) Part 10 Advanced Video Coding (AVC) standard/International Telecommunication Union, Telecommunication Sector (ITU-T) H.264 Recommendation (referred to throughout this application as the "H.264/MPEG-4 AVC Standard" or variations thereof, such as the "AVC standard", the "H.264 standard", or simply "AVC" or "H.264").

The encoder 1402 may include sub-modules, including for example an assembly unit for receiving and assembling various pieces of information into a structured format for storage or transmission. The various pieces of information may include, for example, encoded or unencoded video, encoded or unencoded disparity (or depth) values, and encoded or unencoded elements such as, for example, motion vectors, coding mode indicators, and syntax elements. In some implementations, the encoder 1402 includes the processor 1401 and therefore performs the operations of the processor 1401.

The transmitter 1404 receives the encoded signal(s) from the encoder 1402 and transmits the encoded signal(s) in one or more output signals. The transmitter 1404 may be, for example, adapted to transmit a program signal having one or more bitstreams representing encoded pictures and/or information related thereto. Typical transmitters perform functions such as, for example, one or more of providing error- correction coding, interleaving the data in the signal, randomizing the energy in the signal, and modulating the signal onto one or more carriers using a modulator 1406. The transmitter 1404 may include, or interface with, an antenna (not shown). Further, implementations of the transmitter 1404 may be limited to the modulator 1406.

The video transmission system or apparatus 1400 is also communicatively coupled to a storage unit 1408. In one implementation, the storage unit 1408 is coupled to the encoder 1402, and the storage unit 1408 stores an encoded bitstream from the encoder 1402 and, optionally, provides the stored bitstream to the transmitter 1404. In another implementation, the storage unit 1408 is coupled to the transmitter 1404, and stores a bitstream from the transmitter 1404. The bitstream from the transmitter 1404 may include, for example, one or more encoded bitstreams that have been further processed by the transmitter 1404. The storage unit 1408 is, in different

implementations, one or more of a standard DVD, a Blu-Ray disc, a hard drive, or some other storage device.

Referring now to FIG. 15, a video receiving system or apparatus 1500 is shown to which the features and principles described above may be applied. The video receiving system or apparatus 1500 may be configured to receive signals over a variety of media, such as, for example, satellite, cable, telephone-line, or terrestrial broadcast. The signals may be received over the Internet or some other network. It should also be clear that the blocks of FIG. 15 provide a flow diagram of a video receiving process, in addition to providing a block diagram of a video receiving system or apparatus.

The video receiving system or apparatus 1500 may be, for example, a cell- phone, a computer, a set-top box, a television, or other device that receives encoded video and provides, for example, decoded video signal for display (display to a user, for example), for processing, or for storage. Thus, the video receiving system or apparatus 1500 may provide its output to, for example, a screen of a television, a computer monitor, a computer (for storage, processing, or display), or some other storage, processing, or display device.

The video receiving system or apparatus 1500 is capable of receiving and processing video information, and the video information may include, for example, video images, and/or disparity (or depth) images. The video receiving system or apparatus 1500 includes a receiver 1502 for receiving an encoded signal, such as, for example, the signals described in the implementations of this application. The receiver 1502 may receive, for example, a signal providing an encoding of one or more of the pictures 410, 420, and 430 of FIG. 4, or of the pictures 510, 520, and 530 of FIG. 5. Alternatively, the receiver 1502 may receive a signal output from the video transmission system 1400 (for example, from the storage unit 1408 or the transmitter 1404) of FIG. 14.

The receiver 1502 may be, for example, adapted to receive a program signal having a plurality of bitstreams representing encoded pictures (for example, video pictures or depth pictures). Typical receivers perform functions such as, for example, one or more of receiving a modulated and encoded data signal, demodulating the data signal from one or more carriers using a demodulator 1504, de-randomizing the energy in the signal, de-interleaving the data in the signal, and error-correction decoding the signal. The receiver 1502 may include, or interface with, an antenna (not shown).

Implementations of the receiver 1502 may be limited to the demodulator 1504.

The video receiving system or apparatus 1500 includes a decoder 1506. The decoder 1506 is, in one implementation, the decoder 1200 of FIG. 12.

The receiver 1502 provides a received signal to the decoder 1506. The signal provided to the decoder 1506 by the receiver 1502 may include one or more encoded bitstreams. The decoder 1506 outputs a decoded signal, such as, for example, decoded video signals including video information, or decoded depth signals including depth information. The decoder 1506 may be, for example, an AVC decoder.

The video receiving system or apparatus 1500 is also communicatively coupled to a storage unit 1507. In one implementation, the storage unit 1507 is coupled to the receiver 1502, and the receiver 1502 accesses a bitstream from the storage unit 1507 and/or stores a received bitstream to the storage unit 1507. In another implementation, the storage unit 1507 is coupled to the decoder 1506, and the decoder 1506 accesses a bitstream from the storage unit 1507 and/or stores a decoded bitstream to the storage unit 1507. The bitstream accessed from the storage unit 1507 includes, in different implementations, one or more encoded bitstreams. The storage unit 1507 is, in different implementations, one or more of a standard DVD, a Blu-Ray disc, a hard drive, or some other storage device.

The output video from the decoder 1506 is provided, in one implementation, to a processor 1508. The processor 1508 is, in one implementation, a processor configured for performing, for example, the operations 160-180 of the process 100, the operations 260-270 of the process 200, or the process 1 100. In some implementations, the decoder 1506 includes the processor 1508 and therefore performs the operations of the processor 1508. In other implementations, the processor 1508 is part of a downstream device such as, for example, a set-top box or a television. Referring again to FIG. 3(e), the encoded data of that implementation includes encodings of the indicated pictures, the transformation metadata, and the original picture size. The encoded data is provided, in various implementations, as a signal or a signal structure. The signal and/or the signal structure is stored, in various

implementations, on a processor-readable medium.

More specifically, in one implementation, a signal is formatted to include information, and the signal includes metadata indicating a transformation performed on a first picture to remove at least some motion occurring between the first picture and a second picture. The signal further includes a transformed first picture, wherein the transformed first picture is a transformation of the first picture using the transformation indicated by the metadata. In other implementations, the signal also includes the second picture. In yet other implementations, the signal also includes a size portion indicating a previous size of the first picture. In various implementations, the metadata is encoded metadata, the second picture is an encoded second picture, and the transformed first picture is an encoded transformed first picture.

In an implementation, a signal structure includes a metadata portion for metadata indicating a transformation performed on a first picture to remove at least some motion occurring between the first picture and a second picture. The signal structure further includes a first picture portion for a transformed first picture, wherein the transformed first picture is a transformation of the first picture using the transformation indicated by the metadata. In some implementations, the signal structure further includes a second picture portion for the second picture.

It should be clear that the signals described above are included in various implementations described in this application. For example, a signal including (i) a transformed and encoded picture and/or (ii) encoded metadata is provided, in various implementations, as output from the encoder 1200, as output from the encoder 1402, as input to the decoder 1300, or as input to the decoder 1506. As another example, a signal including a transformed picture and/or metadata is provided, in various implementations, as output from the processor 1401 , or as input to the processor 1508. In various implementations, there is a loss of quality due to the transformation and retransformation of pictures. The loss occurs because bilinear interpolation is used for the transformation and retransformation.

- To reduce these losses, various implementations use smart interpolation techniques as are known in the art.

- Another implementation casts the retransformation problem as an optimization problem in which we minimize the error between the retransformed image and the original image. This implementation analyzes the optimization problem in a least- squares framework. The implementation modifies the least squares minimization framework to account for rectangular blurring matrices which will be constructed because the size of the picture after transformation will change.

- The blurring matrix of this implementation is used to convert the whole bilinear interpolation algorithm into a matrix-vector multiplication. The vector is all the pixels in the original image (say M). The matrix combines the pixels in the original image in some fashion and returns another vector which represents the transformed image (say M").

- As shown earlier, if we have an X-shift as the transformation, then the size of the image does not change. In such a case we will have a square blurring matrix. Note that each row of the blurring matrix shows us how to combine pixels in the original image to obtain a new transformed pixel value. Therefore, the number of rows in the blurring matrix is equal to the number of pixels in the transformed image.

- In this algorithm, the transformed images can have different sizes so we use more (or less) rows in the blurring matrix depending on the transformation. Hence, we can have a rectangular matrix in which the number of rows is not necessarily equal to the number of columns. That is, the number of pixels in the transformed image is not necessarily the same as the number in the original image.

- Yet another implementation provides a small amount of overhead information. The overhead information aids the optimization algorithm in retransforming the image.

Various implementations have been described that relate, explicitly, to a GOP. However, this is not intended to limit the applicability of those or other implementations. - Such implementations, and the related features, however, apply to other non- GOP implementations. The features and implementations are applicable to any series of pictures including at least two or more pictures, whether part of a video sequence or not. This includes, for example, an entire movie, an entire scene, a series of GOPs, or just two pictures.

- Indeed, the pictures need not be related in a fixed temporal manner as is common with video. In particular implementations, for example, isolated pictures are used that have been extracted (sampled) from a video at uneven sampling times. Other implementations use separate still pictures of, for example, a common scene.

- Further, the features and implementations are also applicable to pictures that are not even temporally distinct. In various implementations, for example, the features and implementations are applicable to pictures that are from different scalable layers of, for example, a common picture. For example, in one implementation, a first picture is a base layer of a common picture and a second picture is an enhancement layer of the common picture. In another example, pictures from separate views at the same time are used.

- The features and implementations are also applicable to pictures that are not capturing information that the human eye typically sees (that is, for example, not capturing video or still shots). For example, pictures capturing depth, disparity, edges, exposures, or images capturing frequencies outside a normal viewing range can all be used in various implementations that are described in this application.

This application provides multiple block/flow diagrams, including the block/flow diagrams of FIGS. 1 -2 and 10-15. It should be clear that the block/flow diagrams of this application present both a flow diagram describing a process, and a block diagram describing functional blocks of an apparatus. Additionally, this application provides multiple pictorial representations, including the pictorial representations of FIGS. 3-9B. It should be clear that the pictorial representations of this application present both (i) an illustration, a result, or an output, and (ii) a flow diagram describing a process.

Additionally, there are many implementations described in this application, including implementations of the block/flow diagrams of FIGS. 1 -2 and 10-15, as well as the implementations depicted and described with respect to the pictorial representations of FIGS. 3-9B. Many of the operations, blocks, inputs, or outputs of these

implementations are optional, even if not explicitly stated in the descriptions and discussions of these implementations. For example, in the process 100, it should be clear that, for example, encoding (140) and decoding (150) are optional. Additionally, in the process 200, it should be clear that, for example, resizing (230, 270) and reordering (235, 260) are optional. The mere recitation of a feature in a particular implementation does not indicate that the feature is mandatory for all implementations. Indeed, the opposite conclusion should generally be the default, and all features are considered optional unless such a feature is stated to be required. Even if a feature is stated to be required, that requirement is intended to apply only to that specific implementation, and other implementations are assumed to be free from such a requirement.

Various implementations may have one or more of a variety of advantages. A partial list of these advantages includes: (i) low complexity, (ii) increased compression efficiency, (iii) reducing the number or size of motion vectors, or (iv) reducing the number or size of residues.

We thus provide one or more implementations having particular features and aspects. In particular, we provide several implementations relating to predicting depth indicators. However, variations of these implementations and additional applications are contemplated and within our disclosure, and features and aspects of described implementations may be adapted for other implementations.

Several of the implementations and features described in this application may be used in the context of the AVC Standard, and/or AVC with the MVC extension (Annex H), and/or AVC with the SVC extension (Annex G). Additionally these implementations and features may be used in the context of another standard (existing or future), or in a context that does not involve a standard.

Reference to "one embodiment" or "an embodiment" or "one implementation" or "an implementation" of the present principles, as well as other variations thereof, mean that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment of the present principles. Thus, the appearances of the phrase "in one embodiment" or "in an embodiment" or "in one implementation" or "in an implementation", as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment.

Additionally, this application or its claims may refer to "determining" various pieces of information. Determining the information may include one or more of, for example, estimating the information, calculating the information, predicting the information, or retrieving the information from memory.

Further, this application or its claims may refer to "accessing" various pieces of information. Accessing the information may include one or more of, for example, receiving the information, retrieving the information (for example, memory), storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the

information, determining the information, predicting the information, or estimating the information.

Additionally, this application or its claims may refer to "receiving" various pieces of information. Receiving is, as with "accessing", intended to be a broad term.

Receiving the information may include one or more of, for example, accessing the information, or retrieving the information (for example, from memory). Further,

"receiving" is typically involved, in one way or another, during operations such as, for example, storing the information, processing the information, transmitting the

information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.

Various implementations refer to "images" and/or "pictures". The terms "image" and "picture" are used interchangeably throughout this document, and are intended to be broad terms. An "image" or a "picture" may be, for example, all or part of a frame or of a field. The term "video" refers to a sequence of images (or pictures). An image, or a picture, may include, for example, any of various video components or their

combinations. Such components, or their combinations, include, for example, luminance, chrominance, Y (of YUV or YCbCr or YPbPr), U (of YUV), V (of YUV), Cb (of YCbCr), Cr (of YCbCr), Pb (of YPbPr), Pr (of YPbPr), red (of RGB), green (of RGB), blue (of RGB), S-Video, and negatives or positives of any of these components. An "image" or a "picture" may also, or alternatively, refer to various different types of content, including, for example, typical two-dimensional video, a disparity map for a 2D video picture, a depth map that corresponds to a 2D video picture, or an edge map.

Further, many implementations may refer to a "frame". However, such implementations are assumed to be equally applicable to a "picture" or "image".

A "depth map", or "disparity map", or "edge map", or similar terms are also intended to be broad terms. A map generally refers, for example, to a picture that includes a particular type of information. However, a map may include other types of information not indicated by its name. For example, a depth map typically includes depth information, but may also include other information such as, for example, video or edge information.

It is to be appreciated that the use of any of the following 7", "and/or", and "at least one of, for example, in the cases of "A/B", "A and/or B" and "at least one of A and B", is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B). As a further example, in the cases of "A, B, and/or C" and "at least one of A, B, and C" and "at least one of A, B, or C", such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C). This may be extended, as readily apparent by one of ordinary skill in this and related arts, for as many items listed.

Additionally, many implementations may be implemented in one or more of an encoder (for example, the encoder 1 100 or 1402), a decoder (for example, the decoder 1200 or 1506), a post-processor (for example, the processor 1508) processing output from a decoder, or a pre-processor (for example, the processor 1401 ) providing input to an encoder. Further, other implementations are contemplated by this disclosure.

The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus or program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users.

Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications, particularly, for example, equipment or applications associated with data encoding, data decoding, view generation, depth or disparity processing, and other processing of images and related depth and/or disparity maps. Examples of such equipment include an encoder, a decoder, a post-processor processing output from a decoder, a pre-processor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set- top box, a laptop, a personal computer, a cell phone, a PDA, and other communication devices. As should be clear, the equipment may be mobile and even installed in a mobile vehicle.

Additionally, the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an

implementation) may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette ("CD"), an optical disc (such as, for example, a DVD, often referred to as a digital versatile disc or a digital video disc), a random access memory ("RAM"), or a read-only memory ("ROM"). The instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination. Instructions may be found in, for example, an operating system, a separate application, or a combination of the two. A processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor- readable medium (such as a storage device) having instructions for carrying out a process. Further, a processor-readable medium may store, in addition to or in lieu of instructions, data values produced by an implementation.

As will be evident to one of skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry as data the rules for writing or reading the syntax of a described embodiment, or to carry as data the actual syntax-values written by a described embodiment. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.

A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this application.

Claims

1. A method comprising:
transforming a first picture to remove at least some motion occurring between the first picture and a second picture; and
providing the transformed first picture and one or more parameters indicating the transformation for encoding.
2. The method of claim 1 wherein the first picture occurs after the second picture in a display order.
3. The method of claim 1 wherein the motion is a motion between objects included in the first picture and in the second picture.
4. The method of claim 1 wherein the motion is a motion that characterizes multiple objects that occur in the first picture and in the second picture.
5. The method of claim 1 wherein the motion is a motion that characterizes at least a portion of background in the first picture and the second picture.
6. The method of claim 1 wherein the motion includes non-translational motion.
7. The method of claim 1 wherein the transforming occurs in a pre-encoder and the encoding occurs in an encoder.
8. The method of claim 1 further comprising determining the motion between the first picture and the second picture.
9. The method of claim 1 further comprising encoding the transformed first picture, the second picture, and the one or more parameters indicating the
transformation.
10. The method of claim 1 wherein transforming the first picture comprises transforming the first picture to a coordinate system of the second picture, and the method further comprises:
determining an enclosing picture size that encloses the transformed first picture and the second picture; and
using the enclosing picture size as a size for the transformed first picture and for the second picture.
11 . The method of claim 10 further comprising encoding (i) the transformed first picture having the enclosing picture size, (ii) the second picture having the enclosing picture size, (iii) the one or more parameters indicating the transformation, and (iv) one or more parameters indicating an original size of the second picture.
12. The method of claim 1 1 wherein encoding comprises encoding the second picture as an intra-coded picture.
13. The method of claim 12 wherein encoding the transformed first picture comprises encoding the transformed first picture using the second picture as a prediction reference picture.
14. The method of claim 13 wherein:
the first picture and the second picture are part of a Group of Pictures, the Group of Pictures including one or more additional pictures in addition to the first picture and the second picture, and
the method further comprises:
transforming the additional pictures in the Group of Pictures to a coordinate system of the second picture, based on motion between the additional pictures and the second picture;
sizing the transformed additional pictures to have the enclosing picture size; and encoding the transformed additional pictures, wherein the encoding of one or more of the transformed additional pictures uses the second picture as a prediction reference picture.
15. The method of claim 1 wherein:
the first picture and the second picture are part of a Group of Pictures, the Group of Pictures including one or more additional pictures in addition to the first picture and the second picture, and
the method further comprises:
transforming the additional pictures in the Group of Pictures to a coordinate system of the second picture, based on motion between the additional pictures and the second picture;
encoding the second picture; and
encoding the transformed first picture and the transformed additional pictures, wherein the encoding of one or more of the transformed first picture and the transformed additional pictures uses the second picture as a prediction reference picture.
16. The method of claim 15 further comprising:
determining an enclosing picture size that encloses the transformed first picture, the second picture, and the transformed additional pictures; and
sizing the transformed first picture, the second picture, and the transformed additional pictures to have the enclosing picture size,
wherein the encoding of the transformed first picture, the second picture, and the transformed additional pictures encodes pictures having the enclosing picture size.
17. The method of claim 15 wherein the encoding of the second picture encodes the second picture as an intra-coded picture.
18. The method of claim 1 wherein the one or more parameters indicating the transformation comprise one or more parameters describing an inverse of the transformation.
19. The method of claim 1 further comprising encoding the transformed first picture using the second picture as a prediction reference picture.
20. The method of claim 19 wherein the first picture occurs after the second picture in a display order.
21. The method of claim 1 wherein transforming the first picture changes a coordinate system of the first picture.
22. The method of claim 21 wherein transforming the first picture changes the coordinate system of the first picture to a coordinate system of the second picture.
23. An apparatus comprising one or more processors collectively configured for performing at least the following operations:
transforming a first picture to remove at least some motion occurring between the first picture and a second picture; and
providing the transformed first picture and one or more parameters indicating the transformation for encoding.
24. The apparatus of claim 23 further comprising an encoder for encoding the transformed first picture, the second picture, and the one or more parameters indicating the transformation.
25. A processor readable medium having stored thereon instructions for causing one or more processors to collectively perform:
transforming a first picture to remove at least some motion occurring between the first picture and a second picture; and
providing the transformed first picture and one or more parameters indicating the transformation for encoding.
26. An apparatus comprising:
means for transforming a first picture to remove at least some motion occurring between the first picture and a second picture; and
means for providing the transformed first picture and one or more parameters indicating the transformation for encoding.
27. An apparatus comprising:
one or more processors collectively configured for transforming a first picture to remove at least some motion occurring between the first picture and a second picture; an encoder configurable for encoding the transformed first picture and the one or more parameters indicating the transformation; and
a modulator configurable for modulating a signal with the encoded transformed first picture and the encoded one or more parameters.
28. A signal formatted to include information, the signal comprising:
metadata indicating a transformation performed on a first picture to remove at least some motion occurring between the first picture and a second picture; and
a transformed first picture, wherein the transformed first picture is a
transformation of the first picture using the transformation indicated by the metadata.
29. The signal of claim 28 further comprising a size portion indicating a previous size of the first picture.
30. The signal of claim 28 wherein:
the metadata is encoded metadata, and
the transformed first picture is an encoded transformed first picture.
31 . A signal structure comprising:
a metadata portion for metadata indicating a transformation performed on a first picture to remove at least some motion occurring between the first picture and a second picture; and
a first picture portion for a transformed first picture, wherein the transformed first picture is a transformation of the first picture using the transformation indicated by the metadata.
32. A processor readable medium having stored thereon a signal structure, the signal structure comprising:
a metadata portion for metadata indicating a transformation performed on a first picture to remove at least some motion occurring between the first picture and a second picture; and
a first picture portion for a transformed first picture, wherein the transformed first picture is a transformation of the first picture using the transformation indicated by the metadata.
33. A method comprising:
accessing a decoded version of a transformation of a first picture;
accessing one or more decoded parameters indicating the transformation, wherein the transformation is based on motion between the first picture and a second picture and removes at least some motion occurring between the first picture and the second picture; and
inverse transforming the decoded transformation of the first picture to restore all or part of the motion.
34. The method of claim 33 wherein the first picture occurs after the second picture in a display order.
35. The method of claim 33 wherein the first picture and the second picture are part of a Group of Pictures.
36. The method of claim 33 wherein the motion is a motion that characterizes multiple objects that occur in the first picture and in the second picture.
37. The method of claim 33 wherein the motion is a motion that characterizes at least a portion of background in the first picture and the second picture.
38. The method of claim 33 wherein the motion includes non-translational motion.
39. The method of claim 33 further comprising:
accessing one or more decoded parameters indicating a previous size of the first picture; and
resizing the inverse transformed first picture to have the previous size.
40. The method of claim 33 further comprising:
decoding an encoded transformation of the first picture to produce the decoded version of the transformation of the first picture;
decoding an encoded version of the second picture to produce a decoded version of the second picture; and
decoding an encoding of the parameter to produce the decoded parameter.
41 . The method of claim 40 wherein:
decoding the encoded version of the second picture comprises decoding using intra-coded picture decoding; and
decoding the encoded transformation of the first picture comprises decoding the transformation of the first picture using the decoded second picture as a prediction reference picture.
42. The method of claim 33 wherein:
the first picture and the second picture are part of a Group of Pictures, the Group of Pictures including one or more additional pictures in addition to the first picture and the second picture, and
the method further comprises:
accessing decoded versions of transformations of the additional pictures; inverse transforming the decoded versions of the transformations of the additional pictures to restore previous coordinate systems for the additional pictures;
accessing one or more decoded parameters indicating a previous size of the second picture;
decoding an encoded version of the second picture using intra-coded picture decoding, to produce a decoded version of the second picture;
decoding an encoded transformation of the first picture using the decoded second picture as a prediction reference picture, to produce the decoded version of the transformation of the first picture; and
resizing the inverse transformed first picture, the decoded version of the second picture, and the inverse transformed additional pictures to have the previous size.
43. The method of claim 33 wherein the one or more decoded parameters indicating the transformation comprise one or more parameters describing an inverse of the transformation.
44. The method of claim 33 further comprising
decoding an encoded version of the second picture to produce a decoded second picture; and
decoding an encoded transformation of the first picture using the decoded second picture as a prediction reference picture, to produce the decoded version of the transformation of the first picture.
45. The method of claim 44 wherein the first picture occurs after the second picture in a display order.
46. The method of claim 33 wherein the transformation of the first picture changes a coordinate system of the first picture.
47. The method of claim 46 wherein the transformation of the first picture changes the coordinate system of the first picture to a coordinate system of the second picture.
48. An apparatus comprising one or more processors collectively configured for performing at least the following operations:
accessing a decoded version of a transformation of a first picture;
accessing one or more decoded parameters indicating the transformation, wherein the transformation is based on motion between the first picture and a second picture and removes at least some motion occurring between the first picture and the second picture; and
inverse transforming the decoded transformation of the first picture to restore all or part of the motion.
49. The apparatus of claim 48 further comprising a decoder for decoding the transformation of the first picture and the one or more decoded parameters, to produce the decoded version of the transformation of the first picture and the one or more decoded parameters.
50. A processor readable medium having stored thereon instructions for causing one or more processors to collectively perform:
accessing a decoded version of a transformation of a first picture;
accessing one or more decoded parameters indicating the transformation, wherein the transformation is based on motion between the first picture and a second picture and removes at least some motion occurring between the first picture and the second picture; and
inverse transforming the decoded transformation of the first picture to restore all or part of the motion.
51 . An apparatus comprising:
means for receiving a decoded version of a transformation of a first picture and for accessing one or more decoded parameters indicating the transformation, wherein the transformation is based on motion between the first picture and the second picture and removes at least some motion occurring between the first picture and the second picture; and
means for inverse transforming the decoded transformation of the first picture to restore all or part of the motion.
52. An apparatus comprising:
a demodulator configurable for demodulating a signal that includes an encoded version of a transformation of a first picture and one or more encoded parameters indicating the transformation, wherein the transformation is based on motion between the first picture and the second picture and removes at least some motion occurring between the first picture and the second picture;
a decoder configurable for decoding the encoded version of the transformation of the first picture and the one or more encoded parameters, to produce a decoded version of the transformation of the first picture and one or more decoded parameters; and
one or more processors collectively configured for inverse transforming the decoded version of the transformation of the first picture to restore all or part of the motion.
PCT/US2012/020888 2012-01-11 2012-01-11 Motion compensating transformation for video coding WO2013105946A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2012/020888 WO2013105946A1 (en) 2012-01-11 2012-01-11 Motion compensating transformation for video coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2012/020888 WO2013105946A1 (en) 2012-01-11 2012-01-11 Motion compensating transformation for video coding

Publications (1)

Publication Number Publication Date
WO2013105946A1 true WO2013105946A1 (en) 2013-07-18

Family

ID=45509759

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2012/020888 WO2013105946A1 (en) 2012-01-11 2012-01-11 Motion compensating transformation for video coding

Country Status (1)

Country Link
WO (1) WO2013105946A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011090790A1 (en) * 2010-01-22 2011-07-28 Thomson Licensing Methods and apparatus for sampling -based super resolution vido encoding and decoding
WO2012033962A2 (en) * 2010-09-10 2012-03-15 Thomson Licensing Methods and apparatus for encoding video signals using motion compensated example-based super-resolution for video compression

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011090790A1 (en) * 2010-01-22 2011-07-28 Thomson Licensing Methods and apparatus for sampling -based super resolution vido encoding and decoding
WO2012033962A2 (en) * 2010-09-10 2012-03-15 Thomson Licensing Methods and apparatus for encoding video signals using motion compensated example-based super-resolution for video compression

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"H.263+(or H.263 version 2), VIDEO CODING FOR LOW BITRATE COMMUNICATION", 9.2.1998,, no. H.263, version 2, 9 February 1998 (1998-02-09), XP030001506, ISSN: 0000-0509 *
DIRK FARIN ET AL: "Minimizing MPEG-4 sprite coding cost using multi-sprites", VISUAL COMMUNICATIONS AND IMAGE PROCESSING; SAN JOSE, 20 January 2004 (2004-01-20), XP030081289, *
EBRAHIMI T ET AL: "MPEG-4 NATURAL VIDEO CODING-AN OVERVIEW", SIGNAL PROCESSING. IMAGE COMMUNICATION, ELSEVIER SCIENCE PUBLISHERS, AMSTERDAM, NL, vol. 15, no. 4/05, 1 January 2000 (2000-01-01), pages 365-385, XP000961469, ISSN: 0923-5965, DOI: 10.1016/S0923-5965(99)00054-5 *
None

Similar Documents

Publication Publication Date Title
US9179153B2 (en) Refined depth map
US9565449B2 (en) Coding multiview video plus depth content
KR101617970B1 (en) Coding motion depth maps with depth range variation
CA2763887C (en) Image processing device and method
EP2512136B1 (en) Tiling in video encoding and decoding
KR101202630B1 (en) Fragmented reference in temporal compression for video coding
KR101617842B1 (en) Multi-view video coding with disparity estimation based on depth information
US10198792B2 (en) Method and devices for depth map processing
EP2774360B1 (en) Differential pulse code modulation intra prediction for high efficiency video coding
US9918108B2 (en) Image processing device and method
US8537200B2 (en) Depth map generation techniques for conversion of 2D video data to 3D video data
US9215460B2 (en) Apparatus and method of adaptive block filtering of target slice
JP2012529787A (en) Encoding 3D conversion information performed with two-dimensional video sequences (encodingofthree-dimensionalconversioninformationwithtwo-dimensionalvideosequence)
US20110038418A1 (en) Code of depth signal
US9872023B2 (en) Image processing apparatus and method
US20140009574A1 (en) Apparatus, a method and a computer program for video coding and decoding
RU2549168C1 (en) Slice header three-dimensional video extension for slice header prediction
EP2604036B1 (en) Multi-view signal codec
US9569819B2 (en) Coding of depth maps
US20120236934A1 (en) Signaling of multiview video plus depth content with a block-level 4-component structure
US20130142267A1 (en) Line memory reduction for video coding and decoding
US9602814B2 (en) Methods and apparatus for sampling-based super resolution video encoding and decoding
EP2777285B1 (en) Adaptive partition coding
US9661340B2 (en) Band separation filtering / inverse filtering for frame packing / unpacking higher resolution chroma sampling formats
US9979960B2 (en) Frame packing and unpacking between frames of chroma sampling formats with different chroma resolutions

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12700766

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct app. not ent. europ. phase

Ref document number: 12700766

Country of ref document: EP

Kind code of ref document: A1