WO2005078663A1

WO2005078663A1 - Improved method for motion adaptive transformation of video

Info

Publication number: WO2005078663A1
Application number: PCT/AU2005/000193
Authority: WO
Inventors: David Taubman; Nagita Mehrseresht
Original assignee: Newsouth Innovations Pty Limited
Priority date: 2004-02-17
Filing date: 2005-02-17
Publication date: 2005-08-25

Abstract

Disclosed is a content-adaptive method for motion compensated 3D wavelet transformation of video. Ghosting and non-aligned aliasing artefacts in prior known model failure are addressed. The transformation continuously adapts Between “t+2D” and “2D+t” structures based upon information within the compressed bit-stream. Ghosting artefacts are reduced by the transform selecting between different low-pass temporal filters based upon an estimate of the accuracy of the motion model. Motion compensated values used to produce high-pass temporal sub-band frames are adaptively attenuated in accordance with local estimates of the accuracy of the model at each of a number of spatial and temporal resolutions. The transform may achieve high compression efficiency whilst offering high quality reduced spatial and temporal resolutions sequences.

Description

Improved Method for Motion Adaptive Transformation of Video

Field of the Invention The present invention relates generally to a method and apparatus for efficient compression of motion video sequences, and particularly, but not exclusively, to a method and apparatus for implementing a motion adaptive transform for video, which facilitates production of fully scalable compressed representations of an original video sequence. Background of the Invention Scalability refers to the ability to decompress embedded video bit-streams to a particular spatio-temporal resolution and/or a desired quality (bit-rate), by retaining relevant subsets and discarding the others. Each subset must correspond to an efficient compression of the information which it represents and the reconstructed video must have comparable quality at the desired spatio-temporal resolution, as if it had been originally compressed at that spatio-temporal resolution and bit-rate. Recently a large amount of effort has been invested in fully scalable video coding. Interactive multimedia, video surveillance and video delivery over heterogeneous networks and/or in error prone environments can be named as some of the applications which stand to benefit from this research. The predictive feedback paradigm inherent in traditional video compression algorithms is incompatible with the requirement of highly scalable compression. Instead, the preferable paradigm is that of feed-forward compression, in which a spatio- temporal (3D) transform is followed by embedded quantization and coding. Due to the intrinsic multi-resolution structure of discrete wavelet transforms

(DWT), they have been widely used in scalable image and video codecs. Moreover, lifting factorizations exist for any two-channel FIR subband transform [ I. Daubechies and W. Sweldens, 'Tactoring wavelet transforms into lifting steps", 1996. [Online]. . Available http://cm.belMabs.com/who/wim/papers/factor/ ^[5]], and these provide a flexible framework for implementing reversible transforms with desirable properties. To efficiently exploit interframe redundancy, the transform must compensate for the motion between f ames. Without motion compensation, a 3D-DWT is a separable transform, and the order of spatial and temporal DWT stages can be interchanged without altering the final subbands. With the introduction of motion compensation, however, this commutative property is lost, as the DWT is not shift invariant. In fact, by changing the order of the spatial and motion compensated (MC) temporal DWT operations, a family of different MC 3D-DWT's may be formed. Spatial and temporal scalability is generally accomplished by partial DWT synthesis. In this process, unwanted high frequency spatio-temporal subbands from the MC 3D- DWT are simply discarded. While it seems natural to discard higher frequency information when reconstructing at a reduced resolution, it turns out that this can result in the appearance of visually disturbing artifacts in places where the motion model fails. Reasons for this are explored in [N. Mehrseresht and D. Taubman, "Adaptively eighted update steps in motion compesated lifting based scalable video compression," IEEE Int. Conf. Image Proα, pp. 771—774, 2003.^12]] and [--. "A flexible structure for fully scalable motion compensated 3d-DWT with emphasis on the impact of spatial scalability", submitted to IEEE Iran. Image Proc, 2004. ^P]] . Although advanced motion models are able to capture most of the actual scene activity, there are inevitably places (e.g., scene changes and occluded/uncovered regions) where the motion model must necessarily fail. When partially synthesizing a video sequence to obtain a reduced temporal resolution, the resulting sequence corresponds to that obtained by applying the low-pass temporal analysis filters through the motion trajectories of the underlying motion model. Ghosting artifacts can arise wherever these trajectories do not describe true scene motion. One way to avoid the appearance of these ghosting" artifacts is to use a temporal transform in which the low-pass temporal analysis filters" are impulsive, having only one tap, so that the low-pass temporal subbands are simply sub-sampled versions of the original video sequence. This approach has come to be known as Unconstrained Motion Compensated Temporal Filtering (UMCTF). However, the use of such trivial filters results in significant reductions in the full resolution compression efficiency. An alternative content-adaptive approach is described in ^f2]. It can be shown that MC 3D-DWT has higher energy compaction gain if the

MC temporal DWT (TDWT) is performed prior to the spatial DWT (SDWT); this structure is commonly known as the "t+2D " structure. Using the "t+2D" structure, however, may cause motion-failure artifacts in reduced spatial resolution sequences. Under spatial scaling, aliasing artifacts are generally unavoidable, as the wavelet analysis filters are not ideal. Somewhat less obvious, however, is the fact that using the "t+2D" structure causes spatial aliasing from temporally adjacent frames to appear at misaligned positions when the motion model fails '^■3]. The problem of misaligned spatial aliasing artifacts can be avoided altogether . by applying the MC TDWT to the spatial subbands generated by an initial spatial decomposition of the original video frames (in-bcmd MC TDWT), being careful not to use any iriformation from subbands at higher spatial frequencies. Structures of this form are commonly labelled 2D+t" [Y. Andreopoulos, M. Schaar, A Munteanu, J. Barbarien, P. Schelkens, and J. Cornelius, "Complete-to-overcomplete discrete wavelet transforms for scalable video coding withMCTF," SPIE Visual Comm. Image Proc, pp.719-731, 2003.^[41. Y. Andreopoulos, A Munteanu, J. Barbarien, M. N. der Schaar, J. Cornelius, and P. Schelkens "In-band motion compressed temporal filtering", Signal Processing: Image Communication, to appear¹⁵¹. T. Kimoto and Y. Miyamoto, "Multi-resolution motion compressed temporal filtering for 3d wavelet coding," ISO/IEC JTC1/SC29/WDI1 MI0569/S09, March 2004, Munich, Germany.^C6]. G. Baud, M. Duvanel, J. Reichel, and F. Ziliani, "VisioWave scalable video CODEC proposal," ISO/IEC JTC1/SC29/WGU M10569/S20, March 2004, Munich, Germany.¹⁷¹]. Unfortunately, however, such structures have a significant adverse affect on the compression efficiency of the transform.

Summary of the Invention

- In accordance with a first aspect the present invention provides a method for motion adaptive transformation of a sequence of video frames, wherein: (a) each video frame is subjected to a multi-resolution spatial transform; (b) each spatial resolution level is subjected to temporal transformation, using one or more motion compensated lifting steps, in which at least one lifting step in one spatial resolution level either updates a first subset of said frames using a weighted combination of motion compensated versions of a second subset of said frames, or updates the second subset of frames using a weighted combination of motion compensated versions of the frames in the first subset; and (c) at least one lifting step in one spatial resolution level blends motion compensated results produced at that level with motion compensated results produced at

^• a higher spatial resolution level. hi an embodiment, step (c) is carried out in an adaptive manner, depending on a determined effectiveness of a motion model which is used to implement the motion compensation. In an embodiment, a hierarchical motion model is employed, wherein the motion compensation operators which are used to implement a temporal lifting step each utilise a subset of the hierarchical motion model parameters, depending on the spatial resolution of the data which is operated on by the motion compensation operators, and the effectiveness of the motion model is determined based on the degree to which the motion model is used at said current resolution and said high resolution level are in agreement. ha an embodiment, the effectiveness of the motion model is determined based on the degree to which the motion model is deemed to be representative of the true scene motion. In an embodiment of the present invention, adaptive transform structures are advantageously provided which can substantially eliminate the appearance of motion failure artefacts at reduced spatial or temporal resolutions. An advantage of an embodiment of the present invention is the provision of content-adaptive methods for avoiding ghosting artefacts whilst preserving both high compression efficiency and spatial scalability. In accordance with a second aspect, the present invention provides a method for motion adaptive transformation of a sequence of video frames, where (a) each video frame is subjected to a multi-resolution spatial transform; (b) each spatial resolution level is subjected to temporal transformation, using at least two motion compensated hfting steps, in which at least one lifting step in one spatial resolution level either updates a first subset of said frames using a weighted combination of motion compensated versions of a second subset of said frames, or updates said second subset of frames using a weighted combination of motion compensated versions of the frames in said first subset, so as to produce high-pass and low-pass temporal subband frames; and (c) the update terms associated with Hfting steps which produce low-pass temporal subband frames are adaptively modified, based on the degree to which the motion model is deemed to be representative of the true scene motion. It is preferably an advantage of an embodiment of the present invention to provide content-adaptive methods for avoiding misaligned spatial aliasing artefacts, without sacrificing compression efficiency. In accordance with a third aspect, the present invention provides a method for motion adaptive transformation of a sequence of video frames, wherein a) each video frame is subjected to a multi-resolution spatial transform; (b) each spatial resolution level is subjected to temporal transformation, using one or more motion compensated lifting steps, in which at least one lifting step in one spatial resolution level either updates a first subset of said frames using a weighted combination of motion compensated versions of a second subset of said frames, or updates said second subset of frames using a weighted combination of motion compensated versions of the frames in said first subset; and (c) wherein the order of the spatial transformation and temporal transformation are adapted between t + 2D. and 2D + In accordance with a fourth aspect, the present invention provides an apparatus for motion adapted transformation of a sequence of video frames, including: (a) means for subjecting each video frame to a multi-resolution spatial transform; (b) means for subjecting each spatial resolution level to a temporal transformation, using one or more motion compensated lifting steps, in which at least one lifting step in one spatial resolution level either updates a first subset of said frames used a weighted combination of motion compensated versions of a second subset of said frames, or updates the second subset of frames using a weighted combination of motion compensated versions of the frames in the first subset; and (c) means for implementing at least one fining step in one spatial resolution level and blending motion compensated results produced at that level with motion compensated results produced at a higher spatial resolution level. In accordance with a fifth aspect, the present invention provides an apparatus for motion adaptive transformation of the sequence of video frames, including: (a) means for subjecting each video frame to a multi-resolution spatial transform; (b) means for subjecting each spatial resolution level to temporal transformation, using at least two motion compensated lifting steps, in which at least one lifting step in one spatial resolution level either updates a first subset of said frames using a weighted combination of motion compensated versions of a second subset of said frames, or updates said second subset of frames using a weighted combination of motion compensated versions of the frames in the first subset, so as to produce high- pass and low-pass temporal subband frames; and (c) means for adaptively modifying the update terms associated with hfting steps which produce low-pass temporal subband frames, based on a degree to which the motion model is deemed to be representative of the true scene motion. In accordance with a sixth aspect, the present invention provides an apparatus for motion adaptive transformation of a, sequence of video frames, including: (a) means for subjecting each video frame to a multi-resolution spatial transform; (b) means for subjecting each spatial resolution level to temporal transformation, using one or more motion compensated lifting steps, in which at least one lifting step in one spatial resolution level either updates the first subset of said frames using a weighted combination of motion compensated versions of a second subset of said frames, or update said second subset of frames using a weighted combination of motion compensated versions of the frames in said first subset; and (c) means for adapting the order of the spatial transformation and temporal transformation between t + 2D and 2D + 1. Brief Description of the Drawings Features and advantages of the present invention will become apparent from the following description of embodiments thereof, by way of example only, with reference to the accompanying drawings, in which: Figure 1 is a diagram illustrating "t+2D" analysis structure with two levels of spatial motion compensated temporal decomposition; Figure 2 is a diagram illustrating motion induced leakage between spatial resolution levels; Figure 3 illustrates a flexible "2D+t+2D" structure for MC3D-DWT in accordance with an embodiment of the present invention; Figure 4 is a diagram illustrating in-band motion compensated warping operations; Figure 5 illustrates adaptive temporal update Kfting steps within the "2D+T+2D" MC3D-DWT structure in accordance with an embodiment of the present invention; Figure 6 is a diagram illustrating compensating for motion with adaptive type II leakage compensation; Figure 7 illustrates a recursive embodiment of the adaptive multi-resolution motion compensation operator defined by equations (14,) and (15) described in the description; Figure 8 illustrates adaptive compensation for type //leakage between spatial resolution levels ( dashed lines - dot lines correspond to type / leakage compensation); Figure 9illustrates an alternative embodiment of adaptive type // leakage compensation to that shown in Figure 8 and Figure 10s a block diagram illustrating components of a video compression/decompression system incorporating an embodiment of the present invention.

Detailed Description of Preferred Embodiments Preferred embodiments of the present invention will now be described.

Embodiments of the present invention relate to improvements to the spatial-temporal wavelet transform which is utilised in a video compression system. Other elements of the video compression system (eg. quantisation and coding) maybe conventional. Much of the following description is therefore directed to the transform block only. For completion, however, a description is given in relation to Figure 9 of a compression/decompression system for video, incorporating a transform apparatus in accordance with an embodiment of the present invention. In the following a briefreview is given of motion compensated temporal wavelet transform with lifting, and a discussion of some of the problems associated with that, There follows a description of various structures for motion compensated three- dimensional wave transforms, and various embodiments of the present invention.

Review of the Lifting Based 5/3 MC Temporal DWT The idea of using motion compensated temporal DWT (MC TDWT) was . introduced originally by Ohm [K. Ohm, "Three-dimensional subband coding with motion compensation," IEEE Trans. Image Proc, vol.3, no. 5, pp. 559-571, Sept. 1994.^[8]] and then developed by Choi and Woods [S. Choi and J. Woods, "Motion compensated 3d subband coding of video," IEEE Trans. Image Proc, vol.8,^' pp. 155- 167, Feb. 1999.¹⁹¹]. A motion compensated lifting framework for TDWT is proposed in [B. Pesquet-Popescu and V. Bottreau, "Three dimensional lifting schemes for motion compensated video compression," IEEE Int. Conf. Acco st. Speech and Signal Proc, pp. 1793-1796, 2001.^[10] . V. Bottreau, M. Benetiere, B . Felts and B. Pesquet-Popescu, "A fully scalable 3d subband video codec," IEEE Int. Conf. Image Proc, pp. 1017- 1020, 2001.^[11] . L. Luo, J. Li, S. Li, Z. Zhang, and Y-Q. Zhang, "Motion compressed lifting wavelet and its application in video codinf," IEEE Int. Conf. on Multimedia and Expo, pp. 481-484, 2001.^{[1 ]} . A. Seeker and D. Taubman, "Motion-compensated highly scalable video compression using an adaptive 3d wavelet transform based on lifting," IEEE Int. Conf. Image Proc, pp. 1029-1032, 2001.^[131. A. Seeker and D. Taubman, "Highly scalable video compression using a lifting-based 3d wavelet transform with deformable mesh motion compensation," IEEE Int. Conf. Image Proc, pp. 749-752, 2002.^t141. A. Seeker andD. Taubman, "Lifting based invertible motion adaptive transform, LIMAT, framework for highly scalable video compression," IEEE Trans.

Image Proc, vol. 12, pp. 1530 — 1542, Dec. 2003.^[151.]. Temporal decomposition using any desired motion model and any desired wavelet kernel with finite support is possible using the lifting framework for MC TDWT. The results reported in ^{[13] [i4]} indicate a superior performance with the biorthogonal 5/3 wavelet kernel, compared to the conventional Haar wavelet transform. The results of research later accomplished by

Golwelkar and Woods [A. Golwelker and J. Woods, "Scalable video compression using longer motion compensated temporal filters," SPIE, Int. Symp. Visual comm.. and Image Proc, pp. 1406-1416, 2003. .^[I61] also confirm the superior performance of the MC 5/3 kernel. For the purpose of the present description, therefore, the 5/3 kernel shall be assumed. However, the invention may be applied to other temporal wavelet kernels. Let. Vt_ki→h (f_k ) denote a motion compensated mapping of frame ^k ι • onto the coordinate system of frame k₂ . Using this notation, the motion compensated lifting steps for 5/3 analysis can be expressed as hk

+ >V2AH-2→2fc-Kl(/2fc+2)l " '

It can be shown ^[151 that motion compensating the lifting steps effectively causes the temporal subband analysis filters to be applied along the motion trajectories induced by the motion compensation operators, W . Equation (1) is commonly known as the prediction step; it forms the high-pass temporal frames h _k , as the residuals left after bidirectional motion compensation of the odd indexed frames based on the even indexed frames. In regions where the motion model captures the actual motion, the energy in the high-pass frames will be close to zero. Motion model failure, however, may result in multiple edges and increased energy in the high-pass temporal frames. Equation (1) is commonly known as the update step. Its interpretation is not so immediate as that of the prediction step, but it serves to ensure that frame corresponds to a low-pass filtering of the input frame sequence along the motion trajectories using the transform's 5 tap low-pass analysis filter. Regardless of the motion model used for the IF operators, the temporal transform can be trivially inverted by reversing the order of the h^'fting steps and replacing addition with subtraction as follows: hk = h

+ W₂k+ι→2k( k)} (?)

keeping only the low-pass frames. To improve temporal scalability and exploit longer term temporal redundancy, more than one level of motion-compensated temporal filtering may be performed by subsequently applying the transform to the low-pass temporal frames. In many cases, compression performance also continues to improve as the number of decomposition levels grows to 4 or 5. When temporal filtering is effectively applied along the motion trajectories, the update steps help to reduce the noise and temporal aliasing in the reduced frame-rate sequences. However, there are places (e.g., scene changes and occluded/uncovered regions) where the motion model must necessarily fail. In these regions the low-pass temporal analysis filter is effectively being applied along invalid motion trajectories. When this happens, not only does the compression performance suffer, but the update steps also add ghosting to the low-pass temporal frames, significantly reducing their visual quality.

Structures for MC 3D-DWT A widespread strategy for building a complete 3D video transform is known as the "t+2D" structure. In this structure T levels of MC TDWT are first applied to the full spatial resolution video sequence, after which each temporal subband is subjected to S levels of 2D SDWT. Fig.l illustrates the "t+2D" analysis structure when 7" = 2 and S = 2 . Using the "t+2D" structure, temporal scalability is easily achieved by terminating MC TDWT synthesis at the desired temporal resolution. In doing so, all the spatio-temporal subbands corresponding to the higher frequency temporal subbands are , discarded. A form of spatial scalability can also be achieved by terminating the spatial synthesis process at the desired spatial resolution and applying MC temporal synthesis to the reduced spatial resolution frames. While the encoder compensates for motion using full resolution frames, the decoder imitates the motion compensation at reduced spatial resolution by appropriate scaling of the full spatial resolution motion parameters. In this document, we consistently use the term resolution level (whether spatial or temporal) to refer to the collection of additional subband(s) required to double the resolution available from previous levels, where the lowest resolution level contains just the low-pass (spatial or temporal) subbands. We use subscripts _t to identify spatial resolution levels so that f_{k s} refers to the subband(s) of frame f_k at spatial resolution level s. Similarly, we use superscripts ' to identify temporal resolution levels so that f_k refers to frame k at temporal resolution level t . Likewise, f_k'_t refers to frame k in the sequence of spatio-temporal subbands produced after * levels of SDWT decomposition and t levels of MC TDWT decomposition. Larger values for , or ' refer to lower frequency spatial or temporal resolution levels. We also use bold face to Tβfer to the collection of all spatial resolution levels corresponding to frame f_k; thus, f k is a vector of resolution levels so that f_k[s] = f_{k s} . Shifting a frame globally changes its phase content linearly. Different wavelet subbands, however, are not disjoint in the frequency domain, as the subband filters are not perfectly band-limited. The leakage of information from other frequency subbands to the desired subband is known as aliasing. Shifting a frame globally by a constant value generally changes the aliased information in each subband, so that all the spatial subbands of the original frame / , make a contribution to each subband of the motion compensated frame, as illustrated in Fig.2. We refer to the motion induced leakage from lower spatial resolution levels to higher resolution levels of the motion compensated frame as type I leakage (solid lines in Fig.2). Similarly, type II leakage (dashed lines in Fig.2) corresponds to the motion induced leakage from higher to lower resolution levels, as illustrated in Fig.2. The best prediction /, of a frame / , based on frame f_k , is W_k^ .\J_k) . It follows that the best prediction of a resolution level within frame f_j is found by taking the corresponding resolution level from W_{k^ \}f_k) . That is, we should, ideally, use the information from all resolution levels of fk when predicting any one resolution level of f. . This is exactly what happens in the "t+2D" structure. Under spatial scaling using the "t+2D" structure, however, the decoder cannot replicate the motion compensation operation of each subband as was performed by the encoder. We have found, therefore, that the scalable decoder cannot compensate for type II leakage during MC TDWT synthesis at reduced spatial resolution, since the required spatial subbands do not exist. These uncompensated contributions from higher spatial resolution levels of temporally adjacent frames will then appear in the reconstructed reduced spatial resolution sequence. Subject to perfect motion modeling, we have found that the type II leakage components can even cancel out some of the spatial aliasing which is produced naturally by the SDWT, reducing the total amount of aliasing power in the reduced spatial resolution sequence. When the motion model fails, however, we have found these aliasing components appear at misaligned spatial locations. We refer to these visually disturbing artifacts as misaligned spatial aliasing artifacts. If we apply the MC TDWT separately within each spatial resolution level (2D+t structure), the reconstructed reduced spatial resolution sequences will be the same as those produced by taking appropriate low-pass spatial subbands of each original video frame. On the other hand, energy compaction and hence compression efficiency will be substantially reduced in this case, since each resolution level in the prediction f_} of frame f_} is formed using only the information from the corresponding resolution level in f_k , rather than all the resolution levels in fk . More generally, if the MC TDWT is applied in such a way as to utilize only information from the same or lower spatial resolution levels in updating the samples for any given resolution level, within each temporal nfting step, we can be assured that the reconstructed reduced spatial resolution sequences will be the same as those produced by taking appropriate low-pass spatial subbands of each original video frame, but compression performance will suffer significantly relative to the full "t+2D" structure, since the information from higher resolution levels in fk cannot be used when predicting lower resolution levels in /} . A General Structure for 3D-DWT We begin by describing a structure which allows us to choose both the order and the number of spatial and MC temporal DWT decomposition stages. In this structure, P levels of SDWT decomposition are performed on the original video frames, yielding P + l spatial resolution levels; T levels of MC TDWT are then applied, to each spatial resolution level, followed by a further S-P levels of SDWT, which are applied to the temporal subbands of the lowest initial spatial resolution level. Using this structure, the original video sequence is eventually subjected to T levels of MC TDWT and S levels of SDWT, as shown in Fig.3. The value of P determines the number of levels of SDWT analysis which are performed prior to MC TDWT analysis. Choosing P = 0 simplifies the transform to the "t+2D" structure, whereas P = S yields a 2D+t structure. We use the notation Wk - j(fk) to refer to a motion compensated warping operation which separately maps all the spatial resolution levels corresponding to frame f_t onto the coordinate system of frame f , as illustrated in Fig. 4(a). Here, f_k denotes the vector whose elements are the P + l resolution levels of frame f_k . Using Wk - j(fk) , the motion compensation of each spatial resolution level is performed independently of the other spatial resolution levels. For the lowest spatial resolution level, f_{k P+l} , we implement motion compensation in the natural way, since these are baseband frames, scaling the motion parameters according to the spatial resolution of f_{k pn} . The remaining (p ≤ P) spatial resolution levels f_{k μ} , each consists of the three high-pass subbands, HL , LH _p and HH _p . We cannot directly compensate for motion within these high-pass subbands, since shifting high-pass subbands does not produce the same linear phase relationship as shifting a baseband signal. For these, we first synthesize the three high-pass subbands into a baseband frame; this baseband frame is then motion compensated and subjected to DWT analysis in order to recover the corresponding subbands in Wk - j(fk) . These operations are expressed in the following definition for Wk-j .

{w (ft))^;b. H AB O VV£j^" o.5(0, ffc[p]): .1 < p < P

Here, Wξ_^ . refers to a regular baseband motion compensated mapping, in which the full spatial resolution motion parameters are divided by 2^f. S L, H) denotes a single level of spatial DWT synthesis from LL subband L and three high- pass subbands collectively represented by ^" . Here L is taken as 0 since Wk j maps each resolution level independently. The operator A„ denotes a single stage of spatial subband analysis, returning only the three high-pass subbands. Using W we can trivially implement the motion compensated temporal transform in the "2D+t+2D" structure by replacing W with W in equations (1) and (2). That is,

= _ft ¹ + J-W₂&_ι_^2fc U^i) -W_2fc+1^_2fc hi)] (6) Here, 1^ and h'_k denote the set of all spatial resolution levels corresponding to the low-pass and high-pass temporal frame k after t levels of MC TDWT, with 1° = f_t . Obviously the temporal transform is readily inverted (synthesized) by reversing the order of lifting steps and replacing summation with subtraction. When P > 0, spatial scalability can be achieved in the first instance by discarding up to the first P spatial resolution levels from each of the vectors 1^ and h'_k . In this way, for resolution reduction factors up to 2^P , the low resolution video sequences will be identical to those obtained by extracting reduced spatial resolutions from the original video frames. Experimental results, however, indicate that by using W the compression efficiency drops significantly, even when the number of pre- temporal spatial decomposition levels P is small. If we use information from the higher frequency spatial resolution levels during the motion compensation of lower spatial resolution levels, such information will not be available when inverting the temporal transform at a reduced spatial resolution. These uncompensated contributions from higher frequency spatial resolution levels (type-π leakage) will then appear in the reconstructed reduced spatial resolution sequences. We can, however, use information from lower frequency spatial resolution ^' levels to compensate for type-I leakage. This does not interfere with the property that spatial scalability, with resolution reduction factors up to 2^P , produces video sequences identical to those obtained by directly reducing the spatial resolution of the original video frames. We refer to this as type I scalable leakage compensation and use W\_→J to denote the corresponding operator. Specifically, we define W^ - i as follows.

Here,

denotes P+l p levels of spatial DWT synthesis from the LL subband f P + 1] together with the lowest P-p levels of high-pass subbands. These operations are illustrated schematically in Fig.4-(b). Type-I leakage compensation may be employed for both prediction and update steps, although its benefits are more substantial in the prediction lifting steps.

Adaptive Update Steps for Ghosting Artifact Suppression Within the structure of Fig.3, temporal scalability can be obtained by discarding the higher frequency temporal resolution levels. This essentially yields a reduced frame-rate sequence whose frames have been obtained from the original frames by a form of temporal low-pass filtering along the motion trajectories described by the motion parameters. Where the motion model successfully describes the underlying scene motion activity, the reduced frame-rate sequences tend to have excellent visual quality, right down to very low frame rates (e.g., 1 frame/second). Where the motion model fails, however, temporal ghosting artifacts are commonly observed at reduced temporal resolutions. To avoid the appearance of ghosting artifacts in the low-pass temporal frames, a first aspect of the invention involves modifying the update lifting steps of equation (6) as follows. 4 ~ %k^l + ^fW^υ _fc__1→2fe ^• W_2k-.ι→2_k hi-ι) + S ._{lr βfc}.-W_afc I-Λ*04)1 (8)

Here,

- ,U which are used to dynamically weight the motion compensated high-pass temporal frame, in each spatial resolution level p . That is, the adaptive motion compensation within each spatial resolution level p is given by

4H - IS*¹ W +

^• (M%H*-»04)) M) C» For illustrative purposes, Fig.5 depicts the adaptively weighted update steps, for the case P - 2 .

In the preferred embodiment, the weights W", _J [p] are selected in the range 0 to 1 , based on a local estimate of the performance of the motion model. To capture the spatially varying nature of these weights, we write W"_ _t [p, a] for the update weighting factor at location n = [«,₅" , so that equation Q becomes

(10) In other embodiments, the motion compensation operator W t can be used in place of W , yielding

^■ itø,n] • ( u-.₂*( }) fca))

CH) In preferred embodiments, ," ^_2k[ρ ^~] and W^u _{+ 2i}[/?,n] are selected based on the energy in the corresponding motion-compensated high-pass temporal frames, W_%k ι - 2* 'U d of location n . In one specific embodiment, E_h' _M→2k[i] are obtained by applying a 5

_{* k}L _P,»] and. W _* i → «(Λ χj ,πl , reduced to lxl

case, the weighting factors are found from

where Q is a non-increasing function which maps local high-pass energy values to weights in the range 0 to 1 . The function Q may take many forms so long as small values of E produce values for Q(E) which are close fo l and large values of E produce values for Q(E) which are close to 0 . One suitable form for the function Q is

where β represents the smallest high-pass energy value which has any effect on the temporal update Ufting steps and α is a gain factor, which determines the high-pass energy at which temporal update lifting steps are effectively skipped- In the specific

applying a non-linear scalar transducer function to the motion compensated high-pass

samples,

^and • This is appealing from the perspective of implementation convenience. Although some specific embodiments have been mentioned above, it will be apparent to those skilled in the art that many variations on these embodiments should produce similar results. The temporal update hfting steps should be weighted using a locally adaptive weighting factor, which takes on values in the range 0 to I , based on local estimates of the motion modeling accuracy. The motion modeling accuracy in him may be inferred from a measure of the local magnitude of the high-pass temporal subband frames, in the same and/or lower spatial resolutions P . It is important to note that the weighting factors are not explicitly in the compressed bit-stream. Instead, they are inferred from the available high-pass temporal subband samples. In a complete scalable coding environment, the decoder's version of the high-pass temporal subband samples will generally be affected by quantization, in a manner which depends on the number of bits which were actually transmitted to the o decoder — something a scalable encoder cannot generally know ahead of time. As a result, the decoder may infer slightly different weights to those inferred by the encoder. However, in the places where such differences are substantial, the high-pass temporal subband samples are likely to quantized toward zero, so that errors in the weighting factors are less significant than one might at first expect. In any event, experimental 5 results confirm that the adaptive update weighting scheme can be particularly robust to quantization effects, while essentially eliminating the appearance of ghosting artifacts at reduced temporal resolutions.

Adaptive Prediction Steps for Efficient Compression without Misaligned Aliasing0 Artifacts As mentioned above, uncompensated type II leakage in reduced spatial resolution sequences is visually disturbing only when the motion model fails to capture the true motion. Although motion model failure generally happens infrequently and . only in small spatial regions, the structure described above avoids using higher 5 frequency spatial resolution levels for the motion compensation of lower resolution levels, regardless of the performance of the motion model. This has an unnecessarily detrimental effect on the compression efficiency. To resolve the problem of misaligned spatial aliasing artifacts, without sacrificing compression efficiency, a second aspect of the invention in one embodiment, 0 consists of a novel content-adaptive set of temporal prediction Hfting steps. The information from higher frequency spatial resolution levels is only used when motion modeling is successful, avoiding the use of this information when the motion model fails. In an embodiment of the invention, a local estimate of the motion accuracy is formed based on motion compensated samples which are already available to the 5 decoder (possibly with quantization error), and this estimate is used to weight the contributions from higher spatial frequency resolution levels. Fig.6 illustrates the general form for such an adaptive motion compensated warping operation. When the motion model captures the scene activity, weights α_; should be chosen close to 1, so as to take advantage of the information in higher frequency spatial subbands (type II leakage components). On the other hand, when the motion model fails, the weights should be chosen close to 0 . In this case the adaptive motion compensation operation eliminates all type II leakage components, thereby avoiding the possibility of non-aligned aliasing artifacts at reduced spatial resolutions. In practice, behavior between these two extremes is addressed by assigning various weights in the range 0 to l , based on an estimated likelihood of motion failure. We use the notation W ^t+u (fi) to refer to a motion compensated warping operation which uses information from « ( u ≤ P ) higher frequency spatial resolution levels (wherever they exist) and all lower frequency spatial resolution levels, when compensating each resolution level within f _k . This is illustrated in Fig. -(c) for the case u - 1 . Specifically, we define W^t+U as follows. °- -^'*& ^» V£3 S.< [?+ % - -, tklP+i -•«]}. ρ = P + X (w 3(4))M- A_ffXAέo .^' . tAΪeW_k→i -S(f_k[p +,1],.... ,i_kfp],..,. , {i\) l ≤ p ≤.u Here, Ai refers to a single stage of spatial subband analysis, returning only the LL subband. We note that when u - P , the resulting spatio-temporal subbands will be the same as if we had used the "t+2D" structure. Apparently, most of the inter-resolution information leakage occurs between adjacent resolution levels; therefore, some embodiments may choose to save computation cost by restricting the adaptive type II leakage compensation method to only use information from the next higher resolution level (i.e., u - I). In this case, the adaptive motion compensation operator can be written as

That is, the adaptive motion compensation within each spatial resolution level p is given by

(Wl '- ) Eft n] = W nrflPl ϋj ^• (W^.{f_fc).) p, n] + (1 - , rφ ■ (W )) &», n) (13). Here, we have included the spatial location n s [«ι,«z] , to make the spatially adaptive nature of the weighting process more explicit An alternate embodiment, which allows information from all higher spatial resolutions to be exploited, uses a recursive definition for the adaptive motion compensation operator. First, extend equation (7) to define an auxiliary LL subband for each motion compensated result. Specifically, define

(ws <«)w - I Ytl→jihWl) P = P + 1 and { A_H oW£ o${ξ_k[P+ l],...,f_k p]) l ≤p ≤ P (wlX fk)) p] = A_L°W~?_j ^oS(f_h{P + l},...,t_k. M for l ≤p ≤ P

The following recursion then provides the adaptive motion compensated result: (vt^ (&)) [1] = (w^C-T*)) [1] ; [highest resolution needs no adaptation] (w2_ ( )) [1]; [auxiliary low-pass component]

^■ ( _B O (Y pia,)) ~ l]) M + (1 ~ Wζ_→fr -]). (Wl_→fikj) [ , for p = 2, 3, ... , + 1 (14) (wjK (f*)) n) - W_k*_→j]p,n} ^■ ( ,o (M£* )) [p-1]) + (l - W_fc ^p _→ ,n]).

2,3_>...,P + l (15) Although these operations may seem complex, the recursive procedure can be implemented in a straightforward way, as suggested by Figure . The figure reveals the fact that (w ,(f )[p] and ^w*-Λ )[p] are typically implemented jointly by findmg all four subbands of the motion compensated frame corresponding to resolution levels through f_k[P + 1] — but this is the frame obtained by synthesizing the LL _P , HL P , LH p and HH _P subbands of fk . The adaptive temporal prediction steps are obtained by replacing with W* in equation (5), so that

To preserve the invertibility of the transform, it is important that the decompressor be able to recover the same weights used by the compressor, at least in the case where there are no quantization errors, so that it can reproduce the operators

used in equation (16), Certainly, this is possible if the weights employed by W£ ₊,(ls) and ₂"S_→2i+1(l₂K₂) depend only on lg and 1₂ . In the remainder of this section, we describe methods for selecting weights which preserve the invertibiUty of the overall transform. Fig.7 illustrates the details of one particular embodiment of the adaptive prediction steps, for the case P=2. In this figure, the dotted arrows represent type I leakage compensation, as required by W , dashed arrows identify the content-adaptive

information contained in f₂* and f_2jt+2 . To estimate the weights W _→J[p, n] , the performance of the motion model must be estimated at each spatial location, n . To avoid the additional cost of explicitly sending these weights, the weight estimation (WE) process should be repeatable by the decoder and it should be reasonably robust to quantization error. For this purpose, the invention uses a similar method to that described in Section for adaptively weighting the update steps, except that it uses the local energy derived from a motion prediction residual signal (instead of the high-pass temporal frames) to estimate the local performance of the motion model. For the method depicted in Fig. 7, one embodiment uses the local energy in the residual between the MC versions of f ₂* and f_2(t+2 • That is, we obtain

by taking a local average of the energy field, arc mapping operator, Q . In one preferred embodiment, a 7 * 7 moving average window is employed. As for adaptive update steps, the scalar mapping operator Q should take on values near 1 where the local energy of the prediction residual is close to 0 , and values near 0 where the local energy of the prediction residual is large. Inherent to such embodiments is the assumption that motion model failure is most likely to occur wherever the motion compensated versions of the previous frame f_2k and the subsequent frame f_2A .₂ significantly disagree. This assumption is somewhat pessimistic, since it penalizes both forward and backward MC operations, even if the motion model fails only in one of them. As an alternative to the above strategy, various embodiments may derive separate values for the weights W₂ ^p _i→2A+1 [p] and

■_> based on estimates of the accuracy of the corresponding individual motion fields. Specifically, such embodiments may use the local energy in the residual fe _2y+,( "))[p-_l]_ ₂*_+l[ _ⁱ] to estimate the weights

. Similarly, te₊₂-._2V+i ^'2* ^{+ 2}))IP-¹L _2t+,[p_l] may be used to estimate

- This method is suitable for any lifting-based temporal transform, including the conventional Haar wavelet kernel, since it does not rely on comparing the results produced by both forward and backward motion compensation operations. However, in this case, lower 5 spatial frequency information from f_2t+I cannot be used to improve the robustness of the estimator. Specifically, during temporal synthesis, the prediction step must be inverted by working downwards from the highest frequency spatial resolution level, since A._L

°S(0,f_2A[ ?-l]) is needed to synthesize f₂i₊₎[p] , as indicated in Fig. . Note that in this case the adaptive motion compensation operators W^→_z*_+ι (f * )

10 and

do not depend solely on f_2t and f_2i+2 , but also on f_2]l+, . Nevertheless, the adaptive prediction step of equation (16) can still be inverted, so long as we proceed downwards from high to low spatial resolution levels. Variations on the above theme should be obvious to those skilled in the art. The key threads running through the embodiments of interest is that the weights i5 W₂ ^p _i→2Jfc+1[p] and ₂ ^p _fc+2→2JH.₁[p] may be based on the disparity between motion compensated versions of f₂t , f₂*₊₂ and f_2t+1[p -l] . One interesting embodiment - selects identical weights for both W^P _λ→2i+1[/?] and ₂ ^p _t+2→.₂*₊ιli^:'] » based on a local average of the energy in ° ;h_fc ( - 1] = t _i &> i] ~ [P - fl] : (18)

This is illustrated in Figure 9. The advantage of this embodiment is that the weights are computed directly from the high-pass spatio-temporal subband samples produced at the next higher spatial resolution level. This avoids the need to compute

25 any additional motion-compensated results. Also, since the samples of h_k [p lj are directly subjected to quantization and coding, an embodiment may choose to derive weights using only selected bit-planes from an embedded representation of these samples, so as to maximize the robustness of the scheme to quantization errors. Although some specific embodiments have been mentioned above, it will be

30 apparent to those skilled in the art that many variations on these embodiments should . produce similar results. The temporal prediction hfting steps in preferred embodiments should adaptively combine the motion compensated result which can be obtained using only the same or lower spatial resolution levels, with the motion compensated result which can be obtained using higher spatial resolution levels, when forming a prediction 35 of one frame fj at any particular resolution level, from another frame fk . The weighting factors used to blend these different motion compensated estimates should take on values in the range 0 to 1 , based on local estimates of the motion modeling accuracy.

Adaptation Strategies for use with Multi-Resolution Motion Representations 1) Motion scaling vs. hierarchical motion: In the foregoing description, the concept of motion parameter scaling has been evoked to describe the mapping of motion parameters from one spatial resolution to another. The detailed implications of motion parameter scaling generally depend upon the motion modeling and parametrization methods selected for the preferred embodiment. In the simplest case, a frame-wide parametric motion model might be employed, in which case it is sufficient to scale the relevant parameters of the model, in accordance with the spatial resolution at which motion warping is required. More typically, block-based or deformable mesh motion models may be employed, in which case both the block/mesh size and the motion parameters may need to be scaled to match the spatial resolution at which motion warping is required. Of particular interest are hierarchical motion models, such as Hierarchical

Block Motion Compensation (HBMC). In this case, a hierarchical (coarse-to-fine) family of motion models is available for each pair of frames. The coarsest motion model might involve relatively large block/mesh dimensions (e.g., 64 x 64 ), with relatively few motion parameters, while the finest motion model might involve quite small block/mesh dimensions (e.g., 4x 4 ). The relationship between finer model elements (blocks/meshes) and coarser model elements is typically expressed using a parent-child metaphore. Key to the efficiency of such models is that most of the finer model elements are expected to have the same motion parameters as their parents, so that relatively few motion parameters need actually be signalled, and these can be concentrated around regions where the scene motion flow is most divergent. Such motion modeling and signalling schemes are well-known to those skilled in the art. In the case where an hierarchical motion model is available, the process of scaling the motion field to match the spatial resolution at which motion warping is required can be accomplished by selecting an appropriate scale from the hierarchical model. One advantage of this approach is that the motion compensated lifting steps associated with lower spatial resolution components, involve only a subset of the motion parameters from the complete hierarchical model. This, in turn, means that not all of the motion parameter information need be sent to the decoder, when reconstructing the video sequence at a reduced spatial resolution. 2) Incorporating Motion Divergence into the Adaptive Prediction Weights:

Hierarchical motion representations, such as those mentioned above, represent one of a number, of ways in which different motion models may be provided for different spatial resolutions, where the motion models associated with lower resolutions can have a reduced signalling cost. In this context, the adaptive motion compensation operators

W_A ^P _→7[/*] serve to blend motion-compensated results from higher resolution, generated using their higher quality motion model, into the temporal hfting steps for lower spatial resolutions. Again, the weights W _→ [p] should be selected so as to do this only when it is safe, from the perspective of reduced spatial resolution reconstruction. To do this, preferred embodiments should take account of the divergence between the motion vectors used at resolution level p and at those used at resolution level p -1 , when forming the blending weights,

. One simple and effective way to do this is to set ^w!fc→2fc+ι [Pi ^» W _fc→2A+1[ , n] = Q Ehzk+i b, n] + -Dofe+i (P, n]) (19)

Here,

is a local average, in the neighbourhood of location n , of the LH, HL and HH subband energies from h_k[p

— see equation (18); i)_M+1 p,nj is a scaled local average, in the neighbourhood of location π , of the squared error between the motion vectors used in the maps W^ι ' _+] and ₂^_2ιt+1 and the corresponding vectors used in the maps _2jfc+l and ^" ₂ ^μ^_2kH . The recursive formulation of W|_^ [fk] , according to equations (14) and (15), allows high quality motion compensated results to be drawn from the highest quality motion fields at the highest spatial resolution, even when performing the temporal lifting steps for much lower spatial resolutions, wherever the quality of the motion field is good (small

) and it is consistent across the resolutions in question (small Z>_M+I p,nJ ). In addition to motion vector divergence between resolutions, the local spatial divergence of a block-based motion model can also provide valuable information regarding the rehability of the motion field. In particular, the motion model can generally be considered less reliable in regions where the motion parameters differ significantly between neighbouring blocks or mesh elements. In various embodiments, this evidence of poor motion modeling can be incorporated into the adaptive weights,

, by including contributions from the spatial divergence of any or all of , ^~ ₂ ^p _k^_2M , W£-² _2t+l and ^~ ₂ ^p _k-₊l_2kH into the _A+1Lp,nj^' term of equation (19). 3) Motion Compensation Modes and Adaptive Prediction Weights: In many modern motion compensated video coders, the motion model includes mode parameters which are used to selectively disable motion compensation in one or moire of the potential directions. This is usually done on a block basis, so that some blocks may use only Vu u_+ι [f^2k] _> while others may use only

+ _] , when forming the high-pass temporal subband h_t . In this case, the contribution from the motion compensated samples which are actually used should be doubled. Still other blocks might use an intra-mode, wherein neither W₂ ^aS₂₄₊₁[ ^"2i] nor W₂ ^p _2→2t+][/2* ₊ 2] is used to produce h. . Where separate motion fields are available at different spatial resolutions p , it is important to accommodate the effects of different mode decisions within the adaptive prediction weights,

. It is also preferable to employ embodiments of the invention which estimate motion reliability from the high-pass subband samples, h_k [p lj , as discussed in connection with equation (18) and embodied in equation (19). This is because h_k [p lj is the motion compensated residual signal, which provides a usable model of the motion modeling accuracy regardless of the selected motion compensated prediction mode. By contrast, the motion-compensated disparity estimate of equation (17) is usable only in the event that full bi-directional motion is employed. In preferred embodiments of the invention, the motion divergence term

of equation (19) is adjusted to include the effect of motion mode divergence. For example,

should be augmented if only one of the two motion compensation operators is used at resolution level p (i.e., only one of W^-¹ ₁₊₁ and W₂ ^J, _{t I^ 2t+}, is used), yet both W _j-² _2Jt+1 and ^~ _2tιl_{→ 2M} are used at the higher resolution level p 1 , or vice-versa, in the neighbourhood of location n . If unidirectional motion compensated prediction is used in both resolution levels p _ 1 and . p , but with different directions, D_2(t+1 p,nj should be elevated to the point where the weights become 0 , since there are no motion vectors in common between the resolutions, from which to estimate motion divergence. If an intra-mode is selected in some region within some spatial resolution level p , the HH, LH and HL subband samples of

are being quantized and coded directly, since the lack of any motion compensation means that

f_{2 1}LpJ . This need not be the case for all spatial resolutions. In fact, it commonly happens that the lower resolutions can be well predicted, while higher resolution information is essentially unpredictable, using the motion model at hand. In view of this, very substantial artifacts can be produced under resolution scaling, unless the adaptive prediction weights are set to W^? __2/t+I[ >,n]

= 0 wherever an intra- mode (or any prediction mode with a similar effect) is used at resolution level p or P 1 . In summary, the adaptive prediction weights used in the definitions of ₂ ^a|_t!^P.₂j_t+1[ 2*] and

+ 2] should be set in a manner which is sensitive to both motion vector divergence and motion mode divergence across resolutions. 4) Motion Inversion Reliability and Adaptive Update Weights: In preferred embodiments of the invention, motion parameters are only explicitly estimated and communicated to the decompressor for the motion warping operations associated with temporal prediction. For temporal update steps, the explicitly signalled motion fields are inverted"; however, there are conditions under which a reliable inverse cannot be found. For example, with block-based motion models, it can easily happen that a forward motion warping operation Vt_{t→ }} does not use any of the sample values within some block in frame f_k when producing a motion compensated estimate of frame /_; . In this case, no information is available from which to reliably deduce the inverse mapping W _ _k in the neighbourhood of such blocks. This situation is easily accommodated by assigning an update weight of

0 to such regions, during adaptive update step processing. More generally, an additional weighting term may be introduced into equation (10), taking values in the range 0 to 1 , to reflect the confidence which can be placed in the motion model inversion process at different locations. Since the motion fields used at different resolution levels may be different, these confidence values depend upon p . In one particular embodiment, the confidence factor may be selected in a manner which is proportional to the number of samples within a particular block in the inverse motion model which are used by the explicitly signalled forward motion model and have similar implied motion parameters. The confidence factors derived in this way may be further subjected to an offset, so that a minimum number of pixels must be used by the forward motion field before the confidence can exceed zero in the inverse field. The confidence factors should be further subject to clipping so that they lie in the range 0 to 1.

Adaptive Inter-Resolution Blending for Temporal Update Steps Section IV-C described adaptive motion compensated warping operators .

resolution level p , the adaptive warping operator selectively blends motion compensated samples formed from higher resolution source frames, into the motion compensated samples formed using only the current and lower resolution levels, f_tlp],...,f_i[P + l] • By carefully selecting the blending weights, high compression efficiency can be maintained while avoiding the appearance of mis-aligned spatial aliasing artefacts when the video is reconstructed at reduced spatial resolution. As a further measure to avoid the appearance of mis-aligned aliasing artefacts at reduced resolution, Section described an adaptive temporal update step procedure, in which the update terms generated for resolution level p depend only on the temporal high-pass subband samples found at the same and lower resolutions, as expressed in equations (10) and (11). As with the prediction steps, though to a lesser extent, compression efficiency can be improved by replacing k__]→2il (hk - I) and W^_t+,_→2t ( **) i i equation (11) with adaptive motion compensation operators ^_2k- →_2k {h* - 0 ^and VΪZJ_M→_ZI (&*) ». capable of using higher frequency spatial subbands from li_t j and h_k , wherever it is safe to do so. Again, we use the term safe to refer to motion compensated temporal lifting steps which avoid the risk of significant misaligned aliasing artifacts when the compressed video is reconstructed at a reduced spatial resolution. According to an embodiment ofthe invention, safe blending of higher resolution information into lower resolution temporal update terms may be achieved by employing adaptive motion compensated operators

, having the same form as those described by equation (13), or by the recursive equations (14) and (15), except that the adaptive inter-resolution blending factors are different - call them W™ _{l→ ik}[p,^~\ and _{2k+ 2k}[p,ιι] here. In the preferred embodiment of this aspect of the invention, ^~Ϋ™ _ik[p,n] and

are based entirely on the divergence ofthe motion model between resolution level p and resolution level p 1 . The motion fields in question here are those associated with W₂'-|_ _2k and _M+!-.₂- (resolution level p ) and W. - __2i and W ²__2t (resolution level p 1 ). As mentioned above, these are typically obtained by inverting the motion parameters used in the corresponding prediction Ufting steps, and this inversion process can produce unreliable results. In the preceding section, we associated a confidence factor, in the range 0 to 1 , with each block in a block-based representation for the inverted motion operators, ₂ ^p _t-|_{- 2t} and Vt₂ ^p _k-_{l→ 2k} . In preferred embodiments of the invention, _j"² ,„ _2k[p ^~] should be chosen close to zero wherever either the confidence factors or the motion vectors associated with location n differ substantially between ^" _2k \,_2k and W^ ²^ . Similarly, W₂ ^U2 _2t[ ?,n] should be set close to zero wherever the confidence factors or the motion vectors associated with location n differ substantially • between Wfti→j_A and W _ft ^~2,_→2Jt . There are any number of ways to specifically realize suitable blending weights, based on these principles, as should be apparent to those skilled in the art.

In-band Motion Compensation In the foregoing development, motion compensation of high-pass subband samples has been described in terms ofthe concatenation of a spatial synthesis operation to map the high-pass subbands within a resolution level into a baseband frame, followed by motion compensation and then spatial analysis to map the motion compensated result back into the subband domain. While many embodiments may perform the processing in this way, it should be evident to those skilled in the art that the three steps can be collapsed into a single linear operator. This type of processing is advocated in [sp-Anreopoulos-VCP][Anreopouloschelkensournal], for example. This so-called in-band processing has one important advantage for block-based motion models. Specifically, the block-to-block motion field discontinuity is experienced in the subband domain, rather than the image domain. As a result, blocking artifacts do not appear in the image domain, and the spatio-temporal subbands also exhibit better energy compaction properties. In one particular embodiment of in-band motion compensation, the motion compensated warping operators ^~ _{k→ J} and Vi_{k→ J} depicted in Figure are implemented by means of a composite linear operator, which directly produces the motion compensated subbands, HL_p , LH _P , HH _P and LL _P , for frame j from the corresponding input subbands in frame k . In particular, one such operator is defined for each shift v , the samples of each output subband are partitioned into blocks, in ' accordance with the motion model, and the motion compensated subband samples belonging to each block b. are generated using the composite linear operator corresponding to a shift of ₆ , where v₆ is the motion vector for block b . In tins embodiment, the composite linear operator associated with shift v may be derived by composing the operations of spatial DWT synthesis, spatial-domain shifting (with interpolation) by v , and spatial DWT analysis. Noting that all three of these are separable two-dimensional operators, the composite linear operator is also separable. Conventional windowing techniques may be used to reduce the operator's region of support. In a preferred embodiment of in-band motion compensation, the motion compensated warping operators vt_k→J and Vt_{k→ J} depicted in Figure are implemented by first subjecting the relevant source subbands of frame k to one level of DWT synthesis, and then applying a composite linear operator to this synthesized source frame, so as to produce the motion compensated subbands HL _P , LH _P , HH _P and LL for frame j . Again, different composite linear operators are defined for each shift v , and the samples of each output subband are partitioned into blocks, in accordance with the motion model. The motion compensated subband samples belonging to each block b are generated using the composite linear operator corresponding to a shift of v_b , where (, is the motion vector for block b . In this embodiment, the composite linear operator associated with shift v is derived by composing only two operations: spatial domain shifting (with interpolation) by v ; and spatial DWT analysis. The resulting composite linear operators have a smaller region of support than those obtained in the previous embodiment, so that less aggressive windowing is required to achieve a given level of computational complexity. As before, the operators are all separable. The motion adaptive transformation of embodiments ofthe present invention may be implemented, as will be appreciated by a skilled person, by appropriate computing systems, including appropriate computer hardware and computer software. Any suitable architecture may be utilised. A client/server architecture is convenient for compression and transmission of compressed video (server) and decompression and display of received video (client). Other architectures may be utilised. Figure 10 illustrates in block diagram form a code/decoder arrangement for compressing and decompressing video frames, utilising a transform process and apparatus in accordance with an embodiment ofthe present invention. The diagram is schematic and in block form. It will be appreciated, however, that the arrangement can be implemented by the appropriate computer hardware and software. The compressor section, generally designated by reference numeral 100 may be implemented by an appropriate server system, for example, while the decompressor section, generally designated by reference numeral 101 may be implemented by an appropriate computer client system e.g. PC with appropriate software. The compressed bit stream 102 may be transmitted via any communications media 103, the Internet being one example. The compressor 100 includes the transform block 104 which receives the video and implements a transform as described in this specification above. Coding is then implemented, blocks 105, 106, blocks 107, 108 and lead to the formation ofthe bit stream 102. The compressed bit stream 102 is transmitted via communications medium 103 to the decompressor 101. Various inverse steps to compression are implemented by blocks 109, 110, 111 and 112. The inverse wavelet transform as . described in the present specification is implemented in block 113 in order to recover the sequence of video frames. It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the invention as shown in the specific embodiments without departing from the spirit or scope ofthe invention as broadly described. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive.

Claims

CLAIMS:

1. A method for motion adaptive transformation of a sequence of video frames, wherein: (a) each video frame is subjected to a multi-resolution spatial transform; (b) each spatial resolution level is subjected to temporal transformation, using one or more motion compensated lifting steps, in which at least one hfting step in one spatial resolution level either updates a first subset of said frames using a weighted combination of motion compensated versions of a second subset of said frames, or updates the second subset of frames using a weighted combination of motion compensated versions ofthe frames in the first subset; and (c) at least one lifting step in one spatial resolution level blends motion compensated results produced at that level with motion compensated results produced at a higher sp atiai resolution level.

2. A method in accordance with claim 1 , wherein step (c) is carried out in an adaptive manner, depending on a determined effectiveness of a motion model which is used to implement the motion compensation.

3. A method in accordance with claim 2, wherein a hierarchical motion model is employed, wherein the motion compensation operators which are used to implement a temporal lifting step each utilise a subset ofthe hierarchical motion model parameters, depending on the spatial resolution ofthe data which is operated on by the motion compensation operators, and the effectiveness ofthe motion model is determined based on the degree to which the motion models used at said current resolution and said higher resolution level are in agreement.

4. A method according to claim 3, wherein the degree to which the motion models used at said current resolution level and said higher resolution level are in agreement is determined by, or partially influenced by, the difference between their respective motion vectors.

5. A method according to claim 3 or claim 4, wherein the degree to which the motion models used at said current resolution level and said higher resolution level are in agreement is determined by, or partially influenced by, explicitly signalled mode flags.

6. A method according to claims 3, 4 or 5, wherein the degree to which the motion models used at said current resolution level and said higher resolution level are in agreement is determined by, or partially influenced by, any differences between their respective motion modes.

7. A method according to any one of claims 4, 5 or_.6, wherein numerical measures of said motion vector divergences or motion mode differences are accumulated over a small sliding window, the result being used to drive the adaptive blending of higher resolution motion compensated results into those formed at said current resolution during temporal lifting steps.

8. A method according to any one of claims 3 to 7, in which the blending of said higher resolution motion compensated results into those formed at said current resolution is also influenced, within at least one of the lifting steps, by the degree to which the motion model associated with said current resolution is deemed to be representative ofthe true scene motion.

9. A method in accordance with claim 2, wherein the effectiveness ofthe motion model is determined based on the degree to which the motion model is deemed to be representative ofthe true scene motion.

10. A method according to claim 8 or claim 9, in which the degree to which the motion model is deemed to be representative of true scene motion is determined by, or partially influenced by, the difference between the motion compensated version of said higher resolution level data, including any or all resolution levels up to said higher resolution level, and the higher resolution level data associated with the frame which is being updated by the lifting step in question.

11. A method according to claim 8, 9 or 10, in which adaptive lifting steps involve two motion compensated frames, and the degree to which the motion model is deemed to be representative ofthe true scene motion is determined by, or partially influenced by, the difference between these two motion compensated frames, including any or all resolution levels up to said higher resolution level.

12. A method according to any one of claims 8 to 11 , in which the degree to which the motion model is deemed to be representative ofthe true scene is determined by, or partially influenced by, the updated result produced by the same lifting step at a higher resolution level.

13. A method according to claim 10, 11 or 12, in which squared or absolute differences between the samples of said motion compensated frames and/or said higher resolution updated results are accumulated over a small sliding window, the result being used to drive the adaptive utilization of higher spatial resolution information during temporal lifting steps.

14. A method according to claim 10 to claim 13, wherein the degree to which the motion model is deemed to be representative of true scene motion is determined by, or partially influenced by, the degree of local motion divergence signalled by the motion model or motion models.

15. A method according to claim 10 to claim 14, wherein the degree to which the motion model is deemed to be representative of true scene motion is determined by, or partially influenced by, explicitly signalled motion mode flags.

16. A method according to any one of claims 9 to claim 15, in which hierarchical motion models are employed, and the motion compensation operators which are used to implement a temporal lifting step, each utilize a subset ofthe hierarchical motion model parameters, depending on the spatial resolution ofthe data which is operated on by said motion compensation operators.

17. A method according to any one of the preceding claims, in which the multi- resolution spatial transform is a discrete wavelet transform.

18. A method according to any one ofthe preceding claims, in which information from lower spatial resolutions is exploited to improve motion compensated prediction in one or more ofthe temporal lifting steps.

19. A method according to any one ofthe preceding claims, in which the motion compensation within one or more ofthe temporal lifting steps is performed by first synthesizing the spatial resolution level's samples into a single baseband frame, motion compensating the baseband frame, and then applying the spatial transform to recover the motion compensated samples at the relevant spatial resolution.

20. A method according to any one ofthe preceding claims, in which the motion compensation within one or more ofthe temporal lifting steps is performed directly in the subband domain, by applying operators whose function replicates or approximates that described in claim 19. ^• 21. A method according to any one ofthe preceding claims, wherein the first subset of said frames comprises odd indexed frames and the second subset of said frames comprises even indexed frames. 22. A method for motion adaptive transformation of a sequence of video frames, where (a) each video frame is subjected to a multi-resolution spatial transform; (b) each spatial resolution level is subjected to temporal transformation, using at least two motion compensated lifting steps, in which at least one lifting step in one spatial resolution level either updates a first subset of said frames using a weighted combination of motion compensated versions of a second subset of said frames, or updates said second subset of frames using a weighted combination of motion compensated versions ofthe frames in said first subset, so as to produce high-pass and low-pass temporal subband frames; and (c) the update terms associated with lifting steps which produce low-pass temporal subband frames are adaptively modified, based on the degree to which the motion model is deemed to be representative ofthe true scene motion.

23. A method according to claim 22, wherein the first subset of said frames comprises odd indexed frames and the second subset of said frames comprises even indexed frames.

24. A method according to claim 22 or 23, combined with the methods described in any ofthe preceding claims.

25. A method according to any one of claims 22 to 24, in which the adaptively weighted hfting steps which produce low-pass temporal subband frames involve one or more motion compensated high-pass temporal subband frames. 26. A method according to claim 25, in which an energy estimate of said motion compensated high-pass temporal subband frames is formed within, a sliding window, based on any or all ofthe resolution levels up to that associated with the adaptive update lifting step in question, and the result ofthe sliding window energy analysis is sued to drive, or partially drive, the adaptive utilization of high-pass temporal subband data in the adaptive hfting step.

27. A method according to claim 26, in which said energy estimate is found by accumulating squared high-pass temporal subband samples within the window.

28. A method according to claim 26 or claim 27, in which said energy estimate is found by accumulating absolute values of high-pass temporal subband samples within the window.

29. A method according to claim 25, in which the motion compensated high-pass temporal subband samples contribute to said low-pass temporal subband frames through a non-linear transducer function, whose role is to attenuate the contribution associated with high-pass temporal subband samples with relatively large magnitudes.

30. A method according to any one of claims 22 to 29, wherein the degree to which the motion model is deemed to be representative of true scene motion is determined by, or partially influenced by, the degree of local motion divergence signalled by the motion model.

31. A method according to any one of claims 22 to 29, wherein the degree to .which the motion model is deemed to be representative of true scene motion is determined by, or partially influenced by, explicitly signalled motion mode flags.

32. A method for inverting the transformation produced by any of the methods in claims 1 to 31, so as to recover a sequence of video frames, involving the steps of: (a) inverting each individual motion compensated lifting step; (b) applying said inverted motion compensated lifting steps in the opposite order to that in which the corresponding lifting steps were performed in the forward transform; and (c) inverting the multi-resolution spatial transform.

33. A method. for motion adaptive transformation of a sequence of video frames, wherein a) each video frame is subjected to a multi-resolution spatial transform; (b) each spatial resolution level is subjected to temporal transformation, using one or more motion compensated lifting steps, in which at least one lifting step in one spatial resolution level either updates a first subset of said frames using a weighted combination of motion compensated versions of a second subset of said frames, or updates said second subset of frames using a weighted combination of motion compensated versions ofthe frames in said first subset; and (c) wherein the order of the spatial transformation and temporal transformation are adapted between t + 2D and 2D + 1.

34. A method in accordance with claim 33, wherein the adaptation between t + 2D and 2D + 1 is carried out depending on information available within the compressed bit stream.

35. A method of recovering a sequence of video frames produced by a compression method including the transformation of claims 33 and 34, including the steps of inverting the transformation.

36. An apparatus for motion adapted transformation of a sequence of video frames, including: (a) means for subjecting each video frame to a multi-resolution spatial transform; (b) means for subjecting each spatial resolution level to a temporal transformation, using one or more motion compensated lifting steps, in which at least one lifting step in one spatial resolution level either updates a first subset of said frames used a weighted combination of motion compensated versions of a second subset of said frames, or updates the second subset of frames using a weighted combination of motion compensated versions ofthe frames in the first subset; and (c) • means for implementing at least one hfting step in one spatial resolution level and blending motion compensated results produced at that level with motion compensated results produced at a higher spatial resolution level.

37. An apparatus in accordance with claim 36, wherein the means for implementing the lifting step with blending is arranged to carry out the step in an adaptive manner, depending on a determined effectiveness of a motion model which is used to implement the motion compensation.

38. An apparatus in accordance with claim 36, wherein a hierarchical motion model is employed, wherein the motion compensation operators which are used to implement a temporal lifting step each utilise a subset ofthe hierarchical motion model parameters, depending on the spatial resolution ofthe data which is operated on by the • motion compensation operators, and determination means is arranged to determine the effectiveness ofthe motion model based on the degree to which the motion models used at said current resolution and said higher resolution level are in agreement.

39. An apparatus in accordance with claim 38, wherein the degree to which the motion models used at said current resolution level and said higher resolution level are in agreement is determined by, or partially influenced by, the difference between their resp ective motion vectors.

40. An apparatus according to claim 38 or claim 39, wherein the degree to which the motion models used at said current resolution level and said higher resolution level are in agreement is determined by, or partially influenced by, explicitly signalled mode flags.

41. An apparatus according to claim 38, 39 or 40, wherein the degree to which the motion models used at said current resolution level and said higher resolution level are in agreement is determined by, or partially influenced by, any differences between their respective motion modes.

42. An apparatus in accordance with any one of claims 39, 40 or 41 , wherein numerical measures of said motion, vector divergences or motion mode differences are accumulated over a small sliding window, the result being used to drive the adaptive blending of higher resolution motion compensated results into those formed at said current resolution during temporal lifting steps.

^' 43. An apparatus in accordance with any one of claims 38 to 42, in which the blending of said higher resolution motion compensated results into those formed at said current resolution is also influenced, within at least one ofthe lifting steps, by the degree to which the motion model associated with said current resolution is deemed to be representative ofthe true scene motion.

44. An apparatus in accordance with claim 37, wherein determination means is arranged to determine the effectiveness ofthe motion model based on the degree to which the motion model is deemed to be representative ofthe true scene motion.

45. An apparatus in accordance with claim 43 or claim 44, in which the degree to which the motion model is deemed to be representative of true scene motion is determined by, or partially influenced by, the difference between the motion compensated version of said higher resolution level data, including any or all resolution levels up to said higher resolution level, and the higher resolution level data associated with the frame which is being updated by the lifting step in question.

46. . An apparatus in accordance with claims 43, 44, or 45, in which adaptive lifting steps involve two motion compensated frames, and the degree to which the motion model is deemed to be representative of the true scene motion is determined by, or partially influenced by, the difference between these two motion compensated frames, including any or all resolution levels up to said higher resolution level.

47. An apparatus in accordance with any one of claims 43 to 46, , in which the degree to which the motion model is deemed to be representative ofthe true scene is determined by, or partially influenced by, the updated result produced by the same lifting step at a higher resolution level.

48. An apparatus in accordance with claims 45, 46, or 47, in which accumulation means are provided to accumulate squared or absolute differences between the samples of said motion compensated frames and/or higher resolution updated results over a small sliding window, the result being used to drive the adaptive utilisation of higher spatial resolution information during temporal lifting steps.

49. An apparatus in accordance with any one of claims 45 to 48, wherein the degree to which the motion model is deemed to be representative of true scene motion is determined by, or partially influenced by, the degree of local motion divergence signalled by the motion model or motion models.

50. An apparatus in accordance with any one of claims 43 to 49, wherein the degree to which the motion model is deemed to be representative of true scene motion is determined by, or partially influenced by, explicitly signalled motion mode flags. 51. An apparatus in accordance with any one of claims 44 to 50, in which hierarchical motion models are employed, and the motion compensation operators which are used to implement a temporal lifting step, each utilize a subset ofthe - 3.6 - hierarchical motion model parameters, depending on the spatial resolution ofthe data which is operated on by said motion compensation operators.

52. An apparatus in accordance with any one of claims 36 to 51, in which the multi-resolution spatial transform is a discrete wavelet transform.

53. An apparatus in accordance with any one of claims 36 to 52, in which information from lower spatial resolutions is exploited to improve motion compensated prediction in one or more ofthe temporal hfting steps.

54. An apparatus in accordance with any one of claims 36 to 53, in which the motion compensation within one or more of the temporal lifting steps is performed by first synthesizing the spatial resolution level's samples into a single baseband frame, motion compensating the baseband frame, and then applying the spatial transform to recover the motion compensated samples at the relevant spatial resolution.

55. An apparatus in accordance with any one of claims 36 to 54, in which the motion compensation within one or more ofthe temporal lifting steps is performed directly in the subband domain, by applying operators whose function replicates or approximates that described in claim 19.

56. An apparatus in accordance with any one of claims 36 to 55, wherein the first subset of said frames comprises odd indexed frames and the second subset of said frames comprises even indexed frames.

57. An apparatus for motion adaptive transformation ofthe sequence of video frames, including: (a) means for subjecting each video frame to a multi-resolution spatial transform; (b) means for subjecting each spatial resolution level to temporal transformation, using at least two motion compensated hfting steps, in which at least one hfting step in one spatial resolution level either updates a first subset of said frames using a weighted combination of motion compensated versions of a second subset of said frames, or updates said second subset of frames using a weighted combination of motion compensated versions ofthe frames in the first subset, so as to produce high- pass and low-pass temporal subband frames; and (c) means for adaptively modifying the update terms associated with lifting steps which produce low-pass temporal subband frames, based on a degree to which the motion model is deemed to be representative ofthe true scene motion.

58. An apparatus in accordance with claim 57, wherein the first subset of said frames comprises odd indexed frames and the second subset of said frames comprises even indexed frames.

59. An apparatus in accordance with claim 57 or claim 58, in which the adaptively weighted lifting steps which produce low-pass temporal subband frames involve one or more motion compensated high-pass temporal subband frames.

60. An apparatus in accordance with claim 59, further including means for forming an energy estimate of said motion compensated high-pass temporal subband frames within a sliding window, based on any or all ofthe resolution levels up to that associated with the adaptive update hfting step in question, and wherein the result ofthe sHding window energy analysis is used to drive, or partially drive, the adaptive utilisation of high-pass temporal subband data in the adaptive hfting step. 61. An apparatus in accordance with claim 60, in which said energy estimate is found by accumulating squared high-pass temporal subband samples within the window.

62. An apparatus in accordance with claim 60 or 61, in which said energy estimate is found by accumulating absolute values of high-pass temporal subband samples within the window.

63. An apparatus in accordance with claim 59, in which the motion compensated high-pass temporal subband samples contribute to said low-pass temporal subband • frames through a non-linear transducer function, whose role is to attenuate the contribution associated with high-pass temporal subband samples with relatively large magnitudes.

64. An apparatus in accordance with any one of claims 57 to 63 , wherein the degree to which the motion model is deemed to be representative of true scene motion is determined by, or partially influenced by, the degree of local motion divergence signalled by the motion model.

65. An apparatus in accordance with any one of claims 57 to 64, wherein the degree to which the motion model is deemed to be representative of true scene motion is determined by, or partially influenced by, explicitly signalled motion mode flags.

66. An apparatus for motion adaptive transformation of a sequence of video frames, including: (a) means for subjecting each video frame to a multi-resolution spatial transform; (b) means for subjecting each spatial resolution level to temporal transformation, using one or more motion compensated lifting steps, in which at least one lifting step in one spatial resolution level either updates the first subset of said frames using a weighted combination of motion compensated versions of a second subset of said frames, or update said second subset of frames using a weighted combination of motion compensated versions ofthe frames in said first subset; and (c) means for adapting the order of the spatial transformation and temporal transformation between t + 2D and 2D + 1.

67. An apparatus in accordance with claim 67, where the adaptation between t + 2D and 2D + 1 is carried out depending on information available within the compressed bit stream.

68. An apparatus for inverting the transformation produced by the apparatus of any one of claims 36 to 67, the apparatus comprising means for inversing the transformation, so as to recover a sequence of video frames.

70. A method for scalable compression a video signal, including a method for motion adaptive transformation in accordance with any one of claims 1 to 35.

71. An apparatus for scalable compression a video signal, including an apparatus for implementing a motion adaptive transformation in accordance with any one claims 36 to 68.

72. A computer programme including instructions for controlling a computer to implement a method in accordance with any one of claims 1 to 35.

73. A computer readable medium including a computer programme in accordance with claim 72.