WO2004036919A1 - Fully scalable 3-d overcomplete wavelet video coding using adaptive motion compensated temporal filtering - Google Patents

Fully scalable 3-d overcomplete wavelet video coding using adaptive motion compensated temporal filtering Download PDF

Info

Publication number
WO2004036919A1
WO2004036919A1 PCT/IB2003/004452 IB0304452W WO2004036919A1 WO 2004036919 A1 WO2004036919 A1 WO 2004036919A1 IB 0304452 W IB0304452 W IB 0304452W WO 2004036919 A1 WO2004036919 A1 WO 2004036919A1
Authority
WO
WIPO (PCT)
Prior art keywords
sub
motion compensated
band
signal
temporal filtering
Prior art date
Application number
PCT/IB2003/004452
Other languages
French (fr)
Inventor
Jong Chul Ye
Mihaela Van Der Schaar
Original Assignee
Koninklijke Philips Electronics N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V. filed Critical Koninklijke Philips Electronics N.V.
Priority to EP03808830A priority Critical patent/EP1554887A1/en
Priority to AU2003264804A priority patent/AU2003264804A1/en
Priority to US10/531,195 priority patent/US20060008000A1/en
Priority to JP2005501325A priority patent/JP2006503518A/en
Publication of WO2004036919A1 publication Critical patent/WO2004036919A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • H04N19/615Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding using motion compensated temporal filtering [MCTF]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/1883Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit relating to sub-band structure, e.g. hierarchical level, directional tree, e.g. low-high [LH], high-low [HL], high-high [HH]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/523Motion estimation or motion compensation with sub-pixel accuracy
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/547Motion estimation performed in a transform domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/63Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/63Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
    • H04N19/635Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets characterised by filter definition or implementation details
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]

Definitions

  • the present invention relates to video compression, and more particularly to overcomplete wavelet video coding using adaptive motion compensated temporal filtering.
  • Wavelet-based scalable video coding schemes permit great flexibility in terms of the different scalability types allowed. Hence, they are especially useful for video transmission over heterogeneous wireless and wired networks, to various devices with different capabilities.
  • overcomplete wavelet the spatial wavelet transform for each frame is performed first, followed by exploitation of interframe redundancy by predicting the wavelet coefficient values, or by defining temporal contexts in entropy coding.
  • interframe wavelet video coding wavelet filtering is performed along the temporal axis followed by a 2D spatial wavelet transform.
  • MCTF motion compensated temporal filtering
  • SDMCTF spatial domain MCTF
  • the quality of the matches provided by the motion estimation algorithm inherently limit SDMCTF video coding schemes.
  • some of the interframe wavelet-coded sequence appears to be slightly blurred, because imperfect motion estimation causes movement of frame details into the temporal high frequency sub-bands, and from there, to spatial high frequency sub- bands.
  • wavelet filtering is used to spatially decompose each of the video frames into multiple sub-bands, and temporal correlation for each sub-band is removed using motion estimation.
  • the conventional OW framework suffers from drift, which results in performance loss in SNR scalability. Furthermore, only limited range of temporal scalability can be achieved using B frames.
  • the present invention is directed to a method and device for coding video.
  • a video signal is spatially decomposed into at least two signals of different frequency sub-bands.
  • An individualized motion compensated temporal filtering scheme is applied to each sub-band signal. Texture coding is then applied to each of the motion compensated temporally filtered subband signals.
  • a signal including at least two encoded motion compensated temporally filtered, different frequency sub-band signals of a video signal is decoded.
  • Inverse motion compensated temporal filtering is independently applied to each of the decoded at least two sub-band signals.
  • the at least two sub-band signals are spatially recomposed and the video signal is reconstructed from at least one of the at least two spatially recomposed sub-band signals.
  • FIG. 1 is a block diagram of a 3-D overcomplete wavelet video encoder according to an exemplary embodiment of the present invention, which may be used for performing the IBMCTF method of present invention.
  • FIG. 2 is a block diagram of an adaptive higher order interpolation filter used in the present invention.
  • FIG. 3 illustrates the generation of an extended reference frame for motion estimation from overcomplete expansion of wavelet coefficients according to the present invention.
  • FIG. 4A illustrates a decomposition scheme for conventional MCTF that generates blurred images.
  • FIG. 4B illustrates a decomposition scheme used in the present invention.
  • FIG. 5 is a block diagram of a 3-D overcomplete wavelet video decoder according to an exemplary embodiment of the present invention.
  • FIG. 6 shows an over-complete wavelet expansion using a LBS algorithm for two level decompositions.
  • FIG. 7 is a video of a 2-level overcomplete wavelet transform obtained using the LBS method.
  • FIG. 8 illustrates the interleaving scheme of the present invention for a 1-D case of a one level decomposition.
  • FIG. 9 shows the overcomplete wavelet coefficients of the first frame of the video of FIG. 7 after performing the interleaving process of the present invention.
  • FIG. 10 is a wavelet block form by the LBS algorithm.
  • FIG. 11 shows a Table that illustrates the MAD in wavelet domain for temporal high sub-band frames.
  • FIGS. 12-17 plot the rate distortion performance of the IBMCTF video coding scheme of the present invention and SDMCTF for several test sequence for integer and 1/8- pel accurate motion estimation.
  • FIG. 18 is an exemplary embodiment of a system which may be used for implementing the principles of the present invention.
  • the present invention is a fully scalable three-dimensional (3-D) overcomplete wavelet video coding scheme that utilizes a novel inband motion compensated temporal filtering (IBMCTF) method.
  • IBMCTF inband motion compensated temporal filtering
  • the IBMCTF method of the present invention overcomes the drawbacks of previous IBMCTF coding methods, and demonstrates coding efficiency comparable or better than conventional interframe wavelet coding methods that utilize spatial domain motion compensated temporal filtering.
  • FIG. 1 is a block diagram of a 3-D overcomplete wavelet video encoder according to an exemplary embodiment of the present invention, which may be used for performing the IBMCTF method of present invention.
  • the video encoder 100 includes a 3-D wavelet transform unit 110 that spatially decomposes each video frame of an input video into any desired number of multiple sub-bands 1, 2,... and N using a conventional 3-D overcomplete wavelet filtering process.
  • the video encoder 100 further includes a partitioning unit 120a, 120b, 120c for each sub-band generated by the wavelet transform unit 110.
  • Each partitioning unit 120a, 120b, 120c divides the wavelet coefficients of its associated sub-band into groups of frames (GOFs) for encoding as a group.
  • GAFs groups of frames
  • the video encoder 100 also includes a motion compensated temporal filtering (MCTF) unit 130a, 130b, 130c for each sub-band, that contains a motion estimator 131a, 131b, 131c and a temporal filter 132a, 132b, 132c.
  • MCTF motion compensated temporal filtering
  • Each MCTF 130a, 130b, 130c separately removes temporal correlation or redundancy from the GOFs of each sub-band using a motion compensated temporal filtering (MCTF) process.
  • MCTF motion compensated temporal filtering
  • the use of a discrete MCTF unit for each sub-band allows the motion compensated temporal filtering process to be tailored for each sub-band independently of the other sub-bands.
  • the temporal filtering process selected for a particular sub- band may be based on different criteria.
  • the encoder additionally includes a texture encoder 140a, 140b, 140c for each sub- band that allows the residual signal and motion information (motion vectors) generated by the MCTF units 130a, 130b, 130c for each sub-band to be independently texture coded using any optimized texture coding process.
  • the texture coded residual signals and motion information are then combined into a single bitstream by a multiplexer 150.
  • Another embodiment of texture coding is a gobal transform of a full size residual frame, which is applied after the all residual signals and motion information generated by the MCTF units 130a, 130b, 130c for each sub-band are combined to generate the full size residual frame.
  • each motion compensated filtering unit 130a, 130b, 130c utilizes an adaptive higher order interpolation filter 200, as shown in FIG. 2, to maximize the performance of the motion estimator 131a, 131b, 131c.
  • the interpolation filter 200 of the invention includes a low band shifting (LBS) unit 210 that performs low band shifting, an interleaving unit 220 that performs overcomplete wavelet coefficient interleaving, and an interpolation unit 230.
  • LBS low band shifting
  • the LBS process is implemented in the LBS unit 210 with one or more known LBS algorithms that efficiently generate an overcomplete representation of the original wavelet coefficients, which is now shift invariant.
  • LBS advantageously generates the overcomplete expansion of the original wavelet coefficients at the encoder and decoder using one or more similar LBS algorithms, therefore, no additional information needs to be encoded and transmitted as compared to conventional interframe wavelet coding schemes.
  • the interleaving process performed by the interleaving unit 220, combines the different phase information provided by the overcomplete wavelet coefficients to generate an extended reference frame. Accordingly, there is no need to encode the phase information separately as in previous IBMCTF based video coding methods. Due to the interleaving process of the present invention, the phase information is coded inherently as part of the higher accuracy motion vectors.
  • the interpolation unit 230 From the extended reference frame, the interpolation unit 230 generates a fractional pel, such as 1/2, 1/4, 1/8, 1/16 pels, which is used by the motion estimator 131a, 131b, 131c for motion estimation. Interpolation may be implemented with a conventional one- dimensional interpolation filter. In order to maximize the performance of the motion estimation and MCTF, independently optimized interpolation filters with a different tap can be used for each subband.
  • FIG. 3 illustrates the generation of an extended reference frame for motion estimation from overcomplete expansion of wavelet coefficients according to the present invention.
  • three other phases of wavelet coefficients are generated from original wavelet coefficients 310 by shifting the lower sub- band with the amount of (1,0), (0,1) and (1,1). Then, four phases of wavelet coefficients 310, 320, 330, 340 are interleaved to generate an extended reference frame 350.
  • the IBMCTF based 3-D overcomplete wavelet video coding method of the present invention provides improved spatial scalability performance as compared with known spatial domain motion compensated temporal filtering (SDMCTF) based video coding methods. This is because the temporal filtering is performed per sub-band (resolution) and hence, loss of information from the finer resolution sub-bands does not incur any drift in the temporal direction.
  • SDMCTF spatial domain motion compensated temporal filtering
  • a bi-directional temporal filtering technique can be used for low resolution sub-bands, while a forward temporal filtering technique can be used for higher resolution sub-bands.
  • the temporal filtering technique can be selected based on minimizing a distortion or a complexity measure (e.g. the low resolution sub-bands have less pixels and hence bi-directional and multiple reference temporal filtering can be employed, while for the high resolution sub-bands that have a larger number of pixels, only forward estimation is performed).
  • a discrete partitioning unit 120a, 120b, 120c for each sub-band allows the GOFs to be adaptively determined per sub-band.
  • the LL-sub-bands might have a very large GOF, while the H-sub-bands can use limited GOFs.
  • the GOF sizes can be varied based on the sequence characteristics, complexity or resiliency requirements.
  • the decomposition scheme for conventional MCTF as shown in FIG. 4A, generates blurred images.
  • the use of different temporal decomposition levels and GOF sizes allows the 3-D wavelet scalable video coding scheme of the present invention to overcomes such drawbacks. As shown in FIG.
  • GOF sizes for LL LH (HL), and HH may be 8, 4, and 2 frames, respectively, which allow maximum decomposition levels of 3, 2, and 1 respectively. This way the higher spatial frequency sub-bands are omitted from longer-term temporal filtering.
  • the number of temporal decomposition levels for the various sub-bands can be determined either based on content, or to reduce a specific distortion metric or simply based on the desired temporal scalability in each resolution.
  • discrete texture coding unit 140a, 140b, 140c for each sub-band allows adaptive texture coding of the various spatial sub-bands.
  • wavelet or DCT-based texture coding schemes may be used.
  • intra-coded blocks can be advantageously inserted anywhere within the GOF to deal efficiently with covering and uncovering situations.
  • "adaptive intra-refresh" concepts from MPEG-4/H.26L can be easily employed to provide improved resiliency, and different refresh rates can be used for the various sub-bands to obtain different resiliencies. This is especially beneficial since the lower resolution sub-bands can be used for concealing the higher resolution sub-bands and hence, their resiliency is more important.
  • Another advantage of the present invention relates to the complexity scalability of the decoder. If there are many decoders with different computation power and displays, the same scalable bitstream can be used to support all those decoders through SNR/spatial/ temporal scalability.
  • the scalable bitstream generated by the encoder of the invention can be decoded with a decoder with low complexity that can decode only low resolution spatial and temporal decomposition level, which incurs only small computational burden.
  • the scalable bitstream generated by the encoder of the invention can also be decoded with a decoder having sophisticated decoding power that can decode the whole bit stream to achieve the full spatial and temporal resolution.
  • FIG. 5 is a block diagram of a 3-D overcomplete wavelet video decoder according to an exemplary embodiment of the present invention.
  • the decoder may be used for decoding the bitstream produced by the encoder of the present invention.
  • the video decoder 400 may include a demultiplexer 410 that processes the bitstream to separate the encoded wavelet coefficients from the motion information.
  • a first texture decoder 420 texture decodes the wavelet coefficients, according to the inverse of the texture coding technique performed on the encoding side, into their separate sub-bands 1, 2,... and N.
  • the wavelet coefficients of a sub-band produced by the first texture decoder 420 correspond to each GOF of that sub-band.
  • a motion vector decoder 430 decodes the motion information for each sub-band according to the inverse of the texture coding technique performed on the encoding side.
  • inverse MCTF is applied by MCTF units 440a, 440b, 440c for each sub-band independently and an inverse wavelet transform unit 450 spatially recomposes each sub-band to reconstruct the low, medium, and high level images.
  • the low- band-shifting block reads the recomposed sub-band images to assemble a full size image and then the low band shifted wavelet decomposition is applied to provide the extended reference frames for the inverse MCTF units 440a, 440b, 440c.
  • a video reconstruction unit may use one of the sub-bands to generate the low resolution video, or use two sub-bands to generate a medium resolution video, or use all of the sub-bands to generate a high resolution, full quality video.
  • LBS Low Band Shifting Method
  • the decimation process performed in a wavelet transform generates wavelet coefficients that are no longer shift-invariant. Hence, translation motion in the spatial domain cannot be accurately estimated from the wavelet coefficients, which in turn produces a significant loss in coding efficiency.
  • the LBS algorithms utilized in the present invention provide a method for overcoming the shift- variant property of the wavelet transform.
  • the original and shifted signals are decomposed into low-sub- band and high-sub-band signals.
  • the low-sub-band signal is further decomposed in the same way as for the first level.
  • FIG. 6 shows an over-complete wavelet expansion using a LBS algorithm for two level decompositions.
  • the one dimensional (1-D) formulation can be easily expanded to wavelet decompositions having multiple levels and also to two-dimensional (2-D) image signals.
  • the pair (m,n) indicates that the wavelet coefficients within that sub-band were generated by shifts of m-pixels in the x-direction and n-pixels in the y-direction, respectively.
  • the LBS algorithm generates a full-set of wavelet coefficients for all the possible shifts of the input sub-band.
  • the representation accurately conveys any shift in spatial domain.
  • the different shifted wavelet coefficients corresponding to the same decomposition level at a specific spatial location are referred to as "cross-phase" wavelet coefficients.
  • FIG. 7 is a video of a 2-level overcomplete wavelet transform obtained using the LBS method. Note that for an n-level decomposition, the overcomplete wavelet representation requires a storage space that is 3n+l larger than that of the original image.
  • the novel interleaving scheme of the present invention stores the overcomplete wavelet coefficients differently from that depicted in FIGS. 6 and 7. As shown in FIG. 8, which illustrates the interleaving scheme of the present invention for a 1-D case of a one level decomposition, the coefficients for shift-interleaving is performed such that the new coordinates in the overcomplete domain correspond to the associated shift in the original spatial domain.
  • FIG. 9 shows the overcomplete wavelet coefficients of the first frame of the video of FIG. 7 after performing the interleaving process of the present invention.
  • the interleaved low sub-band signal is a low-pass filtered version of the original frame using the overcomplete wavelet low-pass filter.
  • the interleaving process of the present invention enables the IBMCTF method of the present invention to provide sub-pixel accuracy motion estimation and compensation.
  • Previously proposed IBMCTF schemes cannot provide optimal sub-pixel accuracy motion estimation and compensation, because they do not take into consideration cross-phase dependencies between neighbouring wavelet coefficients.
  • the interleaving process allows the IBMCTF method of the invention to use hierarchical variable size block matching, backward motion compensation, and adaptive insertion of intra blocks.
  • every coefficient at a given scale can be related to a set of coefficients of the same orientation at finer scales.
  • this relationship is exploited by representing the coefficients as a data structure called a wavelet tree.
  • the coefficients of each wavelet tree rooted in the lowest sub- band are rearranged to form a wavelet block, as shown in FIG. 10.
  • the purpose of the wavelet block is to provide a direct association between the wavelet coefficients and what they represent spatially in the image.
  • Related coefficients at all scales and orientations are included in each block.
  • the block-based motion estimation usually divides an image into small blocks and then finds the block of the reference frame that minimizes the mean absolute different (MAD) to each block of the current frame.
  • the motion estimation of the LBS algorithm finds the motion vector (dx, dy) that generates the minimum MAD between the current wavelet block and the reference wavelet block.
  • the MAD of the k-th wavelet block in FIG. 10 is computed as follows:
  • the i-th level HL sub-band of the reference frame is represented by L ⁇ f (m, n; x, y) , where (m,n) denotes the number of shift in x- and y- direction in the spatial domain and (x,y) is the location of the sub-band signal.
  • the optimization criterion for the motion estimation is now finding the optimal (dx,dy) which minimizes this MAD. Note that in the original LBS algorithm, for the non-integer value of (dx,dy), it is not possible to compute the MAD using the above formula. More specifically, the MAD in conventional IBMCTF video coding schemes is based solely on the same- phase wavelet coefficients and the resulting sub-pixel accuracy motion estimation and compensation is not optimal.
  • the interleaving process enables the MAD calculation to be performed similarly as in SDMCTF video coding schemes, even for the sub-pixel accuracy. More specifically, the MAD for the displacement vector (dx,dy) for the IBMCTF method of the present invention is computed as follows:
  • LBS _HL ⁇ (x,y) denotes the extended HL sub-band of reference frame using interleaving process of the present invention.
  • the IBMCTF video coding scheme of the present invention provides more efficient and indeed optimal sub-pixel motion estimation compared to the existing IBMCTF coding schemes.
  • the IBMCTF video coding scheme of the present invention with the wavelet block structure does not incur any motion vector overhead because the number of the motion vector to be coded is the same as that of SDMCTF. Since the motion estimation is closely aligned with the residual coding, a more sophisticated motion estimation criterion (such as the entropy of the residual signal) may be used to improve the coding performance.
  • FIG. 11 shows a Table that illustrates the MAD in wavelet domain for temporal high sub-band frames. The MAD values are averaged over the first 50 frames of temporal high sub-bands. For the SDMCTF cases, the corresponding MAD values in wavelet domains are computed after the wavelet transform of the residual signal . Note that the MAD for the IBMCTF is always smaller than for SDMCTF, which indicates the possible coding gain of the IBMCTF video coding scheme of the present invention over SDMCTF.
  • FIGS. 12-17 plot the rate distortion performance of the IBMCTF video coding scheme of the present invention and SDMCTF for several test sequence for integer and 1/8- pel accurate motion estimation.
  • the inband structure for MCTF was computed with a two level spatial decomposition performed by a Daubechies 9/7 filter, and four levels of decomposition were used for the temporal direction.
  • the texture coding was performed with an EZBC algorithm described in the article entitled, Invertible Three-Dimensional Analysis/Synthesis System For Video Coding With Half-Pixel Accurate Motion Compensation, by S.T. Hsiang et al., VCIP 1999, SPIE Vol. 3653, pp. 537-546.
  • the sub-pixel motion estimation using 1/8 pel greatly improves the coding performance of the IBMCTF.
  • the overall coding performance of the IBMCTF and SDMCTF is comparable.
  • some sequences such as “Coastguard”, “Silent” and “Stefan” exhibit a performance gain of up to 0.5dB, while for the "Mobile” sequence a 0.3dB performance degradation can be observed.
  • the IBMCTF algorithm of the present invention is free of blocking artefacts of the motion estimation since the motion estimation and filtering is done in each sub-band and the boundary of the motion is filtered out using wavelet recomposition filter.
  • FIG. 18 is an exemplary embodiment of a system 500 which may be used for implementing the principles of the present invention.
  • the system 500 may represent a television, a set-top box, a desktop, laptop or palmtop computer, a personal digital assistant (PDA), a video/image storage device such as a video cassette recorder (VCR), a digital video recorder (DVR), a TiVO device, etc., as well as portions or combinations of these and other devices.
  • the system 500 includes one or more video/image sources 501, one or more input/output devices 502, a processor 503 and a memory 504.
  • the video/image source(s) 501 may represent, e.g., a television receiver, a VCR or other video/image storage device.
  • the source(s) 501 may alternatively represent one or more network connections for receiving video from a server or servers over, e.g., a global computer communications network such as the Internet, a wide area network, a metropolitan area network, a local area network, a terrestrial broadcast system, a cable network, a satellite network, a wireless network, or a telephone network, as well as portions or combinations of these and other types of networks.
  • a global computer communications network such as the Internet, a wide area network, a metropolitan area network, a local area network, a terrestrial broadcast system, a cable network, a satellite network, a wireless network, or a telephone network, as well as portions or combinations of these and other types of networks.
  • the input/output devices 502, processor 503 and memory 504 may communicate over a communication medium 505.
  • the communication medium 505 may represent, e.g., a bus, a communication network, one or more internal connections of a circuit, circuit card or other device, as well as portions and combinations of these and other communication media.
  • Input video data from the source(s) 501 is processed in accordance with one or more software programs stored in memory 504 and executed by processor 503 in order to generate output video/images supplied to a display device 506.
  • the coding and decoding principles of the present invention may be implemented by computer readable code executed by the system.
  • the code may be stored in the memory 504 or read/downloaded from a memory medium such as a CD-ROM or floppy disk.
  • hardware circuitry may be used in place of, or in combination with, software instructions to implement the invention.
  • the functional elements shown in FIGS. 1, 2, and 5 may also be implemented as discrete hardware elements.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A method and device for coding video where a video signal is spatially decomposed into at least two signals of different frequency sub-bands, an individualized motion compensated temporal filtering scheme is applied to each sub-band signal adaptively according to signal contents, and texture coding is applied to each of the motion compensated temporally filtered subband signals adaptively according to the signal content.

Description

FULLY SCALABLE 3-D ONERCOMPLETE WAVELET VIDEO CODING USING ADAPTIVE MOTION COMPENSATED TEMPORAL FILTERING
This application claims the benefit under 35 USC 119(e) of U.S. provisional application serial no. 60/418,961, filed on October 16, 2002, which is incorporated herein by reference.
The present invention relates to video compression, and more particularly to overcomplete wavelet video coding using adaptive motion compensated temporal filtering.
Current video coding algorithms are mainly based on hybrid-coding schemes with motion compensated predictive coding. In such hybrid schemes, temporal redundancy is reduced using motional compensation and spatial resolution is reduced by transform coding the residue of motion compensation. These hybrid-coding schemes, however, are prone to error propagation and lack flexibility in terms of providing true scalable bitstream, i.e., the ability to decompress to different quality, resolution, and frame-rate layers from the same compressed bitstream.
In contrast, 3D sub-band/wavelet coding can provide very flexible scalable bitstream and higher error resilience. Wavelet-based scalable video coding schemes permit great flexibility in terms of the different scalability types allowed. Hence, they are especially useful for video transmission over heterogeneous wireless and wired networks, to various devices with different capabilities.
Currently, there are two wavelet-based video coding schemes: overcomplete wavelet and interframe wavelet. In overcomplete (OW) wavelet video coding, the spatial wavelet transform for each frame is performed first, followed by exploitation of interframe redundancy by predicting the wavelet coefficient values, or by defining temporal contexts in entropy coding. In interframe wavelet video coding, wavelet filtering is performed along the temporal axis followed by a 2D spatial wavelet transform.
Present interframe wavelet video coding schemes use motion compensated temporal filtering (MCTF), to reduced the temporal redundancy. MCTF is performed in the temporal direction of motion before spatial decomposition is performed. Such video coding schemes are referred to herein as spatial domain MCTF (SDMCTF). However, the quality of the matches provided by the motion estimation algorithm inherently limit SDMCTF video coding schemes. For example, some of the interframe wavelet-coded sequence appears to be slightly blurred, because imperfect motion estimation causes movement of frame details into the temporal high frequency sub-bands, and from there, to spatial high frequency sub- bands. These artifacts lead to degraded visual performance for unquantized and spatially scaled sequences. Further tests have indicated that decreasing the number of temporal decomposition levels can reduce the artifacts.
In present OW video coding schemes, wavelet filtering is used to spatially decompose each of the video frames into multiple sub-bands, and temporal correlation for each sub-band is removed using motion estimation.
There have been many attempts to predict the wavelet coefficients by motion compensation in the wavelet domain. However, motion compensation in the wavelet domain is highly dependent on the alignment of the signal and the discrete grid chosen for the analysis. There exist very large differences between the wavelet coefficients of the original image and the one-pixel-shifted image. This shift-variant property happens frequently around the image edges, so motion compensation of the wavelet coefficients can be difficult.
Existing OW video coding schemes overcome the inefficiency of motion estimation in wavelet domain by utilizing the odd-phase wavelet coefficients in the prediction as well. A convenient method of obtaining the odd phase coefficients is to perform band shifting. Since the decoded previous frame is also available at the decoder, prediction from overcomplete expansion does not require any additional overhead. Moreover, the computational complexity of searching both optimal phase and motion vectors in wavelet domain is comparable to that of conventional motion estimation in spatial domain with fractional pel accuracy.
However, due to the motion estimation compensation, the conventional OW framework suffers from drift, which results in performance loss in SNR scalability. Furthermore, only limited range of temporal scalability can be achieved using B frames.
Accordingly, a wavelet-based video-coding scheme with improved SNR and temporal scalability is needed.
The present invention is directed to a method and device for coding video. According to a first aspect of present invention, a video signal is spatially decomposed into at least two signals of different frequency sub-bands. An individualized motion compensated temporal filtering scheme is applied to each sub-band signal. Texture coding is then applied to each of the motion compensated temporally filtered subband signals. According to a second aspect of the invention, a signal including at least two encoded motion compensated temporally filtered, different frequency sub-band signals of a video signal, is decoded. Inverse motion compensated temporal filtering is independently applied to each of the decoded at least two sub-band signals. The at least two sub-band signals are spatially recomposed and the video signal is reconstructed from at least one of the at least two spatially recomposed sub-band signals.
FIG. 1 is a block diagram of a 3-D overcomplete wavelet video encoder according to an exemplary embodiment of the present invention, which may be used for performing the IBMCTF method of present invention.
FIG. 2 is a block diagram of an adaptive higher order interpolation filter used in the present invention.
FIG. 3 illustrates the generation of an extended reference frame for motion estimation from overcomplete expansion of wavelet coefficients according to the present invention.
FIG. 4A illustrates a decomposition scheme for conventional MCTF that generates blurred images.
FIG. 4B illustrates a decomposition scheme used in the present invention.
FIG. 5 is a block diagram of a 3-D overcomplete wavelet video decoder according to an exemplary embodiment of the present invention.
FIG. 6 shows an over-complete wavelet expansion using a LBS algorithm for two level decompositions.
FIG. 7 is a video of a 2-level overcomplete wavelet transform obtained using the LBS method.
FIG. 8 illustrates the interleaving scheme of the present invention for a 1-D case of a one level decomposition.
FIG. 9 shows the overcomplete wavelet coefficients of the first frame of the video of FIG. 7 after performing the interleaving process of the present invention.
FIG. 10 is a wavelet block form by the LBS algorithm.
FIG. 11 shows a Table that illustrates the MAD in wavelet domain for temporal high sub-band frames.
FIGS. 12-17 plot the rate distortion performance of the IBMCTF video coding scheme of the present invention and SDMCTF for several test sequence for integer and 1/8- pel accurate motion estimation. FIG. 18 is an exemplary embodiment of a system which may be used for implementing the principles of the present invention.
The present invention is a fully scalable three-dimensional (3-D) overcomplete wavelet video coding scheme that utilizes a novel inband motion compensated temporal filtering (IBMCTF) method. The IBMCTF method of the present invention overcomes the drawbacks of previous IBMCTF coding methods, and demonstrates coding efficiency comparable or better than conventional interframe wavelet coding methods that utilize spatial domain motion compensated temporal filtering.
FIG. 1 is a block diagram of a 3-D overcomplete wavelet video encoder according to an exemplary embodiment of the present invention, which may be used for performing the IBMCTF method of present invention. The video encoder 100 includes a 3-D wavelet transform unit 110 that spatially decomposes each video frame of an input video into any desired number of multiple sub-bands 1, 2,... and N using a conventional 3-D overcomplete wavelet filtering process.
The video encoder 100 further includes a partitioning unit 120a, 120b, 120c for each sub-band generated by the wavelet transform unit 110. Each partitioning unit 120a, 120b, 120c divides the wavelet coefficients of its associated sub-band into groups of frames (GOFs) for encoding as a group.
The video encoder 100 also includes a motion compensated temporal filtering (MCTF) unit 130a, 130b, 130c for each sub-band, that contains a motion estimator 131a, 131b, 131c and a temporal filter 132a, 132b, 132c. Each MCTF 130a, 130b, 130c separately removes temporal correlation or redundancy from the GOFs of each sub-band using a motion compensated temporal filtering (MCTF) process. In accordance with the present invention, the use of a discrete MCTF unit for each sub-band allows the motion compensated temporal filtering process to be tailored for each sub-band independently of the other sub-bands. In addition, the temporal filtering process selected for a particular sub- band may be based on different criteria.
The encoder additionally includes a texture encoder 140a, 140b, 140c for each sub- band that allows the residual signal and motion information (motion vectors) generated by the MCTF units 130a, 130b, 130c for each sub-band to be independently texture coded using any optimized texture coding process. The texture coded residual signals and motion information are then combined into a single bitstream by a multiplexer 150. Another embodiment of texture coding is a gobal transform of a full size residual frame, which is applied after the all residual signals and motion information generated by the MCTF units 130a, 130b, 130c for each sub-band are combined to generate the full size residual frame.
As one of ordinary skill in the art will appreciate, the critical-sampled wavelet decomposition in known IBMCTF methods is only periodically shift-invariant. Therefore, performing motion estimation and compensation in the wavelet domain is inefficient and may incur a coding penalty. To address this problem, each motion compensated filtering unit 130a, 130b, 130c utilizes an adaptive higher order interpolation filter 200, as shown in FIG. 2, to maximize the performance of the motion estimator 131a, 131b, 131c. The interpolation filter 200 of the invention includes a low band shifting (LBS) unit 210 that performs low band shifting, an interleaving unit 220 that performs overcomplete wavelet coefficient interleaving, and an interpolation unit 230. The LBS process is implemented in the LBS unit 210 with one or more known LBS algorithms that efficiently generate an overcomplete representation of the original wavelet coefficients, which is now shift invariant. LBS advantageously generates the overcomplete expansion of the original wavelet coefficients at the encoder and decoder using one or more similar LBS algorithms, therefore, no additional information needs to be encoded and transmitted as compared to conventional interframe wavelet coding schemes.
The interleaving process, performed by the interleaving unit 220, combines the different phase information provided by the overcomplete wavelet coefficients to generate an extended reference frame. Accordingly, there is no need to encode the phase information separately as in previous IBMCTF based video coding methods. Due to the interleaving process of the present invention, the phase information is coded inherently as part of the higher accuracy motion vectors.
From the extended reference frame, the interpolation unit 230 generates a fractional pel, such as 1/2, 1/4, 1/8, 1/16 pels, which is used by the motion estimator 131a, 131b, 131c for motion estimation. Interpolation may be implemented with a conventional one- dimensional interpolation filter. In order to maximize the performance of the motion estimation and MCTF, independently optimized interpolation filters with a different tap can be used for each subband. FIG. 3 illustrates the generation of an extended reference frame for motion estimation from overcomplete expansion of wavelet coefficients according to the present invention. In order to achieve a higher order interpolation for motion estimation in the HH sub-band overcomplete expansion 300, for example, three other phases of wavelet coefficients are generated from original wavelet coefficients 310 by shifting the lower sub- band with the amount of (1,0), (0,1) and (1,1). Then, four phases of wavelet coefficients 310, 320, 330, 340 are interleaved to generate an extended reference frame 350.
The IBMCTF based 3-D overcomplete wavelet video coding method of the present invention provides improved spatial scalability performance as compared with known spatial domain motion compensated temporal filtering (SDMCTF) based video coding methods. This is because the temporal filtering is performed per sub-band (resolution) and hence, loss of information from the finer resolution sub-bands does not incur any drift in the temporal direction.
As mentioned earlier, the use of a discrete MCTF unit 130a, 130b, 130c for each sub-band allows different temporal filtering techiques to be used at the various resolutions. For example, in one embodiment, a bi-directional temporal filtering technique can be used for low resolution sub-bands, while a forward temporal filtering technique can be used for higher resolution sub-bands. The temporal filtering technique can be selected based on minimizing a distortion or a complexity measure (e.g. the low resolution sub-bands have less pixels and hence bi-directional and multiple reference temporal filtering can be employed, while for the high resolution sub-bands that have a larger number of pixels, only forward estimation is performed). Such a flexible choice of temporal filtering options makes moves the invention away from the strict 1D+2D decomposition schemes as performed by MCTF, to a more general 3-D decomposition scheme with spatial size reduction throughout the temporal levels, where the higher spatial frequency sub-bands are omitted from longer- term temporal filtering.
The use of a discrete partitioning unit 120a, 120b, 120c for each sub-band allows the GOFs to be adaptively determined per sub-band. For instance, the LL-sub-bands might have a very large GOF, while the H-sub-bands can use limited GOFs. The GOF sizes can be varied based on the sequence characteristics, complexity or resiliency requirements. As mentioned earlier, the decomposition scheme for conventional MCTF, as shown in FIG. 4A, generates blurred images. However, the use of different temporal decomposition levels and GOF sizes allows the 3-D wavelet scalable video coding scheme of the present invention to overcomes such drawbacks. As shown in FIG. 4B, GOF sizes for LL LH (HL), and HH may be 8, 4, and 2 frames, respectively, which allow maximum decomposition levels of 3, 2, and 1 respectively. This way the higher spatial frequency sub-bands are omitted from longer-term temporal filtering. The number of temporal decomposition levels for the various sub-bands can be determined either based on content, or to reduce a specific distortion metric or simply based on the desired temporal scalability in each resolution. For instance, if 30, 15 and 7.5 Hz frame-rates are desired at CIF (352x288) size resolution, and only 30 and 15 at SD (704x576) size resolution, then for the LL spatial-sub-band, three levels of temporal decomposition are used, while only two levels of temporal decomposition can be applied for LH, HL, and the HH sub-bands.
As also mentioned earlier, the use of discrete texture coding unit 140a, 140b, 140c for each sub-band allows adaptive texture coding of the various spatial sub-bands. For example, wavelet or DCT-based texture coding schemes may be used. If DCT-based texture coding is used, intra-coded blocks can be advantageously inserted anywhere within the GOF to deal efficiently with covering and uncovering situations. Also, "adaptive intra-refresh" concepts from MPEG-4/H.26L can be easily employed to provide improved resiliency, and different refresh rates can be used for the various sub-bands to obtain different resiliencies. This is especially beneficial since the lower resolution sub-bands can be used for concealing the higher resolution sub-bands and hence, their resiliency is more important.
Another advantage of the present invention relates to the complexity scalability of the decoder. If there are many decoders with different computation power and displays, the same scalable bitstream can be used to support all those decoders through SNR/spatial/ temporal scalability. For example, the scalable bitstream generated by the encoder of the invention can be decoded with a decoder with low complexity that can decode only low resolution spatial and temporal decomposition level, which incurs only small computational burden. Similarly, the scalable bitstream generated by the encoder of the invention can also be decoded with a decoder having sophisticated decoding power that can decode the whole bit stream to achieve the full spatial and temporal resolution.
FIG. 5 is a block diagram of a 3-D overcomplete wavelet video decoder according to an exemplary embodiment of the present invention. The decoder may be used for decoding the bitstream produced by the encoder of the present invention. The video decoder 400 may include a demultiplexer 410 that processes the bitstream to separate the encoded wavelet coefficients from the motion information.
A first texture decoder 420 texture decodes the wavelet coefficients, according to the inverse of the texture coding technique performed on the encoding side, into their separate sub-bands 1, 2,... and N. The wavelet coefficients of a sub-band produced by the first texture decoder 420 correspond to each GOF of that sub-band. A motion vector decoder 430 decodes the motion information for each sub-band according to the inverse of the texture coding technique performed on the encoding side. Using the decoded motion vectors and residual texture information, inverse MCTF is applied by MCTF units 440a, 440b, 440c for each sub-band independently and an inverse wavelet transform unit 450 spatially recomposes each sub-band to reconstruct the low, medium, and high level images. The low- band-shifting block reads the recomposed sub-band images to assemble a full size image and then the low band shifted wavelet decomposition is applied to provide the extended reference frames for the inverse MCTF units 440a, 440b, 440c. Depending on the display resolution, a video reconstruction unit (not shown) may use one of the sub-bands to generate the low resolution video, or use two sub-bands to generate a medium resolution video, or use all of the sub-bands to generate a high resolution, full quality video.
The various processes utilized in the video scheme of the present invention will now be described in greater detail below.
MOTION ESTIMATION AND COMPENSATION
IN THE OVERCOMPLETE WAVELET DOMAIN
1. Low Band Shifting Method (LBS)
The decimation process performed in a wavelet transform generates wavelet coefficients that are no longer shift-invariant. Hence, translation motion in the spatial domain cannot be accurately estimated from the wavelet coefficients, which in turn produces a significant loss in coding efficiency. The LBS algorithms utilized in the present invention provide a method for overcoming the shift- variant property of the wavelet transform. At a first level, the original and shifted signals are decomposed into low-sub- band and high-sub-band signals. Subsequently, the low-sub-band signal is further decomposed in the same way as for the first level.
FIG. 6 shows an over-complete wavelet expansion using a LBS algorithm for two level decompositions. The one dimensional (1-D) formulation can be easily expanded to wavelet decompositions having multiple levels and also to two-dimensional (2-D) image signals. The pair (m,n) indicates that the wavelet coefficients within that sub-band were generated by shifts of m-pixels in the x-direction and n-pixels in the y-direction, respectively. The LBS algorithm generates a full-set of wavelet coefficients for all the possible shifts of the input sub-band. Hence, the representation accurately conveys any shift in spatial domain. As will be discussed further on, the different shifted wavelet coefficients corresponding to the same decomposition level at a specific spatial location are referred to as "cross-phase" wavelet coefficients.
FIG. 7 is a video of a 2-level overcomplete wavelet transform obtained using the LBS method. Note that for an n-level decomposition, the overcomplete wavelet representation requires a storage space that is 3n+l larger than that of the original image.
2. Interleaving of Wavelet Coefficients The novel interleaving scheme of the present invention stores the overcomplete wavelet coefficients differently from that depicted in FIGS. 6 and 7. As shown in FIG. 8, which illustrates the interleaving scheme of the present invention for a 1-D case of a one level decomposition, the coefficients for shift-interleaving is performed such that the new coordinates in the overcomplete domain correspond to the associated shift in the original spatial domain.
The interleaving scheme can be used recursively at each decomposition level and can be directly extended for 2-D signals. FIG. 9 shows the overcomplete wavelet coefficients of the first frame of the video of FIG. 7 after performing the interleaving process of the present invention. As can be seen from FIG. 9, the interleaved low sub-band signal is a low-pass filtered version of the original frame using the overcomplete wavelet low-pass filter. The interleaving process of the present invention enables the IBMCTF method of the present invention to provide sub-pixel accuracy motion estimation and compensation. Previously proposed IBMCTF schemes cannot provide optimal sub-pixel accuracy motion estimation and compensation, because they do not take into consideration cross-phase dependencies between neighbouring wavelet coefficients. Furthermore, the interleaving process allows the IBMCTF method of the invention to use hierarchical variable size block matching, backward motion compensation, and adaptive insertion of intra blocks.
Generation of Wavelet Block As is well known in the art, in a wavelet decomposition, every coefficient at a given scale, with the exception of those in the highest frequency sub-bands, can be related to a set of coefficients of the same orientation at finer scales. In many wavelet coders, this relationship is exploited by representing the coefficients as a data structure called a wavelet tree. In the LBS algorithm, the coefficients of each wavelet tree rooted in the lowest sub- band are rearranged to form a wavelet block, as shown in FIG. 10. The purpose of the wavelet block is to provide a direct association between the wavelet coefficients and what they represent spatially in the image. Related coefficients at all scales and orientations are included in each block.
Structure of Motion Estimation In the spatial domain, the block-based motion estimation usually divides an image into small blocks and then finds the block of the reference frame that minimizes the mean absolute different (MAD) to each block of the current frame. The motion estimation of the LBS algorithm finds the motion vector (dx, dy) that generates the minimum MAD between the current wavelet block and the reference wavelet block. As an example, if an input image is decomposed up to the third level (i.e. the input image can be decomposed to a total often sub-bands), and the displacement vector is (dx,dy), then the MAD of the k-th wavelet block in FIG. 10 is computed as follows:
MADk(dx,dy) = ∑ ∑ ∑{
1-1 ■ . = . * , =Λl
# '„. J . ) - HL ( dx%2' , dy%2' ; x, + \. y, + dx dy LH ' {x y, ) - LH ( dx%2' , dy%2' ; x, + > y, + dx
HHi' (X,X ) ~ HH ri/ ( dx% * , dy%2' ; x, + y, + ~2 2'
Figure imgf000012_0001
where xlJc = x0ιk /2' and y k = y0ιk /2' ; and {x0ιk, 0ιk) denotes the initial position of the k-th wavelet block in the spatial domain, as shown in FIG. 10 and
Figure imgf000012_0002
denotes largest integer not bigger than x. Here, for example, the i-th level HL sub-band of the reference frame is represented by L^f (m, n; x, y) , where (m,n) denotes the number of shift in x- and y- direction in the spatial domain and (x,y) is the location of the sub-band signal. The optimization criterion for the motion estimation is now finding the optimal (dx,dy) which minimizes this MAD. Note that in the original LBS algorithm, for the non-integer value of (dx,dy), it is not possible to compute the MAD using the above formula. More specifically, the MAD in conventional IBMCTF video coding schemes is based solely on the same- phase wavelet coefficients and the resulting sub-pixel accuracy motion estimation and compensation is not optimal.
However, in the IBMCTF method of the present invention, the interleaving process enables the MAD calculation to be performed similarly as in SDMCTF video coding schemes, even for the sub-pixel accuracy. More specifically, the MAD for the displacement vector (dx,dy) for the IBMCTF method of the present invention is computed as follows:
3 *, +__ . 2' . ,_ + /_'
^t(ώ,Φ) = ∑ ∑ ∑{
Figure imgf000013_0001
+ dx, 2' y, + dy)\ + |Lff « (x y )-LBS_ LH% (2< x, + dx, 2' y, + dy)\
Figure imgf000013_0002
where, for example, LBS _HL^(x,y) denotes the extended HL sub-band of reference frame using interleaving process of the present invention. Note that even if (dx,dy) are non- integer values, the same interpolation technique used for SDMCTF can be easily used for each extended sub-band to generate the MAD for the non-integer displacement. Therefore, the IBMCTF video coding scheme of the present invention provides more efficient and indeed optimal sub-pixel motion estimation compared to the existing IBMCTF coding schemes. Also, in the IBMCTF video coding scheme of the present invention with the wavelet block structure does not incur any motion vector overhead because the number of the motion vector to be coded is the same as that of SDMCTF. Since the motion estimation is closely aligned with the residual coding, a more sophisticated motion estimation criterion (such as the entropy of the residual signal) may be used to improve the coding performance.
SIMULATION RESULTS In order to verify that motion estimation and motion compensation in accordance with the present invention in the overcomplete wavelet domain yields lower residual energy in the wavelet domain, we use a one level temporal decomposition and compute the MAD for both IBMCTF and SDMCTF. Note that in interframe wavelet coding, the MAD is computed in the spatial-domain, but actually what needs to be minimized is the residual energy in the wavelet domain. FIG. 11 shows a Table that illustrates the MAD in wavelet domain for temporal high sub-band frames. The MAD values are averaged over the first 50 frames of temporal high sub-bands. For the SDMCTF cases, the corresponding MAD values in wavelet domains are computed after the wavelet transform of the residual signal . Note that the MAD for the IBMCTF is always smaller than for SDMCTF, which indicates the possible coding gain of the IBMCTF video coding scheme of the present invention over SDMCTF.
FIGS. 12-17 plot the rate distortion performance of the IBMCTF video coding scheme of the present invention and SDMCTF for several test sequence for integer and 1/8- pel accurate motion estimation. The inband structure for MCTF was computed with a two level spatial decomposition performed by a Daubechies 9/7 filter, and four levels of decomposition were used for the temporal direction. The texture coding was performed with an EZBC algorithm described in the article entitled, Invertible Three-Dimensional Analysis/Synthesis System For Video Coding With Half-Pixel Accurate Motion Compensation, by S.T. Hsiang et al., VCIP 1999, SPIE Vol. 3653, pp. 537-546. Similar to SDMCTF, the sub-pixel motion estimation using 1/8 pel greatly improves the coding performance of the IBMCTF. The overall coding performance of the IBMCTF and SDMCTF is comparable. However, some sequences such as "Coastguard", "Silent" and "Stefan" exhibit a performance gain of up to 0.5dB, while for the "Mobile" sequence a 0.3dB performance degradation can be observed. Visually, the IBMCTF algorithm of the present invention is free of blocking artefacts of the motion estimation since the motion estimation and filtering is done in each sub-band and the boundary of the motion is filtered out using wavelet recomposition filter.
FIG. 18 is an exemplary embodiment of a system 500 which may be used for implementing the principles of the present invention. The system 500 may represent a television, a set-top box, a desktop, laptop or palmtop computer, a personal digital assistant (PDA), a video/image storage device such as a video cassette recorder (VCR), a digital video recorder (DVR), a TiVO device, etc., as well as portions or combinations of these and other devices. The system 500 includes one or more video/image sources 501, one or more input/output devices 502, a processor 503 and a memory 504. The video/image source(s) 501 may represent, e.g., a television receiver, a VCR or other video/image storage device. The source(s) 501 may alternatively represent one or more network connections for receiving video from a server or servers over, e.g., a global computer communications network such as the Internet, a wide area network, a metropolitan area network, a local area network, a terrestrial broadcast system, a cable network, a satellite network, a wireless network, or a telephone network, as well as portions or combinations of these and other types of networks.
The input/output devices 502, processor 503 and memory 504 may communicate over a communication medium 505. The communication medium 505 may represent, e.g., a bus, a communication network, one or more internal connections of a circuit, circuit card or other device, as well as portions and combinations of these and other communication media. Input video data from the source(s) 501 is processed in accordance with one or more software programs stored in memory 504 and executed by processor 503 in order to generate output video/images supplied to a display device 506.
In a preferred embodiment, the coding and decoding principles of the present invention may be implemented by computer readable code executed by the system. The code may be stored in the memory 504 or read/downloaded from a memory medium such as a CD-ROM or floppy disk. In other embodiments, hardware circuitry may be used in place of, or in combination with, software instructions to implement the invention. For example, the functional elements shown in FIGS. 1, 2, and 5 may also be implemented as discrete hardware elements.
While the present invention has been described above in terms of specific embodiments, it is to be understood that the invention is not intended to be confined or limited to the embodiments disclosed herein. For example, other transforms besides DCT can be employed, including but not limited to wavelets or matching-pursuits. These and all other such modifications and changes are considered to be within the scope of the appended claims.

Claims

1. A method of encoding video, the method comprising the steps of: providing a video signal; spatially decomposing (110) the video signal into at least two signals of different frequency sub-bands; applying an individualized motion compensated temporal filtering scheme (130a, 130b, 130c) to each sub-band signal; and texture coding (140a, 140b, 140c) each of the motion compensated temporally filtered subband signals.
2. The method according to claim 1, wherein the spatially decomposing step (110) is performed by wavelet filtering.
3. The method according to claim 1, wherein the video signal defines a plurality of frames, the spatially decomposing step (110) including spatially decomposing each of the frames of the video signal into the at least two signals of different frequency sub-bands.
4. The method according to claim 1, wherein prior to the step (130a, 130b, 130c) of applying a motion compensated temporal filtering scheme (130a, 130b, 130c), further comprising the step of breaking each of the sub-band signals into a signal representing a group of temporal frames having a certain content.
5. The method according to claim 4, wherein the individualized motion compensated temporal filtering scheme (130a, 130b, 130c) applied to each sub-band signal is individualized according to the content of the group of frames.
6. The method according to claim 1 , wherein prior to the step of applying a motion compensated temporal filtering scheme, further comprising the step of breaking each of the sub-band signals into a signal representing a group of frames (120a, 120b, 120c), the number of the frames in at least one of the group of frames signals being adaptively determined.
7. The method according to claim 1 , wherein the individualized motion compensated temporal filtering scheme (130a, 130b, 130c) applied to each sub-band signal is individualized according to a spatial resolution of the sub-band signal.
8. The method according to claim 1, wherein the step of applying an individualized motion compensated temporal filtering scheme (130a, 130b, 130c) to each sub-band signal is performed by using variable accuracy motion estimation, which is dependent of signal contents.
9. The method according to claim 1 , wherein the individualized motion compensated temporal filtering scheme (130a, 130b, 130c) applied to each sub-band signal is individualized according to a temporal correlation of the sub-band signal.
10. The method according to claim 1, wherein the step of applying an individualized motion compensated temporal filtering scheme (130a, 130b, 130c) to each sub-band signal is performed by using an individualized interpolation filter (200) for maximizing motion estimation performance.
11. The method according to claim 1 , wherein the individualized motion compensated temporal filtering scheme (130a, 130b, 130c) applied to each sub-band signal is individualized according to a characteristic of the sub-band signal.
12. The method according to claim 1 , wherein the step of applying an individualized motion compensated temporal filtering scheme(130a, 130b, 130c) to each bandwidth signal is performed by using a temporal filter selected from the group consisting of multidirectional temporal filters and unidirectional temporal filters.
13. The method according to claim 1, wherein the step of applying an individualized motion compensated temporal filtering scheme (130a, 130b, 130c) to each sub-band signal includes the steps of: shifting (210) the sub-band signal, which is from a phase of wavelet coefficients generated in the spatially decomposing step, at least three times to generate three additional phases of wavelet coefficients; interleaving (220) the four phases of wavelet coefficients to produce an extended reference frame; and estimating motion (131a, 131b, 131c) using the extended reference frame.
14. The method according to claim 13, wherein the spatial decomposing step (110) is performed to provide a plurality decomposition levels, each decomposition level comprising a different frequency sub-band and wherein the step of applying the individualized motion compensated temporal filtering scheme (130a, 130b, 130c), by performing the shifting (210), interleaving (220) and estimating steps 131a, 131b, 131c), is recursively applied for. each decomposition level.
15. The method according to claim 1 , wherein the step of applying an individualized motion compensated temporal filtering scheme (130a, 130b, 130c) to each sub-band signal includes the steps of: shifting (210) the sub-band signal, which are from a phase of wavelet coefficients generated in the spatially decomposing step, at least three times to generate three additional phases of wavelet coefficients; combining (220) the four phases of wavelet coefficients to produce an extended reference frame; generating a fractional pel (230) from the extended frame; and estimating motion (131a, 131b, 131c) according to the fractional pel.
16. The method according to claim 14, wherein the spatial decomposing step (110) is performed to provide a plurality decomposition levels, each decomposition level comprising a different frequency sub-band and wherein the step of applying the individualized motion compensated temporal filtering scheme(130a, 130b, 130c), by performing the shifting (210), combining (220), generating (230) and estimating steps (131a, 131b, 131c), is recursively applied for each decomposition level.
17. A memory medium for encoding video, the memory medium comprising: code for spatially decomposing (110) a video signal into at least two signals of different frequency sub-bands; code for applying an individualized motion compensated temporal filtering scheme (130a, 130b, 130c) to each sub-band signal; and code for texture coding (140a, 140b, 140c) each of the motion compensated temporally filtered subband signals.
18. A device for encoding video, the device comprising: a wavelet transform unit (110) for spatially decomposing a video signal into at least two signals of different frequency sub-bands; a motion compensated temporal filtering unit (130a, 130b, 130c) for each of the at least two sub-band signals, each motion compensated temporal filtering unit applying an individualized motion compensated temporal filtering scheme to its associated sub-band signal; and a texture coding unit (140a, 140b, 140c) for each of the at least two sub-band signals, each texture coding unit texture coding its associated motion compensated temporally filtered subband signal.
19. The device according to claim 18, further comprising a partitioning unit (120a, 120b, 120c) for each of the sub-band signals, each partitioning unit breaking its associated sub- band signal into a signal representing a group of temporal frames having a certain content.
20. The device according to claim 18, wherein each motion compensated temporal filtering unit (130a, 130b, 130c) includes: a low band shifting unit (210) for shifting its associated sub-band signal, which is from a phase of wavelet coefficients, at least three times to generate three additional phases of wavelet coefficients; and an interleaving unit (220) for interleaving the four phases of wavelet coefficients to produce an extended reference frame.
21. The device according to claim 20, wherein each motion compensated temporal filtering unit (130a, 130b,130c) further includes an interpolating unit (230) for generating a fractional pel from the extended frame.
22. The device according to claim 21, wherein each motion compensated temporal filtering unit (130a, 130b,130c) further includes a motion estimation unit (131a, 131b, 131c) for estimating motion according to the fractional pel.
23. A method of decoding video, the method comprising the steps of: decoding (420) a signal including at least two encoded motion compensated temporally filtered, different frequency sub-band signals of a video signal; independently applying inverse motion compensated temporal filtering (440a, 440b,
440c) to each of the decoded at least two sub-band signals; spatially recomposing (450) the at least two sub-band signals; and reconstructing the video signal from at least one of the at least two spatially recomposed sub-band signals.
24. The method according to claim 23, wherein the video signal is reconstructed from all of the at least two spatially recomposed sub-band signals.
25. A memory medium for decoding video, the memory medium comprising: code for decoding a signal (420) including at least two encoded motion compensated temporally filtered, different frequency sub-band signals of a video signal; code for independently applying inverse motion compensated temporal filtering (440a, 440b, 440c) to each of the decoded at least two sub-band signals; code for spatially recomposing (450) the at least two sub-band signals; and code for reconstructing the video signal from at least one of the at least two spatially recomposed sub-band signals.
26. A device for decoding video, the device comprising: a texture decoding unit (420) for decoding a signal including at least two encoded motion compensated temporally filtered, different frequency sub-band signals of a video signal; an inverse motion compensated temporal filtering unit (440a, 440b, 440c) for each of the at least two sub-band signals, each inverse motion compensated temporal filtering unit independently applying inverse motion compensated temporal filtering to its associated decoded at least two sub-band signal; an inverse wavelet transform unit (450) for spatially recomposing the at least two sub-band signals; and a video reconstructing unit for reconstructing the video signal from at least one of the at least two spatially recomposed sub-band signals.
PCT/IB2003/004452 2002-10-16 2003-10-08 Fully scalable 3-d overcomplete wavelet video coding using adaptive motion compensated temporal filtering WO2004036919A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP03808830A EP1554887A1 (en) 2002-10-16 2003-10-08 Fully scalable 3-d overcomplete wavelet video coding using adaptive motion compensated temporal filtering
AU2003264804A AU2003264804A1 (en) 2002-10-16 2003-10-08 Fully scalable 3-d overcomplete wavelet video coding using adaptive motion compensated temporal filtering
US10/531,195 US20060008000A1 (en) 2002-10-16 2003-10-08 Fully scalable 3-d overcomplete wavelet video coding using adaptive motion compensated temporal filtering
JP2005501325A JP2006503518A (en) 2002-10-16 2003-10-08 Highly scalable 3D overcomplete wavelet video coding

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US41896102P 2002-10-16 2002-10-16
US60/418,961 2002-10-16
US48379603P 2003-06-30 2003-06-30
US60/483,796 2003-06-30

Publications (1)

Publication Number Publication Date
WO2004036919A1 true WO2004036919A1 (en) 2004-04-29

Family

ID=32110202

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2003/004452 WO2004036919A1 (en) 2002-10-16 2003-10-08 Fully scalable 3-d overcomplete wavelet video coding using adaptive motion compensated temporal filtering

Country Status (6)

Country Link
US (1) US20060008000A1 (en)
EP (1) EP1554887A1 (en)
JP (1) JP2006503518A (en)
KR (1) KR20050052532A (en)
AU (1) AU2003264804A1 (en)
WO (1) WO2004036919A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006060792A (en) * 2004-07-13 2006-03-02 Microsoft Corp Embedded base layer codec for 3d sub-band encoding
WO2006109135A2 (en) * 2005-04-11 2006-10-19 Nokia Corporation Method and apparatus for update step in video coding based on motion compensated temporal filtering
WO2007000657A1 (en) * 2005-06-29 2007-01-04 Nokia Corporation Method and apparatus for update step in video coding using motion compensated temporal filtering
JP2008507170A (en) * 2004-07-13 2008-03-06 フランス テレコム エス アー Method and apparatus for encoding video image array
CN101199121B (en) * 2005-06-17 2012-03-21 Dts(英属维尔京群岛)有限公司 Scalable compressed audio bit stream and codec using a hierarchical filterbank and multichannel joint coding
US8244071B2 (en) 2006-11-27 2012-08-14 Microsoft Corporation Non-dyadic spatial scalable wavelet transform
US8467460B2 (en) 2006-12-28 2013-06-18 Nippon Telegraph And Telephone Corporation Video processing method and apparatus, video processing program, and storage medium which stores the program
US8953673B2 (en) 2008-02-29 2015-02-10 Microsoft Corporation Scalable video coding and decoding with sample bit depth and chroma high-pass residual layers
US8964854B2 (en) 2008-03-21 2015-02-24 Microsoft Corporation Motion-compensated prediction of inter-layer residuals
US9319729B2 (en) 2006-01-06 2016-04-19 Microsoft Technology Licensing, Llc Resampling and picture resizing operations for multi-resolution video coding and decoding
US9571856B2 (en) 2008-08-25 2017-02-14 Microsoft Technology Licensing, Llc Conversion operations in scalable video encoding and decoding

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1642463A1 (en) * 2003-06-30 2006-04-05 Koninklijke Philips Electronics N.V. Video coding in an overcomplete wavelet domain
US20060153466A1 (en) * 2003-06-30 2006-07-13 Ye Jong C System and method for video processing using overcomplete wavelet coding and circular prediction mapping
WO2005029846A1 (en) * 2003-09-23 2005-03-31 Koninklijke Philips Electronics, N.V. Video de -noising algorithm using inband motion-compensated temporal filtering
KR100643269B1 (en) * 2004-01-13 2006-11-10 삼성전자주식회사 Video/Image coding method enabling Region-of-Interest
FR2867328A1 (en) * 2004-03-02 2005-09-09 Thomson Licensing Sa Image sequence decoding method for e.g. videotelephony field, involves determining motion resolution and filter based on spatial and temporal resolution of sources and decoding rate, or on level of temporal decomposition of images
US20050201468A1 (en) * 2004-03-11 2005-09-15 National Chiao Tung University Method and apparatus for interframe wavelet video coding
TWI255138B (en) * 2005-03-08 2006-05-11 Novatek Microelectronics Corp Method and apparatus for noise reduction of video signals
US8755440B2 (en) 2005-09-27 2014-06-17 Qualcomm Incorporated Interpolation techniques in wavelet transform multimedia coding
KR100791453B1 (en) * 2005-10-07 2008-01-03 성균관대학교산학협력단 Multi-view Video Encoding and Decoding Method and apparatus Using Motion Compensated Temporal Filtering
WO2008079508A1 (en) * 2006-12-22 2008-07-03 Motorola, Inc. Method and system for adaptive coding of a video
EP2160716A1 (en) * 2007-06-08 2010-03-10 Thomson Licensing Method and apparatus for multi-lattice sparsity-based filtering
EP2099176A1 (en) * 2007-12-18 2009-09-09 Nokia Corporation Method and device for adapting a buffer of a terminal and communication system comprising such device
US8619861B2 (en) * 2008-02-26 2013-12-31 Microsoft Corporation Texture sensitive temporal filter based on motion estimation
US20090328093A1 (en) * 2008-06-30 2009-12-31 At&T Intellectual Property I, L.P. Multimedia Content Filtering
US20110149037A1 (en) * 2008-08-26 2011-06-23 Koninklijke Philips Electronics N.V. Method and system for encoding a 3D video signal, encoder for encoding a 3-D video signal, encoded 3D video signal, method and system for decoding a 3D video signal, decoder for decoding a 3D video signal.
FR2954035B1 (en) * 2009-12-11 2012-01-20 Thales Sa METHOD OF ESTIMATING VIDEO QUALITY AT ANY RESOLUTION
CN111083489B (en) 2018-10-22 2024-05-14 北京字节跳动网络技术有限公司 Multiple iteration motion vector refinement
WO2020098643A1 (en) 2018-11-12 2020-05-22 Beijing Bytedance Network Technology Co., Ltd. Simplification of combined inter-intra prediction
CN117319644A (en) 2018-11-20 2023-12-29 北京字节跳动网络技术有限公司 Partial position based difference calculation
WO2020177756A1 (en) 2019-03-06 2020-09-10 Beijing Bytedance Network Technology Co., Ltd. Size dependent inter coding

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5560003A (en) * 1992-12-21 1996-09-24 Iowa State University Research Foundation, Inc. System and hardware module for incremental real time garbage collection and memory management
US6065020A (en) * 1998-05-27 2000-05-16 Microsoft Corporation Dynamic adjustment of garbage collection

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
D.S. TURAGA AND M. VAN DER SCHAAR: "Unconstrained motion compensated temporal filtering", ISO/IEC JTC1/SC29/WG11 MPEG2002/M8388, 6 May 2002 (2002-05-06) - 10 May 2002 (2002-05-10), Fairfax, US, pages 1 - 15, XP002268488 *
HSIANG S T ET AL: "Embedded video coding using invertible motion compensated 3-D subband/wavelet filter bank", SIGNAL PROCESSING. IMAGE COMMUNICATION, ELSEVIER SCIENCE PUBLISHERS, AMSTERDAM, NL, vol. 16, no. 8, May 2001 (2001-05-01), pages 705 - 724, XP004249801, ISSN: 0923-5965 *
LI XIN ET AL: "Efficient motion field representation in the wavelet domain for video compression", INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP'02);ROCHESTER, NY, UNITED STATES SEP 22-25 2002, vol. 3, 2002, IEEE Int Conf Image Process;IEEE International Conference on Image Processing 2002, pages III/257 - III/260, XP002268097 *
PEISONG CHEN AND JOHN W. WOODS: "Comparison of MC-EZBC and H.26L TML 8 on Digital Cinema Test Sequences", ISO/IEC JTC1/SC29/WG11 MPEG2002/M8130, 11 March 2002 (2002-03-11) - 15 March 2002 (2002-03-15), Jeju Island, Korea, pages 1 - 6, XP002268096 *
THOMAS RUSERT AND MATHIAS WIEN: "Exploration Experiments on Spatial and Temporal Scalability in Interframe Wavelet Coding", ISO/IEC JTC1/SC29/WG11 MPEG2002/M8650, 22 July 2002 (2002-07-22) - 26 July 2002 (2002-07-26), Klagenfurt, Austria, pages 1 - 7, XP002268098 *
VAN DER AUWERA G ET AL: "Scalable wavelet video-coding with in-band prediction - The bottom-up overcomplete discrete wavelet transform", PROCEEDINGS 2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING. ICIP 2002. ROCHESTER, NY, SEPT. 22 - 25, 2002, INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, NEW YORK, NY: IEEE, US, vol. 3 OF 3, 22 September 2002 (2002-09-22), pages 725 - 728, XP010607820, ISBN: 0-7803-7622-6 *
Y. ANDREOPOULOS, A. MUNTEANU, G. VAN DER AUWERA, P. SCHELKENS AND JAN CORNELIS: "Wavelet-Based Fully-Scalable Video Coding with In-Band Prediction", 3RD IEEE BENELUX SIGNAL PROCESSING SYMPOSIUM (SPS-2002), 21 March 2002 (2002-03-21) - 22 March 2002 (2002-03-22), Leuven, Belgium, pages S02-1 - S02-4, XP002268099 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008507170A (en) * 2004-07-13 2008-03-06 フランス テレコム エス アー Method and apparatus for encoding video image array
JP2006060792A (en) * 2004-07-13 2006-03-02 Microsoft Corp Embedded base layer codec for 3d sub-band encoding
WO2006109135A2 (en) * 2005-04-11 2006-10-19 Nokia Corporation Method and apparatus for update step in video coding based on motion compensated temporal filtering
WO2006109135A3 (en) * 2005-04-11 2007-01-25 Nokia Corp Method and apparatus for update step in video coding based on motion compensated temporal filtering
CN101199121B (en) * 2005-06-17 2012-03-21 Dts(英属维尔京群岛)有限公司 Scalable compressed audio bit stream and codec using a hierarchical filterbank and multichannel joint coding
WO2007000657A1 (en) * 2005-06-29 2007-01-04 Nokia Corporation Method and apparatus for update step in video coding using motion compensated temporal filtering
US9319729B2 (en) 2006-01-06 2016-04-19 Microsoft Technology Licensing, Llc Resampling and picture resizing operations for multi-resolution video coding and decoding
US8244071B2 (en) 2006-11-27 2012-08-14 Microsoft Corporation Non-dyadic spatial scalable wavelet transform
US8467460B2 (en) 2006-12-28 2013-06-18 Nippon Telegraph And Telephone Corporation Video processing method and apparatus, video processing program, and storage medium which stores the program
US8953673B2 (en) 2008-02-29 2015-02-10 Microsoft Corporation Scalable video coding and decoding with sample bit depth and chroma high-pass residual layers
US8964854B2 (en) 2008-03-21 2015-02-24 Microsoft Corporation Motion-compensated prediction of inter-layer residuals
US9571856B2 (en) 2008-08-25 2017-02-14 Microsoft Technology Licensing, Llc Conversion operations in scalable video encoding and decoding
US10250905B2 (en) 2008-08-25 2019-04-02 Microsoft Technology Licensing, Llc Conversion operations in scalable video encoding and decoding

Also Published As

Publication number Publication date
EP1554887A1 (en) 2005-07-20
US20060008000A1 (en) 2006-01-12
AU2003264804A1 (en) 2004-05-04
KR20050052532A (en) 2005-06-02
JP2006503518A (en) 2006-01-26

Similar Documents

Publication Publication Date Title
US20060008000A1 (en) Fully scalable 3-d overcomplete wavelet video coding using adaptive motion compensated temporal filtering
KR101176650B1 (en) Embedded base layer codec for 3d sub-band coding
US7027512B2 (en) Spatio-temporal hybrid scalable video coding apparatus using subband decomposition and method
KR101183304B1 (en) Spatial scalability in 3d sub-band decoding of sdmctf-encoded video
US8442108B2 (en) Adaptive updates in motion-compensated temporal filtering
US20060088096A1 (en) Video coding method and apparatus
EP1606950B1 (en) Scalable encoding and decoding of interlaced digital video data
US7042946B2 (en) Wavelet based coding using motion compensated filtering based on both single and multiple reference frames
KR101225160B1 (en) Methood and device for encoding a video image sequence into frequency subband coefficients of different spatial resolutions
US20030202599A1 (en) Scalable wavelet based coding using motion compensated temporal filtering based on multiple reference frames
JP2008506328A (en) A scalable video coding method and apparatus using a base layer.
US20060146937A1 (en) Three-dimensional wavelet video coding using motion-compensated temporal filtering on overcomplete wavelet expansions
Ye et al. Fully scalable 3D overcomplete wavelet video coding using adaptive motion-compensated temporal filtering
WO2004032059A1 (en) L-frames with both filtered and unfiltered regions for motion-compensated temporal filtering in wavelet-based coding
Atta et al. Spatio-temporal scalability-based motion-compensated 3-d subband/dct video coding
Wang Fully scalable video coding using redundant-wavelet multihypothesis and motion-compensated temporal filtering
Maestroni et al. In-band adaptive update step based on local content activity
CN1706197A (en) Fully scalable 3-D overcomplete wavelet video coding using adaptive motion compensated temporal filtering
CAI et al. Boundary Artifact Minimization on Best Matching Blocks in Wavelet-Based Video Compression
WO2006080665A1 (en) Video coding method and apparatus
WO2006098586A1 (en) Video encoding/decoding method and apparatus using motion prediction between temporal levels

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2003808830

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2006008000

Country of ref document: US

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 10531195

Country of ref document: US

Ref document number: 1020057006325

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 20038A15199

Country of ref document: CN

Ref document number: 2005501325

Country of ref document: JP

WWP Wipo information: published in national office

Ref document number: 1020057006325

Country of ref document: KR

WWP Wipo information: published in national office

Ref document number: 2003808830

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 10531195

Country of ref document: US

WWW Wipo information: withdrawn in national office

Ref document number: 2003808830

Country of ref document: EP