WO2005081531A1 - Three-dimensional video scalable video encoding method - Google Patents

Three-dimensional video scalable video encoding method Download PDF

Info

Publication number
WO2005081531A1
WO2005081531A1 PCT/IB2005/000104 IB2005000104W WO2005081531A1 WO 2005081531 A1 WO2005081531 A1 WO 2005081531A1 IB 2005000104 W IB2005000104 W IB 2005000104W WO 2005081531 A1 WO2005081531 A1 WO 2005081531A1
Authority
WO
WIPO (PCT)
Prior art keywords
frames
band
wavelet
temporal sub
frequency filtered
Prior art date
Application number
PCT/IB2005/000104
Other languages
French (fr)
Inventor
Ihor Kirenko
Original Assignee
Koninklijke Philips Electronics N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V. filed Critical Koninklijke Philips Electronics N.V.
Publication of WO2005081531A1 publication Critical patent/WO2005081531A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • H04N19/615Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding using motion compensated temporal filtering [MCTF]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/63Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]

Definitions

  • the present invention relates to a method of and a device for three-dimensional wavelet encoding a sequence of frames.
  • This invention may be used, for example, in video compression systems adapted to generate progressively scalable (signal to noise ratio SNR, spatially or temporally) compressed video signals.
  • a conventional method for three-dimensional video scalable video encoding a sequence of frames is described, for example, in "Lifting schemes in scalable video coding", B. Pesquet-Popescu, V. Bottreau, SCI 2001, Orlando, USA.
  • Said method comprises the following steps illustrated in Figure 1.
  • a sequence of frames is divided into groups GOF of 2 N frames, said group having, in the example of Figure 1, 8 frames FI to F8.
  • the encoding method comprises a step of motion estimation ME based on pairs of odd Fo and even Fe input frames within the group of frames, as illustrated by the dotted arrows.
  • Said motion estimation step results in a set MV1 of motion vector fields of a first decomposition level comprising 4 fields in the example of Figure 1.
  • MCTF motion-compensated temporal wavelet-based filtering
  • MCTF for example
  • the temporal filtering MCTF step delivers a temporal sub-band Tl of a first decomposition level comprising filtered frames, which are 4 low-frequency frames L and 4 high-frequency frames H in our example.
  • the motion estimation and filtering steps are repeated on the low-frequency frames L of the temporal sub-band T 1 , that is : motion estimation is done on pairs of odd and even low-frequency frames within the temporal sub-band Tl, resulting in a set MN2 of motion vector fields of a second decomposition level comprising 2 fields in our example; motion-compensated temporal wavelet-based filtering MCTF based on the set MN2 of motion vector fields and on the lifting equations, and resulting in a temporal sub-band T2 of a second decomposition level comprising filtered frames, which are 2 low-frequency frames LL and 2 high-frequency frames LH in the example of Figure 1.
  • Motion estimation and motion compensated temporal filtering are still repeated on the pair of odd and even low-frequency frames of the temporal sub-band T2, resulting in a temporal sub-band T3 of a third and last decomposition level in the case of a group GOF of 8 frames.
  • Said temporal sub-band T3 comprises 1 low-frequency frame LLL and 1 high- frequency frame LLH.
  • a four-stage wavelet spatial filtering step is then applied on the frames LLL and LLH of the temporal sub-band T3 and on the high-frequency frames of the temporal sub-bands Tl and T2, i.e. the 2 filtered frames LH and the 4 filtered frames H, respectively.
  • spatio-temporal sub-bands comprising 4 spatially filtered frames sub-sampled in a horizontal and in a vertical direction.
  • a spatial encoding of the coefficients of the frames of the spatio- temporal sub-bands is then performed, each spatio-temporal sub-band being encoded separately beginning from the low-frequency frame of the spatio-temporal sub-band of the last decomposition level.
  • the motion vector fields are also encoded.
  • an output bit-stream is formed on the basis of the encoded coefficients of the spatio-temporal sub-bands and of the encoded motion vector fields, the bits of said motion vector fields being sent as an overhead.
  • the encoding method according to the prior art has a number of disadvantages.
  • the motion estimation and the motion-compensated temporal wavelet-based filtering steps are implemented on full size frames. Therefore, these steps are computationally expensive and may cause a delay during encoding. Besides, motion vectors of the highest spatial resolution are encoded at each temporal level, which results in a quite high overhead.
  • the encoding method has also a low computational scalability.
  • the present invention also relates to an encoding device implementing such an encoding method. It finally relates to a computer program product comprising program instructions for implementing said encoding method.
  • Figure 1 is a block diagram showing an encoding method in accordance with the prior art
  • Figure 2 is a block diagram illustrating the 3 first steps of an encoding method in accordance with the invention
  • Figure 3 is a block diagram illustrating the following steps of a first embodiment of the encoding method in accordance with the invention
  • - Figure 4 is a block diagram illustrating the following steps of a second embodiment of the encoding method in accordance with the invention.
  • the present invention relates to a three-dimensional 3D wavelet encoding method with motion compensation.
  • Such an encoding method has been demonstrated to be an efficient technique for scalable video encoding applications.
  • Said 3D encoding method uses wavelet transform in both spatial and temporal domains. It is known that the wavelet temporal filtering is computationally expensive and causes a delay during encoding.
  • the present invention disclosure proposes an improvement of conventional 3D scalable wavelet video encoder.
  • the process of temporal wavelet filtering is modified in such a way, that delay caused by the motion-compensated temporal wavelet- based filtering of video frames is reduced.
  • the modification also allows implementation of computationally scalable encoder with parallel processing.
  • Figure 2 is a block diagram showing an encoding method in accordance with the invention.
  • the sequence of frames is divided into groups of 2 N consecutive frames, a group of frames GOF having, for example, 8 frames.
  • a first sub-group of frames GOF1 within the GOF namely the first 4 frames Fie, Flo, F2e and F2o in our example, are processed.
  • Said first processing step comprises the sub-steps of: doing motion estimation ME on pairs of consecutive frames (Fe,Fo) within the group GOF1, resulting in a set of 2 motion vector fields MV11 of a first decomposition level; motion-compensated temporal wavelet-based filtering MCTF, for example Haar filtering, based on the motion vector fields MV11 of the first decomposition level.
  • Said temporal filtering sub-step is based on the use of a reverse lifting scheme adapted to deliver sequentially low-frequency wavelet coefficients L(n) and high-frequency wavelet coefficients H(n).
  • a second sub-group of frames GOF2 namely the last 4 frames F3e, F3o, F4e and F4o in our example, within the GOF are processed as in the first processing step.
  • Said second processing step results in: a set of 2 motion vector fields MN12 of a first decomposition level, - 2 low-frequency frames L2e and L2o in the temporal sub-band Tl based on the motion vector fields MN12, a motion vector field MN22 of a second decomposition level, and 1 low-frequency frame LL in the temporal sub-band T2 based on the motion vector field MN22.
  • the second and third steps i.e. the first and second processing steps
  • Figure 3 is a block diagram illustrating a first embodiment of the encoding method in accordance with the invention.
  • the encoding method comprises a third processing step including the sub-steps: doing motion estimation ME on the pair of consecutive low-frequency filtered frames LLe and LLo within the temporal sub-band T2, which have been computed during the first and second processing steps.
  • the criterion used to determine if an additional level is needed is based on an efficiency of the motion estimation step at a current decomposition level. For example, said criterion is based on a comparison of the number of unconnected pixels defined during motion estimation versus a predetermined threshold.
  • a predetermined threshold In case of a high-intensity motion within a video sequence, an inhomogeneous motion vector fields occur, such that certain pixels or entire areas may not be members of motion vectors. These positions are related to newly uncovered areas, and are referred to as unconnected.
  • unconnected pixels is also addressed to the areas where motion trajectories converge or merge, which for example happens when areas are being covered. In other words, unconnected pixels are pixels for which the motion estimation step does not find unique motion vectors.
  • a video sequence comprises a high-intensity motion
  • the low-frequency filtered frames LL of the temporal sub-band T2 derived from the first GOF1 and second GOF2 group of four frames are different.
  • motion estimation will not be efficient, and the temporal filtering step will not lead to packing of temporal information in the resulted low-frequency filtered frame (a lot of information will be left in high-frequency filtered frame).
  • the motion in the video sequence is slow and regular (i.e. more pixels are uniquely connected by motion vectors and the motion estimation step is thus more efficient)
  • the low-frequency filtered frames of the temporal sub-band T2 are very similar.
  • the encoding method comprises a one level four-stage wavelet spatial filtering step of the low-frequency filtered frame LLL and of the high-frequency filtered frame LLH of the temporal sub-band T3.
  • Said filtering step is based on a wavelet transform such as, for example, the one described in "Image coding using wavelet transform", by M.Antonini, M.Barlaud, P.Mathieu and I.Daubechies, IEEE Trans. Image Processing, vol.l, pp. 205-220, Apr. 1992.
  • Said spatial filtering step is adapted to generate 4 spatial sub-bands of a first decomposition level, corresponding to a spatially filtered low-low frame, a spatially filtered low-high frame, a spatially filtered high-low frame, and a spatially filtered high-high frame.
  • Each spatially filtered frame is sub-sampled by a factor 2 both in a horizontal and in a vertical direction.
  • Said spatial filtering is applied several times in a pyramidal manner up to the coarsest spatial decomposition level, i.e. the smallest spatial resolution needed.
  • the encoding method in accordance with the invention comprises a step of quantizing and entropy coding the wavelet coefficients of the filtered frames of the temporal sub-band T3.
  • This coding step is based on, for example, embedded zero-tree block coding EZBC according to a principle known to a person skilled in the art.
  • the encoding method in accordance with the invention also comprises a step of encoding the motion vector fields based on, for example, lossless differential pulse code modulation DPCM or adaptive arithmetic coding.
  • This temporal filtering step is followed by a four-stage wavelet spatial filtering, as described before, of the high-frequency filtered frames LH of the temporal sub-band T2.
  • the filtering steps are followed by a step of quantizing and entropy coding the wavelet coefficients of said filtered frames.
  • This temporal filtering step is followed by a four-stage wavelet spatial filtering, as described before, of the high-frequency filtered frames H, a step of quantizing and entropy coding the wavelet coefficients of said filtered frames, said encoded data being added to the bit-stream.
  • the available computational resources is determined, for example, based on a number of CPU cycles required to execute a temporal decomposition level in real-time, or on available amount of memory, or on the cash size. For example, if the encoding of three temporal decomposition levels in real-time requires a 400 MHz CPU, and if the encoder only has a 300 MHz CPU, then only two temporal decomposition levels will be implemented in order to encode a video sequence in real-time.
  • FIG. 4 is a block diagram illustrating the following steps of a second embodiment of the encoding method in accordance with the invention. Said embodiment corresponds to the fact that an additional level of temporal decomposition, i.e. the third decomposition level in our example, will not contribute to the visual quality or that there are enough computational resources. According to this embodiment, the temporal sub-band T3 of the third decomposition level is not computed. Instead, the second equation of the reverse lifting scheme is applied to obtain the high-frequency filtered LH frames of the temporal sub-band T2.
  • This temporal filtering step is followed by a four-stage wavelet spatial filtering, as described before, of the low-frequency filtered frames LL and of the high-frequency filtered frames LH of the temporal sub-band T2.
  • the filtering steps are followed by a step of quantizing and entropy coding the wavelet coefficients of said filtered frames.
  • the motion vector fields are also encoded and the bit-stream is finally formed on the basis of the encoded coefficients and vectors. If computational resources are still available, the second equation of the reverse lifting scheme is applied to obtain the high-frequency filtered frames H of the sub-band Tl of the first decomposition level.
  • This temporal filtering step is followed by a four-stage wavelet spatial filtering, as described before, of the high-frequency filtered frames H, a step of quantizing and entropy coding the wavelet coefficients of said filtered frames, said encoded data being added to the bit-stream.
  • these different configurations are the following: - only the third temporal sub-band T3 (i.e. the frames LLL and LLH) is generated, spatially filtered and encoded. In this case, only the video sequence with the lowest temporal resolution will be reconstructed; only the third temporal sub-band T3 and high-frequency frames of the second temporal sub-band T2 (i.e.
  • the frames LLL-LLH-LH-LH) are generated, spatially filtered and encoded;
  • the third temporal sub-band T3 and high-frequency frames of the second T2 and first Tl temporal sub-bands i.e. the frames LLL-LLH-LH-LH-H-H-H-H
  • the second temporal sub-band T2 i.e. 2 frames LL and 2 frames LH
  • the second temporal sub-band T2 and high-frequency frames of the first temporal sub- band Tl i.e.
  • the encoding method in accordance with the invention can be implemented by means of items of hardware or software, or both. Said hardware or software items can be implemented in several manners, such as by means of wired electronic circuits or by means of an integrated circuit that is suitable programmed, respectively.
  • the integrated circuit can be contained in a video encoder.
  • the integrated circuit comprises a set of instructions.
  • said set of instructions contained, for example, in an encoder memory may cause the integrated circuit to carry out the different steps of the motion estimation method.
  • the set of instructions may be loaded into the programming memory by reading a data carrier such as, for example, a disk.
  • a service provider can also make the set of instructions available via a communication network such as, for example, the Internet.
  • the proposed encoding method is not restricted to the size of the group of frames GOF or to the size of the sub-groups of frames. Moreover, it allows an easy and natural adaptation of the GOF size, depending on the efficiency of motion estimation between each pair of consecutive frames. For example, if during motion estimation the number of unconnected pixels exceeds a predetermined threshold, the temporal filtering MCTF is not implemented.
  • the low-frequency frames of the temporal sub-band of the last decomposition level is spatially filtered and quantized before generation of high- frequency frames of the temporal sub-bands of lower decomposition levels.
  • frame information of a lower decomposition level takes into account the quantization errors of previously encoded frame of a higher decomposition level.
  • the same quantized low- frequency temporal information is used at encoder and decoder providing the bit-rate synchronization of encoder and decoder.
  • the proposed invention also allows a low encoding delay because the sub-band frames, which have to be encoded and decoded first are generated at the beginning of the encoding process. This simplifies the bit-budget allocation.
  • the corresponding encoder and decoder can be easily synchronized in time. The order of generation of high-frequency frames of a temporal sub-band replicates the order of video frames reconstruction at the decoding side.
  • the encoder encodes exactly the same number of high-frequency frames of the temporal sub-bands (i.e. provides the same frame rate) as the decoder is capable of decoding.
  • This feature is very useful for a low-delay one-to-one video communication (e.g. video phone).

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The present invention relates to A three-dimensional wavelet encoding method for encoding a sequence of frames comprising the steps of dividing the sequence of frames into groups of 2N consecutive input frames (GOF), where N is an integer, doing motion estimation on pairs of even and odd input frames of the group of frames, resulting in a set of motion vector fields, motion-compensated temporal wavelet-based filtering using a first equation L[n] = Fo[n]+U(Fe[n]) of a reverse lifting scheme, where U is an update function, Fo[n] and Fe[n] are values of pixels of odd and even input frames of the group of frames, respectively, pixels corresponding to Fo[n] and Fe[n] being taken along an appropriate motion vector of the set of motion vector fields, said filtering step resulting in low-frequency filtered frames (L) of a first level temporal sub-band (T1).

Description

Three-dimensional video scalable video encoding method
FIELD OF THE INVENTION The present invention relates to a method of and a device for three-dimensional wavelet encoding a sequence of frames. This invention may be used, for example, in video compression systems adapted to generate progressively scalable (signal to noise ratio SNR, spatially or temporally) compressed video signals.
BACKGROUND OF THE INVENTION A conventional method for three-dimensional video scalable video encoding a sequence of frames is described, for example, in "Lifting schemes in scalable video coding", B. Pesquet-Popescu, V. Bottreau, SCI 2001, Orlando, USA. Said method comprises the following steps illustrated in Figure 1. In a first step, a sequence of frames is divided into groups GOF of 2N frames, said group having, in the example of Figure 1, 8 frames FI to F8. Then, the encoding method comprises a step of motion estimation ME based on pairs of odd Fo and even Fe input frames within the group of frames, as illustrated by the dotted arrows. Said motion estimation step results in a set MV1 of motion vector fields of a first decomposition level comprising 4 fields in the example of Figure 1. The motion estimation step is followed by a step of motion-compensated temporal wavelet-based filtering MCTF, for example Haar filtering, based on the set MN1 of motion vector fields and on a lifting scheme according to which the high-frequency wavelet coefficients H[n] and the low-frequency wavelet coefficients L[n] are computed sequentially as follows: H[n] = Fe[n] - P(Fo[n]), L[n] = Fo[n] + U(H[n]), where P is a prediction function, U is an update function, Fo[n] and Fe[n] are values of pixels of odd and even input frames, the pixels corresponding to Fo[n] and Fe[n] being taken along a corresponding motion vector of the set MN1 of motion vector fields. The temporal filtering MCTF step delivers a temporal sub-band Tl of a first decomposition level comprising filtered frames, which are 4 low-frequency frames L and 4 high-frequency frames H in our example. The motion estimation and filtering steps are repeated on the low-frequency frames L of the temporal sub-band T 1 , that is : motion estimation is done on pairs of odd and even low-frequency frames within the temporal sub-band Tl, resulting in a set MN2 of motion vector fields of a second decomposition level comprising 2 fields in our example; motion-compensated temporal wavelet-based filtering MCTF based on the set MN2 of motion vector fields and on the lifting equations, and resulting in a temporal sub-band T2 of a second decomposition level comprising filtered frames, which are 2 low-frequency frames LL and 2 high-frequency frames LH in the example of Figure 1. Motion estimation and motion compensated temporal filtering are still repeated on the pair of odd and even low-frequency frames of the temporal sub-band T2, resulting in a temporal sub-band T3 of a third and last decomposition level in the case of a group GOF of 8 frames. Said temporal sub-band T3 comprises 1 low-frequency frame LLL and 1 high- frequency frame LLH. A four-stage wavelet spatial filtering step is then applied on the frames LLL and LLH of the temporal sub-band T3 and on the high-frequency frames of the temporal sub-bands Tl and T2, i.e. the 2 filtered frames LH and the 4 filtered frames H, respectively. It results in spatio-temporal sub-bands comprising 4 spatially filtered frames sub-sampled in a horizontal and in a vertical direction. At a next step, a spatial encoding of the coefficients of the frames of the spatio- temporal sub-bands is then performed, each spatio-temporal sub-band being encoded separately beginning from the low-frequency frame of the spatio-temporal sub-band of the last decomposition level. The motion vector fields are also encoded. Finally, an output bit-stream is formed on the basis of the encoded coefficients of the spatio-temporal sub-bands and of the encoded motion vector fields, the bits of said motion vector fields being sent as an overhead. However, the encoding method according to the prior art has a number of disadvantages. First of all, the motion estimation and the motion-compensated temporal wavelet-based filtering steps are implemented on full size frames. Therefore, these steps are computationally expensive and may cause a delay during encoding. Besides, motion vectors of the highest spatial resolution are encoded at each temporal level, which results in a quite high overhead. The encoding method has also a low computational scalability.
SUMMARY OF THE INVENTION It is an object of the invention to propose a three-dimensional wavelet encoding method, which has a better computational scalability than the one of the prior art. To this end, the encoding method in accordance with the invention is characterized in that it comprises the steps of: dividing the sequence of frames into groups of 2N consecutive input frames, where N is an integer, doing motion estimation on pairs of even and odd input frames of the group of frames, resulting in a set of motion vector fields, motion-compensated temporal wavelet-based filtering using a first equation L[n] = Fo[n] + U(Fe[n]) of a reverse lifting scheme, where U is an update function, Fo[n] and Fe[n] are values of pixels of odd and even input frames of the group of frames, respectively, pixels corresponding to Fo[n] and Fe[n] being taken along an appropriate motion vector of the set of motion vector fields, said filtering step resulting in low-frequency filtered frames of a first level temporal sub-band. As a consequence, only the low-frequency filtered frames of successive temporal decomposition levels are computed without the need of computing the high-frequency filtered frames. This greatly simplifies the three-dimensional wavelet encoding method with motion compensation in accordance with the invention and authorizes different levels of scalability, as it will be explained in more detail hereinafter. The present invention also relates to an encoding device implementing such an encoding method. It finally relates to a computer program product comprising program instructions for implementing said encoding method. These and other aspects of the invention will be apparent from and will be elucidated with reference to the embodiments described hereinafter.
BRIEF DESCRIPTION OF THE DRAWINGS The present invention will now be described in more detail, by way of example, with reference to the accompanying drawings, wherein: Figure 1 is a block diagram showing an encoding method in accordance with the prior art, Figure 2 is a block diagram illustrating the 3 first steps of an encoding method in accordance with the invention, Figure 3 is a block diagram illustrating the following steps of a first embodiment of the encoding method in accordance with the invention, and - Figure 4 is a block diagram illustrating the following steps of a second embodiment of the encoding method in accordance with the invention.
DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a three-dimensional 3D wavelet encoding method with motion compensation. Such an encoding method has been demonstrated to be an efficient technique for scalable video encoding applications. Said 3D encoding method uses wavelet transform in both spatial and temporal domains. It is known that the wavelet temporal filtering is computationally expensive and causes a delay during encoding. The present invention disclosure proposes an improvement of conventional 3D scalable wavelet video encoder. The process of temporal wavelet filtering is modified in such a way, that delay caused by the motion-compensated temporal wavelet- based filtering of video frames is reduced. The modification also allows implementation of computationally scalable encoder with parallel processing. Figure 2 is a block diagram showing an encoding method in accordance with the invention. In a first step, the sequence of frames is divided into groups of 2N consecutive frames, a group of frames GOF having, for example, 8 frames. In a second step, a first sub-group of frames GOF1 within the GOF, namely the first 4 frames Fie, Flo, F2e and F2o in our example, are processed. Said first processing step comprises the sub-steps of: doing motion estimation ME on pairs of consecutive frames (Fe,Fo) within the group GOF1, resulting in a set of 2 motion vector fields MV11 of a first decomposition level; motion-compensated temporal wavelet-based filtering MCTF, for example Haar filtering, based on the motion vector fields MV11 of the first decomposition level. Said temporal filtering sub-step is based on the use of a reverse lifting scheme adapted to deliver sequentially low-frequency wavelet coefficients L(n) and high-frequency wavelet coefficients H(n). The reverse lifting scheme comprises the 2 following equations: L[n] = Fo[n] + U(Fe[n]) H[n] = Fe[n] - P(L[n]) where Fo[n] and Fe[n] are values of pixels of odd and even input frames, respectively, the pixels corresponding to Fo[n] and Fe[n] being taken along a corresponding motion vector of the motion vector fields MN 11. For example, the prediction and update functions of the reverse lifting scheme are based on the (4,4) Deslauriers-Dubuc wavelet transform such as: L[n] = Fo[n] + (-Fe[n-2] + 9Fe[n-l] + 9Fe[n] - Fe[n+l])/32, H[n] = Fe[n] - (-L[n-1] + 9L[n] + 9L[n+l] - L[n+2])/16. It will be apparent to a person skilled in the art that other prediction and update functions can be used without departing from the scope of the invention. In a first stage, only the first equation L[n] = Fo[n] + U(Fe[n]) of the reverse lifting scheme is applied, resulting in 2 low-frequency filtered frames Lie and Llo of a temporal sub-band Tl of a first decomposition level. Said first processing step further comprises the sub-steps of: - doing motion estimation ME on pairs of consecutive low-frequency filtered frames within the temporal sub-band Tl, resulting in a motion vector field MN21 of a second decomposition level; motion-compensated temporal wavelet-based filtering MCTF based on the motion vector field MN21 and on the first equation of the reverse lifting scheme: LL[n] = Lo[n] + U(Le[n]), where Lo[n] and Le[n] are values of pixels of odd and even frames of the temporal sub-band Tl, respectively, the pixels corresponding to Lo[n] and Le[n] being taken along a corresponding motion vector of the motion vector field MN21. This results in a low-frequency filtered frame LL of a temporal sub-band T2 of a second decomposition level. In a third step, a second sub-group of frames GOF2, namely the last 4 frames F3e, F3o, F4e and F4o in our example, within the GOF are processed as in the first processing step. Said second processing step results in: a set of 2 motion vector fields MN12 of a first decomposition level, - 2 low-frequency frames L2e and L2o in the temporal sub-band Tl based on the motion vector fields MN12, a motion vector field MN22 of a second decomposition level, and 1 low-frequency frame LL in the temporal sub-band T2 based on the motion vector field MN22. The second and third steps (i.e. the first and second processing steps) can be processed sequentially or in parallel.
Figure 3 is a block diagram illustrating a first embodiment of the encoding method in accordance with the invention. According to this embodiment, the encoding method comprises a third processing step including the sub-steps: doing motion estimation ME on the pair of consecutive low-frequency filtered frames LLe and LLo within the temporal sub-band T2, which have been computed during the first and second processing steps. This results in a motion vector field MN3 of a third decomposition level; motion-compensated temporal wavelet-based filtering MCTF based on the motion vector field MN3 and on the first equation of the reverse lifting scheme: LLL[n] = LLo[n] + U(Le[n]), where LLo[n] and LLe[n] are values of pixels of odd and even frames of the temporal sub-band T2, respectively, the pixels corresponding to LLo[n] and LLe[n] being taken along a corresponding motion vector of the motion vector field MN3. This results in a low-frequency filtered frame LLL of a temporal sub-band T3 of a third and last decomposition level in the case of a group GOF of 8 frames. The second equation of the reverse lifting scheme is then applied: LLH[n] = LLe[n] - P(LLL[n]), resulting in a high-frequency filtered frame LLH of the temporal sub-band T3. This additional level of temporal decomposition, i.e. third level in our example, will not contribute necessarily to the visual quality if the low-frequency filtered frames LL from the current (i.e. second) decomposition level have a low temporal correlation. The criterion used to determine if an additional level is needed is based on an efficiency of the motion estimation step at a current decomposition level. For example, said criterion is based on a comparison of the number of unconnected pixels defined during motion estimation versus a predetermined threshold. In case of a high-intensity motion within a video sequence, an inhomogeneous motion vector fields occur, such that certain pixels or entire areas may not be members of motion vectors. These positions are related to newly uncovered areas, and are referred to as unconnected. The notion of "unconnected pixels" is also addressed to the areas where motion trajectories converge or merge, which for example happens when areas are being covered. In other words, unconnected pixels are pixels for which the motion estimation step does not find unique motion vectors. If a video sequence comprises a high-intensity motion, then the low-frequency filtered frames LL of the temporal sub-band T2 derived from the first GOF1 and second GOF2 group of four frames (these low-frequency filtered frames consist in fact of temporally averaged information) are different. Thus, motion estimation will not be efficient, and the temporal filtering step will not lead to packing of temporal information in the resulted low-frequency filtered frame (a lot of information will be left in high-frequency filtered frame). In contrast, if the motion in the video sequence is slow and regular (i.e. more pixels are uniquely connected by motion vectors and the motion estimation step is thus more efficient), then the low-frequency filtered frames of the temporal sub-band T2 are very similar. Thus, the additional level of temporal decomposition may be efficiently implemented. Then the encoding method comprises a one level four-stage wavelet spatial filtering step of the low-frequency filtered frame LLL and of the high-frequency filtered frame LLH of the temporal sub-band T3. Said filtering step is based on a wavelet transform such as, for example, the one described in "Image coding using wavelet transform", by M.Antonini, M.Barlaud, P.Mathieu and I.Daubechies, IEEE Trans. Image Processing, vol.l, pp. 205-220, Apr. 1992. Said spatial filtering step is adapted to generate 4 spatial sub-bands of a first decomposition level, corresponding to a spatially filtered low-low frame, a spatially filtered low-high frame, a spatially filtered high-low frame, and a spatially filtered high-high frame. Each spatially filtered frame is sub-sampled by a factor 2 both in a horizontal and in a vertical direction. Said spatial filtering is applied several times in a pyramidal manner up to the coarsest spatial decomposition level, i.e. the smallest spatial resolution needed. Once the spatial filtering step has been performed, the encoding method in accordance with the invention comprises a step of quantizing and entropy coding the wavelet coefficients of the filtered frames of the temporal sub-band T3. This coding step is based on, for example, embedded zero-tree block coding EZBC according to a principle known to a person skilled in the art. The encoding method in accordance with the invention also comprises a step of encoding the motion vector fields based on, for example, lossless differential pulse code modulation DPCM or adaptive arithmetic coding. It finally comprises a step of forming the final bit-stream on the basis of the encoded coefficient of the spatio-temporal sub-bands and of the encoded motion vector fields, the bits of said motion vector fields being sent as overhead. If not all temporal levels have been encoded yet, and if there are enough computational resources, the second equation of the reverse lifting scheme is applied to obtain the high-frequency filtered frames of the lower level temporal sub-band, that is, in our example: LH[n] = Le[n] - P(LL[n]), to obtain the high-frequency filtered frames LH of the temporal sub-band T2 of the second decomposition level. This temporal filtering step is followed by a four-stage wavelet spatial filtering, as described before, of the high-frequency filtered frames LH of the temporal sub-band T2. The filtering steps are followed by a step of quantizing and entropy coding the wavelet coefficients of said filtered frames. The bit-stream is finally completed by the encoded data. If computational resources are still available, the second equation of the reverse lifting scheme is applied to obtain the high-frequency filtered frames H of the sub-band Tl of the first decomposition level as follows: H[n] = Fe[n] - P(L [n]). This temporal filtering step is followed by a four-stage wavelet spatial filtering, as described before, of the high-frequency filtered frames H, a step of quantizing and entropy coding the wavelet coefficients of said filtered frames, said encoded data being added to the bit-stream. The available computational resources is determined, for example, based on a number of CPU cycles required to execute a temporal decomposition level in real-time, or on available amount of memory, or on the cash size. For example, if the encoding of three temporal decomposition levels in real-time requires a 400 MHz CPU, and if the encoder only has a 300 MHz CPU, then only two temporal decomposition levels will be implemented in order to encode a video sequence in real-time. Inversely, if only a 200 MHz CPU is required for the encoding of two temporal decomposition levels whereas the encoder has a 300 MHz CPU, a third temporal decomposition level will be encoded. Figure 4 is a block diagram illustrating the following steps of a second embodiment of the encoding method in accordance with the invention. Said embodiment corresponds to the fact that an additional level of temporal decomposition, i.e. the third decomposition level in our example, will not contribute to the visual quality or that there are enough computational resources. According to this embodiment, the temporal sub-band T3 of the third decomposition level is not computed. Instead, the second equation of the reverse lifting scheme is applied to obtain the high-frequency filtered LH frames of the temporal sub-band T2. This temporal filtering step is followed by a four-stage wavelet spatial filtering, as described before, of the low-frequency filtered frames LL and of the high-frequency filtered frames LH of the temporal sub-band T2. The filtering steps are followed by a step of quantizing and entropy coding the wavelet coefficients of said filtered frames. The motion vector fields are also encoded and the bit-stream is finally formed on the basis of the encoded coefficients and vectors. If computational resources are still available, the second equation of the reverse lifting scheme is applied to obtain the high-frequency filtered frames H of the sub-band Tl of the first decomposition level. This temporal filtering step is followed by a four-stage wavelet spatial filtering, as described before, of the high-frequency filtered frames H, a step of quantizing and entropy coding the wavelet coefficients of said filtered frames, said encoded data being added to the bit-stream.
Thus, different encoding configurations are possible depending on the computational resources and/or the required visual quality. Based on the example of a group of frames GOF comprising 8 frames, these different configurations are the following: - only the third temporal sub-band T3 (i.e. the frames LLL and LLH) is generated, spatially filtered and encoded. In this case, only the video sequence with the lowest temporal resolution will be reconstructed; only the third temporal sub-band T3 and high-frequency frames of the second temporal sub-band T2 (i.e. the frames LLL-LLH-LH-LH) are generated, spatially filtered and encoded; the third temporal sub-band T3 and high-frequency frames of the second T2 and first Tl temporal sub-bands (i.e. the frames LLL-LLH-LH-LH-H-H-H-H) are generated, spatially filtered and encoded, allowing 3 levels of temporal scalability; only the second temporal sub-band T2 (i.e. 2 frames LL and 2 frames LH) is generated, spatially filtered and encoded; the second temporal sub-band T2 and high-frequency frames of the first temporal sub- band Tl (i.e. the frames LL-LL-LH-LH-H-H-H-H) are generated, spatially filtered and encoded, allowing 2 levels of temporal scalability. The encoding method in accordance with the invention can be implemented by means of items of hardware or software, or both. Said hardware or software items can be implemented in several manners, such as by means of wired electronic circuits or by means of an integrated circuit that is suitable programmed, respectively. The integrated circuit can be contained in a video encoder. The integrated circuit comprises a set of instructions. Thus, said set of instructions contained, for example, in an encoder memory may cause the integrated circuit to carry out the different steps of the motion estimation method. The set of instructions may be loaded into the programming memory by reading a data carrier such as, for example, a disk. A service provider can also make the set of instructions available via a communication network such as, for example, the Internet.
It will be apparent to a person skilled in art that the proposed encoding method is not restricted to the size of the group of frames GOF or to the size of the sub-groups of frames. Moreover, it allows an easy and natural adaptation of the GOF size, depending on the efficiency of motion estimation between each pair of consecutive frames. For example, if during motion estimation the number of unconnected pixels exceeds a predetermined threshold, the temporal filtering MCTF is not implemented. According to the invention, the low-frequency frames of the temporal sub-band of the last decomposition level is spatially filtered and quantized before generation of high- frequency frames of the temporal sub-bands of lower decomposition levels. It means that frame information of a lower decomposition level takes into account the quantization errors of previously encoded frame of a higher decomposition level. Thus, the same quantized low- frequency temporal information is used at encoder and decoder providing the bit-rate synchronization of encoder and decoder. The proposed invention also allows a low encoding delay because the sub-band frames, which have to be encoded and decoded first are generated at the beginning of the encoding process. This simplifies the bit-budget allocation. Finally, the corresponding encoder and decoder can be easily synchronized in time. The order of generation of high-frequency frames of a temporal sub-band replicates the order of video frames reconstruction at the decoding side. This feature allows an optimal on-the-fly implementation of temporal scalability: the encoder encodes exactly the same number of high-frequency frames of the temporal sub-bands (i.e. provides the same frame rate) as the decoder is capable of decoding. This feature is very useful for a low-delay one-to-one video communication (e.g. video phone). Any reference sign in the following claims should not be construed as limiting the claim. It will be obvious that the use of the verb "to comprise" and its conjugations do not exclude the presence of any other steps or elements besides those defined in any claim. The word "a" or "an" preceding an element or step does not exclude the presence of a plurality of such elements or steps.

Claims

1 A three-dimensional wavelet encoding method for encoding a sequence of frames comprising the steps of: - dividing the sequence of frames into groups of 2N consecutive input frames (GOF), where N is an integer, doing motion estimation on pairs of even and odd input frames of the group of frames, resulting in a set of motion vector fields, motion-compensated temporal wavelet-based filtering using a first equation L[n] = Fo[n]+U(Fe[n]) of a reverse lifting scheme, where U is an update function, Fo[n] and Fe[n] are values of pixels of odd and even input frames of the group of frames, respectively, pixels corresponding to Fo[n] and Fe[n] being taken along an appropriate motion vector of the set of motion vector fields, said filtering step resulting in low-frequency filtered frames (L) of a first level temporal sub-band (Tl).
2 A method as claimed in claim 1, wherein the motion estimation step and the motion- compensated temporal wavelet-based filtering step are iterated on pairs of even and odd low- frequency filtered frames (Lo-Le,LLo-LLe) until only one low-frequency filtered frame (LLL) of a last level temporal sub-band (T3) is generated, and wherein the motion- compensated temporal wavelet-based filtering step is adapted to compute a high-frequency filtered frame (LLH) of said last level temporal sub-band using a second equation of the reverse lifting scheme applied to the low-frequency filtered frame (LLL) of the last level temporal sub-band (T3) and to a even low-frequency filtered frame (LLe) of an immediately lower level temporal sub-band (T2), said method further comprising the steps of: - four-stage wavelet spatial filtering the low-frequency filtered frame and the high- frequency filtered frames of the last level temporal sub-band, resulting in wavelet coefficients, quantizing and entropy coding the wavelet coefficients to form an output bit-stream.
3 A method as claimed in claim 2, wherein the motion-compensated temporal wavelet- based filtering step is adapted to compute high-frequency filtered frames (LH,H) of a temporal sub-band (T2,T1) having a level lower than the last level temporal sub-band (T3), wherein the four-stage wavelet spatial filtering step is adapted to compute wavelet coefficients on the basis of said high-frequency filtered frames of the lower level temporal sub-band (T2,T1), and wherein the quantizing and entropy coding step is adapted to quantize and encode the wavelet coefficients.
4 A method as claimed in claim 1 , wherein the motion estimation step and the motion- compensated temporal wavelet-based filtering step are iterated on pairs of even and odd low- frequency filtered frames until at least two low-frequency filtered frames (LL) of a subsequent level temporal sub-band (T2) is generated, and wherein the motion-compensated temporal wavelet-based filtering is adapted to compute high-frequency filtered frames (LH) corresponding to said low-frequency filtered frames using a second equation of the reverse lifting scheme applied to low-frequency filtered frames (LL) of the subsequent level temporal sub-band (T2) and to even low-frequency filtered frames (Le) of an immediately lower level temporal sub-band (Tl), said method further comprising the steps of: four-stage wavelet spatial filtering the low-frequency and the high-frequency filtered frames of the subsequent level temporal sub-band, resulting in wavelet coefficients, - quantizing and entropy coding the wavelet coefficients to form an output bit-stream.
5 A method as claimed in claim 4, wherein the motion-compensated temporal wavelet- based filtering step is adapted to compute high-frequency filtered frames (H) of temporal sub-band (Tl) of level lower than the subsequent level (T2), wherein the four-stage wavelet spatial filtering step is adapted to compute wavelet coefficients on the basis of said high- frequency filtered frames of the lower level temporal sub-band (Tl), and wherein the quantizing and entropy coding step is adapted to quantize and encode the wavelet coefficients.
6 A method as claimed in claim 4, wherein the subsequent level temporal sub-band is the last level temporal sub-band if a criterion based on a comparison of a number of unconnected pixels defined during motion estimation with a predetermined threshold is fulfilled.
7 A method as claimed in claim 4, wherein the subsequent level temporal sub-band is the last level temporal sub-band if less than a predetermined amount of computational resources are available.
8 A device for three-dimensional wavelet encoding a sequence of frames comprising: means dividing the sequence of frames into groups of 2N consecutive input frames, where N is an integer, means for doing motion estimation on pairs of even and odd input frames within the group of frames, resulting in a set of motion vector fields, - means for motion-compensated temporal wavelet-based filtering using a first equation
L[n] = Fo[n]+U(Fe[n]) of a reverse lifting scheme, where U is an update function, Fo[n] and Fe[n] are values of pixels of odd and even input frames of the group of frames, respectively, the pixels corresponding to Fo[n] and Fe[n] being taken along an appropriate motion vector of the set of motion vector fields, so as to deliver low-frequency filtered frames (L) of a first level temporal sub-band (Tl).
9 A computer program product comprising program instructions for implementing, when said program is executed by a processor, a method as claimed in claim 1.
PCT/IB2005/000104 2004-01-20 2005-01-11 Three-dimensional video scalable video encoding method WO2005081531A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP04300031 2004-01-20
EP04300031.4 2004-01-20

Publications (1)

Publication Number Publication Date
WO2005081531A1 true WO2005081531A1 (en) 2005-09-01

Family

ID=34878338

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2005/000104 WO2005081531A1 (en) 2004-01-20 2005-01-11 Three-dimensional video scalable video encoding method

Country Status (1)

Country Link
WO (1) WO2005081531A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002085026A1 (en) * 2001-04-10 2002-10-24 Koninklijke Philips Electronics N.V. Method of encoding a sequence of frames

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002085026A1 (en) * 2001-04-10 2002-10-24 Koninklijke Philips Electronics N.V. Method of encoding a sequence of frames

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
ANTONINI M: "IMAGE CODING USING WAVELET TRANSFORM", IEEE TRANSACTIONS ON IMAGE PROCESSING, IEEE INC. NEW YORK, US, vol. 1, no. 2, 1 April 1992 (1992-04-01), pages 205 - 220, XP000367547, ISSN: 1057-7149 *
DAUBECHIES I ET AL: "FACTORING WAVELET TRANSFORMS INTO LIFTING STEPS", JOURNAL OF FOURIER ANALYSIS AND APPLICATIONS, CRC PRESS, BOCA RATON, FL, US, vol. 4, no. 3, 1998, pages 247 - 269, XP001051011, ISSN: 1069-5869 *
OHM J-R: "Complexity and Delay Analysis of MCTF Interframe Wavelet Structures", ISO/IEC JTC1/SC29/WG11 MPEG02/M8520, July 2002 (2002-07-01), pages 1 - 16, XP002282535 *
PESQUET-POPESCU B ET AL: "LIFTING SCHEMES IN SCALABLE VIDEO CODING", WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, vol. CONF. XII, 2001, pages 250 - 254, XP008045534 *
PESQUET-POPESCU B ET AL: "THREE-DIMENSIONAL LIFTING SCHEMES FOR MOTION COMPENSATED VIDEO COMPRESSION", INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, vol. CONF. 3, 2001, pages 1793 - 1796, XP002172582 *

Similar Documents

Publication Publication Date Title
US10958944B2 (en) Video coding with embedded motion
US7023923B2 (en) Motion compensated temporal filtering based on multiple reference frames for wavelet based coding
KR20020026254A (en) Color video encoding and decoding method
US20030202599A1 (en) Scalable wavelet based coding using motion compensated temporal filtering based on multiple reference frames
US20030202597A1 (en) Wavelet based coding using motion compensated filtering based on both single and multiple reference frames
JP2005516494A (en) Drift-free video encoding and decoding method and corresponding apparatus
KR20020064791A (en) Video encoding method based on a wavelet decomposition
Ye et al. Fully scalable 3D overcomplete wavelet video coding using adaptive motion-compensated temporal filtering
KR20040069209A (en) Video encoding method
JP2006509410A (en) Video encoding method and apparatus
US20050265612A1 (en) 3D wavelet video coding and decoding method and corresponding device
US9628819B2 (en) Method for coding a video image taking into account the part relating to a component of a movement vector
US20070053435A1 (en) 3D video scalable video encoding method
Mayer et al. Bit plane quantization for scalable video coding
WO2005081531A1 (en) Three-dimensional video scalable video encoding method
WO2004004355A1 (en) Subband video decoding method and device
EP1552478A1 (en) L-frames with both filtered and unfiltered regions for motion-compensated temporal filtering in wavelet-based coding
Muzaffar et al. Linked significant tree wavelet-based image compression
Bai et al. Generating Side Information Using HVSBM for Wavelet-Based Distributed Video Coding
Mayer Motion-compensated predictive subband coding of temporal lowpass frames from a 3D wavelet video coding scheme

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

122 Ep: pct application non-entry in european phase