WO2006043750A1 - Methode et appareil de codage video - Google Patents

Methode et appareil de codage video Download PDF

Info

Publication number
WO2006043750A1
WO2006043750A1 PCT/KR2005/002910 KR2005002910W WO2006043750A1 WO 2006043750 A1 WO2006043750 A1 WO 2006043750A1 KR 2005002910 W KR2005002910 W KR 2005002910W WO 2006043750 A1 WO2006043750 A1 WO 2006043750A1
Authority
WO
WIPO (PCT)
Prior art keywords
dct
module
coefficient
wavelet
mode
Prior art date
Application number
PCT/KR2005/002910
Other languages
English (en)
Inventor
Woo-Jin Han
Bae-Keun Lee
Original Assignee
Samsung Electronics Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020040092821A external-priority patent/KR100664932B1/ko
Application filed by Samsung Electronics Co., Ltd. filed Critical Samsung Electronics Co., Ltd.
Publication of WO2006043750A1 publication Critical patent/WO2006043750A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/36Scalability techniques involving formatting the layers as a function of picture distortion after decoding, e.g. signal-to-noise [SNR] scalability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/12Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/154Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/34Scalability techniques involving progressive bit-plane based encoding of the enhancement layer, e.g. fine granular scalability [FGS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/63Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding

Definitions

  • Apparatuses and methods consistent with the present invention relate to video/ image compression, and more particularly, to video coding that can improve compression efficiency or image quality by selecting a spatial transform method suitable for characteristics of an incoming video/image.
  • compression can largely be classified into lossy/lossless compression, according to whether source data is lost, intraframe/interframe compression, according to whether individual frames are compressed independently, and symmetric/asymmetric compression, according to whether time required for compression is the same as time required for recovery.
  • data compression is defined as real-time compression when the compression/ recovery time delay does not exceed 50 ms, and as scalable compression when frames have different resolutions.
  • lossless compression is usually used for text or medical data.
  • lossy compression is usually used for multimedia data.
  • Data redundancy is typically defined as: spatial redundancy where the same color or object is repeated in an image, temporal redundancy where there is little change between adjacent frames in a moving image or the same sound is repeated in audio, or mental/ visual redundancy, which takes into account peoples' inability to perceive high frequencies.
  • DCT discrete cosine transform
  • wavelet transform wavelet transform
  • the DCT is widely used for image processing methods such as the JPEG, MPEG, and H.264 standards. These standards use DCT block division, which involves dividing an image into DCT blocks each having a predetermined pixel size, e.g., 4x4, 8 x8, and 16x16, and performing the DCT on each block independently, followed by quantization and encoding.
  • DCT block division involves dividing an image into DCT blocks each having a predetermined pixel size, e.g., 4x4, 8 x8, and 16x16, and performing the DCT on each block independently, followed by quantization and encoding.
  • the degree of complexity of the algorithm becomes very high while considerably reducing block effects of a decoded image.
  • Wavelet coding is a widely used image coding technique, but its algorithm is rather complex compared to the DCT algorithm.
  • the wavelet transform is not as effective as the DCT.
  • the wavelet transform produces a scalable image with respect to resolution, and takes into account in ⁇ formation on pixels adjacent to a pertinent pixel in addition to the pertinent pixel during the wavelet transform. Therefore, the wavelet transform is more effective than the DCT for an image having high spatial correlation, that is, a smooth image.
  • Both the DCT and the wavelet transform are lossless compression techniques, and original data can be perfectly reconstructed through an inverse transform operation. However, actual data compression may be performed by discarding less important in ⁇ formation in cooperation with a quantizing operation. Disclosure of Invention
  • the wavelet transform is advantageous in that it can take advantage of the spatial correlation between pixels because the information on adjacent pixels can be taken into consideration during the transform.
  • the wavelet transform is suitable for a smooth image having high spatial correlation while the DCT is suitable for an image having low spatial correlation and many block artifacts.
  • the present invention provides a method and apparatus for performing DCT after performing wavelet transform for spatial transform during a video compression.
  • the present invention also provides a method and apparatus for performing video compression by selectively performing both DCT and wavelet transform or performing only DCT. Furthermore, the present invention presents criteria for selecting a spatial transform method suitable for characteristics of an incoming video/image. [14] The present invention also provides a method and apparatus for supporting Signal- to-Noise Ratio (SNR) scalability by applying Fine Granular Scalability (FGS) to the result obtained after performing wavelet transform and DCT.
  • SNR Signal- to-Noise Ratio
  • FGS Fine Granular Scalability
  • a video encoder including a temporal transform module removing temporal redundancy in an input frame to generate a residual frame, a wavelet transform module performing wavelet transform on the residual frame to generate a wavelet coefficient, a DCT module performing DCT on the wavelet coefficient for each DCT block to create a DCT co ⁇ efficient, and a quantization module applying quantization to the DCT coefficient.
  • a horizontal length and a vertical length of the lowest subband image in the wavelet transform are an integer multiple of the size of the DCT block.
  • an image encoder including a wavelet transform module performing wavelet transform on an input image to create a wavelet coefficient, a DCT module performing DCT on the wavelet coefficient for each DCT block to create a DCT coefficient, and a quantization module applying quantization to the DCT coefficient.
  • a video encoder including a temporal transform module removing temporal redundancy in an input frame to generate a residual frame, a wavelet transform module performing wavelet transform on the residual frame to generate a wavelet coefficient, a DCT module performing DCT on the wavelet coefficient for each DCT block to create a DCT coefficient, a quantization module applying quantization to the DCT coefficient according to a predetermined criterion and creating a quantization coefficient for a base layer, and a Fine Granular Scalability (FGS) module decomposing a difference between the quantization coefficient for the base layer and the DCT coefficient into a plurality of bit planes.
  • FGS Fine Granular Scalability
  • a video encoder including a temporal transform module removing temporal redundancy in an input frame to generate a residual frame, a mode selection module selecting one of a first mode in which only DCT is performed during spatial transform and a second mode in which wavelet transform is followed by DCT for spatial transform according to the spatial correlation of the residual frame, a wavelet transform module performing wavelet transform on the residual frame to generate a wavelet coefficient when the second mode is selected, a DCT module performing DCT on the wavelet coefficient when the second mode is selected and on the residual frame for each DCT block when the first mode is selected to thereby create a DCT coefficient, and a quantization module applying quantization to the DCT coefficient.
  • a video encoder including a temporal transform module removing temporal redundancy in an input frame to generate a residual frame, a mode selection module selecting one of a first mode in which only DCT is performed during spatial transform and a second mode in which wavelet transform is followed by DCT for spatial transform according to the spatial correlation of the residual frame, a wavelet transform module performing wavelet transform on the residual frame to generate a wavelet coefficient when the second mode is selected, a DCT module performing DCT on the wavelet coefficient when the second mode is selected and on the residual frame for each DCT block when the first mode is selected to thereby create a DCT coefficient, a quantization module applying quantization to the DCT coefficient according to a predetermined criterion and creating a quantization coefficient for a base layer, and an FGS module de ⁇ composing a difference between the quantization coefficient for the base layer and the DCT coefficient into a plurality of bit planes.
  • a video encoder including a temporal transform module removing temporal redundancy in an input frame to generate a residual frame, a wavelet transform module performing wavelet transform on the residual frame to generate a wavelet coefficient, a DCT module performing DCT on the residual frame for each DCT block to generate a first DCT coefficient while performing DCT on the wavelet coefficient for each DCT block to generate a second DCT coefficient, a quantization module applying quantization to the first and second DCT coefficients to generate first and second quantization co ⁇ efficients, respectively, and a mode selection module reconstructing first and second residual frames from the first and second quantization coefficients, comparing the quality of the first residual frame with that of the second residual frame, and selecting a mode that offers a better quality residual frame.
  • a video encoder including a temporal transform module removing temporal redundancy in an input frame to generate a residual frame, a wavelet transform module performing wavelet transform on the residual frame to generate a wavelet coefficient, a DCT module performing DCT on the residual frame for each DCT block to generate a first DCT coefficient while performing DCT on the wavelet coefficient for each DCT block to generate a second DCT coefficient, a quantization module applying quantization to the first and second DCT coefficients to generate first and second quantization co ⁇ efficients for a base layer, respectively, according to a predetermined criterion, a mode selection module reconstructing first and second residual frames from the first and second quantization coefficients, comparing the quality of the first residual frame with that of the second residual frame, and selecting a mode that offers a better quality residual frame, and an FGS module decomposing a difference between either the first or the second quantization coefficient corresponding to the selected mode and either the first or the second DCT coefficient corresponding
  • an image decoder including an inverse quantization module inversely quantizing texture in ⁇ formation contained in an input bitstream, an inverse DCT module performing inverse DCT on the inversely quantized value for each DCT block, and an inverse wavelet transform module performing inverse wavelet transform on the inversely DCT transformed value.
  • a video decoder including an inverse quantization module inversely quantizing texture in ⁇ formation contained in an input bitstream, an inverse DCT module performing inverse DCT on the inversely quantized value for each DCT block, an inverse wavelet transform module performing inverse wavelet transform on the inversely DCT transformed value, and an inverse temporal transform module reconstructing a video sequence using the inversely wavelet transformed value and motion information in the bitstream.
  • a video decoder including an inverse quantization module inversely quantizing texture in ⁇ formation contained in an input bitstream, an inverse DCT module performing inverse DCT on the inversely quantized value for each DCT block and sending the inversely DCT transformed value to an inverse temporal transform module when mode in ⁇ formation contained in the bitstream represents a first mode and to an inverse wavelet transform module when the mode information represents a second mode, an inverse wavelet transform module performing inverse wavelet transform on the inversely DCT transformed value, and an inverse temporal transform module reconstructing a video sequence using the inversely DCT transformed value and the motion information in the bitstream when the mode information represents the first mode while reconstructing a video sequence using the inversely wavelet transformed value and the motion in ⁇ formation when the mode information represents the second mode.
  • FlG. 1 shows the configuration of a video encoder according to a first exemplary embodiment of the present invention
  • FlG. 2 illustrates a process of decomposing an input image or frame into subbands at two levels by wavelet transform
  • FlG. 3 is a detailed diagram illustrating the decomposing process shown in FlG. 2;
  • FlG. 4 is a diagram for explaining a process of performing DCT on a wavelet- transformed frame
  • FlG. 5 shows the configuration of an image encoder for encoding an incoming still image
  • FlG. 6 shows the configuration of a video encoder supporting FGS after performing wavelet transform and DCT according to a second exemplary embodiment of the present invention
  • FlG. 7 shows the detailed configuration of the FGS module shown in FlG. 6;
  • FlG. 8 shows an example of difference coefficients of a DCT block
  • FlG. 9 is a block diagram of a video encoder according to a third exemplary embodiment of the present invention.
  • FlG. 10 is a block diagram of a video encoder according to a fourth exemplary embodiment of the present invention.
  • FlG. 11 shows an example of the mode selection module shown in FlG. 10;
  • FlG. 12 is a block diagram of a video encoder according to a fifth exemplary embodiment of the present invention.
  • FlG. 13 is a block diagram of a video decoder according to the present invention.
  • FlG. 14 is a block diagram of a system for performing an encoding or decoding process according to the present invention.
  • FlG. 1 shows the configuration of a video encoder 100 according to a first exemplary embodiment of the present invention.
  • the video encoder 100 includes a temporal transform module 110, a wavelet transform module 120, a DCT module 130, a quantization module 140, and a bitstream generation module 150.
  • the wavelet transform is performed to remove spatial redundancies, followed by the DCT to remove additional spatial redundancies.
  • the temporal transform module 110 performs motion estimation to determine motion vectors, generates a motion- compensated frame using the motion vectors and a reference frame, and subtracts the motion-compensated frame from a current fame to create a residual frame.
  • Various algorithms such as fixed-size block matching and hierarchical variable size block matching (HVSBM) are available for motion estimation.
  • HVSBM hierarchical variable size block matching
  • MCTF Motion Compensated Temporal Filtering supporting temporal scalability may be used as the temporal transform.
  • the wavelet transform module 120 performs wavelet transform to decompose the residual frame generated by the temporal transform module 110 into low-pass and high-pass subbands and to determine wavelet coefficients for pixels in the respective sub-bands.
  • FIG. 2 illustrates a process of decomposing an input image or frame into subbands at two levels by wavelet transform.
  • 'LL' represents a low-pass subband that is low frequency in both horizontal and vertical directions while 'LH', 'HL' and 'HH' represent high-pass subbands in horizontal, vertical, and both horizontal and vertical directions, respectively.
  • the low- pass subband LL can be further decomposed iteratively.
  • the numbers within the parentheses denote a level of wavelet transform.
  • FIG. 3 is a detailed diagram illustrating the decomposing process shown in FIG. 2.
  • the wavelet transform module 120 includes at least a low-pass filter 121, a high-pass filter 122, and a downsampler 123.
  • Three types of wavelet filters i.e., a Haar filter, a 5/3 filter, and a 9/7 filter, are typically used for wavelet transform.
  • the Haar filter performs low-pass filtering and high-pass filtering using only one adjacent pixel.
  • the 5/3 filter performs low-pass filtering using five adjacent pixels and high-pass filtering using three adjacent pixels.
  • the 9/7 filter performs low-pass filtering based on nine adjacent pixels and high-pass filtering based on seven adjacent pixels.
  • Video compression characteristics and video quality may vary depending on the type of a wavelet filter used.
  • An input image 10 is transformed into a low-pass image L 11 having half the horizontal (or vertical) width of the input image 10 after it passes through the low-pass filter 121 and the downsampler 123.
  • the input image 10 is transformed into a high- pass image H 12 that is half the horizontal (or vertical) width of the input image 10 after it passes through the high-pass filter 122 and the downsampler 123.
  • the low-pass image L 11 and the high-pass image H 12 are transformed into four subband images LL (1) 13, LH (1) 14, HL (1) 15, and HH (1) 16 after they J p r asses through the low-pass filter 121, the high-pass filter 122, and the downsampler 123.
  • the low-pass image LL 13 is decomposed in the same way into the four subband images LL , LH , HL , and HH shown in FIG.
  • a horizontal length and a vertical length of a low-pass image at the lowest level subband must be integer multiples of a DCT block size ( 1 B'). If the image width and height are not integer multiples of B, compression efficiency or video quality may be significantly degraded since regions of different subbands can be included within the same DCT block.
  • 'size' means the number of pixels.
  • the horizontal length is equal to the vertical length.
  • the horizontal length and vertical length of an input image are M and N, i.e., the input frame has M x N pixels, and the number of subband de ⁇ composition levels is k, the size of the lowest level subband is M/2 k x N/2 k .
  • M/2 k and N/2 must be integer multiples of B, as expressed by Equation (1):
  • the horizontal length M and the vertical length N are integer multiples of the DCT block size B multiplied by 2 k .
  • FIG. 4 is a diagram for explaining a process of performing the DCT on a wavelet- transformed frame 20.
  • a DCT block does not overlap a subband boundary.
  • a predecoder or transcoder may extract four DCT blocks from the upper left quadrant of a frame 30 partitioned into DCT blocks.
  • a decoder receives the extracted data and performs an inverse DCT and an inverse wavelet transform to reconstruct a video at a reduced resolution.
  • the DCT module 130 partitions a wavelet-transformed frame (i.e., wavelet coefficients) into DCT blocks having a predetermined size, and performs the DCT on each DCT block to create a DCT coefficient.
  • a wavelet-transformed frame i.e., wavelet coefficients
  • the size of a DCT block may be one of divisors of 8. Since it is assumed in the present exemplary embodiment that the DCT block size is 4, the DCT module 130 partitions the wavelet-transformed frame 20 into DCT blocks of 4x4 pixels and performs the DCT on each of the DCT blocks.
  • the quantization module 140 performs quantization of DCT coefficients created by the DCT module 130. Quantization is the process of converting real- valued DCT co ⁇ efficients into discrete values by dividing the range of coefficients into a limited number of intervals and mapping the real- valued coefficients into quantization indices.
  • the bitstream generation module 150 losslessly encodes or entropy encodes the co ⁇ efficients quantized by the quantization module 140 and the motion information provided by the temporal transform module 110 into an output bitstream.
  • Various coding schemes such as Huffman Coding, Arithmetic Coding, and Variable Length Coding may be employed for lossless coding.
  • FIG. 5 shows the configuration of an image encoder 200 that can encode a still image.
  • the image encoder 200 includes elements that perform the same functions as their counterparts in the video encoder 100 of FlG. 1, except for the temporal transform module 110. Instead of a residual frame obtained from a temporal residual, an original still image is input to the wavelet transform module 120.
  • FlG. 6 shows the configuration of a video encoder 300 for providing Fine Granular
  • FGS frequency division multiple access
  • SNR Signal-to-Noise Ratio
  • FGS is a technique to encode a video sequence into a base layer and an enhancement layer, and it is useful in performing video streaming services in an environment in which the transmission bandwidth cannot be known in advance.
  • a video sequence is divided into a base layer and an en ⁇ hancement layer.
  • a streaming server Upon receiving a request for transmission of video data at a particular bit-rate, a streaming server sends the base layer and a truncated version of the enhancement layer. The amount of truncation is chosen to match the available transmission bit-rate, thereby maximizing the quality of a decoded sequence at the given bit-rate.
  • the video encoder 300 shown in FlG. 6 further includes an FGS module 160 between a quantization module 140 and a bitstream generation module 150.
  • the quantization module 140, the FGS module 160, and the bitstream generation module 150 will be described in the following.
  • DCT coefficients created after passing through a wavelet transform module 120 and a DCT module 130 are fed into the quantization module 140 and the FGS module 160.
  • the quantization module 140 quantizes the input DCT coefficients according to predetermined criteria and creates quantization coefficients for a base layer. The criteria may be determined based on the minimum bit-rate available in a bitstream transmission environment.
  • the quantization coefficients for the base layer are fed into the FGS module 160 and the bitstream generation module 150.
  • the FGS module 160 calculates the difference between each of the quantization coefficients of the base layer (received from the quantization module 140) and the cor ⁇ responding DCT coefficient received from the DCT module 130, and decomposes the difference into a plurality of bit planes.
  • a combination of the bit planes can be represented as an 'enhancement layer', which is then provided to the bitstream generation module 150.
  • FlG. 7 shows a detailed configuration of the FGS module 160 of HG. 6.
  • the FGS module 160 includes an inverse quantization module 161, a differentiator 162, and a bit plane decomposition module 163.
  • the inverse quantization module 161 dequantizes the input quantization coefficients of the base layer.
  • the differentiator 162 calculates a difference, that is, the difference between each of the input DCT coefficients and the corresponding dequantized coefficient.
  • the bit plane decomposition module 163 decomposes this difference coefficient into a plurality of bit planes, and creates an enhancement layer.
  • An example ar ⁇ rangement of difference coefficients is shown in FlG. 8, in which an 8x8 DCT block is shown and omitted difference coefficients are all represented by 0.
  • the difference co ⁇ efficients may be arranged in a zig-zag scan order: +13, -11, 0, 0, +17, 0, 0, 0, -3, 0, 0, ..., and they may be decomposed into five bit planes as shown in Table 1 below.
  • the enhancement layer represented by bit planes is arranged sequentially in a descending order (highest-order bit plane 4 to lowest-order bit plane 0) and is provided to the bitstream generation module 150.
  • a transcoder or predecoder truncates the enhancement layer from the lowest- order bit plane. If all bit planes except the bit plane 4 and 3 are truncated, a decoder will receive values: +8, -8, 0, 0, 16, 0, 0, 0, 0, ....
  • the exemplary embodiment shown in FlG. 6 may also be applied to an image encoder. Unlike the video encoder 300, the image encoder does not include the temporal transform module 110, which generates motion information. Thus, an input still image is fed directly into the wavelet transform module 120.
  • the bitstream generation module 150 losslessly encodes or entropy encodes the quantization coefficients of the base layer which are provided by the quantization module 140, the bit planes of the enhancement layer which are provided by the FGS module 160, and the motion information provided by the temporal transform module 110 into an output bitstream.
  • FlG. 9 is a block diagram of a video encoder 400 according to a third exemplary embodiment of the present invention.
  • the video encoder 400 analyzes the charac ⁇ teristics of a residual frame subjected to temporal transform, selects a more ad ⁇ vantageous mode (from two modes), and performs encoding according to the selected mode. In the first mode, the video encoder 400 performs only the DCT (for spatial transform) and skips the wavelet transform. In the second mode, the video encoder 400 performs the DCT after performing the wavelet transform.
  • the video encoder 400 further includes a mode selection module 170 between the temporal transform module 110 and the wavelet transform module 120, wherein the mode selection module 170 determines whether the residual frame will pass through the wavelet transform module 120.
  • the mode selection module 170 selects either the first or second mode according to the spatial correlation of the residual frame.
  • the DCT is suitable to transform an image having low spatial correlation and many block artifacts while the wavelet transform is suitable to transform a smooth image having high spatial correlation.
  • criteria are needed for selecting a mode, that is, for determining whether a residual frame fed into the mode selection module 170 is an image having high spatial correlation.
  • an image having high spatial correlation pixels with a specific level of brightness are highly distributed.
  • an image having low spatial correlation consists of pixels with various levels of brightness that are evenly distributed and have similar characteristics to random noise. It can be estimated that a histogram of an image consisting of random noise (the y-axis being pixel count and the x-axis being brightness) has a Gaussian distribution while that of an image having high spatial correlation does not conform to a Gaussian distribution because pixels with a specific level of brightness are highly distributed.
  • a mode can be selected based on whether the difference between the distribution of the histogram of the input residual frame and the corresponding Gaussian distribution exceeds a predetermined threshold. If the difference exceeds the threshold, the second mode is selected because the input residual frame is determined to be highly spatially correlated. If the difference does not exceed the threshold, the residual frame has low spatial correlation, and the first mode is selected.
  • a sum of differences between frequencies of each variable may be used as the difference between the current distribution and the corresponding Gaussian distribution.
  • the mean m and standard deviation ⁇ of the current dis ⁇ tribution are calculated and a Gaussian distribution with the mean m and the standard deviation ⁇ is produced.
  • Equation (2) the sum of differences between the frequency f i of each variable in the current distribution and the frequency
  • (f ) of the variable in the Gaussian distribution are calculated and divided by the sum g i of frequencies in the current distribution for normalization.
  • a mode can be selected by determining whether the resultant value exceeds a predetermined threshold c.
  • the above-mentioned criteria may be applied to a residual frame as well as an original video sequence before they are subjected to the temporal transform.
  • the video encoder 400 of FIG. 9 includes an FGS module 160 that is used to support SNR scalability, the FGS module 160 may not be required.
  • the quantization module 140 quantizes DCT coefficients created by a DCT module 130 according to the first or second mode, and the bitstream generation module 150 entropy encodes these coefficients into a bitstream.
  • the exemplary embodiment shown in FIG. 9 may also be applied to an image encoder. Unlike the video encoder 400, the image encoder does not include the temporal transform module 110 that generates motion information. Thus, an input still image is fed directly into the mode selection module 170.
  • FlG. 10 is a block diagram of a video encoder 500 according to a third exemplary embodiment of the present invention. Unlike the video encoder 400 of FlG. 9, the quantization module 140 is followed by the mode selection module 150. Mode de ⁇ termination criteria are also different from those described with reference to FlG. 9.
  • DCT module 130 according to the first mode, and a second DCT coefficient obtained after the residual frame passes through the wavelet transform module 120 and the DCT module 130 according to the second mode are fed into the quantization module 140.
  • the quantization module 140 quantizes the input first and second DCT coefficients according to a predetermined criterion to create first and second quantization co ⁇ efficients of a base layer.
  • the criterion may be determined based on the minimum bit- rate available in a bitstream transmission environment. The same criterion is applied to the first and second DCT coefficients.
  • the quantization coefficients for the base layer are input to the mode selection module 180.
  • the mode selection module 180 reconstructs the first and second residual frames from the first and second quantization coefficients, compares the quality of either the first or the second residual frame with the residual frame provided by the temporal transform module 110, and selects a mode that offers a better quality residual frame.
  • FlG. 11 shows an example of the mode selection module 180 shown in FlG. 10.
  • the mode selection module 180 includes an inverse quantization module 181, an inverse DCT module 182, an inverse wavelet transform module 183, and a quality comparison module 184.
  • the inverse quantization module 181 applies inverse quantization to the first and second quantization coefficients received from the quantization module 140.
  • the inverse quantization is the process of reconstructing values from corresponding quantization indices created during a quantization process that uses a quantization table.
  • the inverse DCT module 182 performs inverse DCT on the inversely quantized values produced by the inverse quantization module 181, and reconstructs a first residual frame and sends it to the quality comparison module 184 in the first mode while providing the inversely DCT transformed result to the inverse wavelet transform module 183.
  • the inverse wavelet transform module 183 performs inverse wavelet transform on the inversely DCT transformed result received from the inverse DCT module 182, and reconstructs a second residual frame for transmission to the quality comparison module 184.
  • the inverse wavelet transform is a process of reconstructing an image in a spatial domain by performing the inverse wavelet transform shown in FlG. 2.
  • the quality comparison module 184 compares the quality of either the first or second residual frame with the original residual frame provided by the temporal transform module 110, and selects a mode that offers a better quality residual frame. To compare the video quality, the sum of differences of each of the first residual frames and the original residual frame is compared with the sum of differences of each of the second residual frames and the original residual frame, and the mode that offers a smaller sum of differences is determined to offer better video quality.
  • the quality comparison may also be made by comparing the Peak Signal-to-Noise Ratio (PSNR) of either the first or second residual frame with that of the original residual frame.
  • PSNR Peak Signal-to-Noise Ratio
  • this method also uses the sum of differences between the PSNR of either the first or second residual frame and that of the original residual frame for video quality comparison, like in the former method using the sum of differences between residual frames.
  • the video quality comparison may be made by comparing images reconstructed by performing inverse temporal transform on the residual frames. However, it may be more effective to perform the comparison on the residual frames because the temporal transform is performed in both the first and second modes.
  • the FGS module 160 computes the difference between a DCT coefficient created according to a mode selected by the mode selection module 180 and selected quantization coefficients, and decomposes the difference into a plurality of bit planes to create an enhancement layer.
  • the FGS module 160 calculates the difference between a first DCT coefficient and a first quantization co ⁇ efficient.
  • the FGS module 160 calculates the difference between a second DCT coefficient and a second quantization coefficient.
  • the created enhancement layer is then sent to the bitstream generation module 150. Because the detailed configuration of the FGS module 160 is the same as that of its counterpart shown in FlG. 7, description thereof will not be given.
  • the bitstream generation module 150 receives a quantization coefficient (a first quantization coefficient for the first mode or a second coefficient for the second mode) from the quantization module 140 according to information about a mode selected by the mode selection module 180, and losslessly encodes or entropy encodes the received quantization coefficient, the bit planes provided by the FGS module 160, and the motion information provided by the temporal transform module 110 into an output bitstream.
  • a quantization coefficient a first quantization coefficient for the first mode or a second coefficient for the second mode
  • HG. 10 shows that the FGS module 160 is used to support SNR scalability
  • the FGS module 160 may be omitted (see FlG. 12).
  • a quantization module 140 quantizes a DCT coefficient created by the DCT module 130 according to the first or second mode, and sends the result to a mode selection module 180.
  • the mode selection module 180 selects a mode according to the determination criteria described above and sends information about the selected mode to the bitstream generation module 150.
  • the bitstream generation module 150 entropy-encodes the quantized result in the selected mode.
  • the exemplary embodiment shown in FIG. 10 may also be applied to an image encoder. Unlike the video encoder 500, the image encoder does not include the temporal transform module 110 that generates motion information. Thus, an input still image is fed directly into the wavelet transform module 120, the DCT module 130, and the mode selection module 180.
  • FIG. 13 is a block diagram of a video decoder 600 according to the present invention.
  • the video decoder includes a bitstream parsing module 610, an inverse quantization module 620, an inverse DCT module 630, an inverse wavelet transform module 640, and an inverse temporal transform module 650.
  • the bitstream parsing module 610 performs the inverse of entropy encoding by parsing an input bitstream and separately extracting motion information (motion vector, reference frame number, and others), texture information, and mode in ⁇ formation.
  • the inverse quantization module 620 performs inverse quantization on the texture information received from the bitstream parsing module 610.
  • the inverse quantization is the process of reconstructing values from corresponding quantization indices created during a quantization process using a quantization table.
  • the quantization table may be received from the encoder or it may be predetermined by the encoder and the decoder.
  • the inverse DCT module 630 performs inverse DCT on the inversely quantized value obtained by the inverse quantization module 620 for each DCT block, and sends the inversely DCT transformed value to the inverse temporal transform module 650 when the mode information represents the first mode, or to the inverse wavelet transform module 640 when the mode information represents the second mode.
  • the inverse wavelet transform module 640 performs an inverse wavelet transform on the inversely DCT transformed result received from the inverse DCT module 630.
  • the horizontal length and the vertical length of the lowest subband image in the inverse wavelet transform must be an integer multiple of the size of the DCT block.
  • the inverse temporal transform module 650 reconstructs a video sequence from the inversely transformed result or the inversely wavelet transformed result according to the mode information.
  • motion compensation is performed using the motion information received from the bitstream parsing module 610 to create a motion-compensated frame, and the motion- compensated frame is added to the frame received from the inverse wavelet transform module 640.
  • FlG. 13 shows that the inverse DCT module 630 receives the mode information, when wavelet transform and DCT are sequentially performed regardless of a mode, as shown in FlG. 1, the video sequence is reconstructed from the input bitstream that sequentially passes through the modules 610 through 650.
  • an image decoder may be used when the input bitstream is an image bitstream.
  • an image encoder does not include the inverse temporal transform module 650 that generates the motion information.
  • the inverse wavelet transform module 640 outputs a reconstructed image.
  • FlG. 14 is a block diagram of a system for performing an encoding or decoding process according to the present invention.
  • the system may represent a television, a set-top box, a desktop or laptop computer, a personal digital assistant (PDA), a video/ image storage device such as a video cassette recorder (VCR), a digital video recorder DVR, a TiVO device, and others, as well as portions or combinations of these and other devices.
  • the system includes one or more video/image sources 810, one or more input/output devices 820, a display 830, a processor 840, and a memory 850.
  • the video/image source(s) 810 may represent, e.g., a television receiver, a VCR or another video/image storage device.
  • the source(s) 810 may alternatively represent one or more network connections for receiving video from a server or servers over, e.g., a global computer communications network such as the Internet, a wide area network, a metropolitan area network, a local area network, a terrestrial broadcast system, a cable network, a satellite network, a wireless network, or a telephone network, as well as portions or combinations of these and other types of networks.
  • the input/output devices 820, the processor 840 and the memory 850 may communicate over a communication medium 860.
  • the communication medium 860 may represent, e.g., a communication bus, a communication network, one or more internal connections of a circuit, a circuit card or other device, as well as portions and combinations of these and other communication media.
  • Input video data from the source(s) 810 is processed in accordance with one or more software programs stored in the memory 850 and executed by the processor 840 in order to generate output video/ images supplied to the display device 830.
  • the software program stored in the memory 850 includes a scalable wavelet-based codec implementing the method of the present invention.
  • the codec may be stored in the memory 850, read from a memory medium such as a CD-ROM or floppy disk, or downloaded from a predetermined server through a variety of networks.
  • hardware circuitry may be used in place of, or in combination with, software instructions to implement the invention.
  • compression efficiency or video/image quality can be improved by selectively performing a spatial transform method suitable for an incoming video/image.
  • the present invention also provides a video/image coding method that can support spatial scalability through wavelet transform while providing SNR scalability through Fine Granular Scalability (FGS) .
  • FGS Fine Granular Scalability

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Discrete Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

L'invention concerne une méthode et un appareil de codage vidéo pour améliorer l'efficacité de compression ou la qualité de vidéo/image par la sélection d'une méthode de transformée spatiale conçue pour des caractéristiques de vidéo/image entrante lors de la compression de vidéo/image. L'appareil de codage vidéo comprend un module de transformée temporel pour supprimer la redondance temporelle d'une trame d'entrée pour générer une trame résiduelle, un module de transformée d'ondelette pour effectuer une transformée d'ondelette sur la trame résiduelle afin de générer un coefficient d'ondelette, un module de transformée en cosinus discrète (DCT) pour effectuer une DCT sur le coefficient d'ondelette de chaque bloc DCT pour créer un coefficient DCT, et un module de quantification pour quantifier le coefficient DCT.
PCT/KR2005/002910 2004-10-21 2005-09-02 Methode et appareil de codage video WO2006043750A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US62033004P 2004-10-21 2004-10-21
US60/620,330 2004-10-21
KR10-2004-0092821 2004-11-13
KR1020040092821A KR100664932B1 (ko) 2004-10-21 2004-11-13 비디오 코딩 방법 및 장치

Publications (1)

Publication Number Publication Date
WO2006043750A1 true WO2006043750A1 (fr) 2006-04-27

Family

ID=36203151

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2005/002910 WO2006043750A1 (fr) 2004-10-21 2005-09-02 Methode et appareil de codage video

Country Status (1)

Country Link
WO (1) WO2006043750A1 (fr)

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ANDREOPOULOS Y. ET AL: "Spatio-temporal-snr scalable wavelet coding with motion compensated dct base-laver architectures", PROCEEDINGS 2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING. ICIP-2003., vol. 3, 14 September 2003 (2003-09-14) - 17 September 2003 (2003-09-17), pages 795 - 798, XP010669923 *
KONDO H. ET AL: "Wavelet image compression using sub-block DCT", NETWORKS, 2001. PEOCEEDINGS. NINTH IEEE INTERNATIONAL CONFERENCE, 10 October 2001 (2001-10-10) - 12 October 2001 (2001-10-12), pages 327 - 330, XP010565544 *
PIZURICA A. ET AL: "Combined wavelet domain and temporal video denoising", PROCEEDINGS. IEEE CONFERENCE ON ADVANCED VIDEO AN SIGNAL BASED SURVEILLANCE, 2003, 21 July 2003 (2003-07-21) - 22 July 2003 (2003-07-22), pages 334 - 341, XP010648403 *
PO-CHIN HU ET AL: "A wavelet to DCT progressive image transcoder", IMAGE PROCESSING, 2000. PROCEEDINGS. 2003 INTERNATIONAL CONFERENCE, vol. 2, 10 September 2000 (2000-09-10) - 13 September 2000 (2000-09-13), pages 968 - 971, XP002182169 *

Similar Documents

Publication Publication Date Title
US20060088222A1 (en) Video coding method and apparatus
US8031776B2 (en) Method and apparatus for predecoding and decoding bitstream including base layer
US6898324B2 (en) Color encoding and decoding method
US8929436B2 (en) Method and apparatus for video coding, predecoding, and video decoding for video streaming service, and image filtering method
US7042946B2 (en) Wavelet based coding using motion compensated filtering based on both single and multiple reference frames
US6931068B2 (en) Three-dimensional wavelet-based scalable video compression
US7023923B2 (en) Motion compensated temporal filtering based on multiple reference frames for wavelet based coding
US20050163224A1 (en) Device and method for playing back scalable video streams
US20050152611A1 (en) Video/image coding method and system enabling region-of-interest
US20030202599A1 (en) Scalable wavelet based coding using motion compensated temporal filtering based on multiple reference frames
US20050157794A1 (en) Scalable video encoding method and apparatus supporting closed-loop optimization
WO2005074277A1 (fr) Procede et dispositif pour transmettre des flux de bits video echelonnables
US20060013311A1 (en) Video decoding method using smoothing filter and video decoder therefor
EP1504608A2 (fr) Filtrage temporel a compensation de mouvement reposant sur l'utilisation de plusieurs images de reference pour le codage en ondelettes
CN1689045A (zh) 用于基于小波的编码中的运动补偿的时间滤波的既有被滤波区域又有未滤波区域的l帧
WO2006043750A1 (fr) Methode et appareil de codage video
WO2006080665A1 (fr) Procede et appareil de codage video
Pang et al. Wavelet-based Region-of-Interest Video Coding

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 05781162

Country of ref document: EP

Kind code of ref document: A1