EP1817911A1 - Procede et appareil de codage et de decodage video multicouche - Google Patents

Procede et appareil de codage et de decodage video multicouche

Info

Publication number
EP1817911A1
EP1817911A1 EP05820697A EP05820697A EP1817911A1 EP 1817911 A1 EP1817911 A1 EP 1817911A1 EP 05820697 A EP05820697 A EP 05820697A EP 05820697 A EP05820697 A EP 05820697A EP 1817911 A1 EP1817911 A1 EP 1817911A1
Authority
EP
European Patent Office
Prior art keywords
intra
block
prediction
image
residual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP05820697A
Other languages
German (de)
English (en)
Other versions
EP1817911A4 (fr
Inventor
Woo-Jin 108-703 Jugong 2-danji APT HAN
Sang-Chang 103-1503 Raemian 1-cha APT CHA
Ho-Jin Ha
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020050006804A external-priority patent/KR100679031B1/ko
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Publication of EP1817911A1 publication Critical patent/EP1817911A1/fr
Publication of EP1817911A4 publication Critical patent/EP1817911A4/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/33Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the spatial domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/11Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock

Definitions

  • Apparatuses and methods consistent with the present invention relate to a video compression method, and more particularly, to a prediction method for efficiently eliminating redundancy within a video frame, and a video compression method and an apparatus using the prediction method.
  • multimedia data requires a storage media that have a large capacity and a wide bandwidth for transmission since the amount of multimedia data is usually large. Accordingly, a compression coding method is requisite for transmitting multimedia data including text, video and audio.
  • a basic principle of data compression is removing data redundancy.
  • Data can be compressed by removing spatial redundancy in which the same color or object is repeated in an image, temporal redundancy in which there is little change between adjacent frames in a moving image or the same sound is repeated in audio, or mental visual redundancy which takes into account human eyesight and its limited perception of high frequency variation.
  • AVC advanced compression efficiency
  • H.264 one of schemes designed to improve compression efficiency, uses directional intra-prediction to remove spatial similarity within a frame.
  • the directional intra-prediction involves predicting values of a current sub-block by copying pixels in a predetermined direction using pixels above and to the left of this sub-block and encoding only a difference between the current sub-block and the predicted value.
  • a predicted block for a current block is generated based on a previously coded block and a difference between the current block and the predicted block is finally encoded.
  • a predicted block is generated for each 4x4 or 16x16 macroblock.
  • For each 4x4 luma block there exist 9 prediction modes.
  • For each 16x16 block 4 prediction modes are available.
  • a video encoder compliant with H.264 selects a prediction mode of each block that minimizes a difference between a current block and a predicted block among the available prediction modes.
  • H. 264 uses 9 prediction modes including 8 directional prediction modes 0, 1, and 3 through 8 plus a DC prediction mode 2 using the average of 8 neighboring pixels as shown in FIG. 1.
  • FIG. 2 shows an example of labeling of prediction samples A through M for explaining the 9 prediction modes.
  • previously decoded samples A through M are used to form a predicted block (region including a through p). If samples E, F, G, and H are not available, sample D will be copied to their locations to virtually form the samples E, F, G, and H.
  • pixels of a predicted block are formed by extrapolation from upper samples A, B, C, and D, and from left samples I, J, K, and L, respectively.
  • mode 2 all pixels of a predicted block are predicted by a mean value of upper and left samples A, B, C, D, I, J, K, and L.
  • pixels of a predicted block are formed by in terpolation at a 45-degree angle from the upper right to the lower left corner.
  • pixels of a predicted block are formed by extrapolation at a 45-degree angle from the upper left to the lower right corner.
  • pixels of a predicted block are formed by extrapolation at an approximately 26.6 degree angle from the left edge to the right edge, slightly drifting downwards.
  • Samples of a predicted block can be formed from a weighted average of the reference samples A through M.
  • sample d may be predicted by the following Equation (1):
  • mode 0 and mode 1 pixels of a predicted block are formed by extrapolation from upper samples H and from left samples V, respectively.
  • mode 2 pixels of a predicted block are computed by a mean value of the upper and left samples H and V.
  • mode 3 pixels of a predicted block are formed using a linear 'plane' function fitted to the upper and left samples H and V.
  • the mode 3 is more suitable for areas of smoothly-varying luminance.
  • a bitstream may consist of multiple layers, i.e., a base layer (quarter common intermediate format (QCIF)), enhanced layer 1 (common intermediate format (CIF)), and enhanced layer 2 (2CIF) with different resolutions or frame rates.
  • QCIF quarter common intermediate format
  • CIF common intermediate format
  • 2CIF enhanced layer 2
  • Multi-layered video coding enables the use of prediction using texture information from a lower layer at the same temporal positions as a current frame, hereinafter called 'a base layer (BL) prediction' mode, as well as the intra-prediction mode.
  • a BL prediction mode mostly exhibits moderate prediction performance while an intra- prediction mode shows good or bad performance inconstantly.
  • the conventional H.264 standard proposes an approach including selecting a better prediction mode between an intra-prediction mode and a BL prediction mode for each macroblock and encoding the macroblock using the selected prediction mode.
  • the present invention provides a method for selecting a better prediction mode of an intra-prediction mode and a BL prediction mode for a region smaller than a macroblock.
  • the present invention also provides a modified intra-prediction mode combining the BL prediction mode into a conventional intra-prediction mode.
  • the present invention also provides a method for selecting a better prediction mode of a mode of calculating a temporal residual and a BL prediction mode for each motion block by using the same selection scheme as above for temporal prediction as well.
  • a method for encoding video based on a multi-layer structure including: performing intra-prediction on a current intra-block using images of neighboring intra-blocks of the current intra- block to obtain a prediction residual; performing prediction on the current intra-block using an image of a lower layer region corresponding to the current intra-block to obtain a prediction residual; selecting one of the two prediction residuals that offers higher coding efficiency; and encoding the selected prediction residual.
  • a method for decoding video based on a multi-layer structure including: extracting modified intra- prediction mode and texture data for each intra-block; generating a residual image for t he intra-block from the texture data; generating a predicted block for a current intra- block using previously reconstructed neighboring intra-blocks or previously reconstructed lower layer image according to the modified intra-prediction mode; and adding the predicted block to the residual image and reconstructing an image of the current intra-block.
  • a method for encoding video based on a multi-layer structure including: performing temporal prediction on a current motion block using an image of a region of a reference frame corresponding to the current motion block to obtain a prediction residual; performing prediction on the current motion block using an image of a lower layer region corresponding to the current motion block to obtain a prediction residual; selecting one of the two prediction residuals that offers higher coding efficiency; and encoding the selected prediction residual.
  • a method for decoding video based on a multi-layer structure including: extracting selected mode, motion data, and texture data for each motion block; generating a residual image for the motion block from the texture data; selecting an image of a region of a previously reconstructed reference frame corresponding to the motion block or a previously reconstructed lower layer image according to the selected mode; and adding the selected image to the residual image and reconstructing an image of the motion block.
  • a multi- layered video encoder including: a unit configured to perform intra-prediction on a current intra-block using images of neighboring intra-blocks to the current intra-block to obtain a prediction residual; a unit configured to perform prediction on the current intra-block using an image of a lower layer region corresponding to the current intra- block to obtain a prediction residual, a unit configured to select one of the two prediction residuals that offers higher coding efficiency, and a unit configured to encode the selected prediction residual.
  • a multi- layered video decoder including: a unit configured to extract modified intra-prediction mode and texture data for each intra-block; a unit configured to generate a residual image for the intra-block from the texture data; a unit configured to generate a predicted block for a current intra-block using previously reconstructed neighboring intra-blocks or previously reconstructed lower layer image according to the modified intra-prediction mode; and a unit configured to add the predicted block to the residual image and reconstructing an image of the current intra-block.
  • FlG. 1 shows conventional H.264 intra-prediction modes
  • FlG. 2 shows an example of labeling of prediction samples for explaining the intra- prediction modes shown in FlG. 1 ;
  • FlG. 3 is a detailed diagram of the intra-prediction modes shown in FlG. 1 ;
  • FlG. 4 shows an example of an input image
  • FlG. 5 shows the result of selecting one of two modes for each macroblock according to a conventional art
  • FlG. 6 shows the result of selecting one of two modes for each macroblock according to an exemplary embodiment of the present invention
  • FlG. 7 is a schematic diagram of a modified intra-prediction mode according to an exemplary embodiment the present invention.
  • FlG. 8 is a block diagram of a video encoder according to an exemplary embodiment of the present invention.
  • FlG. 9 shows a region being used as a reference in a modified intra-prediction mode
  • FlG. 10 shows an example for creating a macroblock by selecting an optimum prediction mode for each intra-block
  • FlG. 11 is a block diagram of a video decoder according to an exemplary embodiment of the present invention.
  • FlG. 12 shows an example of hierarchical variable size block matching (HVSBM);
  • FlG. 13 shows a macroblock constructed by selecting a mode for each motion block
  • FlG. 14 is a block diagram of a video encoder according to an exemplary embodiment of the present invention.
  • FlG. 15 is a block diagram of a video decoder according to an exemplary embodiment of the present invention.
  • FlG. 6 shows the result of selecting a better prediction mode between an intra- prediction mode and a BL prediction mode for each intra-block (e.g., a 4x4 block) according to an exemplary embodiment of the present invention.
  • an exemplary embodiment of the present invention can accomplish mode selection for a smaller region than a macroblock.
  • the region for this selection may have a size suitable for performing an intra-prediction mode.
  • a luminance component utilizes 4x4 and
  • 16x16 block-size modes while a chrominance component utilizes an 8x8 block-size mode can apply to 4x4 and 8x8 modes except a 16x16 mode which a 16x16 block has the same size as a macroblock.
  • 4x4 mode is used for intra-prediction.
  • the BL prediction mode can be added as one of submodes of a conventional intra-prediction mode.
  • an intra-prediction mode combining a BL prediction mode into the conventional intra-prediction mode is hereinafter referred to as a 'modified intra-prediction mode' according to an exemplary embodiment of the present invention.
  • Table 1 shows submodes of the modified intra-prediction mode.
  • the modified intra-prediction mode contains a BL prediction mode instead of a DC mode that is mode 2 in a conventional intra-prediction mode because an intra-block that can be represented in the DC mode that is non-directional can be predicted sufficiently well using the BL prediction mode. Furthermore, the modified prediction mode including the BL prediction mode can prevent overhead due to addition of a new mode.
  • the modified intra-prediction mode is schematically illustrated in FIG. 7.
  • the modified intra-prediction mode consists of 8 directional modes and one BL prediction mode.
  • the BL prediction mode can be considered to have a downward direction (toward a base layer)
  • the modified intra-prediction mode includes a total of 9 directional modes.
  • the BL prediction mode can be added to the conventional intra-prediction mode as mode '9' as shown in the following Table 2.
  • Exemplary embodiments of the present invention described hereinafter assume that the modified intra-prediction mode consists of submodes as shown in Table 1.
  • FlG. 8 is a block diagram of a video encoder 1000 according to a first exemplary embodiment of the present invention.
  • the video encoder 1000 mainly includes a base layer encoder 100 and an enhancement layer encoder 200.
  • the configuration of the enhancement layer encoder 200 will now be described.
  • a block partitioner 210 segments an input frame into multiple intra-blocks. While each intra-block may have a size less than a macroblock, exemplary embodiments of the present invention will be described assuming that each intra-block has a size of 4x4 pixels. Those multiple intra-blocks are then fed into a subtractor 205.
  • a predicted block generator 220 generates a predicted block associated with a current block for each submode of the modified intra-prediction mode using a reconstructed enhancement layer block received from an inverse spatial transformer 251 and a reconstructed base layer image provided by the base layer encoder 100.
  • a predicted block is generated using a reconstructed enhancement layer block
  • a calculation process as shown in FlG. 3 is used. In this case, since a DC mode is replaced by a BL prediction mode, the DC mode is excluded from the submodes of the intra-prediction mode.
  • the reconstructed base layer image may be used directly as the predicted block or be upsampled to the resolution of an enhancement layer before being used as the predicted block.
  • the predicted block generator 220 generates a predicted block 32 of a current intra-block for each of the prediction modes 0, 1, and 3 through 8 using its previously reconstructed neighboring enhancement layer blocks 33, 34, 35, and 36, in particular, information about pixels of blocks adjacent to the current intra-block.
  • a prediction mode 2 a previously reconstructed base layer image 31 is used directly as a predicted block (when a base layer has the same resolution as an enhancement layer) or upsampled to the resolution of the enhancement layer (when the base layer has a different resolution than the enhancement layer) before being used as the predicted block.
  • a deblocking process may be performed before the reconstructed base layer image is used as a predicted block to reduce a block artifact.
  • the subtractor 205 subtracts a predicted block produced by the predicted block generator 220 from a current intra-block received from the block partitioner 210, thereby removing redundancy in the current intra-block.
  • the difference between the predicted block and the current intra-block is lossily encoded as it passes through a spatial transformer 231 and a quantizer 232 and then losslessly encoded by an entropy coding unit 233.
  • the spatial transformer 231 performs spatial transform on a frame in which temporal redundancy has been removed by the subtractor 205 to create transform coefficients.
  • Discrete Cosine Transform (DCT) or wavelet transform technique may be used for the spatial transform.
  • a DCT coefficient is created when DCT is used for the spatial transform while a wavelet coefficient is produced when wavelet transform is used.
  • the quantizer 232 performs quantization on the transform coefficients obtained by the spatial transformer 231 to create quantization coefficients.
  • quantization is a methodology to express a transform coefficient expressed in an arbitrary real number as a finite number of bits.
  • Known quantization techniques include scalar quantization, vector quantization, and the like.
  • the simple scalar quantization technique is performed by dividing a transform coefficient by a value of a quantization table mapped to the coefficient and rounding the result to an integer value.
  • Embedded quantization is mainly used when wavelet transform is used for spatial transform.
  • the embedded quantization exploits spatial redundancy and involves reducing a threshold value by one half and encoding a transform coefficient larger than the threshold value.
  • Examples of embedded quantization techniques include Embedded Zerotrees Wavelet (EZW), Set Partitioning in Hierarchical Trees (SPIHT), and Embedded ZeroBlock Coding (EZBC).
  • the entropy coding unit 233 losslessly encodes the quantization coefficients generated by the quantizer 232 and a prediction mode selected by a mode selector 240 into an enhancement layer bitstream.
  • Various coding schemes such as Huffman Coding, Arithmetic Coding, and Variable Length Coding may be employed for lossless coding.
  • the mode selector 240 compares the results obtained by the entropy coding unit for each of the submodes of the modified intra-prediction mode and selects a prediction mode that offers highest coding efficiency.
  • the coding efficiency is measured by the quality of an image at a given bit-rate.
  • a cost function based on rate-distortion (RD) optimization is mainly used for evaluating the image quality. Because a lower cost means higher coding efficiency, the mode selector 240 selects a prediction mode that offers a minimum cost among the submodes of the modified intra-prediction mode.
  • E and B respectively denote a difference between an original signal and a signal reconstructed by decoding encoded bits and the number of bits required to perform each prediction mode and ⁇ is a Lagrangian coefficient used to control the ratio of E to B.
  • the number of bits B may be defined as the number of bits required for texture data, it is more accurate to define it as the number of bits required for both each prediction mode and its corresponding texture data. This is because the result of entropy encoding may not be same as the mode number allocated to each prediction mode. In particular, since the conventional H.264 also encodes only the result saved through estimation from prediction modes of neighboring intra-blocks instead of the prediction mode, the encoded result may vary according to the efficiency of estimation.
  • the mode selector 240 selects a prediction mode for each intra-block. In other words, the mode selector determines an optimum prediction mode for each intra-block in a macroblock 10 as shown in FIG. 10.
  • shadowed blocks are encoded using a BL prediction mode while non-shadowed blocks are encoded using conventional directional intra-prediction modes.
  • An integer multiple of the number of intra-blocks, where the modified intra- prediction mode is used, may be same as the size of a macroblock size. However, the modified intra-prediction mode can be performed for a region obtained by arbitrarily partitioning a frame.
  • the entropy coding unit 233 that receives a prediction mode selected by the mode selector 240 through the comparison and selection outputs a bitstream corresponding to the selected prediction mode.
  • the video encoder 1000 includes an inverse quantizer 252 and an inverse spatial transformer 251.
  • the inverse quantizer 252 performs inverse quantization on the coefficient quantized by the quantizer 232.
  • the inverse quantization is an inverse operation of the quantization which has been performed by the quantizer 232.
  • the inverse spatial transformer 251 performs inverse spatial transform on the inversely quantized result to reconstruct a current intra-block that is then sent to the predicted block generator 220.
  • a downsampler 110 downsamples an input frame to the resolution of the base layer.
  • the downsampler may be an MPEG downsampler, a wavelet downsampler, or others.
  • the base layer encoder 100 encodes the downsampled base layer frame into a base layer bitstream while decoding the encoded result. Texture information of a region of a base layer frame reconstructed through the decoding, which corresponds to a current intra-block in an enhancement layer, is transmitted to the predicted block generator 220.
  • Texture information of a region of a base layer frame reconstructed through the decoding which corresponds to a current intra-block in an enhancement layer, is transmitted to the predicted block generator 220.
  • an upsamping process should be performed on the texture information by an upsampler 120 before it is transmitted to the predicted block generator 220.
  • the upsampling process may be performed using the same or different technique than the downsampling process.
  • the base layer encoder 100 may operate in the same manner as the enhancement layer encoder 200, it may also encode and/or decode a base layer frame using conventional intra-prediction, temporal prediction, and other prediction processes.
  • FIG. 11 is a block diagram of a video decoder 2000 according to a first exemplary embodiment of the present invention.
  • the video decoder 2000 mainly includes a base layer decoder 300 and an enhancement layer decoder 400.
  • the configuration of the enhancement layer decoder 400 will now be described.
  • An entropy decoding unit 411 performs lossless decoding that is an inverse operation of entropy encoding to extract a modified intra-prediction mode and texture data for each intra-block, which are then fed to a predicted block generator 420 and an inverse quantizer 412, respectively.
  • the inverse quantizer 412 performs inverse quantization on the texture data received from the entropy decoding unit 411.
  • the inverse quantization is an inverse operation of the quantization which has been performed by the quantizer (232 of FIG. 8) of the video encoder (1000 of FIG. 8).
  • inverse scalar quantization can be performed by multiplying the texture data by its mapped value of the quantization table (the same as that used in the video encoder 1000).
  • An inverse spatial transformer 413 performs inverse spatial transform to reconstruct residual blocks from coefficients obtained after the inverse quantization. For example, when wavelet transform is used for spatial transform at the video encoder 1000, the inverse spatial transformer 413 performs inverse wavelet transform. When DCT is used for spatial transform, the inverse spatial transformer 413 performs inverse DCT.
  • the predicted block generator 420 generates a predicted block according to the prediction mode provided by the entropy decoding unit 411 using previously reconstructed neighboring intra-blocks of a current intra-block output from an adder 215 and a base layer image corresponding to the current intra-block reconstructed by the base layer decoder 300. For example, for modes 0, 1, and 3 through 8, a predicted block is generated using neighboring intra-blocks. For mode 2, the predicted block is generated using a base layer image.
  • the adder 215 adds the predicted block to a residual block reconstructed by the inverse spatial transformer 413, thereby reconstructing an image of the current intra- block.
  • the output of the adder 215 is fed to the predicted block generator 420 and a block combiner 430 that then combines the reconstructed residual blocks to reconstruct a frame.
  • the base layer decoder 300 reconstructs a base layer frame from a base layer bitstream. Texture information of a region of a base layer frame reconstructed through the decoding, which corresponds to a current intra-block in an enhancement layer, is provided to the predicted block generator 420.
  • Texture information of a region of a base layer frame reconstructed through the decoding which corresponds to a current intra-block in an enhancement layer, is provided to the predicted block generator 420.
  • an upsampling process must be performed on the texture information by an upsampler 310 before it is transmitted to the predicted block generator 420.
  • the base layer decoder 300 may operate in the same manner as the enhancement layer decoder 400, it may also encode and/or decode a base layer frame using conventional intra-prediction, temporal prediction, and other prediction processes.
  • a BL prediction mode may be included in a temporal prediction process, which will be described below.
  • the conventional H.264 uses hierarchical variable size block matching (HVSBM) to remove temporal redundancy in each macroblock.
  • HVSBM hierarchical variable size block matching
  • a macroblock 10 is partitioned into subblocks with four modes: 16x16, 8x16,
  • Each 8x8 subblock can be further split into 4x8, 8x4, or 4x4 mode (if not split, a 8x8 mode is used). Thus, a maximum of 7 combinations of subblocks are allowed for each macroblock 10.
  • a combination of subblocks constituting the macroblock 10 that offers a minimum cost is selected as an optimum combination.
  • accuracy in block matching increases and the amount of motion data (motion vectors, subblock modes, etc) increase together.
  • the optimum combination of subblocks is selected to achieve optimum trade-off between the block matching accuracy and the amount of motion data. For example, a simple background image containing no complicated change may use a large size subblock mode while an image with complicated and detailed edges may use a small size subblock mode.
  • the feature of the second exemplary embodiment of the present invention lies in determining whether to apply a mode of calculating a temporal residual or a BL prediction mode for each subblock in a macroblock 10 composed of the optimum combination of subblocks
  • 1 11 and BL 12 respectively denote a subblock to be encoded using a temporal residual and a subblock to be encoded using a BL prediction mode.
  • Equation (3) A RD cost function shown in Equation (3) is used to select an optimal mode for each subblock.
  • Ci and Cb respectively denote costs required when temporal residual is used and when a BL prediction mode is used
  • Ei and Bi respectively denote a difference between an original signal and a reconstructed signal when the temporal residual is used and the number of bits required to encode motion data generated by temporal prediction and texture information obtained by the temporal residual
  • Eb and Bb respectively denote a difference between an original signal and a reconstructed signal when the BL prediction mode is used and the number of bits required to encode information indicating the BL prediction mode and texture information obtained using the BL prediction mode
  • the costs Ci and Cb are defined by Equation (3):
  • H.264 standard uses HVSBM to perform temporal prediction (including motion estimation and motion compensation)
  • other standards such as MPEG may use fixed-size block matching.
  • the second exemplary embodiment focuses on selecting a BL prediction mode or a mode of calculating a residual between a current block and a corresponding block in a reference frame for each block, regardless of whether a macroblock is partitioned into variable-size or fixed-size blocks.
  • a variable-size block or fixed-size block that is a basic unit of calculating a motion vector is hereinafter referred to as a 'motion block'.
  • FIG. 14 is a block diagram of a video encoder 3000 according to a second exemplary embodiment of the present invention.
  • the video encoder 3000 mainly includes a base layer encoder 100 and an enhancement layer encoder 500.
  • the configuration of the enhancement layer encoder 500 will now be described.
  • a motion estimator 290 performs motion estimation on a current frame using a reference frame to obtain motion vectors.
  • the motion estimation may be performed for each macroblock using HVSBM or fixed-size block matching algorithm (BMA).
  • BMA fixed-size block matching algorithm
  • pixels in a given motion block are compared with pixels of a search area in a reference frame and a displacement with a minimum error is determined as a motion vector.
  • the motion estimator 290 sends motion data such as motion vectors obtained as a result of motion estimation, a motion block type, and a reference frame number to an entropy coding unit 233.
  • the motion compensator 280 performs motion compensation on a reference frame using the motion vectors and generates a motion-compensated frame.
  • the motion- compensated frame is a virtual frame consisting of blocks in a reference frame corresponding to blocks in a current frame and is transmitted to a switching unit 295.
  • the switching unit 295 receives a motion-compensated frame received from the motion compensator 280 and a base layer frame provided by the base layer encoder 100 and sends textures of the frames to a subtracter 205 on a motion block basis.
  • a base layer has a different resolution than an enhancement layer
  • an upsampling process must be performed on the base layer frame generated by the base layer encoder 100 before it is transmitted to the switching unit 295.
  • the subtractor 205 subtracts the texture received from the switching unit 295 from a predetermined motion block (current motion block) in the input frame in order to remove redundancy within the current motion block. That is, the subtractor 205 calculates a difference between the current motion block and its corresponding motion block in a motion-compensated frame (hereinafter called a 'first prediction residual') and a difference between the current motion block and its corresponding region in a base layer frame (hereinafter called a 'second prediction residual').
  • a 'first prediction residual' a difference between the current motion block and its corresponding motion block in a motion-compensated frame
  • a 'second prediction residual' a difference between the current motion block and its corresponding region in a base layer frame
  • the first and second prediction residuals are lossily encoded as they pass through a spatial transformer 231 and a quantizer 232 and then losslessly encoded by the entropy coding unit 233.
  • a mode selector 270 selects one of the first and second prediction residuals encoded by the entropy coding unit 233, which offers higher coding efficiency. For example, the method described with reference to the equation (3) may be used for this selection. Because the first and second prediction residuals are calculated for each motion block, the mode selector 270 iteratively performs the selection for all motion blocks.
  • the entropy coding unit 233 that receives the result (represented by an index 0 or 1) selected by the mode selector 270 through the comparison and selection outputs a bitstream corresponding to the selected result.
  • the video encoder 3000 includes the inverse quantizer 252, the inverse spatial transformer 251, and an adder 251.
  • the adder 215 adds a residual frame reconstructed by an inverse spatial transformer 251 to the motion-compensated frame output by the motion compensator 280 to reconstruct a reference frame that is then sent to the motion estimator 290.
  • FlG. 15 is a block diagram of a video decoder 4000 according to a second embodiment of the present invention.
  • the video decoder 4000 mainly includes a base layer decoder 300 and an enhancement layer decoder 600.
  • An entropy decoding unit 411 performs lossless decoding that is an inverse operation of entropy encoding to extract a selected mode, motion data, and texture data for each motion block.
  • the selected mode means an index (0 or 1) indicating the result selected out of a temporal residual ('third prediction residual') and a residual between a current motion block and a corresponding region in a base layer frame ('fourth prediction residual'), which are calculated by the video encoder 3000 for each motion block.
  • the entropy decoding unit 411 provides the selected mode, the motion data, and the texture data to a switching unit 450, a motion compensator 440, and an inverse quantizer 412, respectively.
  • the inverse quantizer 412 performs inverse quantization on the texture data received from the entropy decoding unit 411.
  • the inverse quantization is an inverse operation of the quantization which has been performed by the quantizer (232 of FlG. 14) of the enhancement layer encoder (500 of FlG. 14).
  • An inverse spatial transformer 413 performs inverse spatial transform to reconstruct a residual image from coefficients obtained after the inverse quantization for each motion block.
  • the motion compensator 440 performs motion compensation on a previously reconstructed video frame using the motion data received from the entropy decoding unit 411 and generates a motion-compensated frame, of which an image corresponding to the current motion block (first image) is provided to the switching unit 450.
  • the base layer decoder 300 reconstructs a base layer frame from a base layer bitstream and sends an image of the base layer frame corresponding to the current motion block (second image) to the switching unit 450.
  • an upsampling process may be performed by an upsampler 310 before the second image is transmitted to the switching unit 450.
  • the switching unit 450 selects one of the first and second images according to the selected mode provided by the entropy decoding unit 411 and provides the selected image to an adder 215 as a predicted block.
  • the adder 215 adds the residual image reconstructed by the inverse spatial transformer 413 to the predicted block selected by the switching unit 450 to reconstruct an image for the current motion block.
  • the above process is iteratively performed to reconstruct an image for each motion block, thereby reconstructing one frame.
  • the present invention allows multi-layered video coding that is well suited for characteristics of an input video.
  • the present invention also improves the performance of a multi-layered video codec.
  • various functional components mean, but are not limited to, software or hardware components, such as a Field Programmable Gate Arrays (FPGAs) or Application Specific Integrated Circuits (ASICs), which perform certain tasks.
  • the components may advantageously be configured to reside on the addressable storage media and configured to execute on one or more processors.
  • the functionality provided for in the components and modules may be combined into fewer components and modules or further separated into additional components and modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

L'invention concerne un procédé de compression vidéo, et plus particulièrement, un procédé de prédiction permettant d'éliminer de manière efficace la redondance dans une image vidéo ; ainsi qu'un procédé et un appareil de compression vidéo mettant en oeuvre le procédé de prédiction. L'invention concerne un procédé de codage vidéo fondé sur une structure multicouche et consiste : à faire une prédiction intra sur un bloc intra courant au moyen d'images de blocs intra voisins du bloc intra courant afin d'obtenir un résidu de prédiction ; à faire une prédiction sur le bloc intra courant au moyen d'une image d'une région de couche inférieure correspondant au bloc intra courant afin d'obtenir un résidu de prédiction ; à sélectionner, parmi les deux résidus de prédiction, celui qui offre la meilleure efficacité de codage ; et à coder le résidu de prédiction sélectionné.
EP05820697.0A 2004-12-03 2005-11-18 Procede et appareil de codage et de decodage video multicouche Withdrawn EP1817911A4 (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US63254504P 2004-12-03 2004-12-03
KR1020050006804A KR100679031B1 (ko) 2004-12-03 2005-01-25 다 계층 기반의 비디오 인코딩 방법, 디코딩 방법 및 상기방법을 이용한 장치
PCT/KR2005/003916 WO2006059848A1 (fr) 2004-12-03 2005-11-18 Procede et appareil de codage et de decodage video multicouche

Publications (2)

Publication Number Publication Date
EP1817911A1 true EP1817911A1 (fr) 2007-08-15
EP1817911A4 EP1817911A4 (fr) 2015-05-20

Family

ID=36565263

Family Applications (1)

Application Number Title Priority Date Filing Date
EP05820697.0A Withdrawn EP1817911A4 (fr) 2004-12-03 2005-11-18 Procede et appareil de codage et de decodage video multicouche

Country Status (2)

Country Link
EP (1) EP1817911A4 (fr)
WO (1) WO2006059848A1 (fr)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101601300B (zh) * 2006-12-14 2012-07-18 汤姆逊许可公司 用自适应增强层预测对位深度可分级视频数据进行编码和/或解码的方法和设备
KR101365570B1 (ko) * 2007-01-18 2014-02-21 삼성전자주식회사 인트라 예측 부호화, 복호화 방법 및 장치
KR101365575B1 (ko) * 2007-02-05 2014-02-25 삼성전자주식회사 인터 예측 부호화, 복호화 방법 및 장치
JP5375372B2 (ja) * 2009-07-01 2013-12-25 ヤマハ株式会社 圧縮符号化装置、および復号装置
KR102194749B1 (ko) 2012-10-01 2020-12-23 지이 비디오 컴프레션, 엘엘씨 공간적 인트라 예측 파라미터들의 인터-레이어 예측을 이용한 스케일러블 비디오 코딩
CN114745549B (zh) * 2022-04-02 2023-03-17 北京广播电视台 一种基于感兴趣区域的视频编码方法和系统

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1558040A1 (fr) * 2004-01-21 2005-07-27 Thomson Licensing S.A. Procédé et dispositif de génération/évaluation d'information de prédiction dans le codage/décodage de signaux d'image

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2006059848A1 *

Also Published As

Publication number Publication date
EP1817911A4 (fr) 2015-05-20
WO2006059848A1 (fr) 2006-06-08

Similar Documents

Publication Publication Date Title
US20060120450A1 (en) Method and apparatus for multi-layered video encoding and decoding
KR100772883B1 (ko) 인트라 bl 모드를 고려한 디블록 필터링 방법, 및 상기방법을 이용하는 다 계층 비디오 인코더/디코더
US20060104354A1 (en) Multi-layered intra-prediction method and video coding method and apparatus using the same
KR100772873B1 (ko) 스무딩 예측을 이용한 다계층 기반의 비디오 인코딩 방법,디코딩 방법, 비디오 인코더 및 비디오 디코더
KR100703748B1 (ko) 다 계층 기반의 비디오 프레임을 효율적으로 예측하는 방법및 그 방법을 이용한 비디오 코딩 방법 및 장치
KR100679035B1 (ko) 인트라 bl 모드를 고려한 디블록 필터링 방법, 및 상기방법을 이용하는 다 계층 비디오 인코더/디코더
EP2008469B1 (fr) Procede de codage video par couches multiples et appareil associe
JP5203503B2 (ja) ビット深度スケーラビリティ
US20120250759A1 (en) Apparatus and Method for Generating a Coded Video Sequence and for Decoding a Coded Video Sequence by Using an Intermediate Layer Residual Value Prediction
EP1774793A1 (fr) Codage video evolutif avec evaluation et compensation du mouvement de grille
WO2006004331A1 (fr) Procedes de codage et de decodage video, codeur et decodeur video
EP1817911A1 (fr) Procede et appareil de codage et de decodage video multicouche
EP1842379A1 (fr) Procede de prediction efficace d'une trame video multicouche, procede de codage video et appareil l'utilisant
KR101850152B1 (ko) 적응적 루프 필터 적용 방법 및 그를 이용한 스케일러블 비디오 부호화 장치
EP1817918A1 (fr) Procede et appareil de codage/decodage de video multicouche par sur-echantillonnage dct
He et al. Improved fine granular scalable coding with interlayer prediction
Wu et al. Adaptive weighted prediction for scalable video coding based on HEVC
Zhang et al. Improved motion compensation in the enhancement layer for spatially scalable video coding
WO2006080663A1 (fr) Procede et dispositif pour coder efficacement des vecteurs de mouvement multicouche

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20070522

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR

DAX Request for extension of the european patent (deleted)
RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: SAMSUNG ELECTRONICS CO., LTD.

RIC1 Information provided on ipc code assigned before grant

Ipc: H04N 19/33 20140101ALI20141112BHEP

Ipc: H04N 19/11 20140101ALI20141112BHEP

Ipc: H04N 19/176 20140101ALI20141112BHEP

Ipc: H04N 19/105 20140101AFI20141112BHEP

Ipc: H04N 19/147 20140101ALI20141112BHEP

RIC1 Information provided on ipc code assigned before grant

Ipc: H04N 19/33 20140101ALI20150316BHEP

Ipc: H04N 19/105 20140101AFI20150316BHEP

Ipc: H04N 19/11 20140101ALI20150316BHEP

Ipc: H04N 19/147 20140101ALI20150316BHEP

Ipc: H04N 19/176 20140101ALI20150316BHEP

RIC1 Information provided on ipc code assigned before grant

Ipc: H04N 19/105 20140101AFI20150325BHEP

Ipc: H04N 19/33 20140101ALI20150325BHEP

Ipc: H04N 19/176 20140101ALI20150325BHEP

Ipc: H04N 19/11 20140101ALI20150325BHEP

Ipc: H04N 19/147 20140101ALI20150325BHEP

RIC1 Information provided on ipc code assigned before grant

Ipc: H04N 19/176 20140101ALI20150327BHEP

Ipc: H04N 19/33 20140101ALI20150327BHEP

Ipc: H04N 19/105 20140101AFI20150327BHEP

Ipc: H04N 19/147 20140101ALI20150327BHEP

Ipc: H04N 19/11 20140101ALI20150327BHEP

RA4 Supplementary search report drawn up and despatched (corrected)

Effective date: 20150417

RIC1 Information provided on ipc code assigned before grant

Ipc: H04N 19/105 20140101AFI20150413BHEP

Ipc: H04N 19/11 20140101ALI20150413BHEP

Ipc: H04N 19/176 20140101ALI20150413BHEP

Ipc: H04N 19/147 20140101ALI20150413BHEP

Ipc: H04N 19/33 20140101ALI20150413BHEP

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20150602