EP1741297A1 - Procede et appareil permettant de mettre en oeuvre l'extensibilite de mouvement - Google Patents

Procede et appareil permettant de mettre en oeuvre l'extensibilite de mouvement

Info

Publication number
EP1741297A1
EP1741297A1 EP05789577A EP05789577A EP1741297A1 EP 1741297 A1 EP1741297 A1 EP 1741297A1 EP 05789577 A EP05789577 A EP 05789577A EP 05789577 A EP05789577 A EP 05789577A EP 1741297 A1 EP1741297 A1 EP 1741297A1
Authority
EP
European Patent Office
Prior art keywords
motion vector
enhancement layer
layer
base layer
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP05789577A
Other languages
German (de)
English (en)
Inventor
Woo-Jin 108-703 Jugong 2-danji APT HAN
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020040032237A external-priority patent/KR100587561B1/ko
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Publication of EP1741297A1 publication Critical patent/EP1741297A1/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • H04N19/615Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding using motion compensated temporal filtering [MCTF]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/187Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scalable video layer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/19Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding using optimisation based on Lagrange multipliers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/567Motion estimation based on rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/63Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]

Definitions

  • the present invention relates to a video compression method, and more par ⁇ ticularly, to an apparatus and a method for improving the compression efficiency of a motion vector by efficiently predicting a motion vector in an enhancement layer from a motion vector in a base layer, in a video coding method using a multilayer structure.
  • a basic principle of data compression is removing data redundancy.
  • Data can be compressed by removing spatial redundancy where the same color or object is repeated in an image, or by removing temporal redundancy where there is little change between adjacent frames in a moving image or the same sound is repeated in audio, or by removing visual redundancy taking into account human eyesight and limited perception of high frequency.
  • transmission media are necessary. Different types of transmission media for multimedia have different performance. Currently used transmission media have various transmission rates. For example, an ultrahigh- speed communication network can transmit data at a rate of several megabits per second while a mobile communication network has a transmission rate of 384 kilobits per second.
  • Scalability refers to the ability to partially decode a single compressed bitstream at a decoder or a pre-decoder part. The decoder or pre-decoder can reconstruct multimedia sequences having different quality levels, resolutions, or frame rates from only some of the bitstreams coded by a scalable coding method.
  • a bitstream typically consists of motion information (motion vector, block size, etc.) and texture information corresponding to a residual obtained after motion estimation.
  • texture information consists of multiple layers: i.e., a base layer, a first enhancement layer, and a second enhancement layer.
  • the respective layers have different resolution levels: i.e., Quarter Common Intermediate Format (QCIF), Common In ⁇ termediate Format (CIF), and 2CIF.
  • QCIF Quarter Common Intermediate Format
  • CIF Common In ⁇ termediate Format
  • SNR Signal-to-noise ratio
  • temporal seal- abilities are implemented within each layer.
  • motion information is usually compressed losslessly as a whole.
  • the non-scalable motion information can significantly degrade the coding efficiency due to an excessive amount of motion information, especially for a bitstream compressed at low bitrates.
  • a method to support motion scalability is to divide motion information into layers according to relative significance and to transmit only part of the motion information for low bitrates with loss, giving more bits to textures. Motion scalability is an issue of great concern to MPEG-21 PART 13 scalable video coding.
  • the partitioned-based approach generates a multi-layered motion vector by obtaining motion vectors for various resolutions in a frame with the same pixel accuracy.
  • the accuracy-based approach generates a multi-layered motion vector by obtaining motion vectors for various pixel accuracies in a frame having one resolution.
  • the present invention proposes a method for implementing motion scalability by reconstructing a motion vector into multiple layers using the pixel accuracy-based approach. This method is focused on providing high coding performance for a base layer and an enhancement layer simultaneously.
  • the present invention provides a method for efficiently implementing motion scalability using a motion vector consisting of multiple layers.
  • the present invention also provides a method for improving coding efficiency when using only a base layer at a low bitrate by constructing a motion vector into layers according to the pixel accuracy in such a way as to minimize distortion.
  • the present invention also provides a method for improving coding performance by minimizing overhead when using all layers at a high bitrate.
  • an apparatus for reconstructing a motion vector obtained at the predetermined pixel accuracy including a base layer determining module determining a motion vector component of a base layer using the obtained motion vector according to the pixel accuracy of the base layer, and an enhancement layer determining module determining a motion vector component of an enhancement layer that is close to the obtained motion vector according to the pixel accuracy of the enhancement layer.
  • the base layer determining module may determine the motion vector component of the base layer that is close to a value predicted from motion vectors of neighboring blocks according to the pixel accuracy of the base layer.
  • the base layer determining module may separate the obtained motion vector into a sign and a magnitude, may use an unsigned value to represent the magnitude of the motion vector, and may attach the original sign to the value.
  • the base layer determining module may determine a value closest to the obtained motion vector as the motion vector component of the base layer according to the pixel accuracy of the base layer.
  • the motion vector component x of the base layer may be determined using b where sign(x) denotes a signal function that returns values of 1 and -1 when x is a positive value and a negative value, respectively, x denotes an absolute value function with respect to variable x, and y x 1 +0.5J denotes a function giving the largest integer not exceeding
  • the apparatus for reconstructing a motion vector obtained at a predetermined pixel accuracy may further include a first compression module removing redundancy in a motion vector component of a first enhancement layer among the enhancement layers using the fact that the motion vector component of the first enhancement layer has an opposite sign to the motion vector component of the base layer when the motion vector component of the first enhancement layer is not 0.
  • the apparatus for reconstructing a motion vector obtained at the predetermined pixel accuracy may further include a second compression module removing redundancy in a motion vector component of a second enhancement layer using the fact that the motion vector component of the second enhancement layer is always 0 when the motion vector component of the first enhancement layer is not 0.
  • a video encoder using a motion vector consisting of multiple layers including a motion vector reconstruction module including a motion vector search module obtaining a motion vector with the predetermined pixel accuracy, a base layer de ⁇ termining module determining a motion vector component of a base layer using the obtained motion vector according to the pixel accuracy of the base layer, an en ⁇ hancement layer determining module determining a motion vector component of an enhancement layer that is close to the obtained motion vector according to the pixel accuracy of the enhancement layer, a temporal filtering module removing temporal re ⁇ dundancies by filtering frames in a direction of a temporal axis using the obtained motion vectors, a spatial transform module removing spatial redundancies from the frames from which the temporal redundancies have been removed and creating transform coefficients, and a quantization module performing quantization on the transform coefficients.
  • a motion vector reconstruction module including a motion vector search module obtaining a motion vector with the predetermined pixel accuracy, a base layer de ⁇ termining module determining a motion vector component of
  • an apparatus for reconstructing a motion vector consisting of a base layer and at least one enhancement layer including a layer reconstruction module recon ⁇ structing motion vector components of the respective layers from corresponding values of the layers interpreted from an input bitstream, and a motion addition module adding the reconstructed motion vector components of the layers together and providing the motion vector.
  • an apparatus for reconstructing a motion vector consisting of a base layer and at least one enhancement layer including a first reconstruction module recon ⁇ structing a motion vector component of a first enhancement layer by attaching a sign to a value of the first enhancement layer interpreted from an input bitstream, which is opposite to the sign of a corresponding value of the base layer, a layer reconstruction module reconstructing motion vector components of the base layer and at least one en ⁇ hancement layer other than the first enhancement layer from values of the base layer and the at least one enhancement layer interpreted from the input bitstream, and a moti on addition module adding the reconstructed motion vector components of the layers together and providing the motion vector.
  • an apparatus for reconstructing a motion vector consisting of a base layer and at least one enhancement layer including a first reconstruction module recon ⁇ structing a motion vector component of a first enhancement layer by attaching a sign to a value of the first enhancement layer interpreted from an input bitstream, which is opposite to the sign of a corresponding value of the base layer, a second reconstruction module setting a motion vector component of a second enhancement layer to 0 when the value of the first enhancement layer is not 0 and reconstructing the motion vector component of the second enhancement layer from a value of the second enhancement layer interpreted from the input bitstream when the value of the first enhancement layer is 0, a layer reconstruction module reconstructing motion vector components of the base layer and at least one enhancement layer other than the first and second en ⁇ hancement layers from values of the base layer and the at least one enhancement layer interpreted from the input bitstream, and a motion addition module adding the re ⁇ constructed motion vector components of the layers together and providing the motion vector.
  • a video decoder using a motion vector consisting of multiple layers including an entropy decoding module interpreting an input bitstream and extracting texture in ⁇ formation and motion information from the bitstream, a motion vector reconstruction module reconstructing motion vector component of the respective layers from cor ⁇ responding values of the layers contained in the extracted motion information and providing the motion vector after adding the motion vector components of the respective layers together, an inverse quantization module applying inverse quantization to the texture information and outputting transform coefficients, an inverse spatial transform module inversely transforming the transform coefficients into transform coefficients in a spatial domain by performing the inverse of spatial transform, and an inverse temporal filtering module performing inverse temporal filtering on the transform coefficients in the spatial domain using the obtained motion vector and reconstructing frames in a video sequence.
  • the motion vector reconstruction module may include a first reconstruction module reconstructing a motion vector component of a first enhancement layer by attaching a sign to a value of the first enhancement layer contained in the motion in ⁇ formation, which is opposite to the sign of a corresponding value of the base layer, a layer reconstruction module reconstructing motion vector components of the base layer and at least one enhancement layer other than the first enhancement layer from values of the base layer and the at least one enhancement layer, and a motion addition module adding the reconstructed motion vector components of the layers together and providing the motion vector.
  • the motion vector reconstruction module may include a first recon ⁇ struction module reconstructing a motion vector component of a first enhancement layer by attaching a sign to a value of the first enhancement layer contained in the motion information, which is opposite to the sign of a corresponding value of the base layer, a second reconstruction module setting a motion vector component of a second enhancement layer to 0 when the value of the first enhancement layer is not 0 and re ⁇ constructing the motion vector component of the second enhancement layer from a value of the second enhancement layer contained in the motion information when the value of the first enhancement layer is 0, a layer reconstruction module reconstructing motion vector components of the base layer and at least one enhancement layer other than the first and second enhancement layers from values of the base layer and the at least one enhancement layer contained in the motion information, and a motion addition module adding the reconstructed motion vector components of the layers together and providing the motion vector.
  • a first recon ⁇ struction module reconstructing a motion vector component of a first enhancement layer by attaching a sign to a value of the first enhancement layer contained in the motion information,
  • a method for reconstructing a motion vector obtained at the predetermined pixel accuracy including determining a motion vector component of a base layer using the obtained motion vector according to the pixel accuracy of the base layer, and determining a motion vector component of an enhancement layer that is close to the obtained motion vector according to the pixel accuracy of the enhancement layer.
  • the motion vector component of the base layer may be determined to be close to a value predicted from motion vectors of neighboring blocks according to the pixel accuracy of the base layer.
  • the motion vector component of the base layer may be determined according to the pixel accuracy of the base layer by separating the obtained motion vector into a sign and a magnitude, using an unsigned value to represent the magnitude of the motion vector, and attaching the original sign to the value.
  • a value closest to the obtained motion vector may be determined as the motion vector component of the base layer according to the pixel accuracy of the base layer.
  • a method for reconstructing a motion vector consisting of a base layer and at least one enhancement layer including reconstructing motion vector components of the respective layers from corresponding values of the layers interpreted from an input bitstream, and adding the reconstructed motion vector components of the layers together and providing the motion vector.
  • a method for reconstructing a motion vector consisting of a base layer and at least one en ⁇ hancement layer including reconstructing a motion vector component of a first enhancement layer by attaching a sign to a value of the first enhancement layer in ⁇ terpreted from an input bitstream, which is opposite to the sign of a corresponding value of the base layer, reconstructing motion vector components of the base layer and at least one enhancement layer other than the first enhancement layer from values of the base layer and the at least one enhancement layer interpreted from the input bitstream, and adding the reconstructed motion vector components of the layers together and providing the motion vector.
  • a method for reconstructing a motion vector consisting of a base layer and at least one enhancement layer including reconstructing a motion vector component of a first enhancement layer by attaching a sign to a value of the first enhancement layer interpreted from an input bitstream, which is opposite to the sign of a corresponding value of the base layer, setting a motion vector component of a second enhancement layer to 0 when the value of the first enhancement layer is not 0 and reconstructing the motion vector component of the second enhancement layer from a value of the second enhancement layer interpreted from the input bitstream when the value of the first en ⁇ hancement layer is 0, reconstructing motion vector components of the base layer and at least one enhancement layer other than the first and second enhancement layers from values of the base layer and the at least one enhancement layer interpreted from the input bitstream, and adding the reconstructed motion vector components of the layers together and providing the motion vector.
  • FIG. 1 is a diagram for explaining a method of reconstructing a multi-layered motion vector according to the pixel accuracy
  • FIG. 2 illustrates a method for improving the compression efficiency of a motion vector according to a first embodiment of the present invention
  • FIG. 3 illustrates an example of obtaining a predicted value for a current block by correlation with neighboring blocks
  • FIG. 4 illustrates a third embodiment of the present invention
  • FIG. 5 is a graph illustrating the results of measuring peak signal-to-noise ratios
  • FIG. 6 is a graph illustrating the results of measuring a PSNR when compressing a
  • FIG. 7 is a graph comparing the experimental results of the third embodiment of
  • FIG. 6 and the fourth embodiment of the present invention are identical to FIG. 6 and the fourth embodiment of the present invention.
  • FIG. 8 is a block diagram of a video coding system
  • FIG. 9 is a block diagram of a video encoder
  • FIG. 10 is a block diagram of an exemplary motion vector reconstruction module according to the first embodiment of the present invention.
  • FIG. 11 is an illustration for explaining a process of obtaining a motion vector of an enhancement layer;
  • FIG. 12 is a block diagram of another exemplary motion vector reconstruction module for implementing the method according to the fourth embodiment of the present invention.
  • FIG. 13 is a block diagram of a video decoder
  • FIG. 14 is a block diagram of an exemplary motion vector reconstruction module according to the present invention.
  • FIG. 15 is a block diagram of another exemplary motion vector reconstruction module for implementing the method according to the fourth embodiment of the present invention.
  • FIG. 16 is a schematic diagram illustrating a bitstream structure
  • FIG. 17 is a diagram illustrating the detailed structure of each group of pictures
  • FIG. 18 is a diagram illustrating the detailed structure of a motion vector (MV) field.
  • the present invention presents a method for constructing a base layer in such a way as to minimize distortion when only the base layer is used, and a method for quantizing an enhancement layer in such a way as to minimize overhead when all layers are used.
  • FIG. 1 shows an example in which one motion vector is divided into three motion vector components.
  • the motion vector A is reconstructed as the sum of a base layer motion vector component B, a first enhancement layer motion vector component El, and a second enhancement layer motion vector component E2.
  • a motion vector obtained as a result of a motion vector search with the predetermined pixel accuracy as described above is defined as an 'actual motion vector'.
  • Pixel accuracy used for the highest enhancement layer can be typically selected as the predetermined pixel accuracy.
  • the motion vectors of the respective layers have different pixel accuracies that increase in an order from the lowest (close to a base layer) to the highest (away from the base layer).
  • the base layer has one pixel accuracy
  • the first enhancement layer has a half pixel accuracy
  • the second enhancement layer has a quarter pixel accuracy.
  • An encoder transmits the reconstructed motion vector to a predecoder that truncates a part of the motion vector in an order from the highest to the lowest layers while a decoder receives the remaining part of the motion vector.
  • an encoder may transmit motion vector components of all layers (the base layer, the first enhancement layer, and the second enhancement layer) while the predecoder may transmit only components of the base layer and the first enhancement layer to the decoder by truncating a component of the second enhancement layer when it determines according to available communication conditions that transmission of all the motion vector components is unsuitable.
  • the decoder uses the components of the base layer and the first enhancement layer to reconstruct a motion vector.
  • the base layer is essential motion vector information having the highest priority and it cannot be omitted during transmission.
  • a bitrate in the base layer must be equal to or less than the minimum bandwidth supported by a network.
  • the bitrate in transmission of all the layers (the base layer and the first and second enhancement layers) must be equal to or less than the maximum bandwidth.
  • the present invention proposes methods for constructing a base layer according to first through third embodiments and verifies the methods through experiments.
  • a motion vector is constructed into multiple layers: a motion vector component of the base layer represented with integer-pixel accuracy, and motion vector components of enhancement layers respectively represented with half- and quarter-pixel accuracy.
  • the base layer uses an integer to represent a motion vector component, and the en ⁇ hancement layers use a symbol of 1, -1, or 0 instead of a real number in order to represent motion vector components in a simple way. While a motion vector is usually represented by a pair of x, and y components, only one component will be described throughout this specification for clarity of explanation.
  • the motion vector component of the first enhancement layer with half pixel accuracy may have a value of -0.5, 0.5, or 0, it is represented by the symbol -1, 1, or 0.
  • the motion vector component of the second en ⁇ hancement layer with quarter pixel accuracy may have a value of -0.25, 0.25, or 0, it is represented by the symbol -1, 1, or 0.
  • One of the most important goals in implementing motion scalability is to prevent significant degradation in coding performance when an enhancement layer is truncated.
  • the truncation of the enhancement layer increases a motion vector error, thereby significantly degrading the quality of the video reconstructed by a decoder, this will also reduce the effect of improving video quality by allocating more bits to texture information due to the reduction of motion vector bits. Therefore, the first through third embodiments of the present invention are focused on preventing a significant drop in the peak signal-to-noise ratio (PSNR) when only a base layer is used, compared to when a base layer and enhancement layers are used.
  • PSNR peak signal-to-noise ratio
  • FIG. 2 shows an example of predicting a motion vector in first and second enhancement layers from a motion vector in a base layer.
  • a feature of a second embodiment of the present invention is that an integer motion vector component of a base layer is as close to zero as possible.
  • an actual motion vector is separated into sign and magnitude.
  • the magnitude of the motion vector is represented using an unsigned integer and the original sign is then attached to the unsigned integer. This method makes probable that the motion vector component of the base layer is zero, which enables more efficient quantization since most quantization modules quantize zeros very efficiently. This method is expressed by Equation (1):
  • a Wnd denotes a function giving the largest integer not exceeding x (by stripping the decimal part).
  • Table 1 shows examples of values for each layer that can be obtained with the values x and x in Equation (1).
  • the values x and x are multiplied by a factor of 4 and expressed as integer values
  • A(x-x ) in the b lowest row denotes an error between an actual value and an integer motion vector of the base layer.
  • El and E2 respectively denote motion vector components of the first and second enhancement layers, expressed as symbols.
  • the method of the second embodiment provides higher possibility that the integer, motion vector component x of the base layer has more b zeros, thereby increasing the compression efficiency as compared to the first embodiment in which x is obtained by simply truncating the decimal part
  • motion vector components of the first and second enhancement layers are expressed as the symbols -1, 0, or 1, which results in reduced efficiency.
  • the second embodiment suffers from a significant distortion caused by a difference - as much as 0.75 - between actual and quantized motion vectors even when only the base layer is used.
  • the difference between an actual motion vector and a quantized motion vector of a base layer is minimized. That is, the third embodiment concentrates on reducing that difference to less than 0.5, which is an improvement over the first and second embodiments where the maximum difference is 0.75. This is accomplished by modifying the second embodiment to some extent. That is, an integer nearest to an actual motion vector is selected as a motion vector component of the base layer by rounding off the actual motion vector, as defined by Equation (2):
  • Equation (2) is similar to Equation (1) except for the use of rounding off.
  • FIG. 4 shows an example in which a motion vector with a value of 0.75 is represented according to the third embodiment of the present invention.
  • the value 1 is selected as a motion vector component of a base layer since 1 is an integer nearest to the actual motion vector of 0.75.
  • a motion vector component of the first enhancement layer that minimizes the difference between the actual motion vector and the motion vector of the first enhancement layer may be -0.5 or 0 (a motion vector of the first en ⁇ hancement layer is sum of a motion vector of the base layer and a motion vector component of the first enhancement layer).
  • the minimum difference is 0.25.
  • the value closest to the motion vector component of the immediately lower layer is chosen as the motion vector component of the first enhancement layer.
  • Table 3 shows the results of experiments where a Foreman CIF sequence is compressed at frame rate of 30 Hz and at bitrate of 256 Kbps. The experiments were done to verify the performance of the first through third embodiments of the present invention. Table 3 lists the number of bits (hereinafter 'size' will refer to 'number of bits') needed for motion vectors of a base layer and first and second enhancement layers according to the first through third embodiments.
  • a base layer has the smallest size in the first embodiment, but the first and second enhancement layers have the largest size since a motion vector of a base layer is predicted, thus increasing the total size.
  • the second embodiment increases the size of a base layer as well as a total size compared to the first embodiment. The total size is the largest in the second embodiment.
  • the base layer has the largest size but the first en ⁇ hancement layer has the smallest size since it is highly probable that a motion vector component of the first enhancement layer will have a value of zero.
  • the second en ⁇ hancement layer has a size similar to its counterparts in the first and second em ⁇ bodiments.
  • FIG. 5 is a graph illustrating the results of measuring PSNRs (as a video quality indicator) using motion vectors from the three layers according to the first through third embodiments of the present invention as detailed in Table 3. Referring to FIG. 5, the third embodiment exhibits the highest performance while the first embodiment exhibits the poorest performance.
  • the first embodiment has similar performance to the second embodiment when only a base layer is used while it has weak performance compared to the other embodiments when all motion vector layers are used.
  • the third embodiment exhibits superior performance when only the base layer is used. Specifically, the PSNR value in the third embodiment is more than 1.0 dB higher than that of the second embodiment. This is achieved by minimizing the difference between an integer motion vector component of the base layer and an actual motion vector. That is, since it is more efficient for coding performance to minimize this difference than to slightly decrease an integer value, the third embodiment exhibits the best performance.
  • the third embodiment is superior to the first and second em ⁇ bodiments in terms of the size of the first enhancement layer, but it has little difference in terms of the size of the second enhancement layer.
  • the third embodiment is not advantageous over the others when all motion vector layers are used.
  • FIG. 6 is a graph illustrating an experimental result of compressing a Foreman CIF sequence at 100 Kbps according to the third embodiment.
  • the third embodiment exhibits superior performance when only the base layer is used, compared to when all the layers are used. Specifically, while the third embodiment shows excellent performance when the base layer or a combination of the base layer and the first en ⁇ hancement layer is used, its performance degrades when all the layers are used since the size of the second enhancement layer is large.
  • the third embodiment is intended to allocate a large amount of in ⁇ formation to the second enhancement layer. Since the second enhancement layer is used only for a sufficient bitrate, its large size does not significantly affect performance. For a low bitrate, only the base layer and the first enhancement layer are used, and bits in the second enhancement layer can be truncated.
  • the present invention proposes a method for providing excellent coding performance when all motion vector layers are used by adding two compression rules.
  • the two compression rules are found in Table 2.
  • the first rule is that the motion vector component (Ax b ) of the base layer has an opposite sign to the motion vector component El of the first enhancement layer except, of course, when El is zero.
  • the motion vector component El of the first enhancement layer is represented by 0 or 1
  • a decoder reconstructs the original value of El by attaching a sign to El, which is opposite to the sign of the motion vector component of the base layer.
  • El since El has an opposite sign to the motion vector component of the base layer (except zero, which has no sign), El can be expressed as either 0 or 1.
  • An encoder converts -1 to 1 while a decoder can reconstruct the original value of El by attaching the opposite sign to 1.
  • the second compression rule is that the motion vector component E2 of the second enhancement layer is always 0 when El is 1 or -1. Thus, E2 is not encoded when a corresponding El is not 0.
  • the symbol 'X' in Table 4 denotes a portion not transmitted, and this constitutes a quarter of the total number of cases. Thus, the number of bits can be reduced by 25%.
  • compression efficiency can be further increased.
  • a method created by applying the first and second compression rules to the third embodiment is referred to as a 'fourth embodiment'.
  • the compression rules in the fourth embodiment can also be applied to a base layer, a first enhancement layer, and a second enhancement layer for a motion vector consisting of four or more layers. Furthermore, either the first or second or both rules can be applied depending on the type of application.
  • Table 5 shows the number of bits needed for motion vectors of a base layer, a first enhancement layer, and a second enhancement layer according to the fourth embodiment of the present invention.
  • the fourth embodiment reduces the sizes of the first and second enhancement layers by 15.68% and 11.90% compared to the third embodiment, thereby significantly reducing the overall bitrate.
  • the number of bits in the second en ⁇ hancement layer is reduced by less than 25% since the value of the omitted bits is zero and are efficiently compressed by an entropy encoding module.
  • FIG. 7 is a graph comparing the experimental results of the third embodiment (FIG. 6) and the fourth embodiment of the present invention. As shown in FIG. 7, the fourth embodiment exhibits similar performance to the third embodiment when only the base layer is used, but exhibits superior performance thereto when all the layers are used.
  • a motion vector consists of three layers
  • the present invention can apply to a motion vector consisting of more than three layers.
  • a motion vector search is performed on a base layer with 1 pixel accuracy, a first en ⁇ hancement layer with 1/2 pixel accuracy, and a second enhancement layer with 1/4 pixel accuracy.
  • this is provided as an example only, and it will be readily apparent to those skilled in the art that the motion vector search may be performed with different pixel accuracies than those stated above.
  • the pixel accuracies increase with each layer, in a manner similar to the afore-mentioned embodiments.
  • an encoder encodes an input video using a multilayered motion vector while a predecoder or a decoder decodes all or part of the input video.
  • the overall process will now be described schematically with reference to FIG. 8.
  • FIG. 8 shows the overall configuration of a video coding system.
  • the video coding system includes an encoder 100, a predecoder 200, and a decoder 300.
  • the encoder 100 encodes an input video into a bitstream 20.
  • the predecoder 200 truncates part of the texture data in the bitstream 20 according to extraction conditions such as bitrate, resolution or frame rate determined considering the communication en- vironment.
  • the decoder 300 therefore, implements scalability for the texture data.
  • the predecoder 200 also implements motion scalability by truncating part of the motion data in the bitstream 20 in an order from the highest to the lowest layers according to the communication environment or the number of texture bits. By implementing texture or motion scalability in this way, the predecoder can extract various bitstreams 25 from the original bitstream 20.
  • the decoder 300 generates an output video 30 from the extracted bitstream 25.
  • the predecoder 200 or the decoder 300 or both may extract the bitstream 25 according to the extraction conditions.
  • FIG. 9 is a block diagram of an encoder 100 of a video coding system.
  • the encoder
  • 100 includes a partitioning module 110, a motion vector reconstruction module 120, a temporal filtering module 130, a spatial transform module 140, a quantization module 150, and an entropy encoding module 160.
  • the partitioning module 110 partitions an input video 10 into several groups of pictures(GOPs), each of which is independently encoded as a unit.
  • the motion vector reconstruction module 120 finds an actual motion vector for a frame of one GOP with the predetermined pixel accuracy, and sends the motion vector to the temporal filtering module 130.
  • the motion vector reconstruction module 120 uses this actual motion vector and a predetermined method (one of first through third embodiments) to determine a motion vector component of the base layer. Next, it determines a motion vector component of an enhancement layer with the enhancement layer pixel accuracy that is closer to the actual motion vector.
  • the motion vector re ⁇ construction module 120 also sends an integer motion vector component of the base layer and a symbol value that is the motion vector component of the enhancement layer to the entropy encoding module 160.
  • the multilayered motion information is encoded by the entropy encoding module 160 using a predetermined encoding algorithm.
  • FIG. 10 is a block diagram of an exemplary motion vector reconstruction module
  • the motion vector recon ⁇ struction module 120 includes a motion vector search module 121, a base layer de ⁇ termining module 122, and an enhancement layer determining module 123.
  • the motion vector reconstruction module 120 further includes an enhancement layer compression module 125 with either a first or second compression module 126 or 127 or both.
  • the motion vector search module 121 performs a motion vector search of each block in a current frame (at a predetermined pixel accuracy) in order to obtain an actual motion vector.
  • the block may be a fixed variable size block. When a variable size block is used, information about the block size (or mode) needs to be transmitted together with the actual motion vector.
  • a current image frame is partitioned into blocks of a predetermined pixel size, and a block in a reference image frame is compared with the corresponding block in the current image frame according to the predetermined pixel accuracy in order to derive the difference between the two blocks.
  • a motion vector that gives the minimum sum of errors is designated as the motion vector for the current block.
  • a search range may be predefined using parameters. A smaller search range reduces search time and exhibits good performance when a motion vector exists within the search range. However, the prediction accuracy will be decreased for a fast-motion image where the motion vector does not exist within the range.
  • Motion estimation may be performed using variable size blocks instead of the above fixed-size block.
  • a motion vector search is performed on blocks of variable pixel sizes to determine a variable block size and a motion vector that minimize a predetermined cost function J.
  • Equation (3) The cost function is defined by Equation (3):
  • D is the number of bits used for coding a frame difference
  • R is the number of bits used for coding an estimated motion vector
  • is a Lagrangian coefficient
  • the base layer determining module 122 determines an integer motion vector component of a base layer according to the first through third embodiments. In the first embodiment, it determines the motion vector component of the base layer by spatial correlation with the motion vector components of neighboring blocks and rounding up or down the decimal part of the actual motion vector.
  • the base layer determining module 122 determines the motion vector component of the base layer by separating the actual motion vector into a sign and a magnitude.
  • the magnitude of the motion vector is represented by an unsigned integer to which the original sign is attached. The determination process is shown in Equation (1).
  • the base layer determining module 122 determines the motion vector component of the base layer by finding an integer value nearest to the actual motion vector. This nearest integer value is calculated by Equation (2).
  • the enhancement layer determining module 123 determines a motion vector component of an enhancement layer in such a way as to minimize an error between the actual motion vector and the motion vector component. When two or more vectors with the same error exist, the motion vector that minimizes the error of the motion vector in the immediately lower layer is chosen as the motion vector component of the enhancement layer.
  • a motion vector component of a base layer is determined according to the first through third embodiments and motion vector components of the first through third en ⁇ hancement layers are determined using a separate method.
  • the value 1 is determined as the motion vector component of the base layer according to one of the first through third embodiments
  • a process for determining the motion vector components of the enhancement layers will now be described with reference to FIG. 11.
  • a 'cumulative value' of a layer is defined as the sum of motion vector components of the lower layers.
  • the motion vector reconstruction module 120 further includes the enhancement layer compression module 125 with either the first or second compression module 126 or 127 or both as shown in FIG. 12.
  • the first compression module 126 converts the negative number into a positive number having the same magnitude.
  • the second compression module 127 does not encode the motion vector component of the second enhancement layer.
  • the temporal filtering module [138] Referring to FIG. 9, to reduce temporal redundancies, the temporal filtering module
  • MCTF Motion Compensated Temporal Filtering
  • UMCTF Unconstrained MCTF
  • the spatial transform module 140 removes spatial redundancies from these frames using the discrete cosine transform (DCT) or wavelet transform, and creates transform coefficients.
  • DCT discrete cosine transform
  • wavelet transform wavelet transform
  • the quantization module 150 quantizes those transform coefficients. Quantization is the process of converting real transform coefficients into discrete values and mapping the quantized coefficients into quantization indices. In particular, when a wavelet transform is used for spatial transformation, embedded quantization can often be used. Embedded ZeroTrees Wavelet (EZW), Set Partitioning in Hierarchical Trees (SPIHT), and Embedded ZeroBlock Coding (EZBC) are examples of an embedded quantization algorithm.
  • EZW Embedded ZeroTrees Wavelet
  • SPIHT Set Partitioning in Hierarchical Trees
  • EZBC Embedded ZeroBlock Coding
  • FIG. 13 is a block diagram of a decoder 300 in a video coding system according to an embodiment of the present invention.
  • the decoder 300 includes an entropy decoding module 310, an inverse quantization module 320, an inverse spatial transform module 330, an inverse temporal filtering module 340, and a motion vector reconstruction module 350.
  • the entropy decoding module 310 performs the inverse of an entropy encoding process to extract texture information (encoded frame data) and motion information from the bitstream 20.
  • FIG. 14 is a block diagram of an exemplary motion vector reconstruction module
  • the motion vector reconstruction module 350 includes a layer reconstruction module 351 and a motion addition module 352.
  • the layer reconstruction module 351 interprets the extracted motion information and recognizes motion information for each layer.
  • the motion information contains block information and motion vector information for each layer.
  • the layer recon ⁇ struction module 351 then reconstructs a motion vector component of each layer from a corresponding layer value contained in the motion information.
  • the 'layer value' means a value received from the encoder. Specifically, an integer value representing a motion vector component of a base layer or a symbol value representing a motion vector component of an enhancement layer.
  • the layer reconstruction module 351 reconstructs the original motion vector component from the symbol value.
  • the motion addition module 352 reconstructs a motion vector by adding the motion vector components of the base layer and the enhancement layer together and sending the motion vector to the inverse temporal filtering module 340.
  • FIG. 15 is a block diagram of another exemplary motion vector reconstruction module 350 for implementing the method according to the fourth embodiment of the present invention.
  • the motion vector reconstruction module 350 includes a layer reconstruction module 351, a motion addition module 352, and an enhancement layer reconstruction module 353 with either first or second reconstruction modules 354 and 355 or both.
  • the first reconstruction module 354 attaches a sign to this value that is opposite to the sign of a motion vector component of a base layer, and obtains a motion vector component corresponding to the resultant value (symbol).
  • the motion vector component is 0.
  • the second reconstruction module 355 sets the value of motion vector component of the second enhancement layer to 0 when the value of the first enhancement layer is not 0. When the value is 0, the second reconstruction module obtains a motion vector component corresponding to a value of the second enhancement layer. Then, the motion addition module 352 reconstructs a motion vector by adding the motion vector components of the base layer and the first and second enhancement layers together.
  • the inverse spatial transform module 330 inverse- transforms the transform coefficients into transform coefficients in a spatial domain. For example, in the DCT transform the transform coefficients are inverse-transformed from the frequency domain to the spatial domain. In the wavelet transform, the transform coefficients are inversely transformed from the wavelet domain to the spatial domain.
  • the inverse temporal filtering module 340 performs inverse temporal filtering on the transform coefficients in the spatial domain (i.e., a temporal residual image) using the reconstructed motion vectors received from the motion vector reconstruction module 350 in order to reconstruct frames making up a video sequence.
  • a module refers to, but is not limited to, a software or hardware component such as a Field Programmable Gate Array (FPGA) or an Ap ⁇ plication Specific Integrated Circuit (ASIC), which performs certain tasks.
  • a module may advantageously be configured to reside on the addressable storage medium and to execute on one or more processors.
  • a module may include, by way of example, components such as software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, or variables.
  • the functionality provided for in the components and modules may be combined into fewer components and modules or further separated into additional components and modules.
  • the components and modules may be implemented such a way that they execute one or more computers in a communication system.
  • FIGS. 16 through 18 illustrate a structure of a bitstream 400. Specifically, FIG. 16 is a schematic diagram illustrating an overall structure of the bitstream 400.
  • the bitstream 400 is composed of a sequence header field 410 and a data field 420 containing a plurality of GOP fields 430 through 450.
  • the sequence header field 410 specifies image properties such as frame width (2 bytes) and height (2 bytes), a GOP size (1 byte), and a frame rate (1 byte).
  • the data field 420 contains all the image information and other information
  • FIG. 17 shows the detailed structure of each GOP field 430.
  • the GOP field 430 consists of a GOP header 460, a T field 470 that specifies in-
  • FIG. 18 shows the detailed structure of the MV field 480 consisting of MV through MV fields.
  • each of the MV (1) through MV (n-l) fields specifies variable size block information such as size and position of each variable size block and motion vector information (symbols representing motion vector components) for each layer.
  • the present invention reduces the size of an enhancement layer while minimizing an error in a base layer.
  • the present invention also enables adaptive allocation of the amount of bits between motion information and texture information using motion scalability.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

L'invention concerne un appareil et un procédé permettant d'améliorer le rendement de compression de vecteur de mouvement multicouche d'un procédé de codage vidéo et consistant à prévoir de manière efficace un vecteur de mouvement dans une couche d'amélioration à partir d'un vecteur de mouvement dans une couche de base. L'appareil comprend un module de détermination de couche de base déterminant un composant du vecteur de mouvement d'une couche de base possédant les pixels précis de la couche de base, au moyen du vecteur de mouvement obtenu et un module de détermination de couche d'amélioration déterminant un composant du vecteur de mouvement d'une couche d'amélioration possédant les pixels précis de la couche d'amélioration obtenue par le vecteur de mouvement.
EP05789577A 2004-04-08 2005-04-01 Procede et appareil permettant de mettre en oeuvre l'extensibilite de mouvement Withdrawn EP1741297A1 (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US56025004P 2004-04-08 2004-04-08
KR1020040032237A KR100587561B1 (ko) 2004-04-08 2004-05-07 모션 스케일러빌리티를 구현하는 방법 및 장치
PCT/KR2005/000968 WO2006004305A1 (fr) 2004-04-08 2005-04-01 Procede et appareil permettant de mettre en oeuvre l'extensibilite de mouvement

Publications (1)

Publication Number Publication Date
EP1741297A1 true EP1741297A1 (fr) 2007-01-10

Family

ID=35783075

Family Applications (1)

Application Number Title Priority Date Filing Date
EP05789577A Withdrawn EP1741297A1 (fr) 2004-04-08 2005-04-01 Procede et appareil permettant de mettre en oeuvre l'extensibilite de mouvement

Country Status (2)

Country Link
EP (1) EP1741297A1 (fr)
WO (1) WO2006004305A1 (fr)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102006043707A1 (de) 2006-09-18 2008-03-27 Robert Bosch Gmbh Verfahren zur Datenkompression in einer Videosequenz
US9078024B2 (en) 2007-12-18 2015-07-07 Broadcom Corporation Video processing system with user customized graphics for use with layered video coding and methods for use therewith
US8130823B2 (en) 2007-12-19 2012-03-06 Broadcom Corporation Channel adaptive video transmission system for use with layered video coding and methods for use therewith
US9210480B2 (en) * 2007-12-20 2015-12-08 Broadcom Corporation Video processing system with layered video coding and methods for use therewith
US8416848B2 (en) 2007-12-21 2013-04-09 Broadcom Corporation Device adaptive video transmission system for use with layered video coding and methods for use therewith
US9143731B2 (en) 2008-01-02 2015-09-22 Broadcom Corporation Mobile video device for use with layered video coding and methods for use therewith
US8594191B2 (en) 2008-01-03 2013-11-26 Broadcom Corporation Video processing system and transcoder for use with layered video coding and methods for use therewith
US8144781B2 (en) 2008-01-04 2012-03-27 Broadcom Corporation Video processing system for scrambling layered video streams and methods for use therewith
CN101933328B (zh) 2008-01-22 2014-11-19 杜比实验室特许公司 利用动态查询表更新的自适应运动信息成本估计
KR101847072B1 (ko) * 2010-04-05 2018-04-09 삼성전자주식회사 영상 부호화 방법 및 장치, 비디오 복호화 방법 및 장치

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6510177B1 (en) * 2000-03-24 2003-01-21 Microsoft Corporation System and method for layered video coding enhancement

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2006004305A1 *

Also Published As

Publication number Publication date
WO2006004305A1 (fr) 2006-01-12

Similar Documents

Publication Publication Date Title
US20050226334A1 (en) Method and apparatus for implementing motion scalability
US8031776B2 (en) Method and apparatus for predecoding and decoding bitstream including base layer
KR100679022B1 (ko) 계층간 필터링을 이용한 비디오 코딩 및 디코딩방법과,비디오 인코더 및 디코더
JP5014989B2 (ja) 基礎階層を利用するフレーム圧縮方法、ビデオコーディング方法、フレーム復元方法、ビデオデコーディング方法、ビデオエンコーダ、ビデオデコーダ、および記録媒体
US8929436B2 (en) Method and apparatus for video coding, predecoding, and video decoding for video streaming service, and image filtering method
JP4891234B2 (ja) グリッド動き推定/補償を用いたスケーラブルビデオ符号化
US7889793B2 (en) Method and apparatus for effectively compressing motion vectors in video coder based on multi-layer
EP1589764A2 (fr) Méthode et appareil pour soutenir l' échelonnabilité de mouvement
US20060013309A1 (en) Video encoding and decoding methods and video encoder and decoder
WO2006004305A1 (fr) Procede et appareil permettant de mettre en oeuvre l'extensibilite de mouvement
KR100679018B1 (ko) 다계층 비디오 코딩 및 디코딩 방법, 비디오 인코더 및디코더
US20050163217A1 (en) Method and apparatus for coding and decoding video bitstream
KR20050089721A (ko) 비디오 스트리밍 서비스를 위한 비디오 코딩 방법과비디오 인코딩 시스템, 및 비디오 디코딩 방법과 비디오디코딩 시스템
WO2005086493A1 (fr) Codage video extensible prenant en charge une taille de groupe d'images variable et codeur video extensible
WO2006080662A1 (fr) Procede et dispositif permettant de compresser efficacement des vecteurs de mouvements dans un codeur video sur la base de plusieurs couches
JP2008515328A (ja) 階層間フィルタリングを利用したビデオコーディングおよびデコーディング方法と、ビデオエンコーダおよびデコーダ
WO2006006793A1 (fr) Procede de codage et decodage de video et codeur et decodeur de video
KR20050009639A (ko) 프레임간 웨이브렛 비디오 코딩방법
EP1813114A1 (fr) Procede et appareil de precodage de trains de bits hybride
JP2008512035A (ja) 多階層ビデオコーディングおよびデコーディング方法、ビデオエンコーダおよびデコーダ
WO2006080663A1 (fr) Procede et dispositif pour coder efficacement des vecteurs de mouvement multicouche

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20061005

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): DE FR GB

DAX Request for extension of the european patent (deleted)
RBV Designated contracting states (corrected)

Designated state(s): DE FR GB

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20091102