WO2006004305A1 - Method and apparatus for implementing motion scalability - Google Patents

Method and apparatus for implementing motion scalability Download PDF

Info

Publication number
WO2006004305A1
WO2006004305A1 PCT/KR2005/000968 KR2005000968W WO2006004305A1 WO 2006004305 A1 WO2006004305 A1 WO 2006004305A1 KR 2005000968 W KR2005000968 W KR 2005000968W WO 2006004305 A1 WO2006004305 A1 WO 2006004305A1
Authority
WO
WIPO (PCT)
Prior art keywords
motion vector
enhancement layer
layer
base layer
value
Prior art date
Application number
PCT/KR2005/000968
Other languages
French (fr)
Inventor
Woo-Jin Han
Original Assignee
Samsung Electronics Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020040032237A external-priority patent/KR100587561B1/en
Application filed by Samsung Electronics Co., Ltd. filed Critical Samsung Electronics Co., Ltd.
Priority to EP05789577A priority Critical patent/EP1741297A1/en
Publication of WO2006004305A1 publication Critical patent/WO2006004305A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • H04N19/615Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding using motion compensated temporal filtering [MCTF]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/187Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scalable video layer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/19Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding using optimisation based on Lagrange multipliers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/567Motion estimation based on rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/63Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]

Definitions

  • the present invention relates to a video compression method, and more par ⁇ ticularly, to an apparatus and a method for improving the compression efficiency of a motion vector by efficiently predicting a motion vector in an enhancement layer from a motion vector in a base layer, in a video coding method using a multilayer structure.
  • a basic principle of data compression is removing data redundancy.
  • Data can be compressed by removing spatial redundancy where the same color or object is repeated in an image, or by removing temporal redundancy where there is little change between adjacent frames in a moving image or the same sound is repeated in audio, or by removing visual redundancy taking into account human eyesight and limited perception of high frequency.
  • transmission media are necessary. Different types of transmission media for multimedia have different performance. Currently used transmission media have various transmission rates. For example, an ultrahigh- speed communication network can transmit data at a rate of several megabits per second while a mobile communication network has a transmission rate of 384 kilobits per second.
  • Scalability refers to the ability to partially decode a single compressed bitstream at a decoder or a pre-decoder part. The decoder or pre-decoder can reconstruct multimedia sequences having different quality levels, resolutions, or frame rates from only some of the bitstreams coded by a scalable coding method.
  • a bitstream typically consists of motion information (motion vector, block size, etc.) and texture information corresponding to a residual obtained after motion estimation.
  • texture information consists of multiple layers: i.e., a base layer, a first enhancement layer, and a second enhancement layer.
  • the respective layers have different resolution levels: i.e., Quarter Common Intermediate Format (QCIF), Common In ⁇ termediate Format (CIF), and 2CIF.
  • QCIF Quarter Common Intermediate Format
  • CIF Common In ⁇ termediate Format
  • SNR Signal-to-noise ratio
  • temporal seal- abilities are implemented within each layer.
  • motion information is usually compressed losslessly as a whole.
  • the non-scalable motion information can significantly degrade the coding efficiency due to an excessive amount of motion information, especially for a bitstream compressed at low bitrates.
  • a method to support motion scalability is to divide motion information into layers according to relative significance and to transmit only part of the motion information for low bitrates with loss, giving more bits to textures. Motion scalability is an issue of great concern to MPEG-21 PART 13 scalable video coding.
  • the partitioned-based approach generates a multi-layered motion vector by obtaining motion vectors for various resolutions in a frame with the same pixel accuracy.
  • the accuracy-based approach generates a multi-layered motion vector by obtaining motion vectors for various pixel accuracies in a frame having one resolution.
  • the present invention proposes a method for implementing motion scalability by reconstructing a motion vector into multiple layers using the pixel accuracy-based approach. This method is focused on providing high coding performance for a base layer and an enhancement layer simultaneously.
  • the present invention provides a method for efficiently implementing motion scalability using a motion vector consisting of multiple layers.
  • the present invention also provides a method for improving coding efficiency when using only a base layer at a low bitrate by constructing a motion vector into layers according to the pixel accuracy in such a way as to minimize distortion.
  • the present invention also provides a method for improving coding performance by minimizing overhead when using all layers at a high bitrate.
  • an apparatus for reconstructing a motion vector obtained at the predetermined pixel accuracy including a base layer determining module determining a motion vector component of a base layer using the obtained motion vector according to the pixel accuracy of the base layer, and an enhancement layer determining module determining a motion vector component of an enhancement layer that is close to the obtained motion vector according to the pixel accuracy of the enhancement layer.
  • the base layer determining module may determine the motion vector component of the base layer that is close to a value predicted from motion vectors of neighboring blocks according to the pixel accuracy of the base layer.
  • the base layer determining module may separate the obtained motion vector into a sign and a magnitude, may use an unsigned value to represent the magnitude of the motion vector, and may attach the original sign to the value.
  • the base layer determining module may determine a value closest to the obtained motion vector as the motion vector component of the base layer according to the pixel accuracy of the base layer.
  • the motion vector component x of the base layer may be determined using b where sign(x) denotes a signal function that returns values of 1 and -1 when x is a positive value and a negative value, respectively, x denotes an absolute value function with respect to variable x, and y x 1 +0.5J denotes a function giving the largest integer not exceeding
  • the apparatus for reconstructing a motion vector obtained at a predetermined pixel accuracy may further include a first compression module removing redundancy in a motion vector component of a first enhancement layer among the enhancement layers using the fact that the motion vector component of the first enhancement layer has an opposite sign to the motion vector component of the base layer when the motion vector component of the first enhancement layer is not 0.
  • the apparatus for reconstructing a motion vector obtained at the predetermined pixel accuracy may further include a second compression module removing redundancy in a motion vector component of a second enhancement layer using the fact that the motion vector component of the second enhancement layer is always 0 when the motion vector component of the first enhancement layer is not 0.
  • a video encoder using a motion vector consisting of multiple layers including a motion vector reconstruction module including a motion vector search module obtaining a motion vector with the predetermined pixel accuracy, a base layer de ⁇ termining module determining a motion vector component of a base layer using the obtained motion vector according to the pixel accuracy of the base layer, an en ⁇ hancement layer determining module determining a motion vector component of an enhancement layer that is close to the obtained motion vector according to the pixel accuracy of the enhancement layer, a temporal filtering module removing temporal re ⁇ dundancies by filtering frames in a direction of a temporal axis using the obtained motion vectors, a spatial transform module removing spatial redundancies from the frames from which the temporal redundancies have been removed and creating transform coefficients, and a quantization module performing quantization on the transform coefficients.
  • a motion vector reconstruction module including a motion vector search module obtaining a motion vector with the predetermined pixel accuracy, a base layer de ⁇ termining module determining a motion vector component of
  • an apparatus for reconstructing a motion vector consisting of a base layer and at least one enhancement layer including a layer reconstruction module recon ⁇ structing motion vector components of the respective layers from corresponding values of the layers interpreted from an input bitstream, and a motion addition module adding the reconstructed motion vector components of the layers together and providing the motion vector.
  • an apparatus for reconstructing a motion vector consisting of a base layer and at least one enhancement layer including a first reconstruction module recon ⁇ structing a motion vector component of a first enhancement layer by attaching a sign to a value of the first enhancement layer interpreted from an input bitstream, which is opposite to the sign of a corresponding value of the base layer, a layer reconstruction module reconstructing motion vector components of the base layer and at least one en ⁇ hancement layer other than the first enhancement layer from values of the base layer and the at least one enhancement layer interpreted from the input bitstream, and a moti on addition module adding the reconstructed motion vector components of the layers together and providing the motion vector.
  • an apparatus for reconstructing a motion vector consisting of a base layer and at least one enhancement layer including a first reconstruction module recon ⁇ structing a motion vector component of a first enhancement layer by attaching a sign to a value of the first enhancement layer interpreted from an input bitstream, which is opposite to the sign of a corresponding value of the base layer, a second reconstruction module setting a motion vector component of a second enhancement layer to 0 when the value of the first enhancement layer is not 0 and reconstructing the motion vector component of the second enhancement layer from a value of the second enhancement layer interpreted from the input bitstream when the value of the first enhancement layer is 0, a layer reconstruction module reconstructing motion vector components of the base layer and at least one enhancement layer other than the first and second en ⁇ hancement layers from values of the base layer and the at least one enhancement layer interpreted from the input bitstream, and a motion addition module adding the re ⁇ constructed motion vector components of the layers together and providing the motion vector.
  • a video decoder using a motion vector consisting of multiple layers including an entropy decoding module interpreting an input bitstream and extracting texture in ⁇ formation and motion information from the bitstream, a motion vector reconstruction module reconstructing motion vector component of the respective layers from cor ⁇ responding values of the layers contained in the extracted motion information and providing the motion vector after adding the motion vector components of the respective layers together, an inverse quantization module applying inverse quantization to the texture information and outputting transform coefficients, an inverse spatial transform module inversely transforming the transform coefficients into transform coefficients in a spatial domain by performing the inverse of spatial transform, and an inverse temporal filtering module performing inverse temporal filtering on the transform coefficients in the spatial domain using the obtained motion vector and reconstructing frames in a video sequence.
  • the motion vector reconstruction module may include a first reconstruction module reconstructing a motion vector component of a first enhancement layer by attaching a sign to a value of the first enhancement layer contained in the motion in ⁇ formation, which is opposite to the sign of a corresponding value of the base layer, a layer reconstruction module reconstructing motion vector components of the base layer and at least one enhancement layer other than the first enhancement layer from values of the base layer and the at least one enhancement layer, and a motion addition module adding the reconstructed motion vector components of the layers together and providing the motion vector.
  • the motion vector reconstruction module may include a first recon ⁇ struction module reconstructing a motion vector component of a first enhancement layer by attaching a sign to a value of the first enhancement layer contained in the motion information, which is opposite to the sign of a corresponding value of the base layer, a second reconstruction module setting a motion vector component of a second enhancement layer to 0 when the value of the first enhancement layer is not 0 and re ⁇ constructing the motion vector component of the second enhancement layer from a value of the second enhancement layer contained in the motion information when the value of the first enhancement layer is 0, a layer reconstruction module reconstructing motion vector components of the base layer and at least one enhancement layer other than the first and second enhancement layers from values of the base layer and the at least one enhancement layer contained in the motion information, and a motion addition module adding the reconstructed motion vector components of the layers together and providing the motion vector.
  • a first recon ⁇ struction module reconstructing a motion vector component of a first enhancement layer by attaching a sign to a value of the first enhancement layer contained in the motion information,
  • a method for reconstructing a motion vector obtained at the predetermined pixel accuracy including determining a motion vector component of a base layer using the obtained motion vector according to the pixel accuracy of the base layer, and determining a motion vector component of an enhancement layer that is close to the obtained motion vector according to the pixel accuracy of the enhancement layer.
  • the motion vector component of the base layer may be determined to be close to a value predicted from motion vectors of neighboring blocks according to the pixel accuracy of the base layer.
  • the motion vector component of the base layer may be determined according to the pixel accuracy of the base layer by separating the obtained motion vector into a sign and a magnitude, using an unsigned value to represent the magnitude of the motion vector, and attaching the original sign to the value.
  • a value closest to the obtained motion vector may be determined as the motion vector component of the base layer according to the pixel accuracy of the base layer.
  • a method for reconstructing a motion vector consisting of a base layer and at least one enhancement layer including reconstructing motion vector components of the respective layers from corresponding values of the layers interpreted from an input bitstream, and adding the reconstructed motion vector components of the layers together and providing the motion vector.
  • a method for reconstructing a motion vector consisting of a base layer and at least one en ⁇ hancement layer including reconstructing a motion vector component of a first enhancement layer by attaching a sign to a value of the first enhancement layer in ⁇ terpreted from an input bitstream, which is opposite to the sign of a corresponding value of the base layer, reconstructing motion vector components of the base layer and at least one enhancement layer other than the first enhancement layer from values of the base layer and the at least one enhancement layer interpreted from the input bitstream, and adding the reconstructed motion vector components of the layers together and providing the motion vector.
  • a method for reconstructing a motion vector consisting of a base layer and at least one enhancement layer including reconstructing a motion vector component of a first enhancement layer by attaching a sign to a value of the first enhancement layer interpreted from an input bitstream, which is opposite to the sign of a corresponding value of the base layer, setting a motion vector component of a second enhancement layer to 0 when the value of the first enhancement layer is not 0 and reconstructing the motion vector component of the second enhancement layer from a value of the second enhancement layer interpreted from the input bitstream when the value of the first en ⁇ hancement layer is 0, reconstructing motion vector components of the base layer and at least one enhancement layer other than the first and second enhancement layers from values of the base layer and the at least one enhancement layer interpreted from the input bitstream, and adding the reconstructed motion vector components of the layers together and providing the motion vector.
  • FIG. 1 is a diagram for explaining a method of reconstructing a multi-layered motion vector according to the pixel accuracy
  • FIG. 2 illustrates a method for improving the compression efficiency of a motion vector according to a first embodiment of the present invention
  • FIG. 3 illustrates an example of obtaining a predicted value for a current block by correlation with neighboring blocks
  • FIG. 4 illustrates a third embodiment of the present invention
  • FIG. 5 is a graph illustrating the results of measuring peak signal-to-noise ratios
  • FIG. 6 is a graph illustrating the results of measuring a PSNR when compressing a
  • FIG. 7 is a graph comparing the experimental results of the third embodiment of
  • FIG. 6 and the fourth embodiment of the present invention are identical to FIG. 6 and the fourth embodiment of the present invention.
  • FIG. 8 is a block diagram of a video coding system
  • FIG. 9 is a block diagram of a video encoder
  • FIG. 10 is a block diagram of an exemplary motion vector reconstruction module according to the first embodiment of the present invention.
  • FIG. 11 is an illustration for explaining a process of obtaining a motion vector of an enhancement layer;
  • FIG. 12 is a block diagram of another exemplary motion vector reconstruction module for implementing the method according to the fourth embodiment of the present invention.
  • FIG. 13 is a block diagram of a video decoder
  • FIG. 14 is a block diagram of an exemplary motion vector reconstruction module according to the present invention.
  • FIG. 15 is a block diagram of another exemplary motion vector reconstruction module for implementing the method according to the fourth embodiment of the present invention.
  • FIG. 16 is a schematic diagram illustrating a bitstream structure
  • FIG. 17 is a diagram illustrating the detailed structure of each group of pictures
  • FIG. 18 is a diagram illustrating the detailed structure of a motion vector (MV) field.
  • the present invention presents a method for constructing a base layer in such a way as to minimize distortion when only the base layer is used, and a method for quantizing an enhancement layer in such a way as to minimize overhead when all layers are used.
  • FIG. 1 shows an example in which one motion vector is divided into three motion vector components.
  • the motion vector A is reconstructed as the sum of a base layer motion vector component B, a first enhancement layer motion vector component El, and a second enhancement layer motion vector component E2.
  • a motion vector obtained as a result of a motion vector search with the predetermined pixel accuracy as described above is defined as an 'actual motion vector'.
  • Pixel accuracy used for the highest enhancement layer can be typically selected as the predetermined pixel accuracy.
  • the motion vectors of the respective layers have different pixel accuracies that increase in an order from the lowest (close to a base layer) to the highest (away from the base layer).
  • the base layer has one pixel accuracy
  • the first enhancement layer has a half pixel accuracy
  • the second enhancement layer has a quarter pixel accuracy.
  • An encoder transmits the reconstructed motion vector to a predecoder that truncates a part of the motion vector in an order from the highest to the lowest layers while a decoder receives the remaining part of the motion vector.
  • an encoder may transmit motion vector components of all layers (the base layer, the first enhancement layer, and the second enhancement layer) while the predecoder may transmit only components of the base layer and the first enhancement layer to the decoder by truncating a component of the second enhancement layer when it determines according to available communication conditions that transmission of all the motion vector components is unsuitable.
  • the decoder uses the components of the base layer and the first enhancement layer to reconstruct a motion vector.
  • the base layer is essential motion vector information having the highest priority and it cannot be omitted during transmission.
  • a bitrate in the base layer must be equal to or less than the minimum bandwidth supported by a network.
  • the bitrate in transmission of all the layers (the base layer and the first and second enhancement layers) must be equal to or less than the maximum bandwidth.
  • the present invention proposes methods for constructing a base layer according to first through third embodiments and verifies the methods through experiments.
  • a motion vector is constructed into multiple layers: a motion vector component of the base layer represented with integer-pixel accuracy, and motion vector components of enhancement layers respectively represented with half- and quarter-pixel accuracy.
  • the base layer uses an integer to represent a motion vector component, and the en ⁇ hancement layers use a symbol of 1, -1, or 0 instead of a real number in order to represent motion vector components in a simple way. While a motion vector is usually represented by a pair of x, and y components, only one component will be described throughout this specification for clarity of explanation.
  • the motion vector component of the first enhancement layer with half pixel accuracy may have a value of -0.5, 0.5, or 0, it is represented by the symbol -1, 1, or 0.
  • the motion vector component of the second en ⁇ hancement layer with quarter pixel accuracy may have a value of -0.25, 0.25, or 0, it is represented by the symbol -1, 1, or 0.
  • One of the most important goals in implementing motion scalability is to prevent significant degradation in coding performance when an enhancement layer is truncated.
  • the truncation of the enhancement layer increases a motion vector error, thereby significantly degrading the quality of the video reconstructed by a decoder, this will also reduce the effect of improving video quality by allocating more bits to texture information due to the reduction of motion vector bits. Therefore, the first through third embodiments of the present invention are focused on preventing a significant drop in the peak signal-to-noise ratio (PSNR) when only a base layer is used, compared to when a base layer and enhancement layers are used.
  • PSNR peak signal-to-noise ratio
  • FIG. 2 shows an example of predicting a motion vector in first and second enhancement layers from a motion vector in a base layer.
  • FIG. 3 illustrates an example of obtaining a predicted value for a current block by its correlation with neighboring blocks.
  • a predicted value of a current block (a) is obtained by correlation with neighboring blocks (b), (c), and (d), whose motion vectors have been determined.
  • the predicted value may be the median or average value of the motion vectors of the neighboring blocks (b), (c), and (d).
  • an integer value of the current block (a) is found to be closer to a predicted value obtained from neighboring blocks.
  • a feature of a second embodiment of the present invention is that an integer motion vector component of a base layer is as close to zero as possible.
  • an actual motion vector is separated into sign and magnitude.
  • the magnitude of the motion vector is represented using an unsigned integer and the original sign is then attached to the unsigned integer. This method makes probable that the motion vector component of the base layer is zero, which enables more efficient quantization since most quantization modules quantize zeros very efficiently. This method is expressed by Equation (1):
  • a Wnd denotes a function giving the largest integer not exceeding x (by stripping the decimal part).
  • Table 1 shows examples of values for each layer that can be obtained with the values x and x in Equation (1).
  • the values x and x are multiplied by a factor of 4 and expressed as integer values
  • A(x-x ) in the b lowest row denotes an error between an actual value and an integer motion vector of the base layer.
  • El and E2 respectively denote motion vector components of the first and second enhancement layers, expressed as symbols.
  • the method of the second embodiment provides higher possibility that the integer, motion vector component x of the base layer has more b zeros, thereby increasing the compression efficiency as compared to the first embodiment in which x is obtained by simply truncating the decimal part
  • motion vector components of the first and second enhancement layers are expressed as the symbols -1, 0, or 1, which results in reduced efficiency.
  • the second embodiment suffers from a significant distortion caused by a difference - as much as 0.75 - between actual and quantized motion vectors even when only the base layer is used.
  • the difference between an actual motion vector and a quantized motion vector of a base layer is minimized. That is, the third embodiment concentrates on reducing that difference to less than 0.5, which is an improvement over the first and second embodiments where the maximum difference is 0.75. This is accomplished by modifying the second embodiment to some extent. That is, an integer nearest to an actual motion vector is selected as a motion vector component of the base layer by rounding off the actual motion vector, as defined by Equation (2):
  • Equation (2) is similar to Equation (1) except for the use of rounding off.
  • FIG. 4 shows an example in which a motion vector with a value of 0.75 is represented according to the third embodiment of the present invention.
  • the value 1 is selected as a motion vector component of a base layer since 1 is an integer nearest to the actual motion vector of 0.75.
  • a motion vector component of the first enhancement layer that minimizes the difference between the actual motion vector and the motion vector of the first enhancement layer may be -0.5 or 0 (a motion vector of the first en ⁇ hancement layer is sum of a motion vector of the base layer and a motion vector component of the first enhancement layer).
  • the minimum difference is 0.25.
  • the value closest to the motion vector component of the immediately lower layer is chosen as the motion vector component of the first enhancement layer.
  • the value 0 is finally selected as the motion vector component of the first en ⁇ hancement layer. [86] By doing so, the difference between the actual motion vector and the motion vector component of the base layer can be reduced to 0.25.
  • the third embodiment of the present invention provides improved coding performance when only a base layer is used by limiting the difference to below 0.5. However, this method has the drawback of increasing the size of the base layer over the first or second embodiments. Table 2 shows examples of values that can be created by Equation (2).
  • Table 3 shows the results of experiments where a Foreman CIF sequence is compressed at frame rate of 30 Hz and at bitrate of 256 Kbps. The experiments were done to verify the performance of the first through third embodiments of the present invention. Table 3 lists the number of bits (hereinafter 'size' will refer to 'number of bits') needed for motion vectors of a base layer and first and second enhancement layers according to the first through third embodiments.
  • a base layer has the smallest size in the first embodiment, but the first and second enhancement layers have the largest size since a motion vector of a base layer is predicted, thus increasing the total size.
  • the second embodiment increases the size of a base layer as well as a total size compared to the first embodiment. The total size is the largest in the second embodiment.
  • the base layer has the largest size but the first en ⁇ hancement layer has the smallest size since it is highly probable that a motion vector component of the first enhancement layer will have a value of zero.
  • the second en ⁇ hancement layer has a size similar to its counterparts in the first and second em ⁇ bodiments.
  • FIG. 5 is a graph illustrating the results of measuring PSNRs (as a video quality indicator) using motion vectors from the three layers according to the first through third embodiments of the present invention as detailed in Table 3. Referring to FIG. 5, the third embodiment exhibits the highest performance while the first embodiment exhibits the poorest performance.
  • the first embodiment has similar performance to the second embodiment when only a base layer is used while it has weak performance compared to the other embodiments when all motion vector layers are used.
  • the third embodiment exhibits superior performance when only the base layer is used. Specifically, the PSNR value in the third embodiment is more than 1.0 dB higher than that of the second embodiment. This is achieved by minimizing the difference between an integer motion vector component of the base layer and an actual motion vector. That is, since it is more efficient for coding performance to minimize this difference than to slightly decrease an integer value, the third embodiment exhibits the best performance.
  • the third embodiment is superior to the first and second em ⁇ bodiments in terms of the size of the first enhancement layer, but it has little difference in terms of the size of the second enhancement layer.
  • the third embodiment is not advantageous over the others when all motion vector layers are used.
  • FIG. 6 is a graph illustrating an experimental result of compressing a Foreman CIF sequence at 100 Kbps according to the third embodiment.
  • the third embodiment exhibits superior performance when only the base layer is used, compared to when all the layers are used. Specifically, while the third embodiment shows excellent performance when the base layer or a combination of the base layer and the first en ⁇ hancement layer is used, its performance degrades when all the layers are used since the size of the second enhancement layer is large.
  • the third embodiment is intended to allocate a large amount of in ⁇ formation to the second enhancement layer. Since the second enhancement layer is used only for a sufficient bitrate, its large size does not significantly affect performance. For a low bitrate, only the base layer and the first enhancement layer are used, and bits in the second enhancement layer can be truncated.
  • the present invention proposes a method for providing excellent coding performance when all motion vector layers are used by adding two compression rules.
  • the two compression rules are found in Table 2.
  • the first rule is that the motion vector component (Ax b ) of the base layer has an opposite sign to the motion vector component El of the first enhancement layer except, of course, when El is zero.
  • the motion vector component El of the first enhancement layer is represented by 0 or 1
  • a decoder reconstructs the original value of El by attaching a sign to El, which is opposite to the sign of the motion vector component of the base layer.
  • El since El has an opposite sign to the motion vector component of the base layer (except zero, which has no sign), El can be expressed as either 0 or 1.
  • An encoder converts -1 to 1 while a decoder can reconstruct the original value of El by attaching the opposite sign to 1.
  • the second compression rule is that the motion vector component E2 of the second enhancement layer is always 0 when El is 1 or -1. Thus, E2 is not encoded when a corresponding El is not 0.
  • the symbol 'X' in Table 4 denotes a portion not transmitted, and this constitutes a quarter of the total number of cases. Thus, the number of bits can be reduced by 25%.
  • compression efficiency can be further increased.
  • a method created by applying the first and second compression rules to the third embodiment is referred to as a 'fourth embodiment'.
  • the compression rules in the fourth embodiment can also be applied to a base layer, a first enhancement layer, and a second enhancement layer for a motion vector consisting of four or more layers. Furthermore, either the first or second or both rules can be applied depending on the type of application.
  • Table 5 shows the number of bits needed for motion vectors of a base layer, a first enhancement layer, and a second enhancement layer according to the fourth embodiment of the present invention.
  • the fourth embodiment reduces the sizes of the first and second enhancement layers by 15.68% and 11.90% compared to the third embodiment, thereby significantly reducing the overall bitrate.
  • the number of bits in the second en ⁇ hancement layer is reduced by less than 25% since the value of the omitted bits is zero and are efficiently compressed by an entropy encoding module.
  • FIG. 7 is a graph comparing the experimental results of the third embodiment (FIG. 6) and the fourth embodiment of the present invention. As shown in FIG. 7, the fourth embodiment exhibits similar performance to the third embodiment when only the base layer is used, but exhibits superior performance thereto when all the layers are used.
  • a motion vector consists of three layers
  • the present invention can apply to a motion vector consisting of more than three layers.
  • a motion vector search is performed on a base layer with 1 pixel accuracy, a first en ⁇ hancement layer with 1/2 pixel accuracy, and a second enhancement layer with 1/4 pixel accuracy.
  • this is provided as an example only, and it will be readily apparent to those skilled in the art that the motion vector search may be performed with different pixel accuracies than those stated above.
  • the pixel accuracies increase with each layer, in a manner similar to the afore-mentioned embodiments.
  • an encoder encodes an input video using a multilayered motion vector while a predecoder or a decoder decodes all or part of the input video.
  • the overall process will now be described schematically with reference to FIG. 8.
  • FIG. 8 shows the overall configuration of a video coding system.
  • the video coding system includes an encoder 100, a predecoder 200, and a decoder 300.
  • the encoder 100 encodes an input video into a bitstream 20.
  • the predecoder 200 truncates part of the texture data in the bitstream 20 according to extraction conditions such as bitrate, resolution or frame rate determined considering the communication en- vironment.
  • the decoder 300 therefore, implements scalability for the texture data.
  • the predecoder 200 also implements motion scalability by truncating part of the motion data in the bitstream 20 in an order from the highest to the lowest layers according to the communication environment or the number of texture bits. By implementing texture or motion scalability in this way, the predecoder can extract various bitstreams 25 from the original bitstream 20.
  • the decoder 300 generates an output video 30 from the extracted bitstream 25.
  • the predecoder 200 or the decoder 300 or both may extract the bitstream 25 according to the extraction conditions.
  • FIG. 9 is a block diagram of an encoder 100 of a video coding system.
  • the encoder
  • 100 includes a partitioning module 110, a motion vector reconstruction module 120, a temporal filtering module 130, a spatial transform module 140, a quantization module 150, and an entropy encoding module 160.
  • the partitioning module 110 partitions an input video 10 into several groups of pictures(GOPs), each of which is independently encoded as a unit.
  • the motion vector reconstruction module 120 finds an actual motion vector for a frame of one GOP with the predetermined pixel accuracy, and sends the motion vector to the temporal filtering module 130.
  • the motion vector reconstruction module 120 uses this actual motion vector and a predetermined method (one of first through third embodiments) to determine a motion vector component of the base layer. Next, it determines a motion vector component of an enhancement layer with the enhancement layer pixel accuracy that is closer to the actual motion vector.
  • the motion vector re ⁇ construction module 120 also sends an integer motion vector component of the base layer and a symbol value that is the motion vector component of the enhancement layer to the entropy encoding module 160.
  • the multilayered motion information is encoded by the entropy encoding module 160 using a predetermined encoding algorithm.
  • FIG. 10 is a block diagram of an exemplary motion vector reconstruction module
  • the motion vector recon ⁇ struction module 120 includes a motion vector search module 121, a base layer de ⁇ termining module 122, and an enhancement layer determining module 123.
  • the motion vector reconstruction module 120 further includes an enhancement layer compression module 125 with either a first or second compression module 126 or 127 or both.
  • the motion vector search module 121 performs a motion vector search of each block in a current frame (at a predetermined pixel accuracy) in order to obtain an actual motion vector.
  • the block may be a fixed variable size block. When a variable size block is used, information about the block size (or mode) needs to be transmitted together with the actual motion vector.
  • a current image frame is partitioned into blocks of a predetermined pixel size, and a block in a reference image frame is compared with the corresponding block in the current image frame according to the predetermined pixel accuracy in order to derive the difference between the two blocks.
  • a motion vector that gives the minimum sum of errors is designated as the motion vector for the current block.
  • a search range may be predefined using parameters. A smaller search range reduces search time and exhibits good performance when a motion vector exists within the search range. However, the prediction accuracy will be decreased for a fast-motion image where the motion vector does not exist within the range.
  • Motion estimation may be performed using variable size blocks instead of the above fixed-size block.
  • a motion vector search is performed on blocks of variable pixel sizes to determine a variable block size and a motion vector that minimize a predetermined cost function J.
  • Equation (3) The cost function is defined by Equation (3):
  • D is the number of bits used for coding a frame difference
  • R is the number of bits used for coding an estimated motion vector
  • is a Lagrangian coefficient
  • the base layer determining module 122 determines an integer motion vector component of a base layer according to the first through third embodiments. In the first embodiment, it determines the motion vector component of the base layer by spatial correlation with the motion vector components of neighboring blocks and rounding up or down the decimal part of the actual motion vector.
  • the base layer determining module 122 determines the motion vector component of the base layer by separating the actual motion vector into a sign and a magnitude.
  • the magnitude of the motion vector is represented by an unsigned integer to which the original sign is attached. The determination process is shown in Equation (1).
  • the base layer determining module 122 determines the motion vector component of the base layer by finding an integer value nearest to the actual motion vector. This nearest integer value is calculated by Equation (2).
  • the enhancement layer determining module 123 determines a motion vector component of an enhancement layer in such a way as to minimize an error between the actual motion vector and the motion vector component. When two or more vectors with the same error exist, the motion vector that minimizes the error of the motion vector in the immediately lower layer is chosen as the motion vector component of the enhancement layer.
  • a motion vector component of a base layer is determined according to the first through third embodiments and motion vector components of the first through third en ⁇ hancement layers are determined using a separate method.
  • the value 1 is determined as the motion vector component of the base layer according to one of the first through third embodiments
  • a process for determining the motion vector components of the enhancement layers will now be described with reference to FIG. 11.
  • a 'cumulative value' of a layer is defined as the sum of motion vector components of the lower layers.
  • the motion vector reconstruction module 120 further includes the enhancement layer compression module 125 with either the first or second compression module 126 or 127 or both as shown in FIG. 12.
  • the first compression module 126 converts the negative number into a positive number having the same magnitude.
  • the second compression module 127 does not encode the motion vector component of the second enhancement layer.
  • the temporal filtering module [138] Referring to FIG. 9, to reduce temporal redundancies, the temporal filtering module
  • MCTF Motion Compensated Temporal Filtering
  • UMCTF Unconstrained MCTF
  • the spatial transform module 140 removes spatial redundancies from these frames using the discrete cosine transform (DCT) or wavelet transform, and creates transform coefficients.
  • DCT discrete cosine transform
  • wavelet transform wavelet transform
  • the quantization module 150 quantizes those transform coefficients. Quantization is the process of converting real transform coefficients into discrete values and mapping the quantized coefficients into quantization indices. In particular, when a wavelet transform is used for spatial transformation, embedded quantization can often be used. Embedded ZeroTrees Wavelet (EZW), Set Partitioning in Hierarchical Trees (SPIHT), and Embedded ZeroBlock Coding (EZBC) are examples of an embedded quantization algorithm.
  • EZW Embedded ZeroTrees Wavelet
  • SPIHT Set Partitioning in Hierarchical Trees
  • EZBC Embedded ZeroBlock Coding
  • the entropy encoding module 160 losslessly encodes the transform coefficients quantized by the quantization module 150 and the motion information generated by the motion vector reconstruction module 120 into a bitstream 20.
  • various techniques such as arithmetic encoding and variable-length encoding may be used.
  • FIG. 13 is a block diagram of a decoder 300 in a video coding system according to an embodiment of the present invention.
  • the decoder 300 includes an entropy decoding module 310, an inverse quantization module 320, an inverse spatial transform module 330, an inverse temporal filtering module 340, and a motion vector reconstruction module 350.
  • the entropy decoding module 310 performs the inverse of an entropy encoding process to extract texture information (encoded frame data) and motion information from the bitstream 20.
  • FIG. 14 is a block diagram of an exemplary motion vector reconstruction module
  • the motion vector reconstruction module 350 includes a layer reconstruction module 351 and a motion addition module 352.
  • the layer reconstruction module 351 interprets the extracted motion information and recognizes motion information for each layer.
  • the motion information contains block information and motion vector information for each layer.
  • the layer recon ⁇ struction module 351 then reconstructs a motion vector component of each layer from a corresponding layer value contained in the motion information.
  • the 'layer value' means a value received from the encoder. Specifically, an integer value representing a motion vector component of a base layer or a symbol value representing a motion vector component of an enhancement layer.
  • the layer reconstruction module 351 reconstructs the original motion vector component from the symbol value.
  • the motion addition module 352 reconstructs a motion vector by adding the motion vector components of the base layer and the enhancement layer together and sending the motion vector to the inverse temporal filtering module 340.
  • FIG. 15 is a block diagram of another exemplary motion vector reconstruction module 350 for implementing the method according to the fourth embodiment of the present invention.
  • the motion vector reconstruction module 350 includes a layer reconstruction module 351, a motion addition module 352, and an enhancement layer reconstruction module 353 with either first or second reconstruction modules 354 and 355 or both.
  • the first reconstruction module 354 attaches a sign to this value that is opposite to the sign of a motion vector component of a base layer, and obtains a motion vector component corresponding to the resultant value (symbol).
  • the motion vector component is 0.
  • the second reconstruction module 355 sets the value of motion vector component of the second enhancement layer to 0 when the value of the first enhancement layer is not 0. When the value is 0, the second reconstruction module obtains a motion vector component corresponding to a value of the second enhancement layer. Then, the motion addition module 352 reconstructs a motion vector by adding the motion vector components of the base layer and the first and second enhancement layers together.
  • the inverse quantization module 320 performs inverse quantization on the extracted texture information and outputs transform coefficients. Inverse quantization is the process of obtaining quantized coefficients from quantization indices received from the encoder 100. A mapping table of indices and quantization coefficients is received from the encoder 100.
  • the inverse spatial transform module 330 inverse- transforms the transform coefficients into transform coefficients in a spatial domain. For example, in the DCT transform the transform coefficients are inverse-transformed from the frequency domain to the spatial domain. In the wavelet transform, the transform coefficients are inversely transformed from the wavelet domain to the spatial domain.
  • the inverse temporal filtering module 340 performs inverse temporal filtering on the transform coefficients in the spatial domain (i.e., a temporal residual image) using the reconstructed motion vectors received from the motion vector reconstruction module 350 in order to reconstruct frames making up a video sequence.
  • a module refers to, but is not limited to, a software or hardware component such as a Field Programmable Gate Array (FPGA) or an Ap ⁇ plication Specific Integrated Circuit (ASIC), which performs certain tasks.
  • a module may advantageously be configured to reside on the addressable storage medium and to execute on one or more processors.
  • a module may include, by way of example, components such as software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, or variables.
  • the functionality provided for in the components and modules may be combined into fewer components and modules or further separated into additional components and modules.
  • the components and modules may be implemented such a way that they execute one or more computers in a communication system.
  • FIGS. 16 through 18 illustrate a structure of a bitstream 400. Specifically, FIG. 16 is a schematic diagram illustrating an overall structure of the bitstream 400.
  • the bitstream 400 is composed of a sequence header field 410 and a data field 420 containing a plurality of GOP fields 430 through 450.
  • the sequence header field 410 specifies image properties such as frame width (2 bytes) and height (2 bytes), a GOP size (1 byte), and a frame rate (1 byte).
  • the data field 420 contains all the image information and other information
  • FIG. 17 shows the detailed structure of each GOP field 430.
  • the GOP field 430 consists of a GOP header 460, a T field 470 that specifies in-
  • FIG. 18 shows the detailed structure of the MV field 480 consisting of MV through MV fields.
  • each of the MV (1) through MV (n-l) fields specifies variable size block information such as size and position of each variable size block and motion vector information (symbols representing motion vector components) for each layer.
  • the present invention reduces the size of an enhancement layer while minimizing an error in a base layer.
  • the present invention also enables adaptive allocation of the amount of bits between motion information and texture information using motion scalability.

Abstract

An apparatus and method for improving the multi-layered motion vector compression efficiency of a video coding method by efficiently predicting a motion vector in an enhancement layer from a motion vector in a base layer. The apparatus includes a base layer determining module that determines motion vector component of a base layer having the base layer pixel accuracy using the obtained motion vector, and an enhancement layer determining module that determines a motion vector component of an enhancement layer having the enhancement layer pixel accuracy which is obtained motion vector.

Description

Description METHOD AND APPARATUS FOR IMPLEMENTING MOTION
SCALABILITY
Technical Field
[1] The present invention relates to a video compression method, and more par¬ ticularly, to an apparatus and a method for improving the compression efficiency of a motion vector by efficiently predicting a motion vector in an enhancement layer from a motion vector in a base layer, in a video coding method using a multilayer structure.
Background Art
[2] The development of information technology (IT) such as the Internet has increased text, voice and video communication. Conventional text communication cannot satisfy the various demands of users, and thus multimedia services that can provide various types of information such as text, pictures, and music have increased. Multimedia data requires a large capacity storage medium and a wide bandwidth for transmission since the size of multimedia data is usually large. Accordingly, a compression coding method for transmitting multimedia that includes text, video, and audio is necessary.
[3] A basic principle of data compression is removing data redundancy. Data can be compressed by removing spatial redundancy where the same color or object is repeated in an image, or by removing temporal redundancy where there is little change between adjacent frames in a moving image or the same sound is repeated in audio, or by removing visual redundancy taking into account human eyesight and limited perception of high frequency.
[4] Currently, most video coding standards are based on a motion compensation estimation coding method. Temporal redundancy is usually removed by temporal filtering based on motion compensation, and spatial redundancy is usually removed by spatial transform.
[5] To transmit multimedia created after removing data redundancy, transmission media are necessary. Different types of transmission media for multimedia have different performance. Currently used transmission media have various transmission rates. For example, an ultrahigh- speed communication network can transmit data at a rate of several megabits per second while a mobile communication network has a transmission rate of 384 kilobits per second.
[6] Accordingly, to support transmission media having various speeds or to transmit multimedia data at a rate suitable to a transmission environment, data coding methods having scalability, such as wavelet video coding and subband video coding, may be suitable to a multimedia environment. [7] Scalability refers to the ability to partially decode a single compressed bitstream at a decoder or a pre-decoder part. The decoder or pre-decoder can reconstruct multimedia sequences having different quality levels, resolutions, or frame rates from only some of the bitstreams coded by a scalable coding method.
[8] In a conventional video coding technique, a bitstream typically consists of motion information (motion vector, block size, etc.) and texture information corresponding to a residual obtained after motion estimation.
[9] In a conventional method for achieving texture scalability, wavelet transform and embedded quantization are used to implement spatial scalability and Motion Compensated Temporal Filtering is used to provide temporal scalability.
[10] Another method for implementing texture scalability is to temporally or spatially construct texture information into multiple layers. For example, the texture information consists of multiple layers: i.e., a base layer, a first enhancement layer, and a second enhancement layer. To support spatial scalability, the respective layers have different resolution levels: i.e., Quarter Common Intermediate Format (QCIF), Common In¬ termediate Format (CIF), and 2CIF. Signal-to-noise ratio (SNR) and temporal seal- abilities are implemented within each layer.
[11] In existing video coding schemes, motion information is usually compressed losslessly as a whole. However, the non-scalable motion information can significantly degrade the coding efficiency due to an excessive amount of motion information, especially for a bitstream compressed at low bitrates. In order to solve this problem, research is being actively conducted to implement motion scalability. A method to support motion scalability is to divide motion information into layers according to relative significance and to transmit only part of the motion information for low bitrates with loss, giving more bits to textures. Motion scalability is an issue of great concern to MPEG-21 PART 13 scalable video coding.
Disclosure of Invention
Technical Problem
[12] Recently, various approaches have been proposed for implementing motion scalability by constructing a motion vector into multiple layers. The approaches are divided into two categories: a partition-based approach and an accuracy-based approach.
[13] The partitioned-based approach generates a multi-layered motion vector by obtaining motion vectors for various resolutions in a frame with the same pixel accuracy. The accuracy-based approach generates a multi-layered motion vector by obtaining motion vectors for various pixel accuracies in a frame having one resolution.
[14] The present invention proposes a method for implementing motion scalability by reconstructing a motion vector into multiple layers using the pixel accuracy-based approach. This method is focused on providing high coding performance for a base layer and an enhancement layer simultaneously.
Technical Solution
[15] The present invention provides a method for efficiently implementing motion scalability using a motion vector consisting of multiple layers.
[16] The present invention also provides a method for improving coding efficiency when using only a base layer at a low bitrate by constructing a motion vector into layers according to the pixel accuracy in such a way as to minimize distortion.
[17] The present invention also provides a method for improving coding performance by minimizing overhead when using all layers at a high bitrate.
[18] According to an aspect of the present invention, there is provided an apparatus for reconstructing a motion vector obtained at the predetermined pixel accuracy including a base layer determining module determining a motion vector component of a base layer using the obtained motion vector according to the pixel accuracy of the base layer, and an enhancement layer determining module determining a motion vector component of an enhancement layer that is close to the obtained motion vector according to the pixel accuracy of the enhancement layer.
[19] The base layer determining module may determine the motion vector component of the base layer that is close to a value predicted from motion vectors of neighboring blocks according to the pixel accuracy of the base layer.
[20] In order to determine the motion vector component of the base layer according to the pixel accuracy of the base layer, the base layer determining module may separate the obtained motion vector into a sign and a magnitude, may use an unsigned value to represent the magnitude of the motion vector, and may attach the original sign to the value.
[21] The base layer determining module may determine a value closest to the obtained motion vector as the motion vector component of the base layer according to the pixel accuracy of the base layer.
[22] The motion vector component x of the base layer may be determined using b where sign(x) denotes a signal function that returns values of 1 and -1 when x is a positive value and a negative value, respectively, x denotes an absolute value function with respect to variable x, and y x 1 +0.5J denotes a function giving the largest integer not exceeding
X +0.5 by stripping the decimal part.
[23] The apparatus for reconstructing a motion vector obtained at a predetermined pixel accuracy may further include a first compression module removing redundancy in a motion vector component of a first enhancement layer among the enhancement layers using the fact that the motion vector component of the first enhancement layer has an opposite sign to the motion vector component of the base layer when the motion vector component of the first enhancement layer is not 0.
[24] The apparatus for reconstructing a motion vector obtained at the predetermined pixel accuracy may further include a second compression module removing redundancy in a motion vector component of a second enhancement layer using the fact that the motion vector component of the second enhancement layer is always 0 when the motion vector component of the first enhancement layer is not 0.
[25] According to another aspect of the present invention, there is provided a video encoder using a motion vector consisting of multiple layers, the encoder including a motion vector reconstruction module including a motion vector search module obtaining a motion vector with the predetermined pixel accuracy, a base layer de¬ termining module determining a motion vector component of a base layer using the obtained motion vector according to the pixel accuracy of the base layer, an en¬ hancement layer determining module determining a motion vector component of an enhancement layer that is close to the obtained motion vector according to the pixel accuracy of the enhancement layer, a temporal filtering module removing temporal re¬ dundancies by filtering frames in a direction of a temporal axis using the obtained motion vectors, a spatial transform module removing spatial redundancies from the frames from which the temporal redundancies have been removed and creating transform coefficients, and a quantization module performing quantization on the transform coefficients.
[26] According to still another aspect of the present invention, there is provided an apparatus for reconstructing a motion vector consisting of a base layer and at least one enhancement layer, the apparatus including a layer reconstruction module recon¬ structing motion vector components of the respective layers from corresponding values of the layers interpreted from an input bitstream, and a motion addition module adding the reconstructed motion vector components of the layers together and providing the motion vector.
[27] According to yet another aspect of the present invention, there is provided an apparatus for reconstructing a motion vector consisting of a base layer and at least one enhancement layer, the apparatus including a first reconstruction module recon¬ structing a motion vector component of a first enhancement layer by attaching a sign to a value of the first enhancement layer interpreted from an input bitstream, which is opposite to the sign of a corresponding value of the base layer, a layer reconstruction module reconstructing motion vector components of the base layer and at least one en¬ hancement layer other than the first enhancement layer from values of the base layer and the at least one enhancement layer interpreted from the input bitstream, and a moti on addition module adding the reconstructed motion vector components of the layers together and providing the motion vector.
[28] According to a further aspect of the present invention, there is provided an apparatus for reconstructing a motion vector consisting of a base layer and at least one enhancement layer, the apparatus including a first reconstruction module recon¬ structing a motion vector component of a first enhancement layer by attaching a sign to a value of the first enhancement layer interpreted from an input bitstream, which is opposite to the sign of a corresponding value of the base layer, a second reconstruction module setting a motion vector component of a second enhancement layer to 0 when the value of the first enhancement layer is not 0 and reconstructing the motion vector component of the second enhancement layer from a value of the second enhancement layer interpreted from the input bitstream when the value of the first enhancement layer is 0, a layer reconstruction module reconstructing motion vector components of the base layer and at least one enhancement layer other than the first and second en¬ hancement layers from values of the base layer and the at least one enhancement layer interpreted from the input bitstream, and a motion addition module adding the re¬ constructed motion vector components of the layers together and providing the motion vector.
[29] According to another aspect of the present invention, there is provided a video decoder using a motion vector consisting of multiple layers, the decoder including an entropy decoding module interpreting an input bitstream and extracting texture in¬ formation and motion information from the bitstream, a motion vector reconstruction module reconstructing motion vector component of the respective layers from cor¬ responding values of the layers contained in the extracted motion information and providing the motion vector after adding the motion vector components of the respective layers together, an inverse quantization module applying inverse quantization to the texture information and outputting transform coefficients, an inverse spatial transform module inversely transforming the transform coefficients into transform coefficients in a spatial domain by performing the inverse of spatial transform, and an inverse temporal filtering module performing inverse temporal filtering on the transform coefficients in the spatial domain using the obtained motion vector and reconstructing frames in a video sequence.
[30] The motion vector reconstruction module may include a first reconstruction module reconstructing a motion vector component of a first enhancement layer by attaching a sign to a value of the first enhancement layer contained in the motion in¬ formation, which is opposite to the sign of a corresponding value of the base layer, a layer reconstruction module reconstructing motion vector components of the base layer and at least one enhancement layer other than the first enhancement layer from values of the base layer and the at least one enhancement layer, and a motion addition module adding the reconstructed motion vector components of the layers together and providing the motion vector.
[31] In addition, the motion vector reconstruction module may include a first recon¬ struction module reconstructing a motion vector component of a first enhancement layer by attaching a sign to a value of the first enhancement layer contained in the motion information, which is opposite to the sign of a corresponding value of the base layer, a second reconstruction module setting a motion vector component of a second enhancement layer to 0 when the value of the first enhancement layer is not 0 and re¬ constructing the motion vector component of the second enhancement layer from a value of the second enhancement layer contained in the motion information when the value of the first enhancement layer is 0, a layer reconstruction module reconstructing motion vector components of the base layer and at least one enhancement layer other than the first and second enhancement layers from values of the base layer and the at least one enhancement layer contained in the motion information, and a motion addition module adding the reconstructed motion vector components of the layers together and providing the motion vector.
[32] According to still another aspect of the present invention, there is provided a method for reconstructing a motion vector obtained at the predetermined pixel accuracy, the method including determining a motion vector component of a base layer using the obtained motion vector according to the pixel accuracy of the base layer, and determining a motion vector component of an enhancement layer that is close to the obtained motion vector according to the pixel accuracy of the enhancement layer.
[33] In the determining of the motion vector component of the base layer, the motion vector component of the base layer may be determined to be close to a value predicted from motion vectors of neighboring blocks according to the pixel accuracy of the base layer.
[34] In the determining of the motion vector component of the base layer, the motion vector component of the base layer may be determined according to the pixel accuracy of the base layer by separating the obtained motion vector into a sign and a magnitude, using an unsigned value to represent the magnitude of the motion vector, and attaching the original sign to the value.
[35] In the determining of the motion vector component of the base layer, a value closest to the obtained motion vector may be determined as the motion vector component of the base layer according to the pixel accuracy of the base layer.
[36] According to yet another aspect of the present invention, there is provided a method for reconstructing a motion vector consisting of a base layer and at least one enhancement layer, the method including reconstructing motion vector components of the respective layers from corresponding values of the layers interpreted from an input bitstream, and adding the reconstructed motion vector components of the layers together and providing the motion vector.
[37] According to a further aspect of the present invention, there is provided a method for reconstructing a motion vector consisting of a base layer and at least one en¬ hancement layer, the method including reconstructing a motion vector component of a first enhancement layer by attaching a sign to a value of the first enhancement layer in¬ terpreted from an input bitstream, which is opposite to the sign of a corresponding value of the base layer, reconstructing motion vector components of the base layer and at least one enhancement layer other than the first enhancement layer from values of the base layer and the at least one enhancement layer interpreted from the input bitstream, and adding the reconstructed motion vector components of the layers together and providing the motion vector.
[38] According to still another aspect of the present invention, there is provided a method for reconstructing a motion vector consisting of a base layer and at least one enhancement layer, the method including reconstructing a motion vector component of a first enhancement layer by attaching a sign to a value of the first enhancement layer interpreted from an input bitstream, which is opposite to the sign of a corresponding value of the base layer, setting a motion vector component of a second enhancement layer to 0 when the value of the first enhancement layer is not 0 and reconstructing the motion vector component of the second enhancement layer from a value of the second enhancement layer interpreted from the input bitstream when the value of the first en¬ hancement layer is 0, reconstructing motion vector components of the base layer and at least one enhancement layer other than the first and second enhancement layers from values of the base layer and the at least one enhancement layer interpreted from the input bitstream, and adding the reconstructed motion vector components of the layers together and providing the motion vector.
Description of Drawings
[39] The above and other features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings in which: [40] FIG. 1 is a diagram for explaining a method of reconstructing a multi-layered motion vector according to the pixel accuracy; [41] FIG. 2 illustrates a method for improving the compression efficiency of a motion vector according to a first embodiment of the present invention; [42] FIG. 3 illustrates an example of obtaining a predicted value for a current block by correlation with neighboring blocks;
[43] FIG. 4 illustrates a third embodiment of the present invention;
[44] FIG. 5 is a graph illustrating the results of measuring peak signal-to-noise ratios
(PSNRs) as a video quality indicator using motion vectors according to the first through third embodiments of the present invention. [45] FIG. 6 is a graph illustrating the results of measuring a PSNR when compressing a
Foreman CIF sequence at 100 Kbps according to the third embodiment of the present invention; [46] FIG. 7 is a graph comparing the experimental results of the third embodiment of
FIG. 6 and the fourth embodiment of the present invention; [47] FIG. 8 is a block diagram of a video coding system;
[48] FIG. 9 is a block diagram of a video encoder;
[49] FIG. 10 is a block diagram of an exemplary motion vector reconstruction module according to the first embodiment of the present invention; [50] FIG. 11 is an illustration for explaining a process of obtaining a motion vector of an enhancement layer; [51] FIG. 12 is a block diagram of another exemplary motion vector reconstruction module for implementing the method according to the fourth embodiment of the present invention;
[52] FIG. 13 is a block diagram of a video decoder;
[53] FIG. 14 is a block diagram of an exemplary motion vector reconstruction module according to the present invention; [54] FIG. 15 is a block diagram of another exemplary motion vector reconstruction module for implementing the method according to the fourth embodiment of the present invention;
[55] FIG. 16 is a schematic diagram illustrating a bitstream structure;
[56] FIG. 17 is a diagram illustrating the detailed structure of each group of pictures
(GOP) field; and [57] FIG. 18 is a diagram illustrating the detailed structure of a motion vector (MV) field.
Mode for Invention [58] The present invention presents a method for constructing a base layer in such a way as to minimize distortion when only the base layer is used, and a method for quantizing an enhancement layer in such a way as to minimize overhead when all layers are used.
[59] The present invention will now be described more fully with reference to the ac¬ companying drawings, in which exemplary embodiments of this invention are shown. Advantages and features of the present invention and methods of accomplishing the same may be understood more readily by reference to the following detailed de¬ scription of exemplary embodiments and the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these em¬ bodiments are provided so that this disclosure will be thorough and complete and will fully convey the concept of the invention to those skilled in the art, and the present invention will only be defined by the appended claims. Like reference numerals refer to like elements throughout the specification.
[60] FIG. 1 shows an example in which one motion vector is divided into three motion vector components. Referring to FIG. 1, after finding a motion vector A with the pre¬ determined pixel accuracy, the motion vector A is reconstructed as the sum of a base layer motion vector component B, a first enhancement layer motion vector component El, and a second enhancement layer motion vector component E2. A motion vector obtained as a result of a motion vector search with the predetermined pixel accuracy as described above is defined as an 'actual motion vector'.
[61] Pixel accuracy used for the highest enhancement layer can be typically selected as the predetermined pixel accuracy. The motion vectors of the respective layers have different pixel accuracies that increase in an order from the lowest (close to a base layer) to the highest (away from the base layer). For example, the base layer has one pixel accuracy, the first enhancement layer has a half pixel accuracy, and the second enhancement layer has a quarter pixel accuracy.
[62] An encoder transmits the reconstructed motion vector to a predecoder that truncates a part of the motion vector in an order from the highest to the lowest layers while a decoder receives the remaining part of the motion vector. By performing this process it is possible to implement scalability for a motion vector (motion scalability).
[63] For example, an encoder may transmit motion vector components of all layers (the base layer, the first enhancement layer, and the second enhancement layer) while the predecoder may transmit only components of the base layer and the first enhancement layer to the decoder by truncating a component of the second enhancement layer when it determines according to available communication conditions that transmission of all the motion vector components is unsuitable. The decoder uses the components of the base layer and the first enhancement layer to reconstruct a motion vector.
[64] The base layer is essential motion vector information having the highest priority and it cannot be omitted during transmission. Thus, a bitrate in the base layer must be equal to or less than the minimum bandwidth supported by a network. The bitrate in transmission of all the layers (the base layer and the first and second enhancement layers) must be equal to or less than the maximum bandwidth.
[65] METHOD FOR CONSTRUCTING THE BASE LAYER
[66] The present invention proposes methods for constructing a base layer according to first through third embodiments and verifies the methods through experiments.
[67] In each embodiment, a motion vector is constructed into multiple layers: a motion vector component of the base layer represented with integer-pixel accuracy, and motion vector components of enhancement layers respectively represented with half- and quarter-pixel accuracy.
[68] The base layer uses an integer to represent a motion vector component, and the en¬ hancement layers use a symbol of 1, -1, or 0 instead of a real number in order to represent motion vector components in a simple way. While a motion vector is usually represented by a pair of x, and y components, only one component will be described throughout this specification for clarity of explanation.
[69] For example, while the motion vector component of the first enhancement layer with half pixel accuracy may have a value of -0.5, 0.5, or 0, it is represented by the symbol -1, 1, or 0. Similarly, when the motion vector component of the second en¬ hancement layer with quarter pixel accuracy may have a value of -0.25, 0.25, or 0, it is represented by the symbol -1, 1, or 0.
[70] Since a motion vector of the base layer is represented by an integer part, there is a close spatial correlation between motion vectors in the base layer. Thus, after considering this spatial correlation and obtaining a predicted value of a current block from the integer motion vectors of neighboring blocks, only a residual between an actual motion vector of the current block and the predicted value is encoded and transmitted. Conversely, the enhancement layers are usually encoded without considering neighboring blocks because there is little spatial correlation between motion vectors.
[71] One of the most important goals in implementing motion scalability is to prevent significant degradation in coding performance when an enhancement layer is truncated. When the truncation of the enhancement layer increases a motion vector error, thereby significantly degrading the quality of the video reconstructed by a decoder, this will also reduce the effect of improving video quality by allocating more bits to texture information due to the reduction of motion vector bits. Therefore, the first through third embodiments of the present invention are focused on preventing a significant drop in the peak signal-to-noise ratio (PSNR) when only a base layer is used, compared to when a base layer and enhancement layers are used.
[72] In a first embodiment of the present invention, a method for improving the compression efficiency of a motion vector using a spatial correlation of the base layer is proposed. According to the first embodiment, the decimal part of an actual value is rounded up or down so that the resultant value is closer to a value predicted from the motion vector components of the neighboring blocks in the base layer. FIG. 2 shows an example of predicting a motion vector in first and second enhancement layers from a motion vector in a base layer. Referring to FIG. 2, when a value predicted from neighboring blocks in the base layer is -1 and an actual motion vector value is 0.75, the actual motion vector value is rounded down to 0, which is closer to the predicted value of -1, and then motion vector value of 1 in the first and second enhancement layers are predicted from the motion vector value of 0 in the base layer.
[73] FIG. 3 illustrates an example of obtaining a predicted value for a current block by its correlation with neighboring blocks. Referring to FIG. 3, when motion vectors in a base layer are determined in the diagonal direction, a predicted value of a current block (a) is obtained by correlation with neighboring blocks (b), (c), and (d), whose motion vectors have been determined. The predicted value may be the median or average value of the motion vectors of the neighboring blocks (b), (c), and (d). In the first embodiment, as shown in FIG. 3, an integer value of the current block (a) is found to be closer to a predicted value obtained from neighboring blocks.
[74] According to the first embodiment, since a motion vector component of the base layer is quantized using a residual between the actual value and the predicted value obtained from the neighboring blocks, it is possible to represent the motion vector component of the base layer by the integer value closest to the predicted value, thereby most efficiently quantizing the base layer. As such, this method is efficient in reducing the size of a base layer.
[75] A feature of a second embodiment of the present invention is that an integer motion vector component of a base layer is as close to zero as possible. In the second embodiment, to make the motion vector component of the base layer as close to zero as possible, an actual motion vector is separated into sign and magnitude. The magnitude of the motion vector is represented using an unsigned integer and the original sign is then attached to the unsigned integer. This method makes probable that the motion vector component of the base layer is zero, which enables more efficient quantization since most quantization modules quantize zeros very efficiently. This method is expressed by Equation (1):
[76]
Figure imgf000012_0001
[77] where sign(x) denotes a signal function that returns values of 1 and -1 when x is a positive value and a negative value, respectively, denotes the absolute value of variable x,
a Wnd denotes a function giving the largest integer not exceeding x (by stripping the decimal part).
[78] Table 1 shows examples of values for each layer that can be obtained with the values x and x in Equation (1). For convenience of explanation, the values x and x are multiplied by a factor of 4 and expressed as integer values, and A(x-x ) in the b lowest row denotes an error between an actual value and an integer motion vector of the base layer. El and E2 respectively denote motion vector components of the first and second enhancement layers, expressed as symbols.
[79]
Table 1
Figure imgf000013_0001
[80] As is evident from Table 1, the method of the second embodiment provides higher possibility that the integer, motion vector component x of the base layer has more b zeros, thereby increasing the compression efficiency as compared to the first embodiment in which x is obtained by simply truncating the decimal part
(*j = W )-
However, like in the first embodiment, motion vector components of the first and second enhancement layers are expressed as the symbols -1, 0, or 1, which results in reduced efficiency. Furthermore, like the first embodiment, the second embodiment suffers from a significant distortion caused by a difference - as much as 0.75 - between actual and quantized motion vectors even when only the base layer is used.
[81] In a third embodiment of the present invention, the difference between an actual motion vector and a quantized motion vector of a base layer is minimized. That is, the third embodiment concentrates on reducing that difference to less than 0.5, which is an improvement over the first and second embodiments where the maximum difference is 0.75. This is accomplished by modifying the second embodiment to some extent. That is, an integer nearest to an actual motion vector is selected as a motion vector component of the base layer by rounding off the actual motion vector, as defined by Equation (2):
[82]
Figure imgf000014_0001
[83] Equation (2) is similar to Equation (1) except for the use of rounding off. FIG. 4 shows an example in which a motion vector with a value of 0.75 is represented according to the third embodiment of the present invention. Referring to FIG. 4, unlike the first and second embodiments, the value 1 is selected as a motion vector component of a base layer since 1 is an integer nearest to the actual motion vector of 0.75. As shown in FIG. 4, a motion vector component of the first enhancement layer that minimizes the difference between the actual motion vector and the motion vector of the first enhancement layer may be -0.5 or 0 (a motion vector of the first en¬ hancement layer is sum of a motion vector of the base layer and a motion vector component of the first enhancement layer).
[84] In either case, the minimum difference is 0.25. When two or more values with the minimum error are present in the first enhancement layer, the value closest to the motion vector component of the immediately lower layer is chosen as the motion vector component of the first enhancement layer.
[85] Thus, the value 0 is finally selected as the motion vector component of the first en¬ hancement layer. [86] By doing so, the difference between the actual motion vector and the motion vector component of the base layer can be reduced to 0.25. The third embodiment of the present invention provides improved coding performance when only a base layer is used by limiting the difference to below 0.5. However, this method has the drawback of increasing the size of the base layer over the first or second embodiments. Table 2 shows examples of values that can be created by Equation (2).
[87]
Table 2
Figure imgf000014_0002
[88] As is evident from Table (2), in the third embodiment there is a higher probability that the motion vector component El of the first enhancement layer will be zero, which results in higher compression efficiency. However, the motion vector component E2 of the second enhancement layer is more complicated so more bits are allocated for coding. In particular, A(x-x ) in the lowest row indicates that the difference between b the motion vector component of the base layer and the actual motion vector is less than 0.5.
[89] Table 3 shows the results of experiments where a Foreman CIF sequence is compressed at frame rate of 30 Hz and at bitrate of 256 Kbps. The experiments were done to verify the performance of the first through third embodiments of the present invention. Table 3 lists the number of bits (hereinafter 'size' will refer to 'number of bits') needed for motion vectors of a base layer and first and second enhancement layers according to the first through third embodiments.
[90]
Table 3
Figure imgf000015_0001
[91] As evident from Table 3, a base layer has the smallest size in the first embodiment, but the first and second enhancement layers have the largest size since a motion vector of a base layer is predicted, thus increasing the total size. While attempting to reduce the size of a motion vector component of a base layer by assigning more zeros to it, the second embodiment increases the size of a base layer as well as a total size compared to the first embodiment. The total size is the largest in the second embodiment.
[92] In the third embodiment, the base layer has the largest size but the first en¬ hancement layer has the smallest size since it is highly probable that a motion vector component of the first enhancement layer will have a value of zero. The second en¬ hancement layer has a size similar to its counterparts in the first and second em¬ bodiments.
[93] When only the base layer is used for coding, it is advantageous to select a method where the base layer has the smallest size. When all layers are used for coding, a method that minimizes the total size may be selected. In the former case, the first embodiment is selected, and in the latter case the third embodiment is selected.
[94] FIG. 5 is a graph illustrating the results of measuring PSNRs (as a video quality indicator) using motion vectors from the three layers according to the first through third embodiments of the present invention as detailed in Table 3. Referring to FIG. 5, the third embodiment exhibits the highest performance while the first embodiment exhibits the poorest performance.
[95] In particular, the first embodiment has similar performance to the second embodiment when only a base layer is used while it has weak performance compared to the other embodiments when all motion vector layers are used.
[96] It should be especially noted that the third embodiment exhibits superior performance when only the base layer is used. Specifically, the PSNR value in the third embodiment is more than 1.0 dB higher than that of the second embodiment. This is achieved by minimizing the difference between an integer motion vector component of the base layer and an actual motion vector. That is, since it is more efficient for coding performance to minimize this difference than to slightly decrease an integer value, the third embodiment exhibits the best performance.
[97] METHOD OF EFFICIENTLY COMPRESSING THE ENHANCEMENT
LAYER
[98] Referring to Table 3, the third embodiment is superior to the first and second em¬ bodiments in terms of the size of the first enhancement layer, but it has little difference in terms of the size of the second enhancement layer. Thus, for low bitrate coding where the size of the motion vector largely affects the performance, the third embodiment is not advantageous over the others when all motion vector layers are used.
[99] FIG. 6 is a graph illustrating an experimental result of compressing a Foreman CIF sequence at 100 Kbps according to the third embodiment.
[100] As evident from FIG. 6, since the lOOkbps is a low bitrate, the third embodiment exhibits superior performance when only the base layer is used, compared to when all the layers are used. Specifically, while the third embodiment shows excellent performance when the base layer or a combination of the base layer and the first en¬ hancement layer is used, its performance degrades when all the layers are used since the size of the second enhancement layer is large.
[101] However, the third embodiment is intended to allocate a large amount of in¬ formation to the second enhancement layer. Since the second enhancement layer is used only for a sufficient bitrate, its large size does not significantly affect performance. For a low bitrate, only the base layer and the first enhancement layer are used, and bits in the second enhancement layer can be truncated.
[102] In order to prevent significant degradation due to the presence of the second en¬ hancement layer in the third embodiment, the present invention proposes a method for providing excellent coding performance when all motion vector layers are used by adding two compression rules.
[103] The two compression rules are found in Table 2. Referring to Table 2, the first rule is that the motion vector component (Ax b ) of the base layer has an opposite sign to the motion vector component El of the first enhancement layer except, of course, when El is zero. In other words, the motion vector component El of the first enhancement layer is represented by 0 or 1, and when El is 1, a decoder reconstructs the original value of El by attaching a sign to El, which is opposite to the sign of the motion vector component of the base layer.
[104] That is, since El has an opposite sign to the motion vector component of the base layer (except zero, which has no sign), El can be expressed as either 0 or 1. An encoder converts -1 to 1 while a decoder can reconstruct the original value of El by attaching the opposite sign to 1.
[105] By applying the first rule, entropy coding efficiency can be improved since the motion vector component El of the first enhancement layer can be expressed as either 0 or 1. An experimental result demonstrated that applying the first rule alone reduces the number of bits by more than 12%.
[106] Referring to Table 2, the second compression rule is that the motion vector component E2 of the second enhancement layer is always 0 when El is 1 or -1. Thus, E2 is not encoded when a corresponding El is not 0.
[107] In other words, an encoder does not encode E2 when El is not 0. A decoder uses 0 as E2 when El is not 0, and the received value as E2 when El is 0. [108] An experimental result demonstrated that applying the second rule reduces the number of bits by about 25% and by about 12% after entropy encoding. This compensates for the drawback of the third embodiment caused by the large second en¬ hancement layer. Table 4 shows the values of Table 2 after applying the first and second compression rules.
[109]
Table 4
Figure imgf000017_0001
[HO] The symbol 'X' in Table 4 denotes a portion not transmitted, and this constitutes a quarter of the total number of cases. Thus, the number of bits can be reduced by 25%. By converting -1 to 1 in the first enhancement layer, compression efficiency can be further increased. A method created by applying the first and second compression rules to the third embodiment is referred to as a 'fourth embodiment'. The compression rules in the fourth embodiment can also be applied to a base layer, a first enhancement layer, and a second enhancement layer for a motion vector consisting of four or more layers. Furthermore, either the first or second or both rules can be applied depending on the type of application.
[111] Table 5 shows the number of bits needed for motion vectors of a base layer, a first enhancement layer, and a second enhancement layer according to the fourth embodiment of the present invention.
[112]
Table 5
Figure imgf000018_0001
[113] As detailed in Table 5, the fourth embodiment reduces the sizes of the first and second enhancement layers by 15.68% and 11.90% compared to the third embodiment, thereby significantly reducing the overall bitrate. The number of bits in the second en¬ hancement layer is reduced by less than 25% since the value of the omitted bits is zero and are efficiently compressed by an entropy encoding module.
[114] Nevertheless, the number of bits can be reduced by approximately 12%. FIG. 7 is a graph comparing the experimental results of the third embodiment (FIG. 6) and the fourth embodiment of the present invention. As shown in FIG. 7, the fourth embodiment exhibits similar performance to the third embodiment when only the base layer is used, but exhibits superior performance thereto when all the layers are used.
[115] While it is described above that a motion vector consists of three layers, it will be understood by those skilled in the art that the present invention can apply to a motion vector consisting of more than three layers. Furthermore, it is described above that a motion vector search is performed on a base layer with 1 pixel accuracy, a first en¬ hancement layer with 1/2 pixel accuracy, and a second enhancement layer with 1/4 pixel accuracy. However, this is provided as an example only, and it will be readily apparent to those skilled in the art that the motion vector search may be performed with different pixel accuracies than those stated above. Although, the pixel accuracies increase with each layer, in a manner similar to the afore-mentioned embodiments.
[116] In order to implement motion scalability, an encoder encodes an input video using a multilayered motion vector while a predecoder or a decoder decodes all or part of the input video. The overall process will now be described schematically with reference to FIG. 8.
[117] FIG. 8 shows the overall configuration of a video coding system. Referring to FIG. 8, the video coding system includes an encoder 100, a predecoder 200, and a decoder 300. The encoder 100 encodes an input video into a bitstream 20. The predecoder 200 truncates part of the texture data in the bitstream 20 according to extraction conditions such as bitrate, resolution or frame rate determined considering the communication en- vironment. The decoder 300, therefore, implements scalability for the texture data. The predecoder 200 also implements motion scalability by truncating part of the motion data in the bitstream 20 in an order from the highest to the lowest layers according to the communication environment or the number of texture bits. By implementing texture or motion scalability in this way, the predecoder can extract various bitstreams 25 from the original bitstream 20.
[118] The decoder 300 generates an output video 30 from the extracted bitstream 25. Of course, either the predecoder 200 or the decoder 300 or both may extract the bitstream 25 according to the extraction conditions.
[119] FIG. 9 is a block diagram of an encoder 100 of a video coding system. The encoder
100 includes a partitioning module 110, a motion vector reconstruction module 120, a temporal filtering module 130, a spatial transform module 140, a quantization module 150, and an entropy encoding module 160.
[120] The partitioning module 110 partitions an input video 10 into several groups of pictures(GOPs), each of which is independently encoded as a unit.
[121] The motion vector reconstruction module 120 finds an actual motion vector for a frame of one GOP with the predetermined pixel accuracy, and sends the motion vector to the temporal filtering module 130. The motion vector reconstruction module 120 uses this actual motion vector and a predetermined method (one of first through third embodiments) to determine a motion vector component of the base layer. Next, it determines a motion vector component of an enhancement layer with the enhancement layer pixel accuracy that is closer to the actual motion vector. The motion vector re¬ construction module 120 also sends an integer motion vector component of the base layer and a symbol value that is the motion vector component of the enhancement layer to the entropy encoding module 160. The multilayered motion information is encoded by the entropy encoding module 160 using a predetermined encoding algorithm.
[122] FIG. 10 is a block diagram of an exemplary motion vector reconstruction module
120 according to the present invention. Referring to FIG. 10, the motion vector recon¬ struction module 120 includes a motion vector search module 121, a base layer de¬ termining module 122, and an enhancement layer determining module 123.
[123] Referring to FIG. 12, in order to implement the afore-mentioned fourth embodiment of the present invention, the motion vector reconstruction module 120 further includes an enhancement layer compression module 125 with either a first or second compression module 126 or 127 or both.
[124] The motion vector search module 121 performs a motion vector search of each block in a current frame (at a predetermined pixel accuracy) in order to obtain an actual motion vector. The block may be a fixed variable size block. When a variable size block is used, information about the block size (or mode) needs to be transmitted together with the actual motion vector.
[125] In general, to accomplish a motion vector search, a current image frame is partitioned into blocks of a predetermined pixel size, and a block in a reference image frame is compared with the corresponding block in the current image frame according to the predetermined pixel accuracy in order to derive the difference between the two blocks. A motion vector that gives the minimum sum of errors is designated as the motion vector for the current block. A search range may be predefined using parameters. A smaller search range reduces search time and exhibits good performance when a motion vector exists within the search range. However, the prediction accuracy will be decreased for a fast-motion image where the motion vector does not exist within the range.
[126] Motion estimation may be performed using variable size blocks instead of the above fixed-size block. In motion estimation using a variable size block, a motion vector search is performed on blocks of variable pixel sizes to determine a variable block size and a motion vector that minimize a predetermined cost function J.
[127] The cost function is defined by Equation (3):
[128]
J = D + λχR ... (3)
[129] where D is the number of bits used for coding a frame difference, R is the number of bits used for coding an estimated motion vector, and λ is a Lagrangian coefficient.
[130] The base layer determining module 122 determines an integer motion vector component of a base layer according to the first through third embodiments. In the first embodiment, it determines the motion vector component of the base layer by spatial correlation with the motion vector components of neighboring blocks and rounding up or down the decimal part of the actual motion vector.
[131] In the second embodiment, the base layer determining module 122 determines the motion vector component of the base layer by separating the actual motion vector into a sign and a magnitude. The magnitude of the motion vector is represented by an unsigned integer to which the original sign is attached. The determination process is shown in Equation (1).
[132] In the third embodiment, the base layer determining module 122 determines the motion vector component of the base layer by finding an integer value nearest to the actual motion vector. This nearest integer value is calculated by Equation (2).
[133] The enhancement layer determining module 123 determines a motion vector component of an enhancement layer in such a way as to minimize an error between the actual motion vector and the motion vector component. When two or more vectors with the same error exist, the motion vector that minimizes the error of the motion vector in the immediately lower layer is chosen as the motion vector component of the enhancement layer.
[134] For example, when a motion vector consists of four layers as shown in FIG. 11, a motion vector component of a base layer is determined according to the first through third embodiments and motion vector components of the first through third en¬ hancement layers are determined using a separate method. Assuming that the value 1 is determined as the motion vector component of the base layer according to one of the first through third embodiments, a process for determining the motion vector components of the enhancement layers will now be described with reference to FIG. 11. Here, a 'cumulative value' of a layer is defined as the sum of motion vector components of the lower layers.
[135] Referring to FIG. 11, when the cumulative value of the first enhancement layer is set to 0.5 as it is the closest value to 0.625, -0.5 is determined to be the motion vector component of the first enhancement layer. Two cumulative values 0.5 and 0.75, having the same error relative to 0.625, exist in the second enhancement layer, but 0.5 is selected since it is closer to the cumulative value of the first enhancement layer. Thus, 0 is determined as a motion vector component of the second enhancement layer, and then 0.125 is determined as the motion vector component of the third enhancement layer.
[136] In order to implement the aforementioned method according to the fourth embodiment of the present invention, the motion vector reconstruction module 120 further includes the enhancement layer compression module 125 with either the first or second compression module 126 or 127 or both as shown in FIG. 12.
[137] When the motion vector component of the first enhancement layer is a negative number, the first compression module 126 converts the negative number into a positive number having the same magnitude. When the motion vector component of the first enhancement layer is not 0, the second compression module 127 does not encode the motion vector component of the second enhancement layer.
[138] Referring to FIG. 9, to reduce temporal redundancies, the temporal filtering module
130 uses motion vectors obtained by the motion vector reconstruction module 121 to decompose frames into low-pass and high-pass frames in the direction of a temporal axis. A temporal filtering algorithm such as Motion Compensated Temporal Filtering (MCTF) or Unconstrained MCTF (UMCTF) can be used.
[139] The spatial transform module 140 removes spatial redundancies from these frames using the discrete cosine transform (DCT) or wavelet transform, and creates transform coefficients.
[140] The quantization module 150 quantizes those transform coefficients. Quantization is the process of converting real transform coefficients into discrete values and mapping the quantized coefficients into quantization indices. In particular, when a wavelet transform is used for spatial transformation, embedded quantization can often be used. Embedded ZeroTrees Wavelet (EZW), Set Partitioning in Hierarchical Trees (SPIHT), and Embedded ZeroBlock Coding (EZBC) are examples of an embedded quantization algorithm.
[141] The entropy encoding module 160 losslessly encodes the transform coefficients quantized by the quantization module 150 and the motion information generated by the motion vector reconstruction module 120 into a bitstream 20. For entropy encoding, various techniques such as arithmetic encoding and variable-length encoding may be used.
[142] FIG. 13 is a block diagram of a decoder 300 in a video coding system according to an embodiment of the present invention.
[143] The decoder 300 includes an entropy decoding module 310, an inverse quantization module 320, an inverse spatial transform module 330, an inverse temporal filtering module 340, and a motion vector reconstruction module 350.
[144] The entropy decoding module 310 performs the inverse of an entropy encoding process to extract texture information (encoded frame data) and motion information from the bitstream 20.
[145] FIG. 14 is a block diagram of an exemplary motion vector reconstruction module
350 according to the present invention. The motion vector reconstruction module 350 includes a layer reconstruction module 351 and a motion addition module 352.
[146] The layer reconstruction module 351 interprets the extracted motion information and recognizes motion information for each layer. The motion information contains block information and motion vector information for each layer. The layer recon¬ struction module 351 then reconstructs a motion vector component of each layer from a corresponding layer value contained in the motion information. Here, the 'layer value' means a value received from the encoder. Specifically, an integer value representing a motion vector component of a base layer or a symbol value representing a motion vector component of an enhancement layer. When the layer value is a symbol value, the layer reconstruction module 351 reconstructs the original motion vector component from the symbol value.
[147] The motion addition module 352 reconstructs a motion vector by adding the motion vector components of the base layer and the enhancement layer together and sending the motion vector to the inverse temporal filtering module 340.
[148] FIG. 15 is a block diagram of another exemplary motion vector reconstruction module 350 for implementing the method according to the fourth embodiment of the present invention.
[149] Referring to FIG. 15, the motion vector reconstruction module 350 includes a layer reconstruction module 351, a motion addition module 352, and an enhancement layer reconstruction module 353 with either first or second reconstruction modules 354 and 355 or both.
[150] In order to reconstruct a motion vector component of a first enhancement layer when a value of the extracted information of the first enhancement layer is not 0, the first reconstruction module 354 attaches a sign to this value that is opposite to the sign of a motion vector component of a base layer, and obtains a motion vector component corresponding to the resultant value (symbol). When the value of the extracted in¬ formation of the first enhancement layer is 0, the motion vector component is 0.
[151] In order to reconstruct a motion vector component of a second enhancement layer, the second reconstruction module 355 sets the value of motion vector component of the second enhancement layer to 0 when the value of the first enhancement layer is not 0. When the value is 0, the second reconstruction module obtains a motion vector component corresponding to a value of the second enhancement layer. Then, the motion addition module 352 reconstructs a motion vector by adding the motion vector components of the base layer and the first and second enhancement layers together.
[152] The inverse quantization module 320 performs inverse quantization on the extracted texture information and outputs transform coefficients. Inverse quantization is the process of obtaining quantized coefficients from quantization indices received from the encoder 100. A mapping table of indices and quantization coefficients is received from the encoder 100.
[153] The inverse of spatial transform, the inverse spatial transform module 330 inverse- transforms the transform coefficients into transform coefficients in a spatial domain. For example, in the DCT transform the transform coefficients are inverse-transformed from the frequency domain to the spatial domain. In the wavelet transform, the transform coefficients are inversely transformed from the wavelet domain to the spatial domain.
[154] The inverse temporal filtering module 340 performs inverse temporal filtering on the transform coefficients in the spatial domain (i.e., a temporal residual image) using the reconstructed motion vectors received from the motion vector reconstruction module 350 in order to reconstruct frames making up a video sequence.
[155] The term 'module', as used herein refers to, but is not limited to, a software or hardware component such as a Field Programmable Gate Array (FPGA) or an Ap¬ plication Specific Integrated Circuit (ASIC), which performs certain tasks. A module may advantageously be configured to reside on the addressable storage medium and to execute on one or more processors. Thus, a module may include, by way of example, components such as software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, or variables. The functionality provided for in the components and modules may be combined into fewer components and modules or further separated into additional components and modules. In addition, the components and modules may be implemented such a way that they execute one or more computers in a communication system.
[156] FIGS. 16 through 18 illustrate a structure of a bitstream 400. Specifically, FIG. 16 is a schematic diagram illustrating an overall structure of the bitstream 400.
[157] The bitstream 400 is composed of a sequence header field 410 and a data field 420 containing a plurality of GOP fields 430 through 450.
[158] The sequence header field 410 specifies image properties such as frame width (2 bytes) and height (2 bytes), a GOP size (1 byte), and a frame rate (1 byte).
[159] The data field 420 contains all the image information and other information
(motion vector, reference frame number, etc.) needed to reconstruct an image.
[160] FIG. 17 shows the detailed structure of each GOP field 430. Referring to FIG. 17, the GOP field 430 consists of a GOP header 460, a T field 470 that specifies in-
(0) formation about a first frame (encoded without reference to another frame) and that has been subjected to temporal filtering, a motion vector (MV) field 480 specifying a set of motion vectors, and a 'the other T' field 490 specifying information on frames other than the first frame (encoded with reference to another frame). [161] Unlike the sequence header field 410 that specifies properties of the entire video sequence, the GOP header field 460 specifies image properties of a GOP such as temporal filtering order. [ 162] FIG. 18 shows the detailed structure of the MV field 480 consisting of MV through MV fields.
(n-l)
[163] Referring to FIG. 18, each of the MV (1) through MV (n-l) fields specifies variable size block information such as size and position of each variable size block and motion vector information (symbols representing motion vector components) for each layer.
Industrial Applicability
[164] The present invention reduces the size of an enhancement layer while minimizing an error in a base layer. The present invention also enables adaptive allocation of the amount of bits between motion information and texture information using motion scalability.
[165] In concluding the detailed description, those skilled in the art will appreciate that many variations and modifications can be made to the exemplary embodiments without substantially departing from the principles of the present invention. Therefore, the disclosed exemplary embodiments of the invention are used in a generic and de¬ scriptive sense only and not for purposes of limitation.

Claims

Claims
[1] An apparatus for reconstructing a motion vector obtained at a predetermined pixel accuracy, the apparatus comprising: a base layer determining module determining a motion vector component of a base layer using the obtained motion vector according to a pixel accuracy of the base layer; and an enhancement layer determining module determining a motion vector component of an enhancement layer according to a pixel accuracy of the en¬ hancement layer, so that a sum of the motion vector component of the en¬ hancement layer and the motion vector component of the base layer is close to the obtained motion vector.
[2] The apparatus of claim 1, wherein the base layer determining module determines the motion vector component of the base layer that is close to a value predicted from motion vectors of neighboring blocks according to the pixel accuracy of the base layer.
[3] The apparatus of claim 1, wherein in order to determine the motion vector component of the base layer according to the pixel accuracy of the base layer, the base layer determining module separates the obtained motion vector into an original sign and a magnitude, uses an unsigned value to represent the magnitude of the motion vector, and attaches the original sign to the unsigned value.
[4] The apparatus of claim 1, wherein the base layer determining module determines a value closest to the obtained motion vector as the motion vector component of the base layer according to the pixel accuracy of the base layer.
[5] The apparatus of claim 4, wherein the motion vector component of the base layer sing
Figure imgf000025_0001
where sign(x) denotes a signal function that returns values of 1 and - 1 when variable x is a positive value and a negative value, respectively,
JC denotes an absolute value function with respect to the variable x, and
Figure imgf000025_0002
denotes a function that gives a largest integer not exceeding
Figure imgf000025_0003
by stripping a decimal part.
[6] The apparatus of claim 4, further comprising a first compression module removing redundancy in a motion vector component of a first enhancement layer using a first relationship wherein a sign of the motion vector component of the first enhancement layer is the opposite to a sign of the motion vector component of the base layer when the motion vector component of the first enhancement layer is not 0.
[7] The apparatus of claim 6, further comprising a second compression module removing redundancy in a motion vector component of a second enhancement layer using a second relationship wherein the motion vector component of the second enhancement layer is always 0 when the motion vector component of the first enhancement layer is not 0.
[8] A video encoder using a motion vector consisting of multiple layers, the encoder comprising: a motion vector reconstruction module including a motion vector search module obtaining the motion vector with a predetermined pixel accuracy, a base layer de¬ termining module determining a motion vector component of a base layer using the obtained motion vector according to a pixel accuracy of the base layer; an enhancement layer determining module determining a motion vector component of an enhancement layer so that a sum of the motion vector component of the enhancement layer and the motion vector component of the base layer is close to the obtained motion vector according to a pixel accuracy of the enhancement layer; a temporal filtering module removing temporal redundancies by filtering frames in a direction of a temporal axis using the obtained motion vector; a spatial transform module removing spatial redundancies from the filtered frames from which the temporal redundancies have been removed and creating transform coefficients; and a quantization module performing quantization on the transform coefficients.
[9] An apparatus for reconstructing a motion vector consisting of a base layer and at least one enhancement layer, the apparatus comprising: a layer reconstruction module reconstructing a motion vector component of the base layer and a motion vector component of the at least one enhancement layer from a value of the base layer and a value of the at least one enhancement layer, respectively, the values of the base layer and the at least one enhancement layer being interpreted from an input bitstream; and a motion addition module adding the reconstructed motion vector components of the base layer and the at least one enhancement layer together and providing the motion vector.
[10] An apparatus for reconstructing a motion vector consisting of a base layer and at least one enhancement layer, the apparatus comprising: a first reconstruction module reconstructing a motion vector component of a first enhancement layer by attaching a sign to a value of the first enhancement layer interpreted from an input bitstream, which is opposite to a sign of a cor¬ responding value of the base layer; a layer reconstruction module reconstructing a motion vector component of the base layer and a motion vector component of at least one enhancement layer other than the first enhancement layer from the corresponding value of the base layer and a value of the at least one enhancement layer other than the first en¬ hancement layer, respectively, the corresponding value of the base layer and the value of the at least one enhancement layer other than the first enhancement layer being interpreted from the input bitstream; and a motion addition module adding the reconstructed motion vector components of the base layer, the first enhancement layer, and the least one enhancement layer other than the first enhancement layer together and providing the motion vector.
[11] An apparatus for reconstructing a motion vector consisting of a base layer and at least one enhancement layer, the apparatus comprising: a first reconstruction module reconstructing a motion vector component of a first enhancement layer by attaching a sign to a value of the first enhancement layer interpreted from an input bitstream, which is opposite to a sign of a cor¬ responding value of the base layer; a second reconstruction module setting a motion vector component of a second enhancement layer to 0 when the value of the first enhancement layer is not 0 and reconstructing the motion vector component of the second enhancement layer from a value of the second enhancement layer interpreted from the input bitstream when the value of the first enhancement layer is 0; a layer reconstruction module reconstructing a motion vector component of the base layer and a motion vector component of a third enhancement layer other than the first and the second enhancement layers from the corresponding value of the base layer and a value of the third enhancement layer, respectively, the cor¬ responding value of the base layer and the value of the third enhancement layer being interpreted from the input bitstream; and a motion addition module adding the reconstructed motion vector component of the base layer and the reconstructed motion vector components of the first, the second, and the third enhancement layers together and providing the motion vector.
[12] A video decoder using a motion vector consisting of multiple layers, the decoder comprising: an entropy decoding module interpreting an input bitstream and extracting texture information and motion information from the bitstream; a motion vector reconstruction module reconstructing motion vector components of the multiple layers from corresponding values of the multiple layers contained in the extracted motion information and providing the motion vector after adding the motion vector components of the multiple layers together; an inverse quantization module applying inverse quantization to the texture in¬ formation and outputting transform coefficients; an inverse spatial transform module inversely transforming the transform co¬ efficients into transform coefficients in a spatial domain by performing an inverse of a spatial transform; and an inverse temporal filtering module performing inverse temporal filtering on the inversely transformed transform coefficients in the spatial domain using the provided motion vector and reconstructing frames in a video sequence.
[13] The decoder of claim 12, wherein the motion vector reconstruction module comprises: a first reconstruction module reconstructing a motion vector component of a first enhancement layer by attaching a sign to a value of the first enhancement layer contained in the motion information, which is opposite to a sign of a cor¬ responding value of a base layer; a layer reconstruction module reconstructing a motion vector component of the base layer and a motion vector component of at least one enhancement layer other than the first enhancement layer from the corresponding value of the base layer and a value of the enhancement layer other than the first enhancement layer, respectively; and a motion addition module adding the reconstructed motion vector components of the base layer, the first enhancement layer, and the at least one enhancement other than the first enhancement layer together and providing the motion vector.
[14] The decoder of claim 12, wherein the motion vector reconstruction module comprises: a first reconstruction module reconstructing a motion vector component of a first enhancement layer by attaching a sign to a value of the first enhancement layer contained in the motion information, which is opposite to a sign of a cor¬ responding value of a base layer; a second reconstruction module setting a motion vector component of a second enhancement layer to 0 when the value of the first enhancement layer is not 0 and reconstructing the motion vector component of the second enhancement layer from a value of the second enhancement layer contained in the motion in¬ formation when the value of the first enhancement layer is 0; a layer reconstruction module reconstructing a motion vector component of the base layer and a motion vector component of at least one enhancement layer other than the first and second enhancement layers from the corresponding value of the base layer and a value of the at least one enhancement layer other than the first and the second enhancement layers contained in the motion information, re¬ spectively; and a motion addition module adding the reconstructed motion vector component of the base layer, the first enhancement layer, the second enhancement layer, and the at least one enhancement layer other than the first and the second en¬ hancement layers together and providing the motion vector.
[15] A method for reconstructing a motion vector obtained at predetermined pixel accuracy, the method comprising: determining a motion vector component of a base layer using the obtained motion vector according to a pixel accuracy of the base layer; and determining a motion vector component of an enhancement layer so that a sum of the motion vector component of the enhancement layer and the motion vector component of the base layer is close to the obtained motion vector according to a pixel accuracy of the enhancement layer.
[16] The method of claim 15, wherein in the determining of the motion vector component of the base layer, the motion vector component of the base layer is determined to be close to a value predicted from motion vectors of neighboring blocks according to the pixel accuracy of the base layer.
[17] The method of claim 15, wherein in the determining of the motion vector component of the base layer, the motion vector component of the base layer is determined according to the pixel accuracy of the base layer by separating the obtained motion vector into an original sign and a magnitude, using an unsigned value to represent the magnitude of the motion vector, and attaching the original sign to the unsigned value.
[18] The method of claim 15, wherein in the determining of the motion vector component of the base layer, a value closest to the obtained motion vector is determined as the motion vector component of the base layer according to the pixel accuracy of the base layer.
[19] A method for reconstructing a motion vector consisting of a base layer and at least one enhancement layer, the method comprising: reconstructing a motion vector component of the base layer and a motion vector component of the at least one enhancement layer from a value of the base layer and a value of the at least one enhancement layer, respectively, the values of the base layer and the at least one enhancement layer being interpreted from an input bitstream; and adding the reconstructed motion vector components of the base layer and the at least one enhancement layer together and providing the motion vector.
[20] A method for reconstructing a motion vector consisting of a base layer and at least one enhancement layer, the method comprising: reconstructing a motion vector component of a first enhancement layer by attaching a sign to a value of the first enhancement layer interpreted from an input bitstream, which is opposite to a sign of a corresponding value of the base layer; reconstructing a motion vector component of the base layer and a motion vector component of an least one enhancement layer other than the first enhancement layer from the corresponding value of the base layer and a value of the at least one enhancement layer other than the first enhancement layer, respectively, the corresponding value of the base layer and the value of the at least one en¬ hancement layer other than the first enhancement layer being interpreted from the input bitstream; and adding the reconstructed motion vector components of the base layer, the first enhancement layer, and the at least one enhancement layer other than the first en¬ hancement layer together and providing the motion vector.
[21] A method for reconstructing a motion vector consisting of a base layer and at least one enhancement layer, the method comprising: reconstructing a motion vector component of a first enhancement layer by attaching a sign to a value of the first enhancement layer interpreted from an input bitstream, which is opposite to a sign of a corresponding value of the base layer; setting a motion vector component of a second enhancement layer to 0 when the value of the first enhancement layer is not 0 and reconstructing the motion vector component of the second enhancement layer from a value of the second en¬ hancement layer interpreted from the input bitstream when the value of the first enhancement layer is 0; reconstructing a motion vector component of the base layer and a motion vector component of at least one enhancement layer other than the first and the second enhancement layers from the corresponding value of the base layer and a value of the at least one enhancement layer other than the first and the second en¬ hancement layers, respectively, the corresponding value of the base layer and the value of the at least one enhancement layer other than the first and the second en- hancement layers being interpreted from the input bitstream; and adding the reconstructed motion vector components of the base layer, the first enhancement layer, the second enhancement layer, and the at least one en¬ hancement layer other than the first and the second enhancement layers together and providing the motion vector.
PCT/KR2005/000968 2004-04-08 2005-04-01 Method and apparatus for implementing motion scalability WO2006004305A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP05789577A EP1741297A1 (en) 2004-04-08 2005-04-01 Method and apparatus for implementing motion scalability

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US56025004P 2004-04-08 2004-04-08
US60/560,250 2004-04-08
KR1020040032237A KR100587561B1 (en) 2004-04-08 2004-05-07 Method and apparatus for implementing motion scalability
KR10-2004-0032237 2004-05-07

Publications (1)

Publication Number Publication Date
WO2006004305A1 true WO2006004305A1 (en) 2006-01-12

Family

ID=35783075

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2005/000968 WO2006004305A1 (en) 2004-04-08 2005-04-01 Method and apparatus for implementing motion scalability

Country Status (2)

Country Link
EP (1) EP1741297A1 (en)
WO (1) WO2006004305A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008034715A2 (en) 2006-09-18 2008-03-27 Robert Bosch Gmbh Method for the compression of data in a video sequence
EP2076038A2 (en) * 2007-12-20 2009-07-01 Broadcom Corporation Video processing system with layered video coding and methods for use therewith
WO2009094349A1 (en) 2008-01-22 2009-07-30 Dolby Laboratories Licensing Corporation Adaptive motion information cost estimation with dynamic look-up table updating
WO2011126278A2 (en) 2010-04-05 2011-10-13 Samsung Electronics Co., Ltd. Method and apparatus for encoding and decoding video
US8311098B2 (en) 2007-12-19 2012-11-13 Broadcom Corporation Channel adaptive video transmission system for use with layered video coding and methods for use therewith
US8416848B2 (en) 2007-12-21 2013-04-09 Broadcom Corporation Device adaptive video transmission system for use with layered video coding and methods for use therewith
US8520737B2 (en) 2008-01-04 2013-08-27 Broadcom Corporation Video processing system for scrambling layered video streams and methods for use therewith
US8594191B2 (en) 2008-01-03 2013-11-26 Broadcom Corporation Video processing system and transcoder for use with layered video coding and methods for use therewith
US9078024B2 (en) 2007-12-18 2015-07-07 Broadcom Corporation Video processing system with user customized graphics for use with layered video coding and methods for use therewith
US9143731B2 (en) 2008-01-02 2015-09-22 Broadcom Corporation Mobile video device for use with layered video coding and methods for use therewith

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6510177B1 (en) * 2000-03-24 2003-01-21 Microsoft Corporation System and method for layered video coding enhancement

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6510177B1 (en) * 2000-03-24 2003-01-21 Microsoft Corporation System and method for layered video coding enhancement

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHEN P. ET AL: "Improvements to the MC-EZBC scalable video coder", PROCEEDINGS OF THE 2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, vol. 2, 14 September 2003 (2003-09-14) - 17 September 2003 (2003-09-17), pages 81 - 84, XP010670746 *
LIN E. ET AL: "A hybrid embedded video codec using base layer information for enhancement layer coding", PROCEEDINGS OF THE 2001 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, vol. 2, 7 October 2001 (2001-10-07) - 10 October 2001 (2001-10-10), pages 1005 - 1008, XP010563936 *
LUO L. ET AL: "Layer-correlated motion estimation and motion vector coding for the 3D-wavelet video coding", PROCEEDINGS OF THE 2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, vol. 2, September 2003 (2003-09-01), pages 791 - 794, XP010670584 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008034715A3 (en) * 2006-09-18 2008-05-22 Bosch Gmbh Robert Method for the compression of data in a video sequence
WO2008034715A2 (en) 2006-09-18 2008-03-27 Robert Bosch Gmbh Method for the compression of data in a video sequence
US9078024B2 (en) 2007-12-18 2015-07-07 Broadcom Corporation Video processing system with user customized graphics for use with layered video coding and methods for use therewith
US8311098B2 (en) 2007-12-19 2012-11-13 Broadcom Corporation Channel adaptive video transmission system for use with layered video coding and methods for use therewith
EP2076038A2 (en) * 2007-12-20 2009-07-01 Broadcom Corporation Video processing system with layered video coding and methods for use therewith
US9210480B2 (en) 2007-12-20 2015-12-08 Broadcom Corporation Video processing system with layered video coding and methods for use therewith
EP2076038A3 (en) * 2007-12-20 2012-03-14 Broadcom Corporation Video processing system with layered video coding and methods for use therewith
US8416848B2 (en) 2007-12-21 2013-04-09 Broadcom Corporation Device adaptive video transmission system for use with layered video coding and methods for use therewith
US9143731B2 (en) 2008-01-02 2015-09-22 Broadcom Corporation Mobile video device for use with layered video coding and methods for use therewith
US8594191B2 (en) 2008-01-03 2013-11-26 Broadcom Corporation Video processing system and transcoder for use with layered video coding and methods for use therewith
US8520737B2 (en) 2008-01-04 2013-08-27 Broadcom Corporation Video processing system for scrambling layered video streams and methods for use therewith
US8855196B2 (en) 2008-01-22 2014-10-07 Dolby Laboratories Licensing Corporation Adaptive motion information cost estimation with dynamic look-up table updating
JP2011510601A (en) * 2008-01-22 2011-03-31 ドルビー・ラボラトリーズ・ライセンシング・コーポレーション Adaptive motion information cost estimation
WO2009094349A1 (en) 2008-01-22 2009-07-30 Dolby Laboratories Licensing Corporation Adaptive motion information cost estimation with dynamic look-up table updating
EP2556674A2 (en) * 2010-04-05 2013-02-13 Samsung Electronics Co., Ltd. Method and apparatus for encoding and decoding video
CN102934443A (en) * 2010-04-05 2013-02-13 三星电子株式会社 Method and apparatus for video encoding, and method and apparatus for video decoding
WO2011126278A2 (en) 2010-04-05 2011-10-13 Samsung Electronics Co., Ltd. Method and apparatus for encoding and decoding video
EP2556674A4 (en) * 2010-04-05 2014-07-02 Samsung Electronics Co Ltd Method and apparatus for encoding and decoding video
CN102934443B (en) * 2010-04-05 2015-11-25 三星电子株式会社 For carrying out the method and apparatus of Code And Decode to video
EP3038363A1 (en) * 2010-04-05 2016-06-29 Samsung Electronics Co., Ltd Video coding with adaptive motion accuracy

Also Published As

Publication number Publication date
EP1741297A1 (en) 2007-01-10

Similar Documents

Publication Publication Date Title
US20050226334A1 (en) Method and apparatus for implementing motion scalability
US8031776B2 (en) Method and apparatus for predecoding and decoding bitstream including base layer
KR100679022B1 (en) Video coding and decoding method using inter-layer filtering, video ecoder and decoder
JP5014989B2 (en) Frame compression method, video coding method, frame restoration method, video decoding method, video encoder, video decoder, and recording medium using base layer
US8929436B2 (en) Method and apparatus for video coding, predecoding, and video decoding for video streaming service, and image filtering method
JP4891234B2 (en) Scalable video coding using grid motion estimation / compensation
US7889793B2 (en) Method and apparatus for effectively compressing motion vectors in video coder based on multi-layer
EP1589764A2 (en) Method and apparatus for supporting motion scalability
US20060013309A1 (en) Video encoding and decoding methods and video encoder and decoder
EP1741297A1 (en) Method and apparatus for implementing motion scalability
KR100679018B1 (en) Method for multi-layer video coding and decoding, multi-layer video encoder and decoder
US20050163217A1 (en) Method and apparatus for coding and decoding video bitstream
KR20050089721A (en) Method and system for video coding for video streaming service, and method and system for video decoding
WO2005086493A1 (en) Scalable video coding method supporting variable gop size and scalable video encoder
WO2006080662A1 (en) Method and apparatus for effectively compressing motion vectors in video coder based on multi-layer
JP2008515328A (en) Video coding and decoding method using inter-layer filtering, video encoder and decoder
WO2006006793A1 (en) Video encoding and decoding methods and video encoder and decoder
KR20050009639A (en) Interframe Wavelet Video Coding Method
EP1813114A1 (en) Method and apparatus for predecoding hybrid bitstream
JP2008512035A (en) Multi-layer video coding and decoding method, video encoder and decoder
WO2006080663A1 (en) Method and apparatus for effectively encoding multi-layered motion vectors

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2005789577

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 200580011913.2

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

WWP Wipo information: published in national office

Ref document number: 2005789577

Country of ref document: EP