US20060088101A1 - Method and apparatus for effectively compressing motion vectors in video coder based on multi-layer - Google Patents

Method and apparatus for effectively compressing motion vectors in video coder based on multi-layer Download PDF

Info

Publication number
US20060088101A1
US20060088101A1 US11/254,051 US25405105A US2006088101A1 US 20060088101 A1 US20060088101 A1 US 20060088101A1 US 25405105 A US25405105 A US 25405105A US 2006088101 A1 US2006088101 A1 US 2006088101A1
Authority
US
United States
Prior art keywords
motion vector
frame
enhancement layer
layer frame
base layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US11/254,051
Other versions
US7889793B2 (en
Inventor
Woo-jin Han
Kyo-hyuk Lee
Jae-Young Lee
Sang-Chang Cha
Bae-keun Lee
Ho-Jin Ha
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Priority to US11/254,051 priority Critical patent/US7889793B2/en
Assigned to SAMSUNG ELECTRONICS CO., LTD. reassignment SAMSUNG ELECTRONICS CO., LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HA, HO-JIN, CHA, SANG-CHANG, HAN, WOO-JIN, LEE, BAE-KEUN, LEE, JAE-YOUNG, LEE, KYO-HYUK
Publication of US20060088101A1 publication Critical patent/US20060088101A1/en
Priority to US13/005,990 priority patent/US8116578B2/en
Application granted granted Critical
Publication of US7889793B2 publication Critical patent/US7889793B2/en
Priority to US13/355,619 priority patent/US8520962B2/en
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/577Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/31Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the temporal domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding

Definitions

  • the present invention relates to a video compression method, and more particularly, to a method and apparatus of improving the compression efficiency of a motion vector by efficiently predicting a motion vector in an enhancement layer from a motion vector in a base layer in a video coding method using a multi-layer structure.
  • multimedia data requires storage media that have a large capacity and a wide bandwidth for transmission since the amount of multimedia data is usually large. Accordingly, a compression coding method is requisite for transmitting multimedia data including text, video, and audio.
  • a basic principle of data compression is removing data redundancy.
  • Data can be compressed by removing spatial redundancy in which the same color or object is repeated in an image, temporal redundancy in which there is little change between adjacent frames in a moving image or the same sound is repeated in audio, or mental visual redundancy which takes into account human eyesight and its limited perception of high frequency.
  • temporal redundancy is removed by motion compensation based on motion estimation and compensation
  • spatial redundancy is removed by transform coding.
  • transmission media are necessary. Transmission performance is different depending on transmission media.
  • Currently used transmission media have various transmission rates. For example, an ultrahigh-speed communication network can transmit data of several tens of megabits per second while a mobile communication network has a transmission rate of 384 kilobits per second. Accordingly, to support transmission media having various speeds or to transmit multimedia at a data rate suitable to a transmission environment, data coding methods having scalability, such as wavelet video coding and subband video coding, may be suitable to a multimedia environment.
  • Scalability indicates the ability for a decoder part or a pre-decoder part to partially decode a single compressed bitstream according to conditions such as a bit rate, error rate, system resources or the like.
  • a decoder or a pre-decoder decompresses only a portion of a bitstream coded by scalable coding and plays back the same to be restored into multimedia sequences having different video quality/resolution levels or frame rates.
  • FIG. 1 is a schematic diagram of a typical scalable video coding system.
  • an encoder 50 codes an input video 51 , thereby generating a bitstream 52 .
  • a pre-decoder 60 can extract different bitstreams 53 by variously cutting the bitstream 52 received from the encoder 50 according to an extraction condition, such as a bit rate, a resolution, or a frame rate, and as related with an environment of communication with a decoder 70 or mechanical performance of the decoder 70 .
  • the pre-decoder 60 is implemented to be included in a video stream server providing variable video streams to an end-user in variable network environments.
  • the decoder 70 reconstructs an output video 54 from the extracted bitstream 53 . Extraction of a bit stream according to the extraction condition may be performed by the decoder 70 instead of the pre-decoder 60 or may be performed by both of the pre-decoder 60 and the decoder 70 .
  • MPEG-4 Motion Picture Experts Group 4 Part 13 standardization for scalable video coding is under way. In particular, much effort is being made to implement scalability based on a multi-layered structure.
  • a bitstream may consist of multiple layers, i.e., base layer and first and second enhancement layers with different resolutions (QCIF, CIF, and 2CIF) or frame rates.
  • a motion vector is obtained for each of the multiple layers to remove temporal redundancy.
  • the motion vector MV may be separately searched for each layer (former approach) or a motion vector obtained by a motion vector search for one layer is used for another layer (without or after being upsampled/downsampled) (latter approach).
  • the former approach has the advantage of obtaining accurate motion vectors while suffering from overhead due to motion vectors generated for each layer. Thus, it is a very challenging task to efficiently reduce redundancy between motion vectors for each layer.
  • FIG. 2 shows an example of a scalable video codec using a multi-layered structure.
  • a base layer has a quarter common intermediate format (QCIF) resolution and a frame rate of 15 Hz
  • a first enhancement layer has a common intermediate format (CIF) resolution and a frame rate of 30 Hz
  • a second enhancement layer has a standard definition (SD) resolution and a frame rate of 60 Hz.
  • SD standard definition
  • the enhancement layer bitstream of CIF — 30 Hz — 0.7M may be truncated to meet the bit-rate of 0.5 M. In this way, it is possible to implement spatial, temporal, and SNR scalabilities.
  • motion prediction from the base layer is very important.
  • the motion vector is used only for an inter-frame coded by referring to neighboring frames, it is not used for an intra-frame coded without reference to adjacent frames.
  • one of the currently used methods for efficiently representing a motion vector includes predicting a motion vector for a current layer from a motion vector for a lower layer and coding a difference between the predicted value and the actual motion vector.
  • FIG. 3 is a diagram for explaining a conventional method for efficiently representing a motion vector using motion prediction.
  • a motion vector in a lower layer having the same temporal position as a current layer has conventionally been used as a predicted motion vector for a current layer motion vector.
  • An encoder obtains motion vectors MV 0 , MV 1 , and MV 2 for a base layer, a first enhancement layer, and a second enhancement layer at predetermined accuracies and performs temporal transformation using the motion vectors MV 0 , MV 1 , and MV 2 to remove temporal redundancies in the respective layers.
  • the encoder sends the base layer motion vector MV 0 , a first enhancement layer motion vector component D 1 , and a second enhancement layer motion vector component D 2 to the pre-decoder (or video stream server).
  • the pre-decoder may transmit only the base layer motion vector, the base layer motion vector and the first enhancement layer motion vector component D 1 , or the base layer motion vector, the first enhancement layer motion vector component D 1 and the second enhancement layer motion vector component D 2 to a decoder to adapt to network situations.
  • the decoder uses the received data to reconstruct a motion vector for an appropriate layer. For example, when the decoder receives the base layer motion vector and the first enhancement layer motion vector component D 1 , the first enhancement layer motion vector component D 1 is added to the base layer motion vector MV 0 in order to reconstruct the first enhancement layer motion vector MV 1 . The reconstructed motion vector MV 1 is used to reconstruct texture data for the first enhancement layer.
  • a lower layer frame having the same temporal position as the current frame may not exist.
  • motion prediction through a lower layer motion vector cannot be performed. That is, since a motion vector in the frame 40 cannot be predicted, a motion vector in the first enhancement layer is inefficiently represented as a redundant motion vector.
  • the present invention provides a method for efficiently predicting a motion vector in an enhancement layer from a motion vector in a base layer.
  • the present invention also provides a method for efficiently predicting a motion vector even if there is no lower layer frame at the same temporal position as a current layer frame.
  • a method for efficiently compressing multi-layered motion vectors including (a) obtaining a motion vector in a base layer frame having a first frame rate from an input frame, (b) obtaining a motion vector in a first enhancement layer frame having a second frame rate from the input frame, the second frame rate being greater than the first frame rate, (c) generating a predicted motion vector by referring to a motion vector for at least one frame among base layer frames present immediately before and after the same temporal position as the first enhancement layer frame if there is no base layer frame at the same temporal position as the first enhancement layer frame, and (d) coding a difference between the motion vector in the first enhancement layer frame and the generated predicted motion vector, and the obtained motion vector in the base layer.
  • a method for efficiently encoding multi-layered motion vectors including (a) obtaining a motion vector in a base layer frame having a first frame rate from an input frame, (b) obtaining a motion vector in a first enhancement layer frame having a second frame rate from the input frame, the second frame rate being greater than the first frame rate, (c) generating a predicted motion vector by referring to a motion vector for at least one frame among base layer frames present immediately before and after the same temporal position as the first enhancement layer frame if there is no base layer frame at the same temporal position as the first enhancement layer frame, (d) lossy coding texture data of the base layer frame using the motion vector of the base layer frame, (e) coding a difference between the motion vector in the first enhancement layer frame and the generated predicted motion vector, and the obtained motion vector in the base layer, and (f) losslessly coding a difference between the motion vector in the first enhancement layer frame and the generated predicted motion vector, the motion vector in the base layer frame,
  • a multi-layered video encoding method including (a) obtaining a motion vector in a base layer frame having a first frame rate from an input frame, (b) generating a motion vector by referring to a motion vector for at least one frame among base layer frames present immediately before and after the same temporal position as the first enhancement layer frame if there is no base layer frame at the same temporal position as the first enhancement layer frame, (c) lossy coding texture data of the base layer frame using the motion vector of the base layer frame, (d) lossy coding texture data of the first enhancement layer frame using the motion vector of the first enhancement layer frame, and (e) losslessly coding the motion vector in the base layer frame, the lossy coded result of step (c), and the lossy coded result of step (d).
  • a multi-layered video decoding method including (a) extracting base layer data and enhancement layer data from an input bitstream, (b) if there is no base layer frame at the same temporal position as a first enhancement layer frame, generating a motion vector of the first enhancement layer frame by referring to the motion vector for at least one frame among base layer frames present immediately before and after the same temporal position as the first enhancement layer frame, (c) reconstructing the motion vector of the enhancement layer using the generated predicted motion vector, and (d) reconstructing a video sequence from texture data of the enhancement layer using the reconstructed motion vector of the enhancement layer.
  • a multi-layered video decoding method including (a) extracting base layer data and enhancement layer data from an input bitstream, (b) if there is no base layer frame at the same temporal position as an enhancement layer frame, reconstructing a motion vector in the enhancement layer frame by referring to at least one frame among base layer frames present immediately before and after the same temporal position as the enhancement layer frame, and (c) reconstructing a video sequence from texture data of the enhancement layer using the reconstructed motion vector of the enhancement layer.
  • FIG. 1 shows an overall configuration of a general scalable video coding system
  • FIG. 2 is a diagram of video coding using a multi-layered structure
  • FIG. 3 is a diagram for explaining a conventional method for efficiently representing a motion vector using motion prediction
  • FIG. 4 is a diagram for explaining a method for efficiently representing a motion vector using motion prediction according to an embodiment of the present invention
  • FIG. 5 is a schematic diagram for explaining a concept of the present invention.
  • FIG. 6 is a block diagram of a video encoder according to an exemplary embodiment of the present invention.
  • FIG. 7 is a flow diagram for explaining the detailed operation of a motion vector estimation unit of an exemplary embodiment of the invention.
  • FIG. 8 illustrates examples of motion fields of previous and next frames
  • FIG. 9 is a diagram for explaining a filtering process of an exemplary embodiment of the invention.
  • FIG. 10 is a diagram for explaining a method of an exemplary embodiment of the invention of obtaining a filtered motion vector when a resolution of a base layer and a resolution of an enhancement layer are different from each other;
  • FIG. 11 is a block diagram of a video decoder according to an embodiment of the present invention.
  • the motion vectors MV 0 and MV 1 demonstrate the same moving effect as vectors 11 and 13 .
  • a process of generating the estimation motion vectors MV 1p and MV 2p should be performed just by reading motion information of a lower layer without additional information.
  • the motion vectors referenced should be set to be substantially close to motion vectors in the current layer.
  • FIG. 5 is a schematic diagram for explaining a fundamental concept of the present invention. It is assumed in this example that a current layer L n has CIF resolution and frame rate of 30 Hz and a lower layer L n-1 has QCIF resolution and frame rate of 15 Hz.
  • a predicted motion vector is generated by referring to a motion vector in the base layer frame.
  • a predicted motion vector is generated using motion vectors in at least one of base layer frames located closest to the temporal position.
  • motion vectors in current layer frames A 0 and A 2 are respectively predicted from motion vectors in lower layer frames B 0 and B 2 having the same temporal positions as the current layer frames A 0 and A 2 .
  • motion predicted is substantially predicted motion vector being generated.
  • a predicted motion vector for a frame A 1 having no corresponding lower layer frame at the same temporal position is generated using motion vectors in the frames B 0 and B 2 closest to the temporal position.
  • motion vectors in the frames B 0 and B 2 are interposed to generate a virtual motion vector (a motion vector in a virtual frame B 1 ) at the same temporal position as the frame A 1 and the virtual motion vector is used to predict a motion vector for the frame A 1 .
  • forward and backward motion vectors can perform efficient prediction using lower layer frames B 0 and B 2 , respectively.
  • the predicted motion vector for the forward motion vector of the frame A 1 can be calculated by multiplying the, backward motion vector of the frame B 0 by ⁇ 1 ⁇ 2.
  • the predicted motion vector for the backward motion vector of the frame A 1 can be calculated by multiplying the backward motion vector of the frame B 0 by 1 ⁇ 2.
  • a value obtained by summing the backward motion vector of the frame B 0 and the calculated forward motion vector of the frame A 1 can also be used as the predicted motion vector for the backward motion vector of the frame A 1 .
  • the predicted motion vector for the forward motion vector of the frame A 1 can be calculated by multiplying the forward motion vector of the frame B 2 by 1 ⁇ 2.
  • the predicted motion vector for the backward motion vector of the frame A ‘ can be calculated by multiplying the forward motion vector of the frame B 2 by ⁇ 1 ⁇ 2.
  • a value obtained by subtracting the forward motion vector of the frame B 2 from the calculated forward motion vector of the frame A 1 can also be used as the predicted motion vector for the backward motion vector of the frame A 1 .
  • FIG. 6 is a block diagram of a video encoder 100 according to an embodiment of the present invention. While FIG. 6 shows the use of one base layer and one enhancement layer, it will be readily apparent to those skilled in the art that the present invention can be applied between a lower layer and an upper layer when two or more layers are used.
  • the video encoder 100 includes a downsampling unit 110 , motion estimation units 121 , 131 , lossy encoders 125 , 135 , a motion vector estimation unit 140 , and an entropy coder 150 .
  • the downsampling unit 110 downsamples an input video to a resolution and frame rate suitable for each layer.
  • a QCIF@15 Hz base layer and a CIF@30 Hz enhancement layer are used as shown in FIG. 5 , an original input video is downsampled to CIF and QCIF resolutions and then downsampled to frame rates of 15 Hz and 30 Hz.
  • Downsampling the resolution may be performed using a MPEG downsampling unit or wavelet downsampler.
  • Downsampling the frame rate may be performed using frame skip or frame interpolation.
  • the enhancement layer does not have both a higher resolution and a higher frame rate than the base layer but has a higher resolution than and the same frame rate as the base layer, or the same resolution with and a higher frame rate than the base layer.
  • a motion estimation unit 121 performs motion estimation on a base layer frame to obtain motion vectors in the base layer frame.
  • the motion estimation is the process of finding the closest block to a block in a current frame, i.e., a block with a minimum error.
  • Various techniques including fixed-size block matching and hierarchical variable size block matching (HVSBM) may be used in the motion estimation.
  • the motion estimation unit 131 performs motion estimation on an enhancement layer frame to obtain motion vectors in the enhancement layer frame.
  • the motion vectors in the base layer frame and the enhancement layer frame are obtained in this way to predict a motion vector in the enhancement layer frame using a virtual motion vector.
  • the motion vector prediction unit 140 generates a predicted motion vector in an enhancement layer using the motion vectors in the base layer and obtains a difference between the obtained motion vector and the predicted motion vector in the enhancement layer frame (hereinafter, referred to as “motion vector components”).
  • step S 10 it is determined whether there is a base layer frame at the same temporal position as the current frame in the enhancement layer in step S 10 . If so (YES in step S 10 ), motion vectors in the base layer frame having a spatial relevancy with the current frame are filtered in step S 20 . As a result, a filtered motion vector corresponding to one motion vector in the current frame is generated. Step S 20 will be described below with reference to FIGS. 9 and 10 .
  • step S 30 it is determined whether resolutions of the enhancement layer and the base layer are the same or not. If the same (YES in step S 30 ), the motion vector in the current frame and the motion vector resulting from the filtering are subtracted in step S 40 . This is because the motion vector resulting from the filtering corresponds to the predicted motion vector in the case of the same resolution. If not the same (NO in step S 30 ), the motion vector resulting from the filtering is upsampled with the resolution of the enhancement layer in step S 45 . For example, if the resolution of the enhancement layer is double that the base layer, the term “sampling” means magnification of a motion vector generated by the filtering by two times. In this case, the upsampled motion vector is a predicted motion vector. Thus, a difference between the current frame and the upsampled motion vector is obtained in step S 50 .
  • step S 10 determines that there is no base layer frame at the same temporal position as the current frame in the enhancement layer (NO in step S 10 )
  • the motion vectors in the base layer frame located immediately before and after the temporal position are filtered in steps S 55 and S 60 .
  • the frames located immediately before and after the same temporal position as the current frame A 1 are B 0 and B 2 .
  • a motion vector having a spatial correlation with a motion vector in the current frame is filtered to generate a filtered motion vector in step S 55 .
  • a motion vector having a spatial correlation with a motion vector in the current frame is filtered to generate a filtered motion vector in step S 60 .
  • This filtering process is similar to that in step S 20 , which will later be described in FIGS. 9 and 10 .
  • the current frame and a “virtual motion vector” at the temporal position are interpolated using the filtered motion vector generated after the filtering in step S 55 and the filtered motion vector generated after the filtering in step S 60 .
  • Interpolation include a simple averaging method, bi-linear interpolation, bi-cubic interpolation, and so on. If the distances from the frames present immediately before and after the temporal position are different, unlike in FIG. 5 , interpolation is preferably performed such that an interpolation weighted factor is increased so as to be in inverse proportion to a distance. As described above, if the distances are different, only one frame closest from the current frame can be used.
  • weighted factors of immediately previous and next frames may be determined.
  • motion vectors in the immediately previous and next frames are displayed (white block portions being portions with motion vectors skipped), blocks having the same spatial position as a certain motion vector in the current frame are block 61 and block 62 .
  • motion vectors in those in the vicinity of the block 61 are substantially the same as the motion vector of the block 61 .
  • motion vectors in those in the vicinity of the block 62 are mostly different from the motion vector of the block 62 .
  • a weighted factor of the motion vector in the immediately previous frame is preferably increased for more accurate motion estimation. That is to say, a sum of differences between the motion vector of the block 61 and each of the motion vectors in the vicinity of the block 61 is computed, and a sum of differences between the motion vector of the block 62 and each of the motion vectors in the vicinity of the block 62 is computed.
  • the weighted factor of the motion vector in each frame is in inverse proportion to the sum of differences. Since a decoder side receives motion vectors from an encoder side, the weighted factor can be computed in the same manner as in the encoder side without the need of being informed of such from the encoder side.
  • the interpolated virtual motion vector is a predicted motion vector. Therefore, in order to effectively compress a motion vector, subtraction is performed on the motion vector in the current frame and the virtual motion vector in step S 75 . The subtraction result becomes the motion vector component in the enhancement layer.
  • the interpolated virtual motion vector is upsampled to be as large as the motion vector in the enhancement layer in step S 80 .
  • the upsampled motion vector is a predicted motion vector, subtraction is performed on the motion vector in the current frame and the upsampled motion vector in step S 85 .
  • FIG. 9 is a diagram for explaining a filtering process.
  • filtering means a process of obtaining a filtered motion vector using a motion vector for an enhancement layer frame and motion vectors having a spatial correlation with the enhancement layer frame.
  • the position having a “spatial correlation” means a “directly corresponding position” (First Embodiment), or the directly corresponding position and a region including the vicinity enlarged from the position (Second Embodiment).
  • a motion vector having a spatial correlation with (that is, positioned directly corresponding to) the motion vector 65 is a motion vector 63 .
  • the motion vector 63 is a “filtered motion vector.”
  • a motion vector having a spatial correlation with (that is, positioned directly corresponding to) the motion vector 65 is a motion vector 64 .
  • the motion vector 64 is a “filtered motion vector.”
  • the motion vector 64 has a spatial correlation with motion vectors 66 , 67 , and 68 as well as the motion vector 65 .
  • the second embodiment in which filtering is performed in consideration of not only motion vectors positioned directly corresponding to a certain motion vector but also motion vectors in the vicinity thereof.
  • the term “a position having a spatial correlation” used herein is meant to embrace a directly corresponding position and a region including the vicinity thereof. The reason for enlarging the region in such a manner is that motion vectors have a spatial similarity and taking adjacent motion vectors into consideration may be advantageous for motion prediction.
  • a filtered motion vector can be obtained by a linear combination of 9 motion vectors, including the motion vector 63 .
  • a relatively greater coefficient that is, a greater weighted factor
  • a relatively smaller coefficient is applied to the adjacent motion vectors.
  • a median filter a bi-cubic filter, a quadratic filter, or other filters may be used.
  • a block in a base layer corresponds to 4 fixed blocks in a first enhancement layer.
  • a block f corresponds to a region consisting of blocks f 5 , f 6 , f 7 , and f 8 .
  • a motion vector of block f 5 has a high spatial correlation with blocks b, e, and f. Since the block f 5 occupies a quarter of a region corresponding to a block f, it is predictable that the block f 5 is considerably spatially correlated with block b, block e, and block f in the base layer.
  • motion vectors present in the region are filtered.
  • the weighted factor for the block f is greater than that for the block b or block e.
  • filters such as a median filter, a bi-cubic filter, or a quadratic filter can be used.
  • interpolation include a simple averaging method, bi-linear interpolation, bi-cubic interpolation, and so on.
  • the block b, block e, and block f may be included in the range of the reference block.
  • different weighted factors may be assigned to the respective blocks, for example, 25% to block b, 25% to block e, 10% to block a, or 40% to block f.
  • the region of the reference block may be set to include not only the immediately adjacent block but also alternate blocks.
  • the present invention may be implemented in a manner different from that specifically discussed in the present application. If the resolution of the base layer is different from the resolution of the enhancement layer, the scale of a filtered motion vector is different from a motion vector scale of the enhancement layer. This is because the filtered motion vector performs only a filtering using the motion vectors in the base layer. In the present invention, upsampling is performed in a separate manner.
  • the lossy coder 125 losslessly codes a base layer frame using the motion vector obtained from the motion estimation unit 121 .
  • the lossy coder 125 may include a temporal transform unit 122 , a spatial transform unit 123 , and a quantization unit 124 .
  • the temporal transform unit 122 constructs predictive frames and performs a subtraction on the current frame and the predictive frame using the motion vector obtained from the motion estimation unit 121 and a frame that is being positioned temporally different from the current frame, thereby reducing a temporal redundancy. As result, a residual frame is generated.
  • the current frame is encoded without referencing another frame, that is, if the current frame is an intra frame, it requires no motion vector, and temporal transformation processes using a predictive frame are skipped.
  • MCTF Motion Compensated Temporal Filtering
  • UMCTF Unconstrained MCTF
  • the spatial transform unit 123 performs a spatial transform on residual frames generated by the temporal transform unit 122 or original input frames and generates transform coefficients.
  • DCT Discrete Cosine Transform
  • wavelet transform or the like
  • the transform coefficient is a DCT coefficient
  • the transform coefficient is a wavelet coefficient.
  • the quantization unit 124 quantizes the transform coefficient generated by the spatial transform unit 123 .
  • Quantization means a process of dividing the DCT coefficients represented by arbitrary real numbered values into predetermined intervals to represent the same as discrete values and matching the discrete values with indices from a predetermined quantization table.
  • a lossy coder 135 performs lossy coding on the enhancement layer frame using motion vectors in the enhancement layer frame obtained by the motion estimation unit 131 .
  • the lossy coder 135 includes a temporal transform unit 132 , a spatial transform unit 133 , and a quantization unit 134 . Because the lossy coder 135 performs the same operation as the lossy coder 125 , except that it performs lossy coding on the enhancement layer frame, a detailed explanation thereof will not be given.
  • the entropy coder 150 losslessly encodes (or entropy encodes) the quantization coefficients obtained by the quantization units 124 and 134 for the base layer and the enhancement layer, the base layer motion vectors generated by the motion estimation unit 121 for the base layer, and the enhancement layer motion vector components generated by the motion vector estimation unit 140 into an output bitstream.
  • Various coding schemes such as Huffman Coding, Arithmetic Coding, and Variable Length Coding may be employed for lossless coding.
  • FIG. 6 shows the lossy decoder 125 for the base layer is separated from the lossy decoder 135 for the enhancement layer, it will be obvious to those skilled in the art that a single lossy decoder can be used to process both the base layer and the enhancement layer.
  • FIG. 11 is a block diagram of a video decoder 200 according to an embodiment of the present invention.
  • the video decoder 200 includes entropy decoder 210 , lossy decoders 225 , 235 , and a motion vector reconstruction unit 240 .
  • the entropy decoder 210 performs the inverse operation of the entropy encoding and extracts motion vectors in a base layer frame, motion vector components of an enhancement layer frame, and texture data from the base layer frame and the enhancement layer frame from an input bitstream.
  • a motion vector reconstruction unit 240 reconstructs motion vectors in the enhancement layer using the motion vectors in the base layer and motion vector components in the enhancement layer, which will now be described in more detail.
  • the motion vector reconstruction process includes, if there is a base layer frame at the same temporal position as the first enhancement layer frame, generating a predicted motion vector by referring to the motion vector in the base layer frame, if not, generating a predicted motion vector by referring to a motion vector in at least one frame among base layer frames present immediately before and after the temporal position, and reconstructing a motion vector in the enhancement layer by adding the generated predicted motion vector and a motion vector component of the enhancement layer.
  • the motion vector reconstruction process is substantially the same as the motion vector estimation process (see FIG. 7 ), except that the decoder 200 performs an addition step for the predicted motion vector and the motion vector component of the current frame (enhancement layer frame), unlike the encoder 100 , as shown in FIG. 7 , performing steps S 40 , S 50 , S 75 , and S 85 are all subtraction steps for the predicted motion vectors and the motion vectors in the current frame.
  • steps S 40 , S 50 , S 75 , and S 85 are all subtraction steps for the predicted motion vectors and the motion vectors in the current frame.
  • a method of generating a predicted motion vector is the same and a repetitive explanation thereof will not be given.
  • a lossy decoder 235 performs the inverse operation of the lossy coder ( 135 of FIG. 6 ) to reconstruct a video sequence from the texture data of the enhancement layer frames using the reconstructed motion vectors in the enhancement layer frames.
  • the lossy decoder 235 includes an inverse quantization unit 231 , an inverse spatial transform unit 232 , and an inverse temporal transform unit 233 .
  • the inverse quantization unit 231 performs inverse quantization on the extracted texture data from the enhancement layer frames.
  • the inverse quantization is the process of reconstructing values from corresponding quantization indices created during a quantization process using a quantization table used during the quantization process.
  • the inverse spatial transform unit 232 performs inverse spatial transform on the inversely quantized result.
  • the inverse spatial transform is the inverse of spatial transform performed by the spatial transform unit 133 in the encoder 100 .
  • Inverse DCT and inverse wavelet transform are among techniques that may be used for the inverse spatial transform.
  • the inverse temporal transform unit 233 performs the inverse operation to the temporal transform unit 132 on the inversely spatially transformed result to reconstruct a video sequence. More specifically, the inverse temporal transform unit 233 uses motion vectors reconstructed by the motion vector reconstruction unit 240 to generate a predicted frame and adds the predicted frame to the inversely spatially transformed result in order to reconstruct a video sequence.
  • an intra-frame that is not temporally transformed at an encoder is not necessarily subjected to an inverse temporal transform.
  • the encoder 100 may remove redundancies in the texture of an enhancement layer using a base layer during encoding.
  • the decoder 200 reconstructs a base layer frame and uses the reconstructed base layer frame and the texture data in the enhancement layer frame received from the entropy decoder 210 to reconstruct the enhancement layer frame, a lossy decoder 225 for the base layer is used.
  • the inverse temporal transform unit 233 uses the reconstructed motion vectors in enhancement layer frames to reconstruct a video sequence from the texture data in the enhancement layer frames (inversely spatially transformed result) and the reconstructed base layer frames.
  • FIG. 11 shows the lossy decoder 225 for the base layer is separated from the lossy decoder 235 for the enhancement layer, it will be obvious to those skilled in the art that a single lossy decoder can be used to process both the base layer and the enhancement layer.
  • the up-/down-sampled motion vector may be used as a motion vector in another layer. In this case, motion vectors can be saved, motion vector accuracy in the enhancement layer may deteriorate.
  • a video encoding process includes obtaining a motion vector in a base layer frame having a first frame rate from an input frame, obtaining a motion vector in an enhancement layer frame having a second frame rate greater than the first frame rate, lossy coding texture data of the base layer frame using the motion vector in the base layer frame, lossy coding texture data of the first enhancement layer frame using the motion vector in the first enhancement layer frame, and losslessly coding motion vector in the base layer frame and the lossy coded result.
  • the obtaining of a motion vector in the enhancement layer frame includes (1) generating a motion vector in the enhancement layer frame by referring to the motion vector in the base layer frame if there is a base layer frame at the same temporal position as the enhancement layer frame, and (2) if not, generating a motion vector in the enhancement layer frame by referring to a motion vector in at least one frame among base layer frames present immediately before and after the temporal position.
  • the step (1) includes filtering motion vectors in the base layer frame having a spatial correlation with a motion vector in the enhancement layer frame using a predetermined filter, and upsampling the motion vector resulting from the filtering such that the motion vector becomes the same as that of the enhancement layer, if the resolution of the base layer frame is not the same as the resolution of the enhancement layer.
  • the resultant motion vector is used as a motion vector in the enhancement layer not as a predicted motion vector.
  • the step (2) includes interpolating a virtual motion vector in the base layer frame by referring to the motion vector in the base layer frame present immediately before and after the temporal position if there is a base layer frame at the same temporal position as the enhancement layer frame, and if the resolution of the base layer frame is not the same as the resolution of the enhancement layer, upsampling the interpolated virtual motion vector such that it is as large as the motion vector in the enhancement layer.
  • the resultant motion vector is used as the motion vector in the enhancement layer not as the predicted motion vector.
  • the filtering process, the upsampling process, and the interpolation process are the same as described above, and a detailed explanation thereof will not be given.
  • a video decoding method in which after obtaining a motion vector from one among multiple layers as described above, and up-/down-sampling the same, the up-/down-sampled motion vector is used as a motion vector in another layer, will now be described.
  • the video decoding process includes extracting a motion vector in a base layer frame and texture data of an enhancement layer frame from an input frame, reconstructing the motion vector in the enhancement layer frame using the extracted motion vector in the base layer frame, and reconstructing a video sequence from the texture data of the base layer frame and the texture data of the first enhancement layer frame.
  • the reconstructing of the motion vector in the enhancement layer including reconstructing a motion vector in the enhancement layer by referring to the base layer frame if there is a base layer frame at the same temporal position as the enhancement layer frame, and if not, reconstructing a motion vector in the enhancement layer frame by referring to a motion vector in at least one frame among base layer frames present immediately before and after the temporal position.
  • the reconstruction process of the motion vector in the enhancement layer frame using only the motion vector in the base layer is substantially the same as the motion vector encoding process for generating the motion vector in the enhancement layer frame the using the motion vector in the base layer.
  • the present invention can be easily implemented for multi-layers. If the multi-layers are composed of a base layer, a first enhancement layer, and a second enhancement layer, an algorithm used for the base layer and the first enhancement layer can also apply to the first enhancement layer and the second enhancement layer.
  • Each of the respective components as shown in FIGS. 6 and 11 may be, but is not limited to, a software or hardware component, such as a Field Programmable Gate Array (FPGA) or Application Specific Integrated Circuit (ASIC), which performs certain tasks.
  • a module may advantageously be configured to reside on the addressable storage medium and configured to execute on one or more processors.
  • a module may include, by way of example, components, such as software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables.
  • the functionality provided for in the components and modules may be combined into fewer components and modules or further separated into additional components and modules.
  • the components and modules may be implemented such that they execute one or more computers in a communication system.
  • the compression efficiency of multi-layered motion vectors can be improved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A method and apparatus of improving the compression efficiency of a motion vector by efficiently predicting a motion vector in an enhancement layer from a motion vector in a base layer in a video coding method using a multi-layer are provided. The method includes obtaining a motion vector in a base layer frame having a first frame rate from an input frame, obtaining a motion vector in a first enhancement layer frame having a second frame rate from the input frame, the second frame rate being greater than the first frame rate, generating a predicted motion vector by referring to a motion vector for at least one frame among base layer frames present immediately before and after the same temporal position as the first enhancement layer frame if there is no base layer frame at the same temporal position as the first enhancement layer frame, and coding a difference between the motion vector in the first enhancement layer frame and the generated predicted motion vector, and the obtained motion vector in the base layer.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application claims the priority from Korean Patent Application No. 10-2004-0103059 filed on Dec. 8, 2004 in the Korean Intellectual Property Office, and U.S. Ser. No. 60/620,328 filed on Oct. 21, 2004 in the United States Patent and Trademark Office, the disclosures of which are incorporated herein in their entireties by reference.
  • BACKGROUND OF THE INVENTION
  • 1. Field of the Invention
  • The present invention relates to a video compression method, and more particularly, to a method and apparatus of improving the compression efficiency of a motion vector by efficiently predicting a motion vector in an enhancement layer from a motion vector in a base layer in a video coding method using a multi-layer structure.
  • 2. Description of the Related Art
  • With the development of information communication technology, including the Internet, video communication as well as text and voice communication, has increased dramatically. Conventional text communication cannot satisfy users' various demands, and thus, multimedia services that can provide various types of information such as text, pictures, and music have increased. However, multimedia data requires storage media that have a large capacity and a wide bandwidth for transmission since the amount of multimedia data is usually large. Accordingly, a compression coding method is requisite for transmitting multimedia data including text, video, and audio.
  • A basic principle of data compression is removing data redundancy. Data can be compressed by removing spatial redundancy in which the same color or object is repeated in an image, temporal redundancy in which there is little change between adjacent frames in a moving image or the same sound is repeated in audio, or mental visual redundancy which takes into account human eyesight and its limited perception of high frequency. In general video coding, temporal redundancy is removed by motion compensation based on motion estimation and compensation, and spatial redundancy is removed by transform coding.
  • To transmit multimedia generated after removing data redundancy, transmission media are necessary. Transmission performance is different depending on transmission media. Currently used transmission media have various transmission rates. For example, an ultrahigh-speed communication network can transmit data of several tens of megabits per second while a mobile communication network has a transmission rate of 384 kilobits per second. Accordingly, to support transmission media having various speeds or to transmit multimedia at a data rate suitable to a transmission environment, data coding methods having scalability, such as wavelet video coding and subband video coding, may be suitable to a multimedia environment.
  • Scalability indicates the ability for a decoder part or a pre-decoder part to partially decode a single compressed bitstream according to conditions such as a bit rate, error rate, system resources or the like. A decoder or a pre-decoder decompresses only a portion of a bitstream coded by scalable coding and plays back the same to be restored into multimedia sequences having different video quality/resolution levels or frame rates.
  • FIG. 1 is a schematic diagram of a typical scalable video coding system. First, an encoder 50 codes an input video 51, thereby generating a bitstream 52. A pre-decoder 60 can extract different bitstreams 53 by variously cutting the bitstream 52 received from the encoder 50 according to an extraction condition, such as a bit rate, a resolution, or a frame rate, and as related with an environment of communication with a decoder 70 or mechanical performance of the decoder 70. Typically, the pre-decoder 60 is implemented to be included in a video stream server providing variable video streams to an end-user in variable network environments.
  • The decoder 70 reconstructs an output video 54 from the extracted bitstream 53. Extraction of a bit stream according to the extraction condition may be performed by the decoder 70 instead of the pre-decoder 60 or may be performed by both of the pre-decoder 60 and the decoder 70.
  • MPEG-4 (Motion Picture Experts Group 4) Part 13 standardization for scalable video coding is under way. In particular, much effort is being made to implement scalability based on a multi-layered structure. For example, a bitstream may consist of multiple layers, i.e., base layer and first and second enhancement layers with different resolutions (QCIF, CIF, and 2CIF) or frame rates.
  • Like when a video is encoded into a singe layer, when a video is encoded into multiple layers, a motion vector (MV) is obtained for each of the multiple layers to remove temporal redundancy. The motion vector MV may be separately searched for each layer (former approach) or a motion vector obtained by a motion vector search for one layer is used for another layer (without or after being upsampled/downsampled) (latter approach). The former approach has the advantage of obtaining accurate motion vectors while suffering from overhead due to motion vectors generated for each layer. Thus, it is a very challenging task to efficiently reduce redundancy between motion vectors for each layer.
  • FIG. 2 shows an example of a scalable video codec using a multi-layered structure. Referring to FIG. 2, a base layer has a quarter common intermediate format (QCIF) resolution and a frame rate of 15 Hz, a first enhancement layer has a common intermediate format (CIF) resolution and a frame rate of 30 Hz, and a second enhancement layer has a standard definition (SD) resolution and a frame rate of 60 Hz. For example, to obtain a stream of CIF and 0.5 Mbps, the enhancement layer bitstream of CIF 30 Hz0.7M may be truncated to meet the bit-rate of 0.5 M. In this way, it is possible to implement spatial, temporal, and SNR scalabilities. Because about twice as much overhead as that generated for a singe-layer bitstream occurs due to an increase in the number of motion vectors as shown in FIG. 2, motion prediction from the base layer is very important. Of course, since the motion vector is used only for an inter-frame coded by referring to neighboring frames, it is not used for an intra-frame coded without reference to adjacent frames.
  • As shown in FIG. 2, frames 10, 20, and 30 in the respective layers having the same temporal position can be estimated to have similar images thus similar motion vectors. Thus, one of the currently used methods for efficiently representing a motion vector includes predicting a motion vector for a current layer from a motion vector for a lower layer and coding a difference between the predicted value and the actual motion vector.
  • FIG. 3 is a diagram for explaining a conventional method for efficiently representing a motion vector using motion prediction. Referring to FIG. 3, a motion vector in a lower layer having the same temporal position as a current layer has conventionally been used as a predicted motion vector for a current layer motion vector.
  • An encoder obtains motion vectors MV0, MV1, and MV2 for a base layer, a first enhancement layer, and a second enhancement layer at predetermined accuracies and performs temporal transformation using the motion vectors MV0, MV1, and MV2 to remove temporal redundancies in the respective layers. However, the encoder sends the base layer motion vector MV0, a first enhancement layer motion vector component D1, and a second enhancement layer motion vector component D2 to the pre-decoder (or video stream server). The pre-decoder may transmit only the base layer motion vector, the base layer motion vector and the first enhancement layer motion vector component D1, or the base layer motion vector, the first enhancement layer motion vector component D1 and the second enhancement layer motion vector component D2 to a decoder to adapt to network situations.
  • The decoder then uses the received data to reconstruct a motion vector for an appropriate layer. For example, when the decoder receives the base layer motion vector and the first enhancement layer motion vector component D1, the first enhancement layer motion vector component D1 is added to the base layer motion vector MV0 in order to reconstruct the first enhancement layer motion vector MV1. The reconstructed motion vector MV1 is used to reconstruct texture data for the first enhancement layer.
  • However, when the current layer has a different frame rate than the lower layer as shown in FIG. 2, a lower layer frame having the same temporal position as the current frame may not exist. For example, because a layer frame lower than a frame 40 is not present, motion prediction through a lower layer motion vector cannot be performed. That is, since a motion vector in the frame 40 cannot be predicted, a motion vector in the first enhancement layer is inefficiently represented as a redundant motion vector.
  • SUMMARY OF THE INVENTION
  • The present invention provides a method for efficiently predicting a motion vector in an enhancement layer from a motion vector in a base layer.
  • The present invention also provides a method for efficiently predicting a motion vector even if there is no lower layer frame at the same temporal position as a current layer frame.
  • According to an aspect of the present invention, there is provided a method for efficiently compressing multi-layered motion vectors, the method including (a) obtaining a motion vector in a base layer frame having a first frame rate from an input frame, (b) obtaining a motion vector in a first enhancement layer frame having a second frame rate from the input frame, the second frame rate being greater than the first frame rate, (c) generating a predicted motion vector by referring to a motion vector for at least one frame among base layer frames present immediately before and after the same temporal position as the first enhancement layer frame if there is no base layer frame at the same temporal position as the first enhancement layer frame, and (d) coding a difference between the motion vector in the first enhancement layer frame and the generated predicted motion vector, and the obtained motion vector in the base layer.
  • According to another aspect of the present invention, there is provided a method for efficiently encoding multi-layered motion vectors, the method including (a) obtaining a motion vector in a base layer frame having a first frame rate from an input frame, (b) obtaining a motion vector in a first enhancement layer frame having a second frame rate from the input frame, the second frame rate being greater than the first frame rate, (c) generating a predicted motion vector by referring to a motion vector for at least one frame among base layer frames present immediately before and after the same temporal position as the first enhancement layer frame if there is no base layer frame at the same temporal position as the first enhancement layer frame, (d) lossy coding texture data of the base layer frame using the motion vector of the base layer frame, (e) coding a difference between the motion vector in the first enhancement layer frame and the generated predicted motion vector, and the obtained motion vector in the base layer, and (f) losslessly coding a difference between the motion vector in the first enhancement layer frame and the generated predicted motion vector, the motion vector in the base layer frame, the lossy coded result of step (d), and the lossy coded result of step (e).
  • According to still another aspect of the present invention, there is provided a multi-layered video encoding method including (a) obtaining a motion vector in a base layer frame having a first frame rate from an input frame, (b) generating a motion vector by referring to a motion vector for at least one frame among base layer frames present immediately before and after the same temporal position as the first enhancement layer frame if there is no base layer frame at the same temporal position as the first enhancement layer frame, (c) lossy coding texture data of the base layer frame using the motion vector of the base layer frame, (d) lossy coding texture data of the first enhancement layer frame using the motion vector of the first enhancement layer frame, and (e) losslessly coding the motion vector in the base layer frame, the lossy coded result of step (c), and the lossy coded result of step (d).
  • According to yet another aspect of the present invention, there is provided a multi-layered video decoding method including (a) extracting base layer data and enhancement layer data from an input bitstream, (b) if there is no base layer frame at the same temporal position as a first enhancement layer frame, generating a motion vector of the first enhancement layer frame by referring to the motion vector for at least one frame among base layer frames present immediately before and after the same temporal position as the first enhancement layer frame, (c) reconstructing the motion vector of the enhancement layer using the generated predicted motion vector, and (d) reconstructing a video sequence from texture data of the enhancement layer using the reconstructed motion vector of the enhancement layer.
  • According to a further aspect of the present invention, there is provided a multi-layered video decoding method including (a) extracting base layer data and enhancement layer data from an input bitstream, (b) if there is no base layer frame at the same temporal position as an enhancement layer frame, reconstructing a motion vector in the enhancement layer frame by referring to at least one frame among base layer frames present immediately before and after the same temporal position as the enhancement layer frame, and (c) reconstructing a video sequence from texture data of the enhancement layer using the reconstructed motion vector of the enhancement layer.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The above and other features and advantages of the present invention will become more apparent by describing in detail preferred embodiments thereof with reference to the attached drawings in which:
  • FIG. 1 shows an overall configuration of a general scalable video coding system;
  • FIG. 2 is a diagram of video coding using a multi-layered structure;
  • FIG. 3 is a diagram for explaining a conventional method for efficiently representing a motion vector using motion prediction;
  • FIG. 4 is a diagram for explaining a method for efficiently representing a motion vector using motion prediction according to an embodiment of the present invention;
  • FIG. 5 is a schematic diagram for explaining a concept of the present invention;
  • FIG. 6 is a block diagram of a video encoder according to an exemplary embodiment of the present invention;
  • FIG. 7 is a flow diagram for explaining the detailed operation of a motion vector estimation unit of an exemplary embodiment of the invention;
  • FIG. 8 illustrates examples of motion fields of previous and next frames;
  • FIG. 9 is a diagram for explaining a filtering process of an exemplary embodiment of the invention;
  • FIG. 10 is a diagram for explaining a method of an exemplary embodiment of the invention of obtaining a filtered motion vector when a resolution of a base layer and a resolution of an enhancement layer are different from each other; and
  • FIG. 11 is a block diagram of a video decoder according to an embodiment of the present invention.
  • DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS OF THE INVENTION
  • The present invention will now be described more fully with reference to the accompanying drawings, in which exemplary embodiments of the invention are shown. Advantages and features of the present invention and methods of accomplishing the same may be understood more readily by reference to the following detailed description of exemplary embodiments and the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete and will fully convey the concept of the invention to those skilled in the art, and the present invention will only be defined by the appended claims. Like reference numerals refer to like elements throughout the specification.
  • The more accurately the motion prediction is performed, the less the overhead due to a motion vector becomes. Unlike in a case where motion vectors in lower layers, i.e., MV0 and MV1, are used as estimation motion vectors MV1p and MV2p, in a case where a more accurate motion estimation method is employed, as shown in FIG. 4, the motion vectors MV0 and MV1 demonstrate the same moving effect as vectors 11 and 13. That is, unlike in the conventional art in which a differential value 12 between a motion vector in a current layer and a motion vector in a lower layer of the current layer and a differential value 14 between a motion vector in a second enhancement layer and the motion vector in the current layer are transmitted, in a case where the estimation motion vectors MV1p and MV2p are used for more accurate motion estimation, smaller values D1 and D2 have only to be stored. Accordingly, bit amounts needed for motion vectors can be reduced and bits corresponding to the reduced bit amounts are allocated to textures, thereby enhancing picture quality levels.
  • To achieve this, first, a process of generating the estimation motion vectors MV1p and MV2p should be performed just by reading motion information of a lower layer without additional information. Second, the motion vectors referenced should be set to be substantially close to motion vectors in the current layer.
  • FIG. 5 is a schematic diagram for explaining a fundamental concept of the present invention. It is assumed in this example that a current layer Ln has CIF resolution and frame rate of 30 Hz and a lower layer Ln-1 has QCIF resolution and frame rate of 15 Hz.
  • In the present invention, if there is a base layer frame at the same temporal position as the current layer frame, a predicted motion vector is generated by referring to a motion vector in the base layer frame. On the other hand, if there is no base layer frame corresponding to the current layer frame, a predicted motion vector is generated using motion vectors in at least one of base layer frames located closest to the temporal position. Referring to FIG. 5, motion vectors in current layer frames A0 and A2 are respectively predicted from motion vectors in lower layer frames B0 and B2 having the same temporal positions as the current layer frames A0 and A2. Here, what is meant by “motion predicted” is substantially predicted motion vector being generated.
  • On the other hand, a predicted motion vector for a frame A1 having no corresponding lower layer frame at the same temporal position is generated using motion vectors in the frames B0 and B2 closest to the temporal position. To achieve this, motion vectors in the frames B0 and B2 are interposed to generate a virtual motion vector (a motion vector in a virtual frame B1) at the same temporal position as the frame A1 and the virtual motion vector is used to predict a motion vector for the frame A1.
  • For example, assuming that a frame A1 of the current layer performs bi-directional prediction using the current layer frames A0 and A2 as reference frames, forward and backward motion vectors can perform efficient prediction using lower layer frames B0 and B2, respectively.
  • If the lower layer frame B0 has a backward motion vector (having B2 as a reference frame), a distance referred of the backward motion vector for the lower layer frame B0 is twice that of a current layer motion vector. Thus, when the distance and direction referred are taken into consideration, the predicted motion vector for the forward motion vector of the frame A1 can be calculated by multiplying the, backward motion vector of the frame B0 by −½. In addition, the predicted motion vector for the backward motion vector of the frame A1 can be calculated by multiplying the backward motion vector of the frame B0 by ½. In order to reduce a computational error, a value obtained by summing the backward motion vector of the frame B0 and the calculated forward motion vector of the frame A1 can also be used as the predicted motion vector for the backward motion vector of the frame A1.
  • Meanwhile, if the lower layer frame B2 has a forward motion vector (having B0 as a reference frame), a distance referred of the forward motion vector for the lower layer frame B2 is twice that of a current layer motion vector. Thus, when the distance and direction referred are taken into consideration, the predicted motion vector for the forward motion vector of the frame A1 can be calculated by multiplying the forward motion vector of the frame B2 by ½. In addition, the predicted motion vector for the backward motion vector of the frame Acan be calculated by multiplying the forward motion vector of the frame B2 by −½. In order to reduce a computational error, a value obtained by subtracting the forward motion vector of the frame B2 from the calculated forward motion vector of the frame A1 can also be used as the predicted motion vector for the backward motion vector of the frame A1.
  • FIG. 6 is a block diagram of a video encoder 100 according to an embodiment of the present invention. While FIG. 6 shows the use of one base layer and one enhancement layer, it will be readily apparent to those skilled in the art that the present invention can be applied between a lower layer and an upper layer when two or more layers are used.
  • Referring to FIG. 6, the video encoder 100 includes a downsampling unit 110, motion estimation units 121,131, lossy encoders 125, 135, a motion vector estimation unit 140, and an entropy coder 150.
  • The downsampling unit 110 downsamples an input video to a resolution and frame rate suitable for each layer. When a QCIF@15 Hz base layer and a CIF@30 Hz enhancement layer are used as shown in FIG. 5, an original input video is downsampled to CIF and QCIF resolutions and then downsampled to frame rates of 15 Hz and 30 Hz. Downsampling the resolution may be performed using a MPEG downsampling unit or wavelet downsampler. Downsampling the frame rate may be performed using frame skip or frame interpolation.
  • As described above, it is obvious to one skilled in the art that the enhancement layer does not have both a higher resolution and a higher frame rate than the base layer but has a higher resolution than and the same frame rate as the base layer, or the same resolution with and a higher frame rate than the base layer.
  • A motion estimation unit 121 performs motion estimation on a base layer frame to obtain motion vectors in the base layer frame. The motion estimation is the process of finding the closest block to a block in a current frame, i.e., a block with a minimum error. Various techniques including fixed-size block matching and hierarchical variable size block matching (HVSBM) may be used in the motion estimation. In the same manner, the motion estimation unit 131 performs motion estimation on an enhancement layer frame to obtain motion vectors in the enhancement layer frame. The motion vectors in the base layer frame and the enhancement layer frame are obtained in this way to predict a motion vector in the enhancement layer frame using a virtual motion vector.
  • The motion vector prediction unit 140 generates a predicted motion vector in an enhancement layer using the motion vectors in the base layer and obtains a difference between the obtained motion vector and the predicted motion vector in the enhancement layer frame (hereinafter, referred to as “motion vector components”).
  • The operation performed by the motion vector prediction unit 140 will now be described in more detail with reference to FIG. 7. First, it is determined whether there is a base layer frame at the same temporal position as the current frame in the enhancement layer in step S10. If so (YES in step S10), motion vectors in the base layer frame having a spatial relevancy with the current frame are filtered in step S20. As a result, a filtered motion vector corresponding to one motion vector in the current frame is generated. Step S20 will be described below with reference to FIGS. 9 and 10.
  • In step S30, it is determined whether resolutions of the enhancement layer and the base layer are the same or not. If the same (YES in step S30), the motion vector in the current frame and the motion vector resulting from the filtering are subtracted in step S40. This is because the motion vector resulting from the filtering corresponds to the predicted motion vector in the case of the same resolution. If not the same (NO in step S30), the motion vector resulting from the filtering is upsampled with the resolution of the enhancement layer in step S45. For example, if the resolution of the enhancement layer is double that the base layer, the term “sampling” means magnification of a motion vector generated by the filtering by two times. In this case, the upsampled motion vector is a predicted motion vector. Thus, a difference between the current frame and the upsampled motion vector is obtained in step S50.
  • Meanwhile, if it is determined in step S10 that there is no base layer frame at the same temporal position as the current frame in the enhancement layer (NO in step S10), the motion vectors in the base layer frame located immediately before and after the temporal position (that is, the closest frames before and after the temporal position) are filtered in steps S55 and S60. For example, in FIG. 5, the frames located immediately before and after the same temporal position as the current frame A1 are B0 and B2. In other words, among the motion vectors in the frame, e.g., B0, present immediately before the temporal position, a motion vector having a spatial correlation with a motion vector in the current frame is filtered to generate a filtered motion vector in step S55. Then, among motion vectors in the frame, e.g., B2, present immediately after the temporal position, a motion vector having a spatial correlation with a motion vector in the current frame is filtered to generate a filtered motion vector in step S60. This filtering process is similar to that in step S20, which will later be described in FIGS. 9 and 10.
  • Next, the current frame and a “virtual motion vector” at the temporal position are interpolated using the filtered motion vector generated after the filtering in step S55 and the filtered motion vector generated after the filtering in step S60. Usable examples of interpolation include a simple averaging method, bi-linear interpolation, bi-cubic interpolation, and so on. If the distances from the frames present immediately before and after the temporal position are different, unlike in FIG. 5, interpolation is preferably performed such that an interpolation weighted factor is increased so as to be in inverse proportion to a distance. As described above, if the distances are different, only one frame closest from the current frame can be used.
  • Rather than the above-mentioned simple interpolation methods, in consideration of characteristics of motion vector fields, weighted factors of immediately previous and next frames may be determined. As shown in FIG. 8, motion vectors in the immediately previous and next frames are displayed (white block portions being portions with motion vectors skipped), blocks having the same spatial position as a certain motion vector in the current frame are block 61 and block 62. In the case of the immediately previous frame, motion vectors in those in the vicinity of the block 61 (enlargement or reduction convertible) are substantially the same as the motion vector of the block 61. On the other hand, in the case of the immediately next frame, motion vectors in those in the vicinity of the block 62 are mostly different from the motion vector of the block 62. Thus, in generating predicted motion vectors for motion vectors in the current frame, a weighted factor of the motion vector in the immediately previous frame is preferably increased for more accurate motion estimation. That is to say, a sum of differences between the motion vector of the block 61 and each of the motion vectors in the vicinity of the block 61 is computed, and a sum of differences between the motion vector of the block 62 and each of the motion vectors in the vicinity of the block 62 is computed. The weighted factor of the motion vector in each frame is in inverse proportion to the sum of differences. Since a decoder side receives motion vectors from an encoder side, the weighted factor can be computed in the same manner as in the encoder side without the need of being informed of such from the encoder side.
  • Next, if the resolution of the enhancement layer and the resolution of the base layer are the same (YES in step S70), the interpolated virtual motion vector is a predicted motion vector. Therefore, in order to effectively compress a motion vector, subtraction is performed on the motion vector in the current frame and the virtual motion vector in step S75. The subtraction result becomes the motion vector component in the enhancement layer.
  • On the other hand, if the resolution of the enhancement layer and the resolution of the base layer are not the same (NO in step S70), the interpolated virtual motion vector is upsampled to be as large as the motion vector in the enhancement layer in step S80. As described above, since the upsampled motion vector is a predicted motion vector, subtraction is performed on the motion vector in the current frame and the upsampled motion vector in step S85.
  • FIG. 9 is a diagram for explaining a filtering process. Here, “filtering” means a process of obtaining a filtered motion vector using a motion vector for an enhancement layer frame and motion vectors having a spatial correlation with the enhancement layer frame. Here, the position having a “spatial correlation” means a “directly corresponding position” (First Embodiment), or the directly corresponding position and a region including the vicinity enlarged from the position (Second Embodiment).
  • The first embodiment will first be described. Referring to FIG. 9, if the resolution of the enhancement layer and the resolution of the base layer are the same, a motion vector having a spatial correlation with (that is, positioned directly corresponding to) the motion vector 65 is a motion vector 63. In this case, the motion vector 63 is a “filtered motion vector.”
  • If the resolution of the enhancement layer and the resolution of the base layer are not the same, a motion vector having a spatial correlation with (that is, positioned directly corresponding to) the motion vector 65 is a motion vector 64. In this case, the motion vector 64 is a “filtered motion vector.” Of course, the motion vector 64 has a spatial correlation with motion vectors 66, 67, and 68 as well as the motion vector 65.
  • Next, the second embodiment will be described, in which filtering is performed in consideration of not only motion vectors positioned directly corresponding to a certain motion vector but also motion vectors in the vicinity thereof. In this case, the term “a position having a spatial correlation” used herein is meant to embrace a directly corresponding position and a region including the vicinity thereof. The reason for enlarging the region in such a manner is that motion vectors have a spatial similarity and taking adjacent motion vectors into consideration may be advantageous for motion prediction.
  • If the resolution of the enhancement layer and the resolution of the base layer are the same, as shown in FIG. 9, while the motion vector 63 directly corresponds to the motion vector 65, filtering is performed in consideration of not only the motion vector 63 but also the motion vectors in the vicinity of the motion vector 63. For example, assuming that the “vicinity” of the motion vector 63 means 8 motion vectors around the motion vector 63, a filtered motion vector can be obtained by a linear combination of 9 motion vectors, including the motion vector 63. In this case, a relatively greater coefficient (that is, a greater weighted factor) is applied for the motion vector 63 and a relatively smaller coefficient is applied to the adjacent motion vectors. It is obvious that different weighted factors may be applied to the adjacent motion vectors according to positions of sides or edges. In order to obtain a filtered motion vector using a plurality of motion vectors, a median filter, a bi-cubic filter, a quadratic filter, or other filters may be used.
  • Meanwhile, if the resolution of the enhancement layer and the resolution of the base layer are not the same, while the motion vector 64 directly corresponds to the motion vector 65, filtering is performed in consideration of not only the motion vector 64 but also the motion vectors in the vicinity of the motion vector 64. A process of obtaining a filtered motion vector will be described with reference to FIG. 10 in a case where the resolution of the enhancement layer and the resolution of the base layer are not the same.
  • First, it is assumed that a block in a base layer corresponds to 4 fixed blocks in a first enhancement layer. For example, a block f corresponds to a region consisting of blocks f5, f6, f7, and f8. To apply a predetermined interpolation method to obtain a reference motion vector, it is necessary to determine a regional range having a spatial correlation in the base layer and then determine weighted factors for motions vectors within the regional range.
  • For example, a motion vector of block f5 has a high spatial correlation with blocks b, e, and f. Since the block f5 occupies a quarter of a region corresponding to a block f, it is predictable that the block f5 is considerably spatially correlated with block b, block e, and block f in the base layer.
  • As described above, after the regional range having a spatial correlation is determined, motion vectors present in the region are filtered. In this case, it is preferred that the weighted factor for the block f is greater than that for the block b or block e. In addition, a variety of filters such as a median filter, a bi-cubic filter, or a quadratic filter can be used. Usable examples of interpolation include a simple averaging method, bi-linear interpolation, bi-cubic interpolation, and so on.
  • Alternatively, not only the block b, block e, and block f but also the block a may be included in the range of the reference block. In addition, different weighted factors may be assigned to the respective blocks, for example, 25% to block b, 25% to block e, 10% to block a, or 40% to block f. Still alternatively, the region of the reference block may be set to include not only the immediately adjacent block but also alternate blocks. One skilled in the art will recognize that the present invention may be implemented in a manner different from that specifically discussed in the present application. If the resolution of the base layer is different from the resolution of the enhancement layer, the scale of a filtered motion vector is different from a motion vector scale of the enhancement layer. This is because the filtered motion vector performs only a filtering using the motion vectors in the base layer. In the present invention, upsampling is performed in a separate manner.
  • While fixed blocks have been used in the second embodiment of the present invention, appropriately filtering according to the extent of a spatial correlation can be sufficiently implemented using variable blocks.
  • Referring back to FIG. 6, the lossy coder 125 losslessly codes a base layer frame using the motion vector obtained from the motion estimation unit 121. The lossy coder 125 may include a temporal transform unit 122, a spatial transform unit 123, and a quantization unit 124.
  • The temporal transform unit 122 constructs predictive frames and performs a subtraction on the current frame and the predictive frame using the motion vector obtained from the motion estimation unit 121 and a frame that is being positioned temporally different from the current frame, thereby reducing a temporal redundancy. As result, a residual frame is generated. Of course, if the current frame is encoded without referencing another frame, that is, if the current frame is an intra frame, it requires no motion vector, and temporal transformation processes using a predictive frame are skipped. Among the temporal transformation processes, to support temporal scalability, MCTF (Motion Compensated Temporal Filtering), or UMCTF (Unconstrained MCTF) may be used.
  • The spatial transform unit 123 performs a spatial transform on residual frames generated by the temporal transform unit 122 or original input frames and generates transform coefficients. For spatial transform, DCT (Discrete Cosine Transform), wavelet transform, or the like, may be used. In the case of employing the DCT, the transform coefficient is a DCT coefficient, and in the case of employing the wavelet transform, the transform coefficient is a wavelet coefficient.
  • The quantization unit 124 quantizes the transform coefficient generated by the spatial transform unit 123. Quantization means a process of dividing the DCT coefficients represented by arbitrary real numbered values into predetermined intervals to represent the same as discrete values and matching the discrete values with indices from a predetermined quantization table.
  • On the other hand, a lossy coder 135 performs lossy coding on the enhancement layer frame using motion vectors in the enhancement layer frame obtained by the motion estimation unit 131. The lossy coder 135 includes a temporal transform unit 132, a spatial transform unit 133, and a quantization unit 134. Because the lossy coder 135 performs the same operation as the lossy coder 125, except that it performs lossy coding on the enhancement layer frame, a detailed explanation thereof will not be given.
  • The entropy coder 150 losslessly encodes (or entropy encodes) the quantization coefficients obtained by the quantization units 124 and 134 for the base layer and the enhancement layer, the base layer motion vectors generated by the motion estimation unit 121 for the base layer, and the enhancement layer motion vector components generated by the motion vector estimation unit 140 into an output bitstream. Various coding schemes such as Huffman Coding, Arithmetic Coding, and Variable Length Coding may be employed for lossless coding.
  • While FIG. 6 shows the lossy decoder 125 for the base layer is separated from the lossy decoder 135 for the enhancement layer, it will be obvious to those skilled in the art that a single lossy decoder can be used to process both the base layer and the enhancement layer.
  • FIG. 11 is a block diagram of a video decoder 200 according to an embodiment of the present invention.
  • Referring to FIG. 11, the video decoder 200 includes entropy decoder 210, lossy decoders 225, 235, and a motion vector reconstruction unit 240.
  • The entropy decoder 210 performs the inverse operation of the entropy encoding and extracts motion vectors in a base layer frame, motion vector components of an enhancement layer frame, and texture data from the base layer frame and the enhancement layer frame from an input bitstream.
  • A motion vector reconstruction unit 240 reconstructs motion vectors in the enhancement layer using the motion vectors in the base layer and motion vector components in the enhancement layer, which will now be described in more detail. The motion vector reconstruction process includes, if there is a base layer frame at the same temporal position as the first enhancement layer frame, generating a predicted motion vector by referring to the motion vector in the base layer frame, if not, generating a predicted motion vector by referring to a motion vector in at least one frame among base layer frames present immediately before and after the temporal position, and reconstructing a motion vector in the enhancement layer by adding the generated predicted motion vector and a motion vector component of the enhancement layer.
  • The motion vector reconstruction process is substantially the same as the motion vector estimation process (see FIG. 7), except that the decoder 200 performs an addition step for the predicted motion vector and the motion vector component of the current frame (enhancement layer frame), unlike the encoder 100, as shown in FIG. 7, performing steps S40, S50, S75, and S85 are all subtraction steps for the predicted motion vectors and the motion vectors in the current frame. However, a method of generating a predicted motion vector is the same and a repetitive explanation thereof will not be given.
  • A lossy decoder 235 performs the inverse operation of the lossy coder (135 of FIG. 6) to reconstruct a video sequence from the texture data of the enhancement layer frames using the reconstructed motion vectors in the enhancement layer frames. The lossy decoder 235 includes an inverse quantization unit 231, an inverse spatial transform unit 232, and an inverse temporal transform unit 233.
  • The inverse quantization unit 231 performs inverse quantization on the extracted texture data from the enhancement layer frames. The inverse quantization is the process of reconstructing values from corresponding quantization indices created during a quantization process using a quantization table used during the quantization process.
  • The inverse spatial transform unit 232 performs inverse spatial transform on the inversely quantized result. The inverse spatial transform is the inverse of spatial transform performed by the spatial transform unit 133 in the encoder 100. Inverse DCT and inverse wavelet transform are among techniques that may be used for the inverse spatial transform.
  • The inverse temporal transform unit 233 performs the inverse operation to the temporal transform unit 132 on the inversely spatially transformed result to reconstruct a video sequence. More specifically, the inverse temporal transform unit 233 uses motion vectors reconstructed by the motion vector reconstruction unit 240 to generate a predicted frame and adds the predicted frame to the inversely spatially transformed result in order to reconstruct a video sequence. Of course, an intra-frame that is not temporally transformed at an encoder is not necessarily subjected to an inverse temporal transform.
  • The encoder 100 may remove redundancies in the texture of an enhancement layer using a base layer during encoding. In this case, because the decoder 200 reconstructs a base layer frame and uses the reconstructed base layer frame and the texture data in the enhancement layer frame received from the entropy decoder 210 to reconstruct the enhancement layer frame, a lossy decoder 225 for the base layer is used.
  • In this case, the inverse temporal transform unit 233 uses the reconstructed motion vectors in enhancement layer frames to reconstruct a video sequence from the texture data in the enhancement layer frames (inversely spatially transformed result) and the reconstructed base layer frames.
  • While FIG. 11 shows the lossy decoder 225 for the base layer is separated from the lossy decoder 235 for the enhancement layer, it will be obvious to those skilled in the art that a single lossy decoder can be used to process both the base layer and the enhancement layer.
  • Based on a motion vector obtained in each layer, a more effective motion vector compression and transmission method has been described. In another embodiment, after obtaining a motion vector from one among multiple layers, and up-/down-sampling the same, if necessary, the up-/down-sampled motion vector may be used as a motion vector in another layer. In this case, motion vectors can be saved, motion vector accuracy in the enhancement layer may deteriorate.
  • In this case, a video encoding process includes obtaining a motion vector in a base layer frame having a first frame rate from an input frame, obtaining a motion vector in an enhancement layer frame having a second frame rate greater than the first frame rate, lossy coding texture data of the base layer frame using the motion vector in the base layer frame, lossy coding texture data of the first enhancement layer frame using the motion vector in the first enhancement layer frame, and losslessly coding motion vector in the base layer frame and the lossy coded result.
  • Here, the obtaining of a motion vector in the enhancement layer frame includes (1) generating a motion vector in the enhancement layer frame by referring to the motion vector in the base layer frame if there is a base layer frame at the same temporal position as the enhancement layer frame, and (2) if not, generating a motion vector in the enhancement layer frame by referring to a motion vector in at least one frame among base layer frames present immediately before and after the temporal position.
  • The step (1) includes filtering motion vectors in the base layer frame having a spatial correlation with a motion vector in the enhancement layer frame using a predetermined filter, and upsampling the motion vector resulting from the filtering such that the motion vector becomes the same as that of the enhancement layer, if the resolution of the base layer frame is not the same as the resolution of the enhancement layer. The resultant motion vector is used as a motion vector in the enhancement layer not as a predicted motion vector.
  • The step (2) includes interpolating a virtual motion vector in the base layer frame by referring to the motion vector in the base layer frame present immediately before and after the temporal position if there is a base layer frame at the same temporal position as the enhancement layer frame, and if the resolution of the base layer frame is not the same as the resolution of the enhancement layer, upsampling the interpolated virtual motion vector such that it is as large as the motion vector in the enhancement layer. The resultant motion vector is used as the motion vector in the enhancement layer not as the predicted motion vector.
  • The filtering process, the upsampling process, and the interpolation process are the same as described above, and a detailed explanation thereof will not be given.
  • A video decoding method according to an embodiment of the present invention, in which after obtaining a motion vector from one among multiple layers as described above, and up-/down-sampling the same, the up-/down-sampled motion vector is used as a motion vector in another layer, will now be described.
  • The video decoding process includes extracting a motion vector in a base layer frame and texture data of an enhancement layer frame from an input frame, reconstructing the motion vector in the enhancement layer frame using the extracted motion vector in the base layer frame, and reconstructing a video sequence from the texture data of the base layer frame and the texture data of the first enhancement layer frame.
  • The reconstructing of the motion vector in the enhancement layer including reconstructing a motion vector in the enhancement layer by referring to the base layer frame if there is a base layer frame at the same temporal position as the enhancement layer frame, and if not, reconstructing a motion vector in the enhancement layer frame by referring to a motion vector in at least one frame among base layer frames present immediately before and after the temporal position.
  • As described above, the reconstruction process of the motion vector in the enhancement layer frame using only the motion vector in the base layer is substantially the same as the motion vector encoding process for generating the motion vector in the enhancement layer frame the using the motion vector in the base layer.
  • While the above-mentioned embodiments have been described in case of one base layer and one enhancement layer by way of example, it is obvious to one skilled in the art that the present invention can be easily implemented for multi-layers. If the multi-layers are composed of a base layer, a first enhancement layer, and a second enhancement layer, an algorithm used for the base layer and the first enhancement layer can also apply to the first enhancement layer and the second enhancement layer.
  • Each of the respective components as shown in FIGS. 6 and 11, may be, but is not limited to, a software or hardware component, such as a Field Programmable Gate Array (FPGA) or Application Specific Integrated Circuit (ASIC), which performs certain tasks. A module may advantageously be configured to reside on the addressable storage medium and configured to execute on one or more processors. Thus, a module may include, by way of example, components, such as software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables. The functionality provided for in the components and modules may be combined into fewer components and modules or further separated into additional components and modules. In addition, the components and modules may be implemented such that they execute one or more computers in a communication system.
  • According to the present invention, the compression efficiency of multi-layered motion vectors can be improved.
  • In addition, the quality of an image segmented into layers with the same bit-rates can be enhanced.
  • While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the following claims. Therefore, it is to be understood that the above-described embodiments have been provided only in a descriptive sense and will not be construed as placing any limitation on the scope of the invention.

Claims (21)

1. A method for efficiently compressing multi-layered motion vectors, the method comprising:
(a) obtaining a motion vector in a base layer frame having a first frame rate from an input frame;
(b) obtaining a motion vector in a first enhancement layer frame having a second frame rate from the input frame, the second frame rate being greater than the first frame rate;
(c) generating a predicted motion vector by referring to a motion vector for at least one frame among base layer frames present immediately before and after the same temporal position as the first enhancement layer frame if there is no base layer frame at the same temporal position as the first enhancement layer frame; and
(d) coding a difference between the motion vector in the first enhancement layer frame and the generated predicted motion vector, and the obtained motion vector in the base layer.
2. The method of claim 1, wherein operation (c) comprises:
(c1) generating a predicted motion vector by referring to the motion vector in the base layer frame if there is a base layer frame at the same temporal position as the first enhancement layer frame.
3. The method of claim 2, further comprising:
(e) obtaining a motion vector in a second enhancement layer frame having a third frame rate from the input frame, the third frame rate being greater than the first frame rate;
(f) generating a predicted motion vector for the second enhancement layer frame, comprising (f1) generating a predicted motion vector by referring to the motion vector in the first enhancement layer frame if there is the first enhancement layer frame at the same temporal position as the second enhancement layer frame, (f2) if not, generating a predicted motion vector for the second enhancement layer frame by referring to a motion vector for at least one frame among first enhancement layer frames present immediately before and after the same temporal position as the second enhancement layer frame; and
(g) coding a difference between the motion vector in the second enhancement layer frame and the predicted motion vector in the generated second enhancement layer frame.
4. The method of claim 2, wherein operation (c1) comprises:
(c11) filtering the motion vectors in the base layer frame having a spatial correlation with the motion vector in the first enhancement layer frame using a predetermined filter; and
(c12) if the resolution of the base layer frame is not the same as the resolution of the first enhancement layer, upsampling the motion vector resulting from the filtering such that the motion vector becomes as large as the motion vector in the first enhancement layer.
5. The method of claim 4, wherein the filtering is performed with different weighted factors assigned to the respective motion vectors according to the spatial correlation.
6. The method of claim 1, wherein operation (c) comprises:
(c21) interpolating a virtual motion vector in the base layer frame by referring to the motion vector in the base layer frame present immediately before and after the temporal position; and
(c22) if the resolution of the base layer frame is not the same as the resolution of the first enhancement layer, generating the predicted motion vector by upsampling the interpolated virtual motion vector to be as large as the motion vector in the first enhancement layer.
7. The method of claim 6, wherein operation (c21) comprises interpolating the virtual motion vector in the base layer frame by assigning a high referring ratio of a base layer frame having a relatively large motion vector uniformity among base layer frames present immediately before and after the temporal position.
8. The method of claim 6, wherein operation (c21) comprises:
(c211) among motion vectors in the base layer frames present immediately before the same temporal position as the enhancement layer frame, filtering motion vectors in the base layer frame having a spatial correlation with a motion vector in the enhancement layer frame using a predetermined filter;
(c212) among motion vectors in the base layer frames present immediately after the same temporal position as the enhancement layer frame, filtering motion vectors in the base layer frame having a spatial correlation with a motion vector in the enhancement layer frame using the predetermined filter; and
(c213) interpolating the virtual motion vector by applying a predetermined algorithm to the filtering results of operations (c211) and (c212).
9. A method for efficiently encoding multi-layered motion vectors, the method comprising:
(a) obtaining a motion vector in a base layer frame having a first frame rate from an input frame;
(b) obtaining a motion vector in a first enhancement layer frame having a second frame rate from the input frame, the second frame rate being greater than the first frame rate;
(c) generating a predicted motion vector by referring to a motion vector for at least one frame among base layer frames present immediately before and after the same temporal position as the first enhancement layer frame if there is no base layer frame at the same temporal position as the first enhancement layer frame;
(d) lossy coding texture data of the base layer frame using the motion vector of the base layer frame;
(e) coding a difference between the motion vector in the first enhancement layer frame and the generated predicted motion vector, and the obtained motion vector in the base layer; and
(f) losslessly coding a difference between the motion vector in the first enhancement layer frame and the generated predicted motion vector, the motion vector in the base layer frame, the lossy coded result of operation (d), and the lossy coded result of operation (e).
10. The method of claim 9, wherein operation (c) comprises:
(c1) generating a predicted motion vector by referring to the motion vector in the base layer frame if there is a base layer frame at the same temporal position as the first enhancement layer frame.
11. A multi-layered video encoding method comprising:
(a) obtaining a motion vector in a base layer frame having a first frame rate from an input frame;
(b) generating a motion vector by referring to a motion vector for at least one frame among base layer frames present immediately before and after the same temporal position as a first enhancement layer frame if there is no base layer frame at the same temporal position as the first enhancement layer frame;
(c) lossy coding texture data of the base layer frame using the motion vector of the base layer frame;
(d) lossy coding texture data of the first enhancement layer frame using the motion vector of the first enhancement layer frame; and (e) losslessly coding the motion vector in the base layer frame, the lossy coded result of operation (c), and the lossy coded result of operation (d).
12. A multi-layered video decoding method comprising:
(a) extracting base layer data and enhancement layer data from an input bitstream;
(b) if there is no base layer frame at the same temporal position as a first enhancement layer frame, generating a motion vector of the first enhancement layer frame by referring to the motion vector for at least one frame among base layer frames present immediately before and after the same temporal position as the first enhancement layer frame;
(c) reconstructing the motion vector of the enhancement layer using the generated predicted motion vector; and
(d) reconstructing a video sequence from texture data of the enhancement layer using the reconstructed motion vector of the enhancement layer.
13. The method of claim 12, wherein operation (b) comprises:
(b1) if there is a base layer frame at the same temporal position as the enhancement layer frame, generating a predicted motion vector by referring to the motion vector of the base layer frame.
14. The method of claim 13, wherein operation (b1) comprises:
(b11) filtering motion vectors in the base layer frame having a spatial correlation with a motion vector in the enhancement layer frame using a predetermined filter; and
(b12) if the resolution of the base layer frame is not the same as the resolution of the first enhancement layer, generating the predicted motion vector by upsampling the motion vector generated resulting from the filtering to be as large as the motion vector in the enhancement layer.
15. The method of claim 12, wherein the filtering is performed with different weighted factors assigned to the respective motion vectors according to the spatial correlation.
16. The method of claim 12, wherein operation (b) comprises:
(b21) interpolating the virtual motion vector of the base layer frame by referring to motion vectors in base layer frames present immediately before and after the same temporal position as the first enhancement layer frame; and
(b22) if the resolution of the base layer frame is not the same as the resolution of the first enhancement layer, generating the predicted motion vector by upsampling the interpolated virtual motion vector to be as large as the motion vector of the enhancement layer.
17. The method of claim 16, wherein operation (b21) comprises interpolating the virtual motion vector in the base layer frame by assigning a high referring ratio of a base layer frame having a relatively large motion vector uniformity among base layer frames present immediately before and after the temporal position.
18. The method of claim 16, wherein operation (b21) comprises:
(b211) among motion vectors in the base layer frames present immediately before the same temporal position as the enhancement layer frame, filtering motion vectors in the base layer frame having a spatial correlation with a motion vector in the enhancement layer frame using a predetermined filter;
(b212) among motion vectors in the base layer frames present immediately after the same temporal position as the enhancement layer frame, filtering motion vectors in the base layer frame having a spatial correlation with a motion vector in the enhancement layer frame using the predetermined filter; and
(b213) interpolating the virtual motion vector by applying a predetermined algorithm to the filtering results of operations (b211) and (b212).
19. A multi-layered video decoding method comprising:
(a) extracting base layer data and enhancement layer data from an input bitstream;
(b) if there is no base layer frame at the same temporal position as an enhancement layer frame, reconstructing a motion vector in the enhancement layer frame by referring to at least one frame among base layer frames present immediately before and after the same temporal position as the enhancement layer frame; and
(c) reconstructing a video sequence from texture data of the enhancement layer using the reconstructed motion vector of the enhancement layer.
20. The method of claim 19, wherein operation (b) comprises reconstructing the motion vector in the enhancement layer by referring to the base layer frame if there is a base layer frame at the same temporal position as the enhancement layer frame.
21. A recording medium having a computer readable program recorded therein, the program for executing the method of claim 1.
US11/254,051 2004-10-21 2005-10-20 Method and apparatus for effectively compressing motion vectors in video coder based on multi-layer Expired - Fee Related US7889793B2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US11/254,051 US7889793B2 (en) 2004-10-21 2005-10-20 Method and apparatus for effectively compressing motion vectors in video coder based on multi-layer
US13/005,990 US8116578B2 (en) 2004-10-21 2011-01-13 Method and apparatus for effectively compressing motion vectors in video coder based on multi-layer
US13/355,619 US8520962B2 (en) 2004-10-21 2012-01-23 Method and apparatus for effectively compressing motion vectors in video coder based on multi-layer

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US62032804P 2004-10-21 2004-10-21
KR10-2004-0103059 2004-12-08
KR1020040103059A KR100664929B1 (en) 2004-10-21 2004-12-08 Method and apparatus for effectively compressing motion vectors in video coder based on multi-layer
US11/254,051 US7889793B2 (en) 2004-10-21 2005-10-20 Method and apparatus for effectively compressing motion vectors in video coder based on multi-layer

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13/005,990 Continuation US8116578B2 (en) 2004-10-21 2011-01-13 Method and apparatus for effectively compressing motion vectors in video coder based on multi-layer

Publications (2)

Publication Number Publication Date
US20060088101A1 true US20060088101A1 (en) 2006-04-27
US7889793B2 US7889793B2 (en) 2011-02-15

Family

ID=36748182

Family Applications (3)

Application Number Title Priority Date Filing Date
US11/254,051 Expired - Fee Related US7889793B2 (en) 2004-10-21 2005-10-20 Method and apparatus for effectively compressing motion vectors in video coder based on multi-layer
US13/005,990 Expired - Fee Related US8116578B2 (en) 2004-10-21 2011-01-13 Method and apparatus for effectively compressing motion vectors in video coder based on multi-layer
US13/355,619 Expired - Fee Related US8520962B2 (en) 2004-10-21 2012-01-23 Method and apparatus for effectively compressing motion vectors in video coder based on multi-layer

Family Applications After (2)

Application Number Title Priority Date Filing Date
US13/005,990 Expired - Fee Related US8116578B2 (en) 2004-10-21 2011-01-13 Method and apparatus for effectively compressing motion vectors in video coder based on multi-layer
US13/355,619 Expired - Fee Related US8520962B2 (en) 2004-10-21 2012-01-23 Method and apparatus for effectively compressing motion vectors in video coder based on multi-layer

Country Status (3)

Country Link
US (3) US7889793B2 (en)
KR (1) KR100664929B1 (en)
CN (1) CN1764280B (en)

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080095238A1 (en) * 2006-10-18 2008-04-24 Apple Inc. Scalable video coding with filtering of lower layers
US20090175333A1 (en) * 2008-01-09 2009-07-09 Motorola Inc Method and apparatus for highly scalable intraframe video coding
US20090220004A1 (en) * 2006-01-11 2009-09-03 Mitsubishi Electric Corporation Error Concealment for Scalable Video Coding
US20100158398A1 (en) * 2007-09-10 2010-06-24 Fujifilm Corporation Image processing apparatus, image processing method, and computer readable medium
US20100183074A1 (en) * 2007-07-19 2010-07-22 Olympus Corporation Image processing method, image processing apparatus and computer readable storage medium
US20100303154A1 (en) * 2007-08-31 2010-12-02 Canon Kabushiki Kaisha method and device for video sequence decoding with error concealment
US7889793B2 (en) * 2004-10-21 2011-02-15 Samsung Electronics Co., Ltd. Method and apparatus for effectively compressing motion vectors in video coder based on multi-layer
US20110103445A1 (en) * 2008-07-16 2011-05-05 Peter Jax Method and apparatus for synchronizing highly compressed enhancement layer data
US20110286661A1 (en) * 2010-05-20 2011-11-24 Samsung Electronics Co., Ltd. Method and apparatus for temporally interpolating three-dimensional depth image
US20120189060A1 (en) * 2011-01-20 2012-07-26 Industry-Academic Cooperation Foundation, Yonsei University Apparatus and method for encoding and decoding motion information and disparity information
JP2013021629A (en) * 2011-07-14 2013-01-31 Sony Corp Image processing apparatus and image processing method
US8428364B2 (en) 2010-01-15 2013-04-23 Dolby Laboratories Licensing Corporation Edge enhancement for temporal scaling with metadata
WO2013068825A1 (en) * 2011-11-10 2013-05-16 Luca Rossato Upsampling and downsampling of motion maps and other auxiliary maps in a tiered signal quality hierarchy
US20130121568A1 (en) * 2011-11-15 2013-05-16 At&T Intellectual Property I, L.P. System and Method of Image Upsampling
US20130188718A1 (en) * 2012-01-20 2013-07-25 Qualcomm Incorporated Motion prediction in svc without including a temporally neighboring block motion vector in a candidate list
US20130222539A1 (en) * 2010-10-08 2013-08-29 Dolby Laboratories Licensing Corporation Scalable frame compatible multiview encoding and decoding methods
US20130322537A1 (en) * 2012-05-14 2013-12-05 Luca Rossato Estimation, encoding and decoding of motion information in multidimensional signals through motion zones, and auxiliary information through auxiliary zones
CN103597836A (en) * 2011-06-14 2014-02-19 索尼公司 Image processing device and method
US20140064374A1 (en) * 2012-08-29 2014-03-06 Vid Scale, Inc. Method and apparatus of motion vector prediction for scalable video coding
GB2505728A (en) * 2012-08-30 2014-03-12 Canon Kk Inter-layer Temporal Prediction in Scalable Video Coding
US20140098220A1 (en) * 2012-10-04 2014-04-10 Cognex Corporation Symbology reader with multi-core processor
US20140146883A1 (en) * 2012-11-29 2014-05-29 Ati Technologies Ulc Bandwidth saving architecture for scalable video coding spatial mode
US20140185680A1 (en) * 2012-12-28 2014-07-03 Qualcomm Incorporated Device and method for scalable and multiview/3d coding of video information
US20140247878A1 (en) * 2012-09-21 2014-09-04 Lidong Xu Cross-layer motion vector prediction
US20150103899A1 (en) * 2012-04-27 2015-04-16 Canon Kabushiki Kaisha Scalable encoding and decoding
JP2016042717A (en) * 2015-10-29 2016-03-31 ソニー株式会社 Image processor and image processing method
JP2017060184A (en) * 2016-11-22 2017-03-23 ソニー株式会社 Image processing system and image processing method
US20170133027A1 (en) * 2014-06-27 2017-05-11 Orange Resampling of an Audio Signal by Interpolation for Low-Delay Encoding/Decoding
US9774882B2 (en) 2009-07-04 2017-09-26 Dolby Laboratories Licensing Corporation Encoding and decoding architectures for format compatible 3D video delivery
US10832379B1 (en) * 2015-04-01 2020-11-10 Pixelworks, Inc. Temporal stability for single frame super resolution
US20220279204A1 (en) * 2021-02-26 2022-09-01 Qualcomm Incorporated Efficient video encoder architecture
US20220295071A1 (en) * 2019-12-02 2022-09-15 Huawei Technologies Co., Ltd. Video encoding method, video decoding method, and corresponding apparatus

Families Citing this family (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100703746B1 (en) * 2005-01-21 2007-04-05 삼성전자주식회사 Video coding method and apparatus for predicting effectively unsynchronized frame
KR100703745B1 (en) * 2005-01-21 2007-04-05 삼성전자주식회사 Video coding method and apparatus for predicting effectively unsynchronized frame
KR100714689B1 (en) * 2005-01-21 2007-05-04 삼성전자주식회사 Method for multi-layer based scalable video coding and decoding, and apparatus for the same
KR100704626B1 (en) * 2005-02-07 2007-04-09 삼성전자주식회사 Method and apparatus for compressing multi-layered motion vectors
CN101258754B (en) * 2005-04-08 2010-08-11 新加坡科技研究局 Method for encoding at least one digital picture and the encoder
US20090161762A1 (en) * 2005-11-15 2009-06-25 Dong-San Jun Method of scalable video coding for varying spatial scalability of bitstream in real time and a codec using the same
US8189934B2 (en) * 2006-03-27 2012-05-29 Panasonic Corporation Image coding apparatus and image decoding apparatus
KR100791299B1 (en) 2006-04-11 2008-01-04 삼성전자주식회사 Multi-layer based video encoding method and apparatus thereof
KR100759722B1 (en) * 2006-05-12 2007-09-20 경희대학교 산학협력단 H.264-based scalable encoding method for performing motion-compensation using different prediction structure according to frame size based layers
US8422555B2 (en) * 2006-07-11 2013-04-16 Nokia Corporation Scalable video coding
US8548056B2 (en) * 2007-01-08 2013-10-01 Qualcomm Incorporated Extended inter-layer coding for spatial scability
US8199812B2 (en) * 2007-01-09 2012-06-12 Qualcomm Incorporated Adaptive upsampling for scalable video coding
JP2011530222A (en) * 2008-08-01 2011-12-15 ゾラン コーポレイション Video encoder with integrated temporal filter for noise removal
US8325796B2 (en) 2008-09-11 2012-12-04 Google Inc. System and method for video coding using adaptive segmentation
US20100208795A1 (en) * 2009-02-19 2010-08-19 Motorola, Inc. Reducing aliasing in spatial scalable video coding
US10104391B2 (en) 2010-10-01 2018-10-16 Dolby International Ab System for nested entropy encoding
US20120082228A1 (en) * 2010-10-01 2012-04-05 Yeping Su Nested entropy encoding
US9154799B2 (en) 2011-04-07 2015-10-06 Google Inc. Encoding and decoding motion via image segmentation
US8422540B1 (en) 2012-06-21 2013-04-16 CBF Networks, Inc. Intelligent backhaul radio with zero division duplexing
US8467363B2 (en) 2011-08-17 2013-06-18 CBF Networks, Inc. Intelligent backhaul radio and antenna system
JP5950541B2 (en) * 2011-11-07 2016-07-13 キヤノン株式会社 Motion vector encoding device, motion vector encoding method and program, motion vector decoding device, motion vector decoding method and program
US9262670B2 (en) 2012-02-10 2016-02-16 Google Inc. Adaptive region of interest
WO2013137588A1 (en) * 2012-03-12 2013-09-19 엘지전자 주식회사 Scalable video decoding/encoding method, and scalable video decoding/encoding device using same
CN103716629B (en) * 2012-09-29 2017-02-22 华为技术有限公司 Image processing method, device, coder and decoder
CN108540809A (en) * 2012-10-09 2018-09-14 英迪股份有限公司 For the decoding apparatus of multi-layer video, code device and inter-layer prediction method
CN103237213B (en) * 2013-04-08 2016-03-30 华为技术有限公司 Method for video coding and video encoding/decoding method and relevant apparatus
US9392272B1 (en) 2014-06-02 2016-07-12 Google Inc. Video coding using adaptive source variance based partitioning
US9578324B1 (en) 2014-06-27 2017-02-21 Google Inc. Video coding using statistical-based spatially differentiated partitioning
US10798396B2 (en) 2015-12-08 2020-10-06 Samsung Display Co., Ltd. System and method for temporal differencing with variable complexity
CN108347612B (en) * 2018-01-30 2020-09-15 东华大学 Monitoring video compression and reconstruction method based on visual attention mechanism
CN110446072B (en) * 2019-08-14 2021-11-23 咪咕视讯科技有限公司 Video stream switching method, electronic device and storage medium
CN114424552A (en) * 2019-09-29 2022-04-29 华为技术有限公司 Low-delay source-channel joint coding method and related equipment
JP2024147359A (en) * 2023-04-03 2024-10-16 キヤノン株式会社 Image processing device, system, image processing method, and device

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5418571A (en) * 1991-02-01 1995-05-23 British Telecommunicatons Public Limited Company Decoding of double layer video signals with interpolation replacement on missing data from enhancement layer
US6339618B1 (en) * 1997-01-08 2002-01-15 At&T Corp. Mesh node motion coding to enable object based functionalities within a motion compensated transform video coder
US6427027B1 (en) * 1996-09-09 2002-07-30 Sony Corporation Picture encoding and/or decoding apparatus and method for providing scalability of a video object whose position changes with time and a recording medium having the same recorded thereon
US20020159518A1 (en) * 1999-12-28 2002-10-31 Vincent Bottreau Snr scalable video encoding method and corresponding decoding method
US6510177B1 (en) * 2000-03-24 2003-01-21 Microsoft Corporation System and method for layered video coding enhancement
US20040022318A1 (en) * 2002-05-29 2004-02-05 Diego Garrido Video interpolation coding
US20040131121A1 (en) * 2003-01-08 2004-07-08 Adriana Dumitras Method and apparatus for improved coding mode selection
US20050002458A1 (en) * 2001-10-26 2005-01-06 Bruls Wilhelmus Hendrikus Alfonsus Spatial scalable compression
US6873655B2 (en) * 2001-01-09 2005-03-29 Thomson Licensing A.A. Codec system and method for spatially scalable video data
US6907070B2 (en) * 2000-12-15 2005-06-14 Microsoft Corporation Drifting reduction and macroblock-based control in progressive fine granularity scalable video coding
US6940905B2 (en) * 2000-09-22 2005-09-06 Koninklijke Philips Electronics N.V. Double-loop motion-compensation fine granular scalability
US20050195896A1 (en) * 2004-03-08 2005-09-08 National Chiao Tung University Architecture for stack robust fine granularity scalability
US20050226334A1 (en) * 2004-04-08 2005-10-13 Samsung Electronics Co., Ltd. Method and apparatus for implementing motion scalability
US7062096B2 (en) * 2002-07-29 2006-06-13 Matsushita Electric Industrial Co., Ltd. Apparatus and method for performing bitplane coding with reordering in a fine granularity scalability coding system
US7072394B2 (en) * 2002-08-27 2006-07-04 National Chiao Tung University Architecture and method for fine granularity scalable video coding
US20070147492A1 (en) * 2003-03-03 2007-06-28 Gwenaelle Marquant Scalable encoding and decoding of interlaced digital video data

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3788823B2 (en) * 1995-10-27 2006-06-21 株式会社東芝 Moving picture encoding apparatus and moving picture decoding apparatus
JP2000209580A (en) 1999-01-13 2000-07-28 Canon Inc Picture processor and its method
US20020118743A1 (en) * 2001-02-28 2002-08-29 Hong Jiang Method, apparatus and system for multiple-layer scalable video coding
US6996173B2 (en) * 2002-01-25 2006-02-07 Microsoft Corporation Seamless switching of scalable video bitstreams
AU2003286339A1 (en) 2002-12-17 2004-07-09 Koninklijke Philips Electronics N.V. Method of coding video streams for low-cost multiple description at gateways
US20060133475A1 (en) 2003-02-17 2006-06-22 Bruls Wilhelmus H A Video coding
US7860161B2 (en) * 2003-12-15 2010-12-28 Microsoft Corporation Enhancement layer transcoding of fine-granular scalable video bitstreams
US20050201468A1 (en) * 2004-03-11 2005-09-15 National Chiao Tung University Method and apparatus for interframe wavelet video coding
US20060012719A1 (en) * 2004-07-12 2006-01-19 Nokia Corporation System and method for motion prediction in scalable video coding
KR100664929B1 (en) * 2004-10-21 2007-01-04 삼성전자주식회사 Method and apparatus for effectively compressing motion vectors in video coder based on multi-layer

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5418571A (en) * 1991-02-01 1995-05-23 British Telecommunicatons Public Limited Company Decoding of double layer video signals with interpolation replacement on missing data from enhancement layer
US6427027B1 (en) * 1996-09-09 2002-07-30 Sony Corporation Picture encoding and/or decoding apparatus and method for providing scalability of a video object whose position changes with time and a recording medium having the same recorded thereon
US6339618B1 (en) * 1997-01-08 2002-01-15 At&T Corp. Mesh node motion coding to enable object based functionalities within a motion compensated transform video coder
US20020159518A1 (en) * 1999-12-28 2002-10-31 Vincent Bottreau Snr scalable video encoding method and corresponding decoding method
US6510177B1 (en) * 2000-03-24 2003-01-21 Microsoft Corporation System and method for layered video coding enhancement
US6940905B2 (en) * 2000-09-22 2005-09-06 Koninklijke Philips Electronics N.V. Double-loop motion-compensation fine granular scalability
US6907070B2 (en) * 2000-12-15 2005-06-14 Microsoft Corporation Drifting reduction and macroblock-based control in progressive fine granularity scalable video coding
US6873655B2 (en) * 2001-01-09 2005-03-29 Thomson Licensing A.A. Codec system and method for spatially scalable video data
US20050002458A1 (en) * 2001-10-26 2005-01-06 Bruls Wilhelmus Hendrikus Alfonsus Spatial scalable compression
US7146056B2 (en) * 2001-10-26 2006-12-05 Koninklijke Philips Electronics N.V. Efficient spatial scalable compression schemes
US20040022318A1 (en) * 2002-05-29 2004-02-05 Diego Garrido Video interpolation coding
US7062096B2 (en) * 2002-07-29 2006-06-13 Matsushita Electric Industrial Co., Ltd. Apparatus and method for performing bitplane coding with reordering in a fine granularity scalability coding system
US7072394B2 (en) * 2002-08-27 2006-07-04 National Chiao Tung University Architecture and method for fine granularity scalable video coding
US20040131121A1 (en) * 2003-01-08 2004-07-08 Adriana Dumitras Method and apparatus for improved coding mode selection
US20070147492A1 (en) * 2003-03-03 2007-06-28 Gwenaelle Marquant Scalable encoding and decoding of interlaced digital video data
US20050195896A1 (en) * 2004-03-08 2005-09-08 National Chiao Tung University Architecture for stack robust fine granularity scalability
US20050226334A1 (en) * 2004-04-08 2005-10-13 Samsung Electronics Co., Ltd. Method and apparatus for implementing motion scalability

Cited By (73)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8116578B2 (en) 2004-10-21 2012-02-14 Samsung Electronics Co., Ltd. Method and apparatus for effectively compressing motion vectors in video coder based on multi-layer
US7889793B2 (en) * 2004-10-21 2011-02-15 Samsung Electronics Co., Ltd. Method and apparatus for effectively compressing motion vectors in video coder based on multi-layer
US8520962B2 (en) 2004-10-21 2013-08-27 Samsung Electronics Co., Ltd. Method and apparatus for effectively compressing motion vectors in video coder based on multi-layer
US20090220004A1 (en) * 2006-01-11 2009-09-03 Mitsubishi Electric Corporation Error Concealment for Scalable Video Coding
US20080095238A1 (en) * 2006-10-18 2008-04-24 Apple Inc. Scalable video coding with filtering of lower layers
US8964843B2 (en) * 2007-07-19 2015-02-24 Olympus Corporation Image processing method, image processing apparatus and computer readable storage medium
US20100183074A1 (en) * 2007-07-19 2010-07-22 Olympus Corporation Image processing method, image processing apparatus and computer readable storage medium
US20100303154A1 (en) * 2007-08-31 2010-12-02 Canon Kabushiki Kaisha method and device for video sequence decoding with error concealment
US8498483B2 (en) 2007-09-10 2013-07-30 Fujifilm Corporation Image processing apparatus, image processing method, and computer readable medium
US20100158398A1 (en) * 2007-09-10 2010-06-24 Fujifilm Corporation Image processing apparatus, image processing method, and computer readable medium
US20090175333A1 (en) * 2008-01-09 2009-07-09 Motorola Inc Method and apparatus for highly scalable intraframe video coding
US8126054B2 (en) * 2008-01-09 2012-02-28 Motorola Mobility, Inc. Method and apparatus for highly scalable intraframe video coding
TWI449032B (en) * 2008-07-16 2014-08-11 Thomson Licensing Method and apparatus for synchronizing highly compressed enhancement layer data
US8995348B2 (en) 2008-07-16 2015-03-31 Thomson Licensing Method and apparatus for synchronizing highly compressed enhancement layer data
US8462702B2 (en) * 2008-07-16 2013-06-11 Thomson Licensing Method and apparatus for synchronizing highly compressed enhancement layer data
US20110103445A1 (en) * 2008-07-16 2011-05-05 Peter Jax Method and apparatus for synchronizing highly compressed enhancement layer data
US10798412B2 (en) 2009-07-04 2020-10-06 Dolby Laboratories Licensing Corporation Encoding and decoding architectures for format compatible 3D video delivery
US9774882B2 (en) 2009-07-04 2017-09-26 Dolby Laboratories Licensing Corporation Encoding and decoding architectures for format compatible 3D video delivery
US10038916B2 (en) 2009-07-04 2018-07-31 Dolby Laboratories Licensing Corporation Encoding and decoding architectures for format compatible 3D video delivery
US8428364B2 (en) 2010-01-15 2013-04-23 Dolby Laboratories Licensing Corporation Edge enhancement for temporal scaling with metadata
US9210398B2 (en) * 2010-05-20 2015-12-08 Samsung Electronics Co., Ltd. Method and apparatus for temporally interpolating three-dimensional depth image
US20110286661A1 (en) * 2010-05-20 2011-11-24 Samsung Electronics Co., Ltd. Method and apparatus for temporally interpolating three-dimensional depth image
US20130222539A1 (en) * 2010-10-08 2013-08-29 Dolby Laboratories Licensing Corporation Scalable frame compatible multiview encoding and decoding methods
US20120189060A1 (en) * 2011-01-20 2012-07-26 Industry-Academic Cooperation Foundation, Yonsei University Apparatus and method for encoding and decoding motion information and disparity information
CN103597836A (en) * 2011-06-14 2014-02-19 索尼公司 Image processing device and method
US20140072055A1 (en) * 2011-06-14 2014-03-13 Sony Corporation Image processing apparatus and image processing method
US10623761B2 (en) 2011-07-14 2020-04-14 Sony Corporation Image processing apparatus and image processing method
JP2013021629A (en) * 2011-07-14 2013-01-31 Sony Corp Image processing apparatus and image processing method
US9749625B2 (en) 2011-07-14 2017-08-29 Sony Corporation Image processing apparatus and image processing method utilizing a correlation of motion between layers for encoding an image
US9300980B2 (en) 2011-11-10 2016-03-29 Luca Rossato Upsampling and downsampling of motion maps and other auxiliary maps in a tiered signal quality hierarchy
KR20140092898A (en) * 2011-11-10 2014-07-24 루카 로사토 Upsampling and downsampling of motion maps and other auxiliary maps in a tiered signal quality hierarchy
US9967568B2 (en) * 2011-11-10 2018-05-08 V-Nova International Limited Upsampling and downsampling of motion maps and other auxiliary maps in a tiered signal quality hierarchy
WO2013068825A1 (en) * 2011-11-10 2013-05-16 Luca Rossato Upsampling and downsampling of motion maps and other auxiliary maps in a tiered signal quality hierarchy
US9628817B2 (en) * 2011-11-10 2017-04-18 V-Nova International Limited Upsampling and downsampling of motion maps and other auxiliary maps in a tiered signal quality hierarchy
KR102263625B1 (en) 2011-11-10 2021-06-10 루카 로사토 Upsampling and downsampling of motion maps and other auxiliary maps in a tiered signal quality hierarchy
US8571309B2 (en) * 2011-11-15 2013-10-29 At&T Intellectual Property I, L.P. System and method of image upsampling
US20130121568A1 (en) * 2011-11-15 2013-05-16 At&T Intellectual Property I, L.P. System and Method of Image Upsampling
US20130188718A1 (en) * 2012-01-20 2013-07-25 Qualcomm Incorporated Motion prediction in svc without including a temporally neighboring block motion vector in a candidate list
US20150103899A1 (en) * 2012-04-27 2015-04-16 Canon Kabushiki Kaisha Scalable encoding and decoding
US9686558B2 (en) * 2012-04-27 2017-06-20 Canon Kabushiki Kaisha Scalable encoding and decoding
US11595653B2 (en) 2012-05-14 2023-02-28 V-Nova International Limited Processing of motion information in multidimensional signals through motion zones and auxiliary information through auxiliary zones
US20130322537A1 (en) * 2012-05-14 2013-12-05 Luca Rossato Estimation, encoding and decoding of motion information in multidimensional signals through motion zones, and auxiliary information through auxiliary zones
US9706206B2 (en) * 2012-05-14 2017-07-11 V-Nova International Limited Estimation, encoding and decoding of motion information in multidimensional signals through motion zones, and auxiliary information through auxiliary zones
US10750178B2 (en) * 2012-05-14 2020-08-18 V-Nova International Limited Processing of motion information in multidimensional signals through motion zones and auxiliary information through auxiliary zones
US20170310968A1 (en) * 2012-05-14 2017-10-26 Luca Rossato Processing of motion information in multidimensional signals through motion zones and auxiliary information through auxiliary zones
US11343519B2 (en) 2012-08-29 2022-05-24 Vid Scale. Inc. Method and apparatus of motion vector prediction for scalable video coding
US10939130B2 (en) 2012-08-29 2021-03-02 Vid Scale, Inc. Method and apparatus of motion vector prediction for scalable video coding
US20140064374A1 (en) * 2012-08-29 2014-03-06 Vid Scale, Inc. Method and apparatus of motion vector prediction for scalable video coding
US9900593B2 (en) * 2012-08-29 2018-02-20 Vid Scale, Inc. Method and apparatus of motion vector prediction for scalable video coding
GB2505728A (en) * 2012-08-30 2014-03-12 Canon Kk Inter-layer Temporal Prediction in Scalable Video Coding
GB2505728B (en) * 2012-08-30 2015-10-21 Canon Kk Method and device for improving prediction information for encoding or decoding at least part of an image
US20140247878A1 (en) * 2012-09-21 2014-09-04 Lidong Xu Cross-layer motion vector prediction
EP2898671A4 (en) * 2012-09-21 2016-03-09 Intel Corp Cross-layer motion vector prediction
US10154177B2 (en) * 2012-10-04 2018-12-11 Cognex Corporation Symbology reader with multi-core processor
US11606483B2 (en) 2012-10-04 2023-03-14 Cognex Corporation Symbology reader with multi-core processor
US20140098220A1 (en) * 2012-10-04 2014-04-10 Cognex Corporation Symbology reader with multi-core processor
US11095910B2 (en) * 2012-11-29 2021-08-17 Advanced Micro Devices, Inc. Bandwidth saving architecture for scalable video coding
US10085017B2 (en) * 2012-11-29 2018-09-25 Advanced Micro Devices, Inc. Bandwidth saving architecture for scalable video coding spatial mode
US20190028725A1 (en) * 2012-11-29 2019-01-24 Advanced Micro Devices, Inc. Bandwidth saving architecture for scalable video coding spatial mode
US11863769B2 (en) * 2012-11-29 2024-01-02 Advanced Micro Devices, Inc. Bandwidth saving architecture for scalable video coding
US20200112731A1 (en) * 2012-11-29 2020-04-09 Advanced Micro Devices, Inc. Bandwidth saving architecture for scalable video coding
US20140146883A1 (en) * 2012-11-29 2014-05-29 Ati Technologies Ulc Bandwidth saving architecture for scalable video coding spatial mode
US10659796B2 (en) * 2012-11-29 2020-05-19 Advanced Micro Devices, Inc. Bandwidth saving architecture for scalable video coding spatial mode
US20210377552A1 (en) * 2012-11-29 2021-12-02 Advanced Micro Devices, Inc. Bandwidth saving architecture for scalable video coding
US9357211B2 (en) * 2012-12-28 2016-05-31 Qualcomm Incorporated Device and method for scalable and multiview/3D coding of video information
US20140185680A1 (en) * 2012-12-28 2014-07-03 Qualcomm Incorporated Device and method for scalable and multiview/3d coding of video information
US20170133027A1 (en) * 2014-06-27 2017-05-11 Orange Resampling of an Audio Signal by Interpolation for Low-Delay Encoding/Decoding
US10510357B2 (en) * 2014-06-27 2019-12-17 Orange Resampling of an audio signal by interpolation for low-delay encoding/decoding
US10832379B1 (en) * 2015-04-01 2020-11-10 Pixelworks, Inc. Temporal stability for single frame super resolution
JP2016042717A (en) * 2015-10-29 2016-03-31 ソニー株式会社 Image processor and image processing method
JP2017060184A (en) * 2016-11-22 2017-03-23 ソニー株式会社 Image processing system and image processing method
US20220295071A1 (en) * 2019-12-02 2022-09-15 Huawei Technologies Co., Ltd. Video encoding method, video decoding method, and corresponding apparatus
US20220279204A1 (en) * 2021-02-26 2022-09-01 Qualcomm Incorporated Efficient video encoder architecture

Also Published As

Publication number Publication date
US20120189061A1 (en) 2012-07-26
US20110110432A1 (en) 2011-05-12
US7889793B2 (en) 2011-02-15
KR100664929B1 (en) 2007-01-04
KR20060035542A (en) 2006-04-26
US8116578B2 (en) 2012-02-14
CN1764280A (en) 2006-04-26
CN1764280B (en) 2010-09-08
US8520962B2 (en) 2013-08-27

Similar Documents

Publication Publication Date Title
US7889793B2 (en) Method and apparatus for effectively compressing motion vectors in video coder based on multi-layer
KR100621581B1 (en) Method for pre-decoding, decoding bit-stream including base-layer, and apparatus thereof
JP5014989B2 (en) Frame compression method, video coding method, frame restoration method, video decoding method, video encoder, video decoder, and recording medium using base layer
KR100631777B1 (en) Method and apparatus for effectively compressing motion vectors in multi-layer
EP1659797A2 (en) Method and apparatus for compressing motion vectors in video coder based on multi-layer
KR100703740B1 (en) Method and apparatus for effectively encoding multi-layered motion vectors
KR101033548B1 (en) Video encoding method, video decoding method, video encoder, and video decoder, which use smoothing prediction
KR100763181B1 (en) Method and apparatus for improving coding rate by coding prediction information from base layer and enhancement layer
KR100703724B1 (en) Apparatus and method for adjusting bit-rate of scalable bit-stream coded on multi-layer base
US7944975B2 (en) Inter-frame prediction method in video coding, video encoder, video decoding method, and video decoder
KR100703734B1 (en) Method and apparatus for encoding/decoding multi-layer video using DCT upsampling
KR100919885B1 (en) Multi-view video scalable coding and decoding
KR100704626B1 (en) Method and apparatus for compressing multi-layered motion vectors
CA2543947A1 (en) Method and apparatus for adaptively selecting context model for entropy coding
KR100621584B1 (en) Video decoding method using smoothing filter, and video decoder thereof
KR100703751B1 (en) Method and apparatus for encoding and decoding referencing virtual area image

Legal Events

Date Code Title Description
AS Assignment

Owner name: SAMSUNG ELECTRONICS CO., LTD., KOREA, REPUBLIC OF

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HAN, WOO-JIN;LEE, KYO-HYUK;LEE, JAE-YOUNG;AND OTHERS;SIGNING DATES FROM 20051010 TO 20051014;REEL/FRAME:017123/0047

Owner name: SAMSUNG ELECTRONICS CO., LTD., KOREA, REPUBLIC OF

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HAN, WOO-JIN;LEE, KYO-HYUK;LEE, JAE-YOUNG;AND OTHERS;REEL/FRAME:017123/0047;SIGNING DATES FROM 20051010 TO 20051014

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 4

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20190215