WO2006043754A1 - Video coding method and apparatus supporting temporal scalability - Google Patents
Video coding method and apparatus supporting temporal scalability Download PDFInfo
- Publication number
- WO2006043754A1 WO2006043754A1 PCT/KR2005/003031 KR2005003031W WO2006043754A1 WO 2006043754 A1 WO2006043754 A1 WO 2006043754A1 KR 2005003031 W KR2005003031 W KR 2005003031W WO 2006043754 A1 WO2006043754 A1 WO 2006043754A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- frame
- frames
- pass
- mctf
- temporal
- Prior art date
Links
- 230000002123 temporal effect Effects 0.000 title claims abstract description 157
- 238000000034 method Methods 0.000 title claims abstract description 68
- 238000001914 filtration Methods 0.000 claims abstract description 20
- 239000013598 vector Substances 0.000 claims description 29
- 238000013139 quantization Methods 0.000 claims description 17
- 239000000284 extract Substances 0.000 claims 1
- 239000000872 buffer Substances 0.000 description 16
- 238000007906 compression Methods 0.000 description 14
- 230000006835 compression Effects 0.000 description 14
- 238000010586 diagram Methods 0.000 description 11
- 238000004891 communication Methods 0.000 description 10
- 230000005540 biological transmission Effects 0.000 description 7
- 238000010276 construction Methods 0.000 description 4
- 238000013144 data compression Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- 238000000354 decomposition reaction Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- VBRBNWWNRIMAII-WYMLVPIESA-N 3-[(e)-5-(4-ethylphenoxy)-3-methylpent-3-enyl]-2,2-dimethyloxirane Chemical compound C1=CC(CC)=CC=C1OC\C=C(/C)CCC1C(C)(C)O1 VBRBNWWNRIMAII-WYMLVPIESA-N 0.000 description 1
- 235000014121 butter Nutrition 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000004438 eyesight Effects 0.000 description 1
- 230000003340 mental effect Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 239000002023 wood Substances 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
- H04N19/615—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding using motion compensated temporal filtering [MCTF]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/105—Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/187—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scalable video layer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/553—Motion estimation dealing with occlusions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/63—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/13—Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
Definitions
- Apparatuses and methods consistent with the present invention relate to video coding, and more particularly, to improving video coding efficiency by combining Motion-Compensated Temporal Filtering (MCTF) with closed-loop coding.
- MCTF Motion-Compensated Temporal Filtering
- Multimedia data requires a large capacity storage medium and a wide bandwidth for transmission since the amount of multimedia data is usually large.
- a 24-bit true color image having a resolution of 640*480 needs a capacity of 640*480*24 bits, i.e., data of about 7.37 Mbits, per frame.
- a bandwidth of 221 Mbits/sec is required.
- a 90-minute movie based on such an image is stored, a storage space of about 1200 Gbits is required.
- a compression coding method is a requisite for transmitting multimedia data including text, video, and audio.
- a basic principle of data compression is removing data redundancy.
- Data can be compressed by removing spatial redundancy in which the same color or object is repeated in an image, temporal redundancy in which there is little change between adjacent frames in a moving image or the same sound is repeated in audio, or mental visual redundancy taking into account human eyesight and limited perception of high frequency signals.
- Data compression can be classified into lossy/lossless compression according to whether source data is lost, intraframe/interframe compression according to whether individual frames are compressed independently, and symmetric/ asymmetric compression according to whether time required for compression is the same as time required for recovery.
- Data compression is defined as real-time compression when a compression/recovery time delay does not exceed 50 ms and as scalable compression when frames have different resolutions.
- lossless compression is usually used.
- For multimedia data lossy compression is usually used.
- intraframe compression is usually used to remove spatial redundancy
- interframe compression is usually used to remove temporal redundancy.
- an ultrahigh-speed communication network can transmit data of several tens of megabits per second while a mobile communication network has a transmission rate of 384 kilobits per second.
- MPEG Motion Picture Experts Group
- MPEG-2 MPEG-2
- H.263, and H.264 temporal redundancy is removed by motion compensation based on motion estimation and compensation
- spatial redundancy is removed by transform coding.
- Scalability indicates the ability to partially decode a single compressed bitstream.
- Scalability includes spatial scalability indicating a video resolution, Signal to Noise Ratio (SNR) scalability indicating a video quality level, and temporal scalability indicating a frame rate.
- SNR Signal to Noise Ratio
- MCTF that was introduced by Ohm and improved by Choi and Wood is an essential technique for removing temporal redundancy and for video coding having flexible temporal scalability.
- MCTF coding is performed on a group of pictures (GOP) and a pair of a current frame and a reference frame are temporally filtered in a motion direction.
- GOP group of pictures
- FIG. 1 shows a conventional encoding process using 5/3 MCTF.
- a high-pass frame is shadowed in gray and a low-pass frame is indicated by white.
- a video sequence is subjected to a plurality of levels of temporal decompositions, thereby achieving temporal scalability.
- a video sequence is decomposed into low- pass and high-pass frames.
- Temporal prediction i.e., both forward and backward prediction is performed on three adjacent input frames to generate a high-pass frame.
- Two adjacent high-pass frames are used to perform temporal update on an input frame.
- temporal prediction and temporal update are performed again on the updated low-pass frames.
- one low-pass frame and one high-pass frame are obtained at the highest temporal level.
- MCTF involves a temporal update step following a temporal prediction step in order to reduce drifting error caused due to a mismatch between an encoder and a decoder.
- the update step allows a drifting error to be uniformly distributed across a group of pictures (GOP), thereby preventing the error from pe ⁇ riodically increasing or decreasing.
- GOP group of pictures
- One of proposed approaches to achieve low time delay in a MCTF structure is to omit forward prediction and update steps for frames at temporal levels higher than a specific temporal level. Disclosure of Invention
- FIG. 2 illustrates a conventional method of limiting time delay in MCTF.
- a maximum time delay is four, forward update and predictions are omitted for frames being updated at temporal level 2 and frames at higher temporal levels.
- 1 time delay refers to one frame interval.
- a minimum time delay required to generate a high-pass frame 15 is four because there is 1 time delay before an encoder receives an input frame 10.
- No forward update is performed for the update step at temporal level 2 because six time delays are introduced to perform forward update for a low-pass frame 20 although the maximum time delay is four.
- skipping forward prediction and update steps in the MCTF structure makes it difficult to uniformly distribute drifting error, thereby resulting in significant degradation of coding efficiency or visual quality.
- Illustrative, non-limiting embodiments of the present invention overcome the above disadvantages and other disadvantages not described above. Also, the present invention is not required to overcome the disadvantages described above, and an illustrative, non-limiting embodiment of the present invention may not overcome any of the problems described above.
- the present invention provides a method for solving a time delay problem in an
- the present invention also provides a method of combining advantages of both
- a video encoding method supporting temporal scalability including the steps of: performing Motion- Compensated Temporal Filtering (MCTF) on input frames up to a first temporal level; performing hierarchical closed-loop coding on frames up to a second temporal level higher than the first temporal level, the frames being generated by the MCTF; performing spatial transform on frames generated using the hierarchical closed-loop coding to create transform coefficients; and quantizing the transform coefficients.
- MCTF Motion- Compensated Temporal Filtering
- a video decoding method supporting temporal scalability including extracting texture data and motion data from an input bitstream, performing inverse quantization on the texture data to output transform coefficients, using the transform coefficients to generate frames in a spatial domain, using an intra-frame and an inter-frame among the frames in the spatial domain to reconstruct low-pass frames at a specific temporal level, and performing inverse MCTF on high-pass frames among the frames in the spatial domain and the reconstructed low-pass frames to reconstruct video frames.
- FIG. 1 illustrates a conventional encoding process using 5/3 MCTF
- FIG. 2 illustrates a conventional method for limiting time delay in MCTF
- FIG. 3 is a block diagram of a video encoder according to an exemplary embodiment of the present invention.
- FIG. 4 illustrates a method of referencing a frame in a MPEG coding scheme
- FIG. 5 is a block diagram showing the detailed construction of the video encoder of
- FIG. 3
- FIG. 6 is a diagram for explaining an unconnected pixel
- FIG. 7 illustrates an example of an encoding process including prediction and update steps for temporal levels 1 and 2 performed by a MCTF coding unit and those for higher temporal levels performed by a closed- loop coding unit;
- FIG. 8 illustrates another example of an encoding process in which a MCTF coding unit performs up to a prediction step for a specific temporal level
- FIG. 9 illustrates an example of an encoding process in which closed-loop coding is applied to a Successive Temporal Approximation and Referencing (STAR) algorithm;
- FIG. 10 shows an example of an encoding process using both forward and backward prediction for all temporal levels without considering time delay
- FIG. 11 shows an example of an encoding process using another group of pictures
- FIG. 12 is a block diagram of a video decoder according to an exemplary embodiment of the present invention.
- FIG. 13 is a block diagram showing the detailed construction of the video decoder of FIG. 12;
- FIG. 14 illustrates a decoding process including hierarchical closed-loop decoding and MCTF decoding performed in reverse order of the encoding process illustrated in FIG. 7;
- FIG. 15 is a block diagram of a system for performing encoding and decoding processes according to an exemplary embodiment of the present invention.
- An exemplary embodiment of the present invention proposes a method for improving Motion- Compensated Temporal Filtering (MCTF) by applying closed-loop coding for a specific temporal level and higher.
- MCTF Motion- Compensated Temporal Filtering
- a closed-loop coding method has better coding efficiency than an open-loop method when it does not include a forward update step.
- the proposed method involves determining which temporal level to apply hierarchical closed-loop coding to and replacing all frames at the determined temporal level with decoded frames than are then used as reference frames during prediction. The method reduces a mismatch in high-pass frame between encoder and decoder, thereby improving the overall coding efficiency.
- This concept can be implemented using a hybrid coding scheme combining MCTF and closed-loop coding.
- FIG. 3 is a block diagram of a video encoder 100 according to an exemplary embodiment of the present invention.
- the video encoder 100 includes an MCTF coding unit 110, a closed- loop coding unit 120, a spatial transformer 130, a quantizer 140, and an entropy coding unit 150.
- the MCTF coding unit 110 performs MCTF up to a temporal prediction step or temporal update step for a specific temporal level.
- the MCTF includes temporal prediction and temporal update steps for a plurality of temporal levels.
- the MCTF coding unit 110 can determine up to which temporal level MCTF is performed according to various conditions, in particular, maximum time delay. High-pass frames generated by the operation of the MCTF coding unit 110 are sent directly to the spatial transformer 130 while the remaining low-pass frames are sent to the closed-loop coding unit 120 for closed-loop coding.
- the closed-loop coding unit 120 performs hierarchical closed-loop coding on a low-pass frame for a specific temporal level received from the MCTF coding unit 110.
- closed- loop coding typically used in MPEG-based codecs or H.264 codecs, as shown in FIG. 4, temporal prediction is performed on a B or P frame using a decoded frame (I or P frame) as a reference frame instead of an original input frame.
- the closed-loop coding unit 120 uses a decoded frame for temporal prediction like in FIG. 4, it performs closed-loop coding on a frame having a hierarchical structure to achieve temporal scalability unlike in FIG. 4.
- the closed- loop coding uses only a previous frame as a reference (i.e., forward prediction).
- the closed-loop coding unit 120 performs temporal prediction on a low-pass frame received from the MCTF coding unit 110 to generate an inter- frame. Temporal prediction is iteratively performed on the remaining low-pass frames at temporal levels up to the highest temporal level to produce inter-frames. If the number of low-pass frames received from the MCTF coding unit 110 is N, the closed-loop coding unit 120 produces one intra- frame and N-I inter- frames. Alternatively, in a case where the highest temporal level is determined in a different way, closed-loop coding may be performed up to a temporal level for which two or more intra-frames are produced.
- 'Low-pass frame' and 'high-pass frame' respectively refer to frames generated by an update step and a temporal prediction step in MCTF.
- 'Intra-frame' and 'inter- frame' respectively denote a frame encoded without reference to any other frame and a frame encoded with reference to another frame among frames generated by closed-loop coding.
- closed-loop filtering uses input low-pass frames (updated with reference to another frame) to generate an intra- frame and an inter- frame
- a frame encoded without reference to any other frame during closed-loop filtering may also be called an intra-frame.
- the closed-loop coding uses a decoded version of a low-pass frame as a reference for temporal prediction. Because the closed- loop coding does not include the step of updating an intra-frame unlike the MCTF coding, an intra-frame does not change according to a temporal level.
- the spatial transformer 130 performs spatial transform on a high-pass frame generated by the MCTF coding unit 110 and an inter- frame and an intra-frame generated by the closed- loop coding unit 120 in order to create transform coefficients.
- Discrete Cosine Transform (DCT) or wavelet transform techniques may be used for spatial transform.
- a DCT coefficient is created when DCT is used for spatial transform while a wavelet coefficient is produced when wavelet transform is used.
- the quantizer 140 performs quantization on the transform coefficients obtained by the spatial transformer 130. Quantization is the process of converting real- valued DCT coefficients into discrete values by dividing the range of coefficients into a limited number of intervals and mapping the real- valued coefficients into quantization indices according to a predetermined quantization table.
- the entropy coding unit 150 losslessly encodes the coefficients quantized by the quantizer 140 and the motion data (motion vectors and block information) obtained for temporal prediction by the MCTF coding unit 110 and the closed-loop coding unit 120 into an output bitstream.
- Various coding schemes such as Huffman Coding, Arithmetic Coding, and Variable Length Coding may be employed for lossless coding.
- FIG. 5 is a block diagram showing the detailed construction of the video encoder
- the MCTF coding unit 110 includes a separator 111, a temporal predictor 112, a motion estimator 113, frame buffers 114 and 115, and an updater 116.
- the separator 111 separates input frames into frames at high-pass frame (H) positions and frames at low-pass frame (L) positions.
- a high-pass frame and a low-pass frame are located at an odd- numbered ((2i+l)-th) position and an even- numbered (2i-th) position where i is an index denoting a frame number and has an integer value greater than or equal to 0.
- the motion estimator 113 performs motion estimation on a current frame at an H position using adjacent frames as a reference to obtain motion vectors.
- the adjacent frames refer to at least one of two frames nearest to a frame at a certain temporal level.
- a block matching algorithm (BMA) has been widely used in motion estimation.
- BMA block matching algorithm
- pixels in a current block are compared with pixels of a search area in a reference frame and a displacement with a minimum error is determined as a motion vector.
- HVSBM hi ⁇ erarchical variable size block matching
- the temporal predictor 112 reconstructs a reference frame using the obtained motion vectors to generate a predicted frame and calculates a difference between the current frame and the predicted frame to generate a high-pass frame at the current frame position.
- the high-pass frame H may be defined by Equation (1) when I is a i 2i+l
- H 1 I 21+1 - PiI 21+1 ) ... (1)
- Equation (2) [50] P(I ) can be defined by Equation (2):
- MV 2i+l->2i and MV 2i+l->2i+2 res rpectively J denote a motion vector directing ⁇ from a 2i+l-th frame to a 2i-th frame and a motion vector directing from a 2i+l-th frame to a 2i+2-th frame and MC() denotes a motion-compensated frame obtained using the motion vector.
- the high-pass frames generated using the above-mentioned process are stored in the frame buffer 115 and provided to the spatial transformer 130.
- the updater 116 updates a current frame among frames located at low-pass frame
- Equation (3) the update is performed using two high- pass frames preceding and following the current frame.
- U(I 2i ) is a frame added to the current frame for update.
- Equation (4) may be modified into the following Equation (5):
- the low-pass frame stored in the frame buffer is again fed into the separator 111 to perform temporal prediction and temporal update steps for the next temporal level.
- a low-pass frame at the last temporal level processed by the MCTF coding unit 110 is fed into the closed- loop coding unit 120.
- FIG. 7 illustrates an example of an encoding process including prediction and update steps for temporal levels 1 and 2 performed by the MCTF coding unit (110 of FIG. 3) and those for higher temporal levels performed by the closed-loop coding unit 120.
- the MCTF coding unit 110 can perform MCTF up to temporal level 2.
- the closed- loop coding unit 120 performs closed-loop coding on the last four low-pass frames 30 through 33 at temporal level 2 updated by the MCTF coding unit 110.
- a predicted frame (inversely predicted frame) of a current frame is formed using the previous frame as a reference and the predicted frame is then subtracted from the current frame.
- the previous frame is not a low-pass frame input from the MCTF coding unit 110 but a decoded frame (indicated by a dotted line) obtained by quantizing and inversely quantizing the low-pass frame.
- the closed-loop coding uses a decoded version of a frame obtained by encoding an original frame used as a reference in encoding another frame.
- FIG. 7 shows that the MCTF coding unit 110 performs MCTF up to temporal level 2
- MCTF may be performed up to a temporal prediction step at a specific temporal level.
- FIG. 8 illustrates another example of an encoding process in which the MCTF coding unit 110 performs MCTF up to a prediction step for a specific temporal level.
- a maximum time delay is 4, an update step cannot be performed for temporal level 2.
- four updated low-pass frames at positions in a first temporal level corresponding to those at temporal level 2 are fed to the closed-loop coding unit 120 for hierarchical closed-loop coding.
- FIG. 9 illustrates an example of an encoding process in which closed-loop coding is applied to a Successive Temporal Approximation and Referencing (STAR) algorithm. More information about the STAR algorithm has been presented in a paper titled Successive Temporal Approximation and Referencing (STAR) for improving MCTF in Low End-to-end Delay Scalable Video Coding (ISO/IEC JTC 1/SC 29/WG 11, MPEG2003 / M10308, Hawaii, USA, Dec 2003). Unlike a technique used the for closed- loop coding shown in FIG. 7 or 8, the STAR algorithm is a hierarchical encoding method in which an encoding process is performed in the same way as a decoding process.
- STAR Successive Temporal Approximation and Referencing
- a decoder that receives some frames in a group of pictures (GOP) can reconstruct a video at a low frame rate.
- the closed-loop coding unit 120 may encode the low-pass frames received from the MCTF coding unit 110 using a STAR algorithm.
- the STAR algorithm differs from a conventional STAR algorithm (open-loop technique) in that a decoded image is used as a reference frame instead of an original image.
- the closed-loop coding unit 120 includes a motion estimator
- a motion compensator 122 a frame buffer 123, a subtractor 124, an adder 125, an inverse quantizer 126, and an inverse spatial transformer 127.
- the frame buffer 123 temporarily stores a low-pass frame L input from the MCTF coding unit 110 and a decoded frame D that will be used as a reference frame.
- the initial frame 30 shown in FIG. 7 is fed into the frame buffer 123 and passes through the adder 123 to the spatial transformer 130. Because there is a predicted frame being added to the initial frame 30 by the adder 125, the initial frame 30 is fed directly into the spatial transformer 130 without being added to the predicted frame. The initial frame 30 is then subjected to spatial transform, quantization, inverse quantization, and inverse spatial transform and stored in the frame buffer 123 for use as a reference in encoding subsequent frames.
- the subsequent frames are converted into high-pass frames that are then subjected to the same processes (spatial transform, quantization, inverse quantization, and inverse spatial transform), are added to predicted frames P, and stored in the frame butter 123 for use as a reference in encoding other frames.
- the motion estimator 121 performs motion estimation on the current frame using a decoded frame stored for use as a reference to obtain motion vectors.
- a BMA has been widely used in this motion estimation.
- the motion compensator 122 uses the motion vectors to reconstruct a reference frame and generates a predicted frame P.
- the subtractor 124 calculates a difference between the current frame L and the predicted frame P to generate an inter-frame for the current frame L, which is then sent to the spatial transformer 130.
- the intra- frame bypasses the subtractor 124 and is fed directly to the spatial transformer 130.
- the inverse quantizer 126 inversely quantizes the result obtained by the quantizer
- the inverse spatial transformer 127 performs inverse spatial transform on the transform coefficient to reconstruct a temporal residual frame.
- the adder 125 adds the temporal residual frame to the predicted frame P to obtain a decoded frame D.
- the initial frame 30 is intra-coded (encoded without reference to any other frame).
- a next frame 31 is then inter-coded (encoded with reference to another frame) using a decoded version of the intra-coded frame as a reference.
- a next frame 32 is inter-coded using the decoded version of the intra-coded frame as a reference.
- the last frame 33 is inter-coded using a decoded version of the frame obtained after inter-coding the frame 32 as a reference.
- Hierarchical closed-loop coding offers better coding efficiency than when MCTF or hierarchical closed-loop coding is separately used.
- MCTF or hierarchical closed- loop coding is individually applied, hierarchical closed-loop coding exhibits better coding efficiency than MCTF.
- MCTF has proved to be an efficient coding tool for temporal prediction at a low temporal level, i.e., filtering between adjacent frames, it suffers a significant decrease in coding efficiency for filtering at a high temporal level because a temporal interval between frames increases as the temporal level increases. Since frames with a larger temporal interval typically have lower temporal correlation, update performance is significantly degraded.
- FIG. 12 is a block diagram of a video decoder 200 according to an exemplary embodiment of the present invention.
- the video decoder 200 includes an entropy decoding unit 210, an inverse quantizer 220, an inverse spatial transformer 230, a closed-loop decoding unit 240, and a MCTF decoding unit 250.
- the entropy decoding unit 210 interprets an input bitstream and performs the inverse of entropy coding to obtain texture data and motion data.
- the motion data may contain motion vectors and additional information such as block information (block size, block mode, etc).
- the entropy decoding unit 210 may obtain in ⁇ formation about a temporal level contained in a bitstream.
- the temporal level in ⁇ formation contains information about up to which temporal level MCTF coding, more specifically, a temporal prediction step is applied. When the temporal level is pre ⁇ determined between the encoder 100 and decoder 200, the information may not be contained in the bitstream.
- the inverse quantizer 220 performs inverse quantization on the texture data to output transform coefficients.
- the inverse quantization is the process of reconstructing quantization coefficients from matched quantization indices created at the encoder 100.
- a matching table between the indices and quantization coefficients may be received from the encoder 100 or predetermined between the encoder and the decoder.
- the inverse spatial transformer 230 performs inverse spatial transform on the transform coefficients to generate frames in a spatial domain.
- the frame in the spatial domain is an inter-frame, it will be a reconstructed temporal residual frame.
- An inverse DCT or inverse wavelet transform may be used in inverse spatial transform according to the technique used at the encoder 100.
- the inverse spatial transformer 230 sends an intra-frame and an inter-frame to the closed-loop decoding unit 240 while providing a high-pass frame to the MCTF decoding unit 250.
- the closed-loop decoding unit 240 uses the intra-frame and the inter-frame received from the inverse spatial transformer 230 to reconstruct low-pass frames at the specific temporal level. The reconstructed low-pass frames are then sent to the MCTF decoding unit 250.
- the MCTF decoding unit 250 performs inverse MCTF on the low-pass frames received from the closed-loop decoding unit 240 and the high-pass frames received from the inverse spatial transformer 230 to reconstruct entire video frames.
- FIG. 13 is a block diagram showing the detailed construction of the video decoder of FIG. 12.
- the closed-loop decoding unit 240 includes an adder 241, a motion compensator 242, and a frame buffer 243. An intra-frame and an inter-frame at a temporal level higher than the specific temporal level are sequentially fed to the adder 241.
- the intra-frame is fed to the adder 241 and temporarily stored in the frame buffer 243. In this case, since no frame is received from the motion compensator 242, no data is added to the intra-frame.
- the intra-frame is one of the low-pass frames.
- an inter- frame at the highest temporal level is fed to the adder 241 and added to a frame motion-compensated using the stored intra-frame to reconstruct a low-pass frame at the specific temporal level.
- the reconstructed low-pass frame is again stored in the frame buffer 243.
- the motion-compensated frame is generated by the motion compensator 242 using the motion data (motion vectors, block information, etc) received from the entropy decoding unit 210.
- the low-pass frames stored in the frame buffer 243 are sent to the MCTF decoding unit 250.
- the MCTF decoding unit 250 includes a frame buffer 251, a motion compensator
- the frame buffer 251 temporarily stores the high-pass frames received from the inverse spatial transformer 230, the low-pass frames received from the closed-loop decoding unit 240, and frames subjected to inverse filtering by the inverse filtering unit 253.
- the motion compensator 252 provides a motion-compensated frame required for inverse filtering in the inverse filtering unit 253.
- the motion-compensated frame is obtained using the motion data received from the entropy decoding unit 210.
- the inverse filtering unit 253 performs inverse temporal update and temporal prediction steps at a certain temporal level to reconstruct low-pass frames at a lower temporal level.
- reconstructed low-pass frames I , and I are defined by Equation (6):
- I 21 L, ⁇ (MC(H ⁇ 1 , MV 2i ⁇ 2i _ x ) + MCiH 1 , MV 2i ⁇ 2l+ ⁇ )) (fi)
- ⁇ +I H 1 + ⁇ (MC(I 2l ,MV 2l+l _ >2l ) + MC(I 2l+2 ,AfV 2l+l _ >2l+2 ))
- Equation (6) In the case of a connected pixel and a multi-connected pixel, Equation (6) is satisfied. Of course, the decoder 200 reconstructs the low-pass frames I , and I
- inverse filtering is performed using a 5/3 filter
- the decoder 200 may perform inverse filtering using a Haar filter or 7/5 or 9/7 filter with a longer tap in place of the 5/3 filter like in the MCTF at the encoder 100.
- FIG. 14 illustrates a decoding process including hierarchical closed-loop decoding and MCTF decoding when an encoding process is performed as shown in FIG. 7.
- One intra-frame 40 and 15 inter-frames or high-pass frames are generated by the inverse spatial transformer 230.
- the intra-frame 40 and three inter- frames 41, 42, and 43 at a temporal level higher than a specific temporal level, i.e., temporal level 2, are sent to the closed- loop decoding unit 240.
- the remaining 12 high- pass frames are sent to the MCTF decoding unit 250.
- the closed- loop decoding unit 240 first reconstructs a low-pass frame 45 from the inter-frame 42 at temporal level 4 using the intra-frame 40 as a reference frame. Similarly, a low-pass frame 44 is reconstructed from the inter-frame 41 using the intra- frame 40 as a reference frame. Lastly, a low-pass frame 46 is reconstructed from the inter-frame 43 using the reconstructed low-pass frame 45 as a reference frame. As a result, all low-pass frames 40, 44, 45, and 46 at temporal level 2 are reconstructed.
- the MCTF decoding unit 250 uses the reconstructed low-pass frames
- the MCTF decoding unit 250 uses the reconstructed 8 low-pass frames and the 8 inter-frames (high-pass frames at first temporal level) to reconstruct 16 video frames.
- FIG. 15 is a block diagram of a system for performing an encoding or decoding process according to an exemplary embodiment of the present invention.
- the system may represent a television, a set-top box, a desktop, laptop or palmtop computer, a personal digital assistant (PDA), a video storage device such as a video cassette recorder (VCR), a digital video recorder (DVR), etc., as well as portions or com ⁇ binations of these and other devices.
- the system includes at least one video source 510, at least one input/output device 540, a processor 520, a memory 550, and a display 530.
- the video source 510 may represent, e.g., a television receiver, a VCR or other video/image storage device.
- the source 510 may alternatively represent one or more network connections for receiving video from a server or servers over, e.g., a global computer communications network such as the Internet, a wide area network, a metropolitan area network, a local area network, a terrestrial broadcast system, a cable network, a satellite network, a wireless network, or a telephone network, as well as portions or combinations of these and other types of networks.
- the input/output devices 520, the processor 540 and the memory 550 may communicate over a communication medium 560.
- the communication medium 560 may represent, e.g., a bus, a communication network, one or more internal connections of a circuit, circuit card or other device, as well as portions and combinations of these and other communication media.
- Input video data from the source 510 is processed in accordance with one or more software programs stored in the memory 550 and executed by the processor 540 in order to generate output video/images supplied to the display device 530.
- the codec may be stored in the memory 550, read from a storage medium such as CD-ROM or floppy disk, or downloaded from a server via various networks.
- the codec may be replaced with a hardware circuit or a combination of software and hardware circuits according to the software program.
- an MCTF structure is combined with hierarchical closed-loop coding, and it is possible to solve a time delay problem that may occur when temporal scalability is implemented.
- the present invention exploits advantages of both MCTF structure and hierarchical closed-loop coding, thereby improving the video compression efficiency.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
Claims
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US62032104P | 2004-10-21 | 2004-10-21 | |
US60/620,321 | 2004-10-21 | ||
KR1020040103076A KR100664930B1 (en) | 2004-10-21 | 2004-12-08 | Video coding method supporting temporal scalability and apparatus thereof |
KR10-2004-0103076 | 2004-12-08 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2006043754A1 true WO2006043754A1 (en) | 2006-04-27 |
Family
ID=36203155
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2005/003031 WO2006043754A1 (en) | 2004-10-21 | 2005-09-13 | Video coding method and apparatus supporting temporal scalability |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2006043754A1 (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20020026254A (en) * | 2000-06-14 | 2002-04-06 | 요트.게.아. 롤페즈 | Color video encoding and decoding method |
KR20020077884A (en) * | 2000-11-17 | 2002-10-14 | 코닌클리케 필립스 일렉트로닉스 엔.브이. | Video coding method using a block matching process |
KR20040069209A (en) * | 2001-12-28 | 2004-08-04 | 코닌클리케 필립스 일렉트로닉스 엔.브이. | Video encoding method |
EP1455534A1 (en) * | 2003-03-03 | 2004-09-08 | Thomson Licensing S.A. | Scalable encoding and decoding of interlaced digital video data |
-
2005
- 2005-09-13 WO PCT/KR2005/003031 patent/WO2006043754A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20020026254A (en) * | 2000-06-14 | 2002-04-06 | 요트.게.아. 롤페즈 | Color video encoding and decoding method |
KR20020077884A (en) * | 2000-11-17 | 2002-10-14 | 코닌클리케 필립스 일렉트로닉스 엔.브이. | Video coding method using a block matching process |
KR20040069209A (en) * | 2001-12-28 | 2004-08-04 | 코닌클리케 필립스 일렉트로닉스 엔.브이. | Video encoding method |
EP1455534A1 (en) * | 2003-03-03 | 2004-09-08 | Thomson Licensing S.A. | Scalable encoding and decoding of interlaced digital video data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR100714696B1 (en) | Method and apparatus for coding video using weighted prediction based on multi-layer | |
KR100703760B1 (en) | Video encoding/decoding method using motion prediction between temporal levels and apparatus thereof | |
US8817872B2 (en) | Method and apparatus for encoding/decoding multi-layer video using weighted prediction | |
US20050169379A1 (en) | Apparatus and method for scalable video coding providing scalability in encoder part | |
KR100763179B1 (en) | Method for compressing/Reconstructing motion vector of unsynchronized picture and apparatus thereof | |
KR20060135992A (en) | Method and apparatus for coding video using weighted prediction based on multi-layer | |
US20050169371A1 (en) | Video coding apparatus and method for inserting key frame adaptively | |
US20050157794A1 (en) | Scalable video encoding method and apparatus supporting closed-loop optimization | |
US20050158026A1 (en) | Method and apparatus for reproducing scalable video streams | |
US20030202597A1 (en) | Wavelet based coding using motion compensated filtering based on both single and multiple reference frames | |
US20060250520A1 (en) | Video coding method and apparatus for reducing mismatch between encoder and decoder | |
WO2006006764A1 (en) | Video decoding method using smoothing filter and video decoder therefor | |
US20070014356A1 (en) | Video coding method and apparatus for reducing mismatch between encoder and decoder | |
US20060159173A1 (en) | Video coding in an overcomplete wavelet domain | |
EP1709811A1 (en) | Device and method for playing back scalable video streams | |
WO2006118384A1 (en) | Method and apparatus for encoding/decoding multi-layer video using weighted prediction | |
US20060088100A1 (en) | Video coding method and apparatus supporting temporal scalability | |
EP1889487A1 (en) | Multilayer-based video encoding method, decoding method, video encoder, and video decoder using smoothing prediction | |
WO2007027012A1 (en) | Video coding method and apparatus for reducing mismatch between encoder and decoder | |
WO2006043754A1 (en) | Video coding method and apparatus supporting temporal scalability | |
WO2006098586A1 (en) | Video encoding/decoding method and apparatus using motion prediction between temporal levels | |
WO2006109989A1 (en) | Video coding method and apparatus for reducing mismatch between encoder and decoder |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 05808647 Country of ref document: EP Kind code of ref document: A1 |