EP1787473A1 - Procedes de decodage et codage video multicouches, codeur et decodeur video multicouches - Google Patents
Procedes de decodage et codage video multicouches, codeur et decodeur video multicouchesInfo
- Publication number
- EP1787473A1 EP1787473A1 EP05780534A EP05780534A EP1787473A1 EP 1787473 A1 EP1787473 A1 EP 1787473A1 EP 05780534 A EP05780534 A EP 05780534A EP 05780534 A EP05780534 A EP 05780534A EP 1787473 A1 EP1787473 A1 EP 1787473A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- frame
- video
- resolution
- encoded
- decoding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
- 238000000034 method Methods 0.000 title claims abstract description 79
- 230000002123 temporal effect Effects 0.000 claims description 33
- 238000001914 filtration Methods 0.000 claims description 11
- 239000010410 layer Substances 0.000 description 303
- 238000004891 communication Methods 0.000 description 6
- 238000007906 compression Methods 0.000 description 5
- 230000006835 compression Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000013139 quantization Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 2
- 239000011229 interlayer Substances 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000004438 eyesight Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003340 mental effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
- H04N19/615—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding using motion compensated temporal filtering [MCTF]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/12—Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/187—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scalable video layer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/63—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/63—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
- H04N19/64—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets characterised by ordering of coefficients or of bits for transmission
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/13—Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
Definitions
- Apparatuses and method consistent with the present invention relate to a multi ⁇ layer video coding algorithm, and more particularly, to a multi-layer video coding algorithm designed to encode a predetermined resolution layer using a plurality of coding algorithms.
- Multimedia data requires a large capacity storage medium and a wide bandwidth for transmission since the amount of multimedia data is usually large.
- a 24-bit true color image having a resolution of 640x480 needs a capacity of 640x480x24 bits, i.e., data of about 7.37 Mbits, per frame.
- a bandwidth of 221 Mbits/sec is required.
- a 90-minute movie based on such an image is stored, a storage space of about 1200 Gbits is required.
- a compression coding method is a requisite for transmitting multimedia data including text, video, and audio.
- a basic principle of multimedia data compression is removing data redundancy.
- video data can be compressed by removing spatial redundancy in which the same color or object is repeated in an image, temporal redundancy in which there is little change between adjacent frames in a moving image or the same sound is repeated in audio, or mental visual redundancy taking into account human eyesight and limited perception of high frequency.
- FIG. 1 shows an environment in which video compression is applied.
- Video data is compressed by a video encoder 110.
- Cosine Transform (DCT)-based video compression algorithms are MPEG-2, MPEG-4, H.263, and H.264.
- DCT Cosine Transform
- Compressed video data is sent to a video decoder 130 via a network 120.
- the video decoder 130 decodes the compressed video data to re ⁇ construct original video data.
- the video encoder 110 compresses the original video data not to exceed the available bandwidth of the network 120 in order for the video decoder 130 to decode the compressed data.
- communication bandwidth may vary depending on the type of the network 120.
- the available communication bandwidth of an Ethernet is different from that of a wireless local area network (WLAN).
- WLAN wireless local area network
- a cellular communication network may have a very narrow bandwidth.
- Scalable video coding is a video compression technique that allows video data to provide scalability. Scalability is the ability to generate video sequences at different resolutions, frame rates, and qualities from the same compressed bitstream.
- Temporal scalability can be provided using Motion Compensation Temporal filtering (MCTF), Unconstrained MCTF (UMCTF), or Successive Temporal Approximation and Referencing (STAR) algorithm. Spatial scalability can be achieved by a wavelet transform algorithm or multi-layer coding that has been actively studied in recent years.
- SNR Signal-to-Noise Ratio
- EZW Embedded ZeroTrees Wavelet
- SPIHT Set Partitioning in Hierarchical Trees
- EZBC Embedded ZeroBlock Coding
- EBCOT Embedded Block Coding with Optimized Truncation
- FIGS. 2 and 3 illustrate examples of multi-layer bitstream structures.
- a multi-layer video encoder encodes each layer using an
- MPEG-4 Advanced Video Coding (AVC) algorithm offering the highest coding efficiency currently available.
- the MPEG-4 AVC algorithm removes temporal re ⁇ dundancies between frames and uses DCT to transform the resulting frames for quantization.
- each layer has at least one different resolution, frame rate, and bit-rate.
- a base layer frame having the lowest resolution, lowest frame rate, and lowest bit-rate is encoded and then an enhancement layer is encoded using the encoded base layer frame.
- the AVC-based multi-layer video coding scheme uses an AVC-based technique for encoding each layer, providing high coding efficiency.
- intra prediction and deblocking techniques used in an AVC algorithm effectively remove most artifacts caused by block-based coding.
- each layer is optimized with respect to rate-distortion.
- the generated bitstream does not have a flexible scalability.
- FGS fine grain scalability
- FIG. 2 When video data is encoded into many layers, the multi-layer coding scheme shown in FIG. 2 performs AVC encoding on all layers.
- a layer having the highest resolution, highest frame rate, and highest quality is encoded using the encoded base layer by wavelet coding.
- bitstream shown in FlG. 2 is optimized for each layer with respect to rate-distortion but has weak scalability
- the bitstream shown in FlG. 3 has excellent scalability but low video quality since all layers excluding the lowest resolution AVC coded layer are reconstructed from one wavelet coded layer.
- the present invention provides multi-layer video encoding and decoding methods that can offer high coding efficiency and scalability, and multi-layer video encoders and decoders.
- a multi-layer video coding method including encoding a video frame having a predetermined resolution using a first video coding scheme, encoding the video frame with the same resolution as the predetermined resolution using a second video coding scheme with a reference to the frame encoded by the first video coding scheme, and generating a bitstream containing the frames encoded by the first and second video coding schemes.
- a multi ⁇ layer video coding method including generating a lower-resolution video frame by downsampling a video frame, encoding the lower-resolution video frame, encoding the video frame using the encoded lower-resolution video frame as a reference, and generating a bitstream containing the encoded lower-resolution video frame and the video frame, wherein the encoding the lower-resolution video frame comprises encoding the lower-resolution video frame using a first video coding scheme, and encoding the lower-resolution video frame using a second video coding scheme with reference to the lower-resolution frame encoded by the first video coding scheme.
- a multi ⁇ layer video coding method including (a) encoding a video frame having a pre ⁇ determined resolution using a first video coding scheme, (b) encoding the video frame with the same resolution as the predetermined resolution using a second video coding scheme with a reference to the frame encoded by the first video coding scheme, and (c) generating a bitstream containing encoded frames of all resolution layers, wherein the step (a) and the step (b) are performed recursively on all resolution layers in the order from a lower-resolution layer to a higher-resolution layer.
- a multi ⁇ layer video encoder including a downsampler downsampling a higher-resolution video frame to generate a lower-resolution video frame, a lower-resolution video encoding unit encoding the lower-resolution video frame, a higher-resolution video encoding unit encoding the higher-resolution video frame using the encoded lower-resolution video frame as a reference, and a bitstream generator generating a bitstream containing the encoded lower-resolution frame and the encoded higher-resolution video frame, wherein the lower-resolution video encoding unit encodes the lower-resolution video frame using a first video coding scheme and uses the lower-resolution frame encoded by the first video coding scheme to encode the lower-resolution video frame with a second video coding scheme, thereby generating the encoded lower-resolution frame.
- a multi ⁇ layer decoding method including extracting a frame encoded by a first video coding scheme and a frame encoded by a second video coding scheme from a bitstream, decoding the frame encoded by the first video coding scheme using a first video decoding scheme to reconstruct a first frame, and decoding the frame encoded by the second video coding scheme with a reference to the reconstructed first frame with the same resolution as the reconstructed first frame using a second video decoding scheme to reconstruct a second frame.
- a multi-layer decoding method including extracting a frame encoded by a first video coding scheme and a frame encoded by a second video coding scheme from a bitstream, decoding the frame encoded by the first video coding scheme using a first video decoding scheme to reconstruct a first frame, decoding the frame encoded by the second video coding scheme with the same resolution as the reconstructed first frame using a second video decoding scheme to reconstruct a second frame, and adding the reconstructed second frame to the reconstructed first frame to reconstruct a video frame.
- a multi ⁇ layer video decoding method including extracting an encoded lower-resolution layer frame and an encoded higher-resolution layer frame from a bitstream, decoding the encoded lower-resolution layer frame to reconstruct a lower-resolution layer frame, and decoding the encoded higher-resolution layer frame to reconstruct a higher- resolution layer frame with reference to the reconstructed lower-resolution layer frame, wherein the encoded lower-resolution layer frame includes a frame encoded by a first video coding scheme and a frame encoded by a second video coding scheme, and wherein the decoding the lower-resolution layer frame comprises decoding the frame encoded by the first video coding scheme using a first video decoding scheme to re ⁇ construct a first frame, and decoding the frame encoded by the second video coding scheme using a second video decoding scheme with reference to the reconstructed first frame to reconstruct a second frame.
- a multi ⁇ layer video decoding method including extracting an encoded lower-resolution layer frame and an encoded higher-resolution layer frame from a bitstream, decoding the encoded lower-resolution layer frame to reconstruct a lower-resolution layer frame, and decoding the encoded higher-resolution layer frame with reference to the re ⁇ constructed lower-resolution layer frame to reconstruct a higher-resolution layer frame, wherein the encoded lower-resolution layer frames includes a frame encoded by a first video coding scheme and a frame encoded by a second video coding scheme, and wherein the decoding the lower-resolution layer frame comprises decoding the frame encoded by the first video coding scheme using a first video decoding scheme to re ⁇ construct a first frame, decoding the frame encoded by the second video coding scheme using a second video decoding scheme to reconstruct a second frame, and adding the reconstructed second frame to the reconstructed first frame to reconstruct a lower-resolution layer video frame.
- a multi ⁇ layer video decoding method including extracting an encoded lower-resolution layer frame and an encoded higher-resolution layer frame from a bitstream, decoding the encoded lower-resolution layer frame to reconstruct a lower-resolution layer frame, and decoding the encoded higher-resolution layer frame with reference to the re ⁇ constructed lower-resolution layer frame to reconstruct a higher-resolution layer frame, wherein the encoded lower-resolution layer frame includes a frame encoded by a first video coding scheme and a frame encoded by a second video coding scheme, and wherein the decoding the lower-resolution layer frame comprises decoding the frame encoded by the first video coding scheme using a first video decoding scheme to re ⁇ construct a first frame, decoding the frame encoded by the second video coding scheme using a second video decoding scheme to reconstruct a second frame, and adding the reconstructed second frame to the reconstructed first frame to reconstruct a lower-resolution layer video frame.
- a multi ⁇ layer video decoding method including extracting an encoded lower-resolution layer frame and an encoded higher-resolution layer frame from a bitstream and decoding the encoded lower-resolution layer frame and the encoded higher-resolution layer frame to reconstruct a video frame, wherein an encoded frame of each resolution layer includes a frame encoded by a first video coding scheme and a frame encoded by a second video coding scheme, the method comprising decoding the frame encoded by the first video coding scheme for a predetermined resolution layer using a first video decoding scheme to reconstruct a first frame, and decoding the frame encoded by the second video coding scheme for the resolution layer using a second video decoding scheme with reference to the reconstructed first frame to reconstruct the second frame, and wherein the decoding the frame encoded by the first video coding scheme and the decoding the frame encoded by the second video coding scheme are performed re ⁇ cursively on all resolution layers in the order from a lower resolution layer to a higher
- a multi ⁇ layer video decoding method including extracting an encoded lower-resolution layer frame and an encoded higher-resolution layer frame from a bitstream and decoding the encoded lower-resolution layer frame and the encoded higher-resolution layer frame to reconstruct a video frame, wherein an encoded video frame of each resolution layer includes a frame encoded by a first video coding scheme and a frame encoded by a second video coding scheme, the method comprising decoding the frame encoded by a first video coding scheme for a predetermined resolution layer using a first video decoding scheme to reconstruct a first frame, decoding the frame encoded by the second video coding scheme for the resolution layer using a second video decoding scheme to reconstruct a second frame, and adding the reconstructed second frame to the reconstructed first frame to thereby reconstruct a video frame in the resolution layer, wherein the decoding the frame encoded by a first video coding scheme, the decoding the frame encoded by the second video coding scheme, and the
- a multi ⁇ layer video decoder including a bitstream interpreter interpreting a bitstream to extract an encoded lower-resolution layer frame and an encoded higher-resolution layer frame, a lower-resolution video decoding unit decoding the encoded lower-resolution layer frame, and a higher-resolution video decoding unit decoding the encoded higher- resolution layer frame using the reconstructed lower-resolution layer frame as a reference, wherein the lower-resolution video decoding unit decodes a frame encoded by a first video coding scheme using a first video decoding scheme to reconstruct a first frame and uses the first frame to decode a frame encoded by a second video coding scheme using a second video decoding scheme, thereby reconstructing the lower-resolution layer frame.
- FlG. 1 shows an environment in which video compression is applied
- FIGS. 2 and 3 show examples of multi-layer video bitstream structures
- FlG. 4 shows the structure of a multi-layer video bitstream according to an exemplary embodiment of the present invention
- FlG. 5 is a block diagram of a multi-layer video encoder according to an exemplary embodiment of the present invention.
- FlG. 6 is a flowchart illustrating a multi-layer video coding process according to an exemplary embodiment of the present invention
- FIGS. 7 and 8 illustrate detailed multi-layer video coding processes according to exemplary embodiments of the present invention
- FlG. 9 illustrates a process of allocating a bit-rate for each layer in a multi-layer video coding process according to an exemplary embodiment of the present invention
- FIGS. 10 and 11 show structures of multi-layer video bitstreams according to exemplary embodiments of the present invention.
- FlG. 12 is a block diagram of a multi-layer video decoder according to an exemplary embodiment of the present invention.
- FlG. 13 is a flowchart illustrating a multi-layer video decoding process according to an exemplary embodiment of the present invention.
- FlG. 4 shows the structure of a multi-layer video bitstream according to an exemplary embodiment of the present invention.
- a bitstream generated by multi-layer video coding has two layers for each resolution.
- One layer is encoded using Advanced Video Coding (AVC) while the other layer is encoded using wavelet coding.
- AVC coding or AVC layer refers to coding or layer adopting Discrete Cosine Transform (DCT) and quantization in an AVC algorithm.
- Wavelet coding or wavelet layer refers to coding or layer adopting wavelet transform and embedded quantization.
- AVC coding and wavelet coding schemes respectively employ an MCTF, UMCTF, or STAR algorithm providing temporal scalability.
- An AVC layer for each resolution ensures coding efficiency at the level of spatio- temporal-quality while a wavelet layer ensures fine grain scalability (FGS).
- a predecoder simply truncates a part of a wavelet layer bitstream to produce a bitstream having a quality between the quality of AVC layer and the quality of wavelet layer. The same truncation scenario will apply to multiple layers.
- the predecoder may produce a bitstream with QCIF resolution and 32 to 64 kbps quality from the bitstream shown in FlG. 4. To accomplish this, the predecoder truncates all CIF and SD resolution layers and all or part of each QCIF resolution wavelet layer.
- FIG. 5 An example of a video encoder generating a multi-layer bitstream according to an exemplary embodiment of the present invention is shown in FlG. 5. For convenience of explanation, it is assumed that the video encoder has coding units for two resolution layers.
- FlG. 5 is a block diagram of a multi-layer video encoder according to an exemplary embodiment of the present invention.
- the multi-layer video encoder includes a downsampler 550, an
- AVC coding unit 510 and a wavelet coding unit 520 encoding low resolution layer video frames
- a bitstream generator 560 generating a bitstream.
- the downsampler 550 downsamples a video frame to produce a low-resolution video frame.
- the multi-layer video encoder has two coding units, i.e., AVC coding unit and wavelet coding unit, for each resolution layer. That is, the multi-layer video encoder includes the AVC coding unit 510 and the wavelet coding unit 520 for encoding low- resolution layer video frames and the AVC coding unit 530 and the wavelet coding unit 540 for encoding high-resolution layer video frame.
- the bitstream generator 560 generates a bitstream containing encoded low- and high-resolution layer frames.
- the downsampler 550 downsamples a video frame 500 to produce a low- resolution video frame with half the resolution of the video frame.
- the low-resolution video frame is sent to the AVC coding unit 510 and the wavelet coding unit 520 for the low resolution layer while the video frame 500 is sent to the AVC coding unit 530 and the wavelet coding unit 540 for the high-resolution layer.
- the AVC coding unit 510 for the low-resolution layer includes a temporal filter
- the AVC-coded low-resolution layer frame is provided to perform wavelet coding for the low-resolution layer.
- the wavelet coding unit 520 for the low-resolution layer includes a temporal filter
- the wavelet-coded low-resolution layer frame is provided to perform AVC coding for the high-resolution layer.
- the AVC coding unit 530 for the high-resolution layer includes a temporal filter
- the AVC-coded high-resolution layer frame is provided to perform wavelet coding for the high- resolution layer.
- the wavelet coding unit 540 for the high-resolution layer includes a temporal filter
- the wavelet-coded high-resolution layer frame is provided to perform wavelet coding for the high-resolution layer.
- the bitstream generator 560 generates a bitstream containing the AVC-coded and wavelet-coded low-resolution layer frames and the AVC-coded and wavelet-coded high-resolution layer frames.
- the bitstream contains information about the coded frames, header information including a sequence header, a group-of-pictures (GOP) header, and a frame header, and other information such as motion vectors obtained during temporal filtering.
- header information including a sequence header, a group-of-pictures (GOP) header, and a frame header, and other information such as motion vectors obtained during temporal filtering.
- GOP group-of-pictures
- the bitstream is predecoded by a predecoder (not shown) and sent to a multi-layer video decoder.
- the predecoder may truncate a high-resolution layer of the bitstream to produce a bitstream containing only coded low-resolution layer frames for a device having a small display screen such as cellular phone or personal digital assistant (PDA).
- PDA personal digital assistant
- the predecoder may also truncate a part of the bitstream to produce a bitstream with a low bit-rate when a network condition is bad. Meanwhile, when the required frame rate is low, the predecoder truncates some frames of the bitstream to generate a bitstream with a low frame rate.
- FlG. 6 is a flowchart illustrating a multi-layer video encoding process.
- a video frame is input into a multi-layer video encoder and in operation S620, the multi- video encoder downsamples the input video frame into a lower resolution.
- the multi- video encoder uses an MPEG downsampler to downsample the input video frame because the MPEG downsampler can produce a smoothed, downsampled version of low-resolution image compared to a wavelet downsampler currently available.
- any other filter that can obtain a downsampled version of image may be used for downsampling.
- the multi-layer video encoder downsamples the input video frame by factors of two and four to generate half and quarter resolution frames.
- the multi-layer video encoder downsamples the input video frame by factors of two, four, and eight to generate half, quarter, and eighth resolution frames.
- the multi-layer video encoder performs AVC coding on the low- resolution video frame.
- the encoder performs wavelet coding on the low-resolution video frame using the AVC-coded low-resolution video frame. For example, after performing AVC coding to produce an AVC-coded video frame having a QCIF resolution, a 15 Hz frame rate, and a 32 kbps bit-rate, the encoder performs wavelet coding to generate a wavelet-coded frame with the same resolution and frame rate as the AVC-coded video frame and a 64 kbps bit-rate using the AVC-coded frame as a reference.
- the multi-layer video encoder After encoding the low-resolution frame, the multi-layer video encoder encodes a high-resolution video frame using the encoded low-resolution frame.
- the encoder performs AVC coding on a high- resolution video frame.
- the encoder performs wavelet coding on the high-resolution video frame using the AVC-coded high-resolution video frame. For example, after performing AVC coding to produce an AVC-coded video frame having a CIF resolution, a 30 Hz frame rate, and a 256 kbps bit-rate, the encoder performs wavelet coding to generate a wavelet-coded frame with a CIF resolution, a 30 Hz frame rate and a 750 kbps bit-rate using the AVC-coded and wavelet-coded QCIF resolution video frames and the AVC-coded CIF frame as references.
- the multi-layer video encoder uses coded video frames to generate a bitstream.
- FIGS. 7 and 8 illustrate examples of detailed multi-layer video coding processes according to exemplary embodiments of the present invention. While FIGS. 7 and 8 show that video coding is performed on two resolution layers, video coding may be performed on three or more resolution layers in the same way.
- a multi-layer video encoder downsamples a video frame 700 to generate a low- resolution video frame 710 and then performs AVC coding on the low-resolution video frame 710 to produce an AVC-coded low-resolution layer frame that will be contained in a bitstream.
- the multi-layer video encoder decodes the AVC-coded low-resolution layer frame to obtain a decoded frame 720 and compares the decoded frame 720 with the low-resolution video frame 710 to obtain a low-resolution residual frame 730.
- the encoder performs wavelet coding on the low-resolution residual frame 730 to generate a wavelet-coded low-resolution layer frame and then decodes the wavelet- coded low-resolution layer frame to obtain a decoded frame 740 that is then added to the decoded frame 720 to obtain a decoded low-resolution layer video frame 750.
- the encoder upsamples the decoded low-resolution layer video frame 750 into a higher resolution and compares the upsampled version of frame 760 with the video frame 700 to obtain a high-resolution layer frame 770.
- AVC coding is performed on the high-resolution layer frame 770 to generate an AVC-coded high-resolution layer frame that will be contained in the bitstream.
- the AVC-coded high-resolution layer frame is decoded to obtain a decoded frame 780 and the decoded frame 780 is compared with the high-resolution layer frame 770 to thereby obtain a high-resolution residual frame 790.
- Wavelet coding is then performed on the high-resolution residual frame 790 to obtain a wavelet-coded high-resolution layer frame that will be contained in the bitstream.
- the multi-layer video encoder finally generates the bitstream containing the AVC- coded and wavelet-coded low-resolution layer frames and the AVC-coded and wavelet-coded high-resolution layer frames.
- a multi-layer video encoder downsamples a high- resolution video frame to generate a low-resolution video frame and performs AVC coding on the low-resolution video frame to produce an AVC-coded low-resolution layer video frame, followed by wavelet coding on the low-resolution video frame using the AVC-coded low-resolution layer video frame.
- N- 1-th and N+l-th low-resolution video frames 811 and 813 are used to encode an N-th low-resolution video frame 812. While the low-resolution video frames 811 and 813 are used as references for open-loop video coding, frames reconstructed after decoding AVC-coded low-resolution video frames are used for closed-loop video coding.
- the multi-layer video encoder After completing AVC coding for the low-resolution layer, the multi-layer video encoder performs wavelet coding for the low-resolution layer.
- the multi-layer video encoder may encode an N-th low-resolution video frame 822 using N- 1-th and N+l-th low-resolution video frames 821 and 823 or frames reconstructed by decoding AVC- coded frames.
- the encoder After completing video encoding for the low-resolution layer, the encoder performs video coding on a high-resolution layer.
- AVC coding may be performed on an N-th high-resolution layer video frame 842 using N- 1-th and N+l-th high-resolution layer video frames 841 and 843 or a frame re ⁇ constructed by decoding the N-th low-resolution video frame 822.
- the reconstructed frame is upsampled to generate a video frame 832 before it can be used as a reference.
- the encoder performs wavelet coding on an N-th high-resolution layer video frame 852 using N-l-th and N+l-th high-resolution layer video frames 851 and 853 or frames reconstructed by decoding the N-th high-resolution layer video frame 842.
- the multi-layer video coding process shown in FlG. 7 involves inter-layer referencing after temporal filtering while the video coding process shown in FlG. 8 includes inter-layer referencing during temporal filtering.
- the coding process shown in FlG. 7 can provide better coding efficiency than the process shown in FlG. 8 since spatial relationship between frames is closer than temporal relationship therebetween.
- the latter can exhibit higher coding efficiency than the former because temporal relationship between frames is closer than spatial relationship therebetween.
- FlG. 9 illustrates a process of allocating a bit-rate for each layer in a multi-layer video coding process according to an exemplary embodiment of the present invention.
- a multi-layer video encoder supports three different resolutions layers, i.e., QCIF, CIF, and SD layers.
- Scalability requirements for video coding are that a QCIF layer 930 have 15 Hz frame rate and 96 to 192 kbps bit-rate, a CIF layer 920 have 7.5 to 30 Hz frame rate, 192 to 768 kbps bit-rate, and a SD layer 910 have 15 to 60 Hz frame rate and 768 to 3072 kbps bit-rate.
- the multi-layer video encoder performs AVC coding on a QCIF frame to produce an AVC-coded QCIF layer frame having 96 kbps bit-rate and 15 Hz frame rate. Then, the encoder performs wavelet coding on the QCIF frame using the AVC-coded frame to generate a wavelet- coded QCIF layer frame having 192 kbps bit-rate and 15 Hz frame rate.
- the encoder performs AVC coding on a CIF frame to generate an AVC-coded CIF layer frame having the maximum frame rate of 30 Hz available for the CIF layer 920.
- AVC-coded and wavelet-coded QCIF layer frames and a part of AVC-coded CIF layer frame are needed.
- the encoder then performs wavelet coding on the CIF frame to generate a wavelet- coded CIF layer frame having the maximum frame rate of 30 Hz allowable for the CIF layer 920.
- the AVC- coded and wavelet-coded QCIF layer frames, the AVC-coded CIF layer frame, and a part of the wavelet-coded CIF layer frame are needed.
- the encoder performs AVC coding on a SD frame to generate an AVC-coded SD layer frame having the maximum frame rate of 60 Hz available for the SD layer 910.
- AVC-coded and wavelet-coded QCIF layer frames To reconstruct a video frame having 768 kbps bit-rate and 15 Hz frame rate, the AVC- coded and wavelet-coded QCIF layer frames, the AVC-coded and wavelet-coded CIF layer frames, and a part of AVC-coded SD layer frame are needed.
- the encoder then performs wavelet coding on the SD frame to generate a wavelet- coded SD layer frame having the maximum frame rate of 60 Hz allowable for the SD layer 910.
- the AVC-coded and wavelet-coded QCIF layer frames, the AVC-coded and wavelet-coded CIF layer frames, the AVC-coded SD layer frame, and a part of the wavelet-coded SD layer frame are needed.
- FIGS. 10 and 11 show the structures of multi-layer video bitstreams according to further exemplary embodiments of the present invention.
- bitstream shown in FIG. 10 has a SD layer encoded using only wavelet coding since a video frame having a lower bit-rate of 1.5 Mbps is easy to reconstruct from a wavelet-coded bitstream having a high resolution and a sufficient bit-rate, e.g., 3.0 Mbps.
- FIG. 12 is a block diagram of a multi-layer video decoder according to an exemplary embodiment of the present invention. For convenience of explanation, it is assumed that the video decoder reconstructs video frames from a bitstream having two resolution layers.
- the multi-layer video decoder includes a bitstream interpreter 1250, an AVC decoding unit 1210 and a wavelet decoding unit 1220 decoding encoded low-resolution layer video frames, and an AVC decoding unit 1230 and a wavelet decoding unit 1240 decoding encoded high-resolution layer video frames.
- the bitstream interpreter 1250 extracts encoded high- and low-resolution layer frames from an input bitstream.
- the encoded low-resolution layer frames consist of an AVC-coded low-resolution layer frame and a wavelet-coded low-resolution layer frame while the encoded high-resolution layer frames consist of an AVC-coded high- resolution layer frame and a wavelet-coded high-resolution layer frame.
- the AVC decoding unit 1210 for the low-resolution layer includes an inverse quantizer 1211 inversely quantizing the AVC-coded low-resolution layer frame, an inverse DCT transformer 1212 performing inverse DCT on the inversely quantized frame, and an inverse temporal filter 1213 performing inverse temporal filtering on the frame subjected to the inverse DCT.
- the wavelet decoding unit 1220 for the low-resolution layer includes an inverse quantizer 1221 inversely quantizing the wavelet-coded low-resolution layer frame using a video frame reconstructed by the AVC decoding unit 1210, an inverse wavelet transformer 1222 performing inverse wavelet on the inversely quantized frame, and an inverse temporal filter 1223 performing inverse temporal filtering on the frame subjected to the inverse wavelet transform.
- the AVC decoding unit 1230 for the high-resolution layer includes an inverse quantizer 1231 inversely quantizing the AVC-coded high-resolution layer frame using a video frame reconstructed by the wavelet decoding unit 1220 for the low-resolution layer, an inverse DCT transformer 1232 performing inverse DCT on the inversely quantized frame, and an inverse temporal filter 1233 performing inverse temporal filtering on the inversely DCT-transformed frame.
- the wavelet decoding unit 1240 for the high-resolution layer includes an inverse quantizer 1241 inversely quantizing the wavelet-coded high-resolution layer frame using a video frame reconstructed by the AVC decoding unit 1230, an inverse wavelet transformer 1242 performing inverse wavelet on the inversely quantized frame, and an inverse temporal filter 1243 performing inverse temporal filtering on the inversely wavelet-transformed frame.
- the term 'unit' means, but is not limited to, a software or hardware component, such as a Field Programmable Gate Array (FPGA) or Application Specific Integrated Circuit (ASIC), which performs certain tasks.
- a unit may advantageously be configured to reside on the addressable storage medium and configured to execute on one or more processors.
- a unit may include, by way of example, components, such as software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables.
- the functionality provided for in the components and units may be combined into fewer components and units or further separated into additional components and units.
- the components and units may be implemented such that they are executed on one or more computers in a com ⁇ munication system.
- FlG. 13 is a flowchart illustrating a multi-layer video decoding process according to an exemplary embodiment of the present invention.
- the multi-layer video decoder interprets the bitstream and extracts coded high- and low-resolution frames from the bitstream.
- AVC decoding is performed on an AVC-coded low-resolution layer frame, among the coded frames, to decode a low-resolution AVC layer.
- a video frame reconstructed by decoding the low- resolution AVC layer is used to decode a low-resolution wavelet layer.
- the encoder uses the video frame reconstructed by decoding the low-resolution AVC layer to decode a low-resolution wavelet layer. That is, wavelet decoding is performed on a wavelet-coded low-resolution layer frame among the coded frames using the video frame reconstructed by decoding the low-resolution AVC layer in order to decode the low-resolution wavelet layer.
- a video frame re ⁇ constructed by decoding the low-resolution wavelet layer is provided to decode a high- resolution AVC layer.
- the encoder uses the video frame reconstructed by decoding the low-resolution wavelet layer to decode a high-resolution AVC layer. That is, AVC decoding is performed on an AVC-coded high-resolution layer frame among the coded frames using the video frame reconstructed by decoding the low-resolution wavelet layer in order to decode the high-resolution AVC layer. A video frame reconstructed by decoding the high-resolution AVC layer is provided to decode a high-resolution wavelet layer.
- the encoder uses the video frame reconstructed by decoding the high-resolution AVC layer to decode the high-resolution wavelet layer. That is, wavelet decoding is performed on a wavelet-coded high-resolution layer frame among the coded frames using the video frame reconstructed by decoding the high-resolution AVC layer in order to decode the high-resolution wavelet layer.
- the multi-layer video decoder uses the reconstructed video frames to generate a video signal that is then displayed through a display device.
- the coding and decoding methods according to exemplary em ⁇ bodiments of the present invention allow a predetermined resolution layer to be encoded/decoded using a plurality of different video coding schemes, thereby providing excellent scalability and coding efficiency.
- the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the following claims.
- the resolution layer may consist of two layers using other coding algorithms.
- the resolution layer may be encoded using three or more video coding schemes.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Discrete Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US60734304P | 2004-09-07 | 2004-09-07 | |
KR1020040090991A KR100679018B1 (ko) | 2004-09-07 | 2004-11-09 | 다계층 비디오 코딩 및 디코딩 방법, 비디오 인코더 및디코더 |
PCT/KR2005/002654 WO2006028330A1 (fr) | 2004-09-07 | 2005-08-13 | Procedes de decodage et codage video multicouches, codeur et decodeur video multicouches |
Publications (2)
Publication Number | Publication Date |
---|---|
EP1787473A1 true EP1787473A1 (fr) | 2007-05-23 |
EP1787473A4 EP1787473A4 (fr) | 2009-01-21 |
Family
ID=36036591
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP05780534A Ceased EP1787473A4 (fr) | 2004-09-07 | 2005-08-13 | Procedes de decodage et codage video multicouches, codeur et decodeur video multicouches |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP1787473A4 (fr) |
JP (1) | JP4660550B2 (fr) |
WO (1) | WO2006028330A1 (fr) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1501311A1 (fr) * | 2002-04-26 | 2005-01-26 | NEC Corporation | Systeme de transfert d'image animee, appareils de codage et de decodage d'image animee, et programme de transfert d'image animee |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2126467A1 (fr) * | 1993-07-13 | 1995-01-14 | Barin Geoffry Haskell | Codage et decodage variables pour systeme video haute definition progressif |
JP3787823B2 (ja) * | 1997-07-31 | 2006-06-21 | ソニー株式会社 | 画像処理装置および画像処理方法 |
JP3384299B2 (ja) * | 1997-10-15 | 2003-03-10 | 富士ゼロックス株式会社 | 画像処理装置および画像処理方法 |
KR100269206B1 (ko) * | 1998-02-21 | 2000-10-16 | 윤종용 | 임의 해상도 다계층 이진형상 부호화기 및 그 방법 |
US6292512B1 (en) * | 1998-07-06 | 2001-09-18 | U.S. Philips Corporation | Scalable video coding system |
-
2005
- 2005-08-13 WO PCT/KR2005/002654 patent/WO2006028330A1/fr active Application Filing
- 2005-08-13 EP EP05780534A patent/EP1787473A4/fr not_active Ceased
- 2005-08-13 JP JP2007529675A patent/JP4660550B2/ja not_active Expired - Fee Related
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1501311A1 (fr) * | 2002-04-26 | 2005-01-26 | NEC Corporation | Systeme de transfert d'image animee, appareils de codage et de decodage d'image animee, et programme de transfert d'image animee |
Non-Patent Citations (6)
Title |
---|
ANDREOPOULOS Y ET AL: "Spatio-temporal-snr scalable wavelet coding with motion compensated dct base-laver architectures" PROCEEDINGS 2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING. ICIP-2003. BARCELONA, SPAIN, SEPT. 14 - 17, 2003; [INTERNATIONAL CONFERENCE ON IMAGE PROCESSING], NEW YORK, NY : IEEE, US, vol. 2, 14 September 2003 (2003-09-14), pages 795-798, XP010669923 ISBN: 978-0-7803-7750-9 * |
RADHA H ET AL: "Scalable Internet video using MPEG-4" SIGNAL PROCESSING. IMAGE COMMUNICATION, ELSEVIER SCIENCE PUBLISHERS, AMSTERDAM, NL, vol. 15, no. 1-2, 1 September 1999 (1999-09-01), pages 95-126, XP004180640 ISSN: 0923-5965 * |
SCHAAR VAN DER M. ET AL: 'EMBEDDED DCT AND WAVELET METHODS FOR FINE GRANULAR SCALABLE VIDEO: ANALYSIS AND COMPARISON' PROCEEDINGS OF THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING (SPIE), SPIE, USA LNKD- DOI:10.1117/12.382999 vol. 3974, 01 January 2000, pages 643 - 653, XP000981435 ISSN: 0277-786X * |
See also references of WO2006028330A1 * |
ULRICH BENZLER: "Spatial Scalable Video Coding Using a Combined Subband-DCT Approach" IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 10, no. 7, 1 October 2000 (2000-10-01), XP011014107 ISSN: 1051-8215 * |
YIANNIS ANDREOPOULOS ET AL: "Response to Call for Proposals on Scalable Video Coding Technology (proposal 21 - tool)" VIDEO STANDARDS AND DRAFTS, XX, XX, no. M10589, 8 March 2004 (2004-03-08), XP030039420 * |
Also Published As
Publication number | Publication date |
---|---|
JP2008512035A (ja) | 2008-04-17 |
EP1787473A4 (fr) | 2009-01-21 |
WO2006028330A1 (fr) | 2006-03-16 |
JP4660550B2 (ja) | 2011-03-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7933456B2 (en) | Multi-layer video coding and decoding methods and multi-layer video encoder and decoder | |
US8331434B2 (en) | Method and apparatus for video coding, predecoding, and video decoding for video streaming service, and image filtering method | |
US8031776B2 (en) | Method and apparatus for predecoding and decoding bitstream including base layer | |
KR100596705B1 (ko) | 비디오 스트리밍 서비스를 위한 비디오 코딩 방법과 비디오 인코딩 시스템, 및 비디오 디코딩 방법과 비디오 디코딩 시스템 | |
US20060083300A1 (en) | Video coding and decoding methods using interlayer filtering and video encoder and decoder using the same | |
EP1766998A1 (fr) | Procédé de codage vidéo échelonnable et appareil utilisant une couche de base | |
US20060013311A1 (en) | Video decoding method using smoothing filter and video decoder therefor | |
EP1657932A1 (fr) | Procède et appareil de codage et décodage de vidéo utilisant une intermédiaire filtre | |
CA2557312C (fr) | Procedes de codage et decodage video et systemes pour service de video en debit continu | |
WO2006028330A1 (fr) | Procedes de decodage et codage video multicouches, codeur et decodeur video multicouches | |
WO2006043753A1 (fr) | Procede et appareil de precodage de trains de bits hybride |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20070220 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR |
|
DAX | Request for extension of the european patent (deleted) | ||
A4 | Supplementary search report drawn up and despatched |
Effective date: 20081223 |
|
17Q | First examination report despatched |
Effective date: 20090302 |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: SAMSUNG ELECTRONICS CO., LTD. |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R003 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED |
|
18R | Application refused |
Effective date: 20160402 |