WO2005086487A1 - Video encoding and decoding methods and systems for video streaming service - Google Patents

Video encoding and decoding methods and systems for video streaming service Download PDF

Info

Publication number
WO2005086487A1
WO2005086487A1 PCT/KR2005/000520 KR2005000520W WO2005086487A1 WO 2005086487 A1 WO2005086487 A1 WO 2005086487A1 KR 2005000520 W KR2005000520 W KR 2005000520W WO 2005086487 A1 WO2005086487 A1 WO 2005086487A1
Authority
WO
WIPO (PCT)
Prior art keywords
resolution
frames
scalable video
encoding
video
Prior art date
Application number
PCT/KR2005/000520
Other languages
English (en)
French (fr)
Inventor
Woo-Jin Han
Original Assignee
Samsung Electronics Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020040028487A external-priority patent/KR100596705B1/ko
Application filed by Samsung Electronics Co., Ltd. filed Critical Samsung Electronics Co., Ltd.
Priority to JP2007501706A priority Critical patent/JP2007525924A/ja
Priority to EP05726745A priority patent/EP1721465A4/en
Priority to CA2557312A priority patent/CA2557312C/en
Publication of WO2005086487A1 publication Critical patent/WO2005086487A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/59Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/187Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scalable video layer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/33Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the spatial domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/63Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Definitions

  • the present invention relates to a video encoding method and system for video streaming services and a video decoding method and system for reconstructing the original video.
  • VOD Video On Demand
  • a video content such as movies or news is provided to an end user over a telephone line, cable or Internet upon the user's request.
  • Users are allowed to view a movie without having to leave their residence.
  • users are allowed to access various types of educational content via moving image lectures without having to physically go to a school or private educational institute.
  • FIGS. 1 - 3 respectively show conventional simulcast, multi-layer coding, scalable video coding schemes for video streaming at different resolutions, frame rates, or image qualities.
  • a separate bitstream is generated for each resolution, frame rate, or image quality. For example, three separate bitstreams are required in order to provide bitstreaming services at three resolutions.
  • a video with 704x576 resolution (first resolution) and 60 Hz frame rate, a video with a 352x288 resolution (second resolution) and 30 Hz frame rate, and a video with 176x144 resolution (third resolution) and 15 Hz frame rate are independently encoded into three bitstreams.
  • the first through third resolution bitstreams are respectively used for streaming services over networks capable of providing bandwidths of 6 Mbps, 750 Kbps, and 64 Kbps.
  • the multi-layer coding scheme shown in FIG. 2 is one approach using a strong correlation between multi-layered video sequences.
  • the multi-layer coding scheme adopted by MPEG-2 for scalable video coding encodes a higher resolution enhancement layer video by referencing the lowest resolution base layer video. That is, referring to FIG. 2, a first enhancement layer video with a 352x288 resolution is encoded with reference to an encoded base layer video with a 176x155 resolution, and a second enhancement layer video with a 705x576 resolution is encoded with reference to the first enhancement layer video.
  • a streaming service provider Upon receipt of a user's request for the 705x576 resolution video, a streaming service provider transmits the video encoded in the second enhancement layer as well as the videos encoded in the first enhancement layer and the base layer to the user. The user that receives them first reconstructs the base layer video and then sequentially reconstructs the first enhancement layer video and the 705x576 resolution second enhancement layer video by referencing the reconstructed base layer video and the reconstructed first enhancement layer video, respectively.
  • the streaming service provider upon receipt of a user's request for the 352x288 resolution video, transmits the videos encoded in the first enhancement layer and the base layer to the user. The user that receives them first reconstructs the base layer video and then reconstructs the first enhancement layer video with the 352x288 resolution by referencing the reconstructed base layer video.
  • the streaming service provider Upon receipt of a user's request for the 176x155 resolution video, transmits the video encoded in the base layer to the user. The user then reconstructs the base layer video.
  • MPEG-4 intends to standardize scalable video coding that involves creating videos at various resolutions, frame rates, and image qualities from a single encoded bitstream. As shown in FIG. 3, the scalable video coding scheme generates videos with various resolutions and frame rates from a single bitstream.
  • Temporal scalability that is the ability to generate videos with different resolutions from a scalable bitstream can be achieved with wavelet transform.
  • Temporal scalability that is the ability to generate videos at different frame rates from a scalable bitstream can be provided by Motion Compensated Temporal Filtering (MCTF), Unconstrained MCTF (UMCTF), or Successive Temporal Approximation and Referencing (STAR).
  • MCTF Motion Compensated Temporal Filtering
  • UMCTF Unconstrained MCTF
  • SNR Signal-to-noise ratio
  • Using a scalable video coding algorithm allows a video streaming service of a single bitstream obtained from a single video sequence at various resolutions and frames rates.
  • scalable video coding algorithms do not offer high quality bitstreams at all resolutions.
  • conventional coding algorithms cannot provide for high quality bitstreams at all resolutions.
  • the highest resolution video can be reconstructed with high quality, but a low-resolution video cannot be reconstructed with satisfactory quality.
  • More bits can be allocated for video coding of the low-resolution video to improve its quality. However, this will degrade the coding efficiency.
  • the present invention provides a video encoding method and system capable of providing video streaming services with various image qualities and high coding efficiency.
  • the present invention also provides a video decoding method and system for decoding video encoded by the video encoding method and system to reconstruct an original video sequence.
  • a video encoding method comprising encoding first resolution frames using scalable video coding, upsampling the first resolution frames to a second resolution, and encoding second resolution frames using scalable video coding with reference to upsampled versions of the first resolution frames.
  • a video encoding method including encoding first resolution frames using non-scalable video coding, upsampling the first resolution frames to a second resolution, and encoding second resolution frames using scalable video coding with reference to upsampled versions of the first resolution frames.
  • a video encoding method including encoding first resolution frames using scalable video coding, upsampling the first resolution frames to a second resolution, upsampling the first resolution frames to a third resolution, encoding second resolution frames using scalable video coding with reference to frames upsampled to the second resolution, and encoding third resolution frames using scalable video coding with reference to frames upsampled to the third resolution.
  • a video encoding method including encoding first resolution frames using scalable video coding, upsampling the first resolution frames to a second resolution, encoding second resolution frames using scalable video coding with reference to frames upsampled to the second resolution, encoding frames with a third resolution higher than the second resolution using scalable video coding, upsampling the third resolution frames to a fourth resolution, and encoding fourth resolution frames using scalable video coding with reference to frames upsampled to the fourth resolution.
  • a video encoding method including encoding frames with a first resolution using scalable video coding, encoding frames with a second resolution higher than the first resolution using scalable video coding, independently of the first resolution frames, and encoding frames with a third resolution higher than the second resolution using scalable video coding, independently of the second resolution frames.
  • a video encoding method including encoding frames with a first resolution using non-scalable video coding, encoding frames with a second resolution higher than the first resolution using scalable video coding, independently of the first resolution frames, and encoding frames with a third resolution higher than the second resolution using scalable video coding, independently of the second resolution frames.
  • a video encoding method including encoding first resolution frames using scalable video coding, upsampling the first resolution frames to a second resolution, encoding frames with a third resolution higher than the second resolution using scalable video coding, downsampling the third resolution frames to the second resolution, and encoding second resolution frames using scalable video coding with reference to upsampled versions of the first resolution frames and downsampled versions of the third resolution frames.
  • a video encoding method including encoding second resolution frames using scalable video coding, downsampling the second resolution frames to a first resolution, and encoding first resolution frames using scalable video coding with reference to downsampled versions of the second resolution frames.
  • a video encoding method including encoding second resolution frames using scalable video coding, downsampling the second resolution frames to a first resolution, and encoding first resolution frames using non-scalable video coding with reference to downsampled versions of the second resolution frames.
  • a video encoding method including encoding third resolution frames using scalable video coding, downsampling the third resolution frames to a second resolution, encoding second resolution frames using scalable video coding with reference to frames downsampled to the second resolution, downsampling the third resolution frames to a first resolution lower than the second resolution, and encoding first resolution frames using scalable video coding with reference to frames downsampled to the first resolution.
  • a video encoder system including a first scalable video encoder encoding first resolution frames using non-scalable video coding, a second scalable video encoder converting the first resolution frames into a second resolution and encoding second resolution frames using scalable video coding with reference to the converted frames, and a bitstream generating module generating a bitstream consisting of the first resolution encoded frames and the second resolution encoded frames.
  • a video encoder system including a first scalable video encoder encoding frames with a first resolution using scalable video coding, a second scalable video encoder encoding frames with a second resolution lower than the first resolution using scalable video coding, and a bitstream generating module generating a bitstream consisting of the first resolution encoded frames and the second resolution encoded interframes.
  • a video encoder system including a scalable video encoder encoding frames with a first resolution using scalable video coding, a non-scalable video encoder encoding frames with a second resolution lower than the first resolution using non-scalable video coding, and a bitstream generating module generating a bitstream consisting of the first resolution encoded frames and the second resolution encoded interframes.
  • a video decoding method including decoding the first resolution frames encoded using scalable video coding to reconstruct original frames, upsampling the reconstructed first resolution frames to a second resolution, and decoding second resolution frames encoded using scalable video coding with reference to upsampled versions of the reconstructed first resolution frames in order to reconstruct original frames.
  • a video decoding method comprising decoding the first resolution frames encoded using non- scalable video coding to reconstruct original frames, upsampling the reconstructed first resolution frames to a second resolution, and decoding second resolution frames encoded using scalable video coding with reference to upsampled versions of the reconstructed first resolution frames in order to reconstruct original frames.
  • a video decoding method including decoding the first resolution frames encoded using scalable video coding to reconstruct original frames, downsampling some of the reconstructed first resolution frames to a second resolution and generating intraframes with the second resolution, and decoding second resolution interframes encoded using scalable video coding with reference to the generated intraframes.
  • a video decoding method including decoding the first resolution frames encoded using scalable video coding to reconstruct original frames, downsampling some of the reconstructed first resolution frames to a second resolution and generating intraframes with the second resolution, and decoding second resolution interframes encoded using non- scalable video coding with reference to the generated intraframes.
  • a video decoder system including a first scalable video decoder decoding first resolution frames encoded using scalable video coding in order to reconstruct original frames, and a second scalable video decoder converting the reconstructed first resolution frames to a second resolution and decoding second resolution frames encoded using scalable video coding with reference to the converted frames in order to reconstruct original frames.
  • a video decoder system including a non-scalable video decoder decoding first resolution frames encoded using non-scalable video coding in order to reconstruct original frames, and a scalable video decoder converting the reconstructed first resolution frames to a second resolution and decoding second resolution frames encoded using scalable video coding with reference to the converted frames in order to reconstruct original frames.
  • FIGS. 1 - 3 show conventional coding schemes for providing video streaming at different resolutions
  • FIG. 4 illustrates a referencing relationship in encoding frames in an enhancement layer using a multi-layer coding scheme
  • FIGS. 5 and 6 illustrate coding schemes for video streaming according to first and second exemplary embodiments of the present invention
  • FIGS. 7 - 10 illustrate coding schemes for video streaming according to third through sixth exemplary embodiments of the present invention
  • FIGS. 11 - 14 illustrate coding schemes for video streaming according to seventh through tenth exemplary embodiments of the present invention
  • FIG. 15 illustrates a referencing relationship in interframe coding according to an exemplary embodiment of the present invention
  • FIG. 16 illustrates a referencing relationship in interframe coding according to another exemplary embodiment of the present invention
  • FIG. 17 illustrates a referencing relationship in interframe coding according to another exemplary embodiment of the present invention
  • FIG. 15 illustrates a referencing relationship in interframe coding according to an exemplary embodiment of the present invention
  • FIG. 16 illustrates a referencing relationship in interframe coding according to another exemplary embodiment of the present invention
  • FIG. 17 illustrates a referencing relationship in interframe coding according to another exemplary embodiment of the present invention
  • FIG. 43 FIG
  • FIG. 18 illustrates a referencing relationship in interframe coding according to another exemplary embodiment of the present invention
  • FIG. 19 illustrates sharing of an intraframe according to an exemplary embodiment of the present invention
  • FIG. 20 illustrates sharing of an intraframe according to another exemplary embodiment of the present invention
  • FIG. 21 is a block diagram of a video encoder system according to an exemplary embodiment of the present invention
  • FIG. 22 is a block diagram of a video decoder system according to an exemplary embodiment of the present invention
  • FIG. 23 is a diagram for explaining a process of generating a smooth intraframe in a smooth enhancement layer in intraframe sharing and decoding a shared intraframe.
  • FIG. 4 illustrates a referencing relationship in encoding frames in an enhancement layer using a multi-layer coding scheme.
  • a current frame (frame N) in the enhancement layer can be inter-coded using a previous frame (frame N-1) as a reference (backward prediction) or using a next frame (frame N+l) as a reference (forward prediction).
  • frame N-1 a previous frame
  • frame N+l a next frame
  • forward prediction a prediction for prediction
  • inter-layer prediction uses a current frame in a base layer to encode a current frame in an enhancement layer.
  • a reference frame is created by upsampling or downsampling the current frame in the base layer to the resolution of the enhancement layer. For example, when the resolution of the base layer is lower than that of the enhancement layer as shown in FIG. 4, the current frame in the base layer is upsampled to the resolution of the enhancement layer and then the current frame in the enhancement layer is inter-coded with reference to an upsampled version of the frame in the base layer.
  • the current frame in the base layer is downsampled to the resolution of the enhancement layer and the current frame in the enhancement layer is inter-coded with reference to a downsampled version of the frame in the base layer.
  • weighted bi-directional prediction and intrablock prediction can also be used as a prediction mode.
  • a prediction mode can be selected based on a cost containing the amount of coded data and the amount of motion vector data used for prediction, computational complexity, and other factors.
  • a frame in an enhancement layer may be encoded based on inter-layer prediction from another enhancement layer instead of a base layer.
  • a frame in a first enhancement layer may be encoded using a frame in a base layer as a reference
  • a frame in a second enhancement layer may be encoded using the frame in the first enhancement layer as a reference.
  • all or a part of frames in the first or second enhancement layer may be encoded based on inter- layer prediction using frames in another layer (the base layer or the first enhancement layer) as a reference.
  • some frames in the enhancement layer may be encoded based on prediction other than the inter-layer prediction.
  • Exemplary embodiments of the present invention use simulcast coding or multilayer coding scheme to provide video streaming services at various resolutions and frame rates.
  • the present invention also uses a scalable video coding scheme in all or part of layers to allow video streaming services at a larger number of resolutions and frame rates.
  • FIGS. 5 - 14 illustrate coding schemes for video streaming according to first through tenth exemplary embodiments of the present invention. While a video is described to have three or four layers, the video may consist of two layers or five or more layers. Lower and upper layers in the first through tenth exemplary embodiments respectively denote lower- and higher-resolution layers.
  • inter-layer referencing is indicated by a dotted arrow, and videos with different resolutions, frame rates, or transmission rates that can be obtained from an encoded video in a certain layer are indicated by solid arrows.
  • FIG. 5 shows an example of a multi-layer coding scheme for video streaming according to a first exemplary embodiment of the present invention where video data is encoded into three layers, i.e., a base layer and first and second enhancement layers.
  • videos in all the layers are encoded using scalable video coding. That is, a video in the base layer is encoded using scalable video coding. A video in the first enhancement layer is encoded with reference to frames in the encoded base layer video using scalable video coding, and a video in the second enhancement layer is encoded with reference to frames in the encoded first enhancement layer video using scalable video coding.
  • a streaming service provider Upon receiving a user's request for a 705x576 resolution video, a streaming service provider transmits the video encoded in the second enhancement layer as well as the videos encoded in the first enhancement layer and the base layer to the user.
  • a requested frame rate is 60 Hz
  • all frames encoded in the base layer and the first and second enhancement layers are transmitted to the user.
  • the streaming service provider truncates unnecessary part of the coded frames before transmission.
  • the user uses the coded frames to reconstruct the video in the base layer first. Then, the user sequentially reconstructs the video in the first enhancement layer and the 705x576 resolution video in the second enhancement layer by referencing the reconstructed video in the base layer and the reconstructed video in the first enhancement layer, respectively.
  • the streaming service provider Upon receiving a user's request for a 352x288 resolution video, the streaming service provider transmits the videos encoded in the base layer and the first enhancement layer to the user.
  • a requested frame rate is 30 Hz
  • all frames encoded in the base layer and the first enhancement layer are transmitted to the user.
  • the streaming service provider truncates unnecessary part of the coded frames before transmission.
  • the user that receives the coded frames reconstructs the video in the base layer and then the 352x288 resolution video in the first enhancement layer by referencing the reconstructed video in the base layer.
  • the streaming service provider Upon receipt of a user's request for a 176x155 resolution video, the streaming service provider transmits the video encoded in the base layer to the user.
  • the user selects bitstream transmission at a bit rate of 128 Kbps, all coded frames are transmitted to the user.
  • the streaming service provider truncates some bits of the coded frames before transmission. The user that receives the coded frames reconstructs the video in the base layer.
  • FIG. 6 shows an example of a multi-layer coding scheme for video streaming according to a second exemplary embodiment of the present invention, in which one layer is encoded using non-scalable video coding.
  • an H.264 or MPEG-4 video coding standard can support limited spatial scalability by using the coding schemes shown in FIGS. 1 - 3 or limited temporal scalability as disclosed in International Application No. PCT/US2000/09584, it does not offer sufficient spatial, temporal, and signal-to-noise ratio (SNR) scalabilities.
  • SNR signal-to-noise ratio
  • the present invention uses a wavelet-based scalable coding scheme as a basic algorithm. While offering good spatial, temporal, and SNR scalabilities, currently known scalable video coding algorithms provide lower coding efficiency than H.264 or MPEG-4. In order to improve coding efficiency, some layers can be encoded using a non-scalable H.264 or MPEG-4 scheme as shown in FIG. 6.
  • the lowest resolution base layer is encoded using a non- scalable H.264 or MPEG-4 coding scheme since the lowest resolution video does not need to be scalable. That is, a video having a transmission rate of 64 Kbps (lowest bit rate) is encoded using the H.264 or MPEG-4 coding scheme with high coding efficiency.
  • FIG. 7 shows an example of a multi-layer coding scheme for video streaming according to a third exemplary embodiment of the present invention, in which an enhancement layer is encoded with reference to a layer lower than the immediately preceding layer.
  • a second enhancement layer is encoded with reference to a base layer instead of a first enhancement layer.
  • the coding scheme according to the third exemplary embodiment provides lower coding efficiency than in the first exemplary embodiment because the second enhancement layer is encoded with reference to the base layer with a large resolution difference.
  • it offers higher image quality than in the first exemplary embodiment since a video in the second enhancement layer is reconstructed by directly referencing the base layer instead of the first enhancement layer during a decoding process.
  • FIG. 8 shows an example of a multi-layer coding scheme for video streaming according to a fourth exemplary embodiment of the present invention, in which a video is encoded into a plurality of base layers and enhancement layers. Using many layers as in the first embodiment may degrade coding efficiency. Thus, in the fourth exemplary embodiment, a base layer that can be independently encoded without reference to any other layer is placed at a proper position determined according to the number of layers.
  • FIG. 9 shows an example of a simulcast video coding scheme according to a fifth exemplary embodiment of the present invention that uses only scalable coding in encoding each resolution.
  • a simulcast coding scheme is more efficient than a multi-layer coding scheme.
  • scalable video coding is used in encoding all or some of resolutions.
  • only the lowest resolution video may be encoded using non-scalable H.264 or MPEG-4 coding as in a sixth exemplary embodiment of FIG. 10.
  • FIG. 11 shows an example of a multi-layer coding scheme for video streaming according to a seventh exemplary embodiment of the present invention in which the lowest resolution layer is not a base layer.
  • video data is encoded into the first enhancement layer of the lowest resolution, and the second enhancement layer of the highest resolution, by referencing intermediate resolution base layer.
  • An upsampled version of a frame in the base layer is used as a reference in encoding a video in the second enhancement layer while a downsampled version of the frame in the base layer is used in encoding a video in the first enhancement layer.
  • FIG. 12 shows an example of a multi-layer coding scheme for video streaming according to an eighth exemplary embodiment of the present invention, in which a base layer is encoded at the highest resolution.
  • a video in a first enhancement layer is encoded with reference to a video in the base layer
  • a video in a second enhancement layer is encoded with reference to the video in the first enhancement layer.
  • the reference frames used in encoding the first enhancement layer video are downsampled versions of frames in the base layer.
  • some of multiple layers can be encoded using a non-scalable video coding scheme as in a ninth exemplary embodiment shown in FIG. 13.
  • FIG. 14 shows an example of a multi-layer video coding scheme for video streaming according to a tenth exemplary embodiment of the present invention.
  • the multi-layer coding scheme in the tenth exemplary embodiment encodes a video in a lower-resolution layer with reference to a video in a high-resolution layer.
  • FIG. 15 illustrates a referencing relationship in interframe coding according to an exemplary embodiment of the present invention. Referencing between each resolution layer is indicated by dotted arrows while referencing within the same resolution layer is indicated by solid arrows.
  • a low-resolution video 610 is encoded first.
  • a coding order in the low-resolution video 610 is determined to achieve temporal scalability. That is, when the size of a group of pictures (GOP) is 4, frame 1 in the GOP is encoded as an intraframe (I frame) and frame 3 is encoded as an interframe (H frame). Then, the frames 1 and 3 are used as a reference to encode frame 2, and the frame 3 is used to encode frame 4.
  • a decoding process is performed in the same order as the encoding process, i.e., according to the order of frames 1, 3, 2, and 4. After the frames 1, 3, 2, and 4 are sequentially decoded, the frames 1, 2, 3, and 4 are output in order.
  • a high-resolution video 620 is encoded with reference to the low-resolution video 610 in the same order as the low-resolution video 610, i.e., in the order of frames 1, 3, 2, and 4.
  • To decode the high-resolution video 620 both encoded high- and low- resolution video frames are required.
  • the frame 1 in the low-resolution video 610 is decoded, and the decoded frame 1 is used to decode frame 1 in the high-resolution video 620.
  • the frame 3 in the low-resolution video 610 is decoded, and the decoded frame 3 is used to decode frame 3 in the high-resolution video 620.
  • the frame 2 in the low-resolution video 610 is decoded and used in decoding frame 2 in the high-resolution video 620.
  • the frame 4 in the low-resolution video 610 is decoded and used in decoding frame 4 in the high-resolution video 620, followed by decoding of frames in the next GOP.
  • encoding and decoding frames in this way, temporal scalability can be achieved.
  • a GOP size is 8
  • encoding and decoding are performed according to the order of frames 1, 5, 3, 7, 2, 4, 6, and 8. If only frames 1 and 5 are encoded or decoded, a frame rate is one-quarter the full frame rate. If only frames 1, 5, 3, and 7 are encoded or decoded, a frame rate is half the full frame rate.
  • FIG. 16 illustrates a referencing relationship in interframe coding according to an exemplary embodiment of the present invention.
  • the quality of the encoded low-resolution video is high since the frames 2 through 4 are encoded with reference to the I frame that can be encoded independently without reference to any other frame.
  • the quality of the encoded high-resolution video is lower than that obtained when a simulcast coding scheme is used because the frames 2 through 4 are encoded with reference to the H frame that is encoded with reference to another frame.
  • FIG. 16 shows an improved method for referencing between layers.
  • a high-resolution video 720 is encoded first.
  • a coding order in the high-resolution video 720 is determined to achieve temporal scalability. That is, when a GOP size is 4, frame 1 in the GOP is encoded as an intraframe (I frame) and frame 3 is encoded as an interframe (H frame). Then, the frames 1 and 3 are used as a reference to encode frame 2, and the frame 3 is used to encode frame 4.
  • a decoding process is performed in the same order as the encoding process, i.e., according to the order of frames 1, 3, 2, and 4. After the frames 1, 3, 2, and 4 are sequentially decoded, the frames 1, 2, 3, and 4 are output in order.
  • a low-resolution video 710 is encoded with reference to the high-resolution video 720 in the same order as the high-resolution video 720, i.e., in the order of frames 1, 3, 2, and 4.
  • To decode the low-resolution video 710 both encoded high- and low- resolution video frames are required.
  • the frame 1 in the high-resolution video 720 is decoded, and the decoded frame 1 is used to decode frame 1 in the low-resolution video 710.
  • the frame 3 in the high-resolution video 720 is decoded, and the decoded frame 3 is used to decode frame 3 in the low-resolution video 710.
  • the frame 2 in the high-resolution video 720 is decoded and used in decoding frame 2 in the low-resolution video 710.
  • the frame 4 in the high-resolution video 720 is decoded and used in decoding frame 4 in the low-resolution video 710.
  • FIGS. 17 and 18 respectively illustrate referencing relationships in interframe coding according to other exemplary embodiments of the present invention when resolution layers have varying frame rates.
  • a low-resolution video 810 is encoded first.
  • a coding order in the low-resolution video 710 is determined to achieve temporal scalability. That is, when a GOP size is 4, frame 1 in the GOP is encoded as an intraframe (I frame) and frame 5 is encoded as an interframe (H frame). Then, the frames 1 and 5 are used to encode frame 3. In this way, frames 1, 5, 3, and 7 in the GOP are encoded in order.
  • a decoding process is performed in the same order as the encoding process.
  • a high-resolution video 820 is encoded with reference to the low-resolution video 810 in the same order as the low -resolution video 810, i.e., according to the order of frames 1, 5, 3, and 7. Then, frames 2, 4, 6, and 8 not contained in the low- resolution video 810 are encoded.
  • a high-resolution video 920 is encoded first.
  • a coding order in the high-resolution video 920 is determined to achieve temporal scalability. That is, when a GOP size is 8, all frames 1, 5, 3, 7, 2, 4, 6, and 8 in a GOP are sequentially encoded.
  • a decoding process is performed in the same order as the encoding process.
  • a low-resolution video 910 is encoded with reference to the high-resolution video 920 in the same order as the high-resolution video 920, i.e., in the order of frames 1, 5, 3, and 7.
  • FIGS. 15 - 18 illustrate referencing relationships between two resolution layers according to the exemplary embodiments of the present invention
  • the illustrated embodiments can apply to a multi-layer video coding scheme as well, which encodes video data into three or more layers.
  • a multi-layer video coding scheme in which a low-resolution frame is encoded with reference to a high-resolution frame, coding efficiency is reduced when a low- resolution bitstream is transmitted since the low-resolution bitstream contains low- resolution coded video data as well as high-resolution coded data.
  • Simulcast video coding is more efficient for transmission of a low-resolution bitstream than multi-layer video coding.
  • FIGS. 19 and 20 respectively illustrate sharing of an intraframe to improve coding efficiency in a simulcast video coding scheme to improve coding efficiency according to exemplary embodiments of the present invention.
  • videos 1010 and 1020 with different resolutions are encoded independently using a simulcast coding scheme.
  • the high-resolution video 1020 is encoded according to the order of frames 1, 3, 2, and 4 in order to achieve temporal scalability.
  • the low -resolution video 1010 is also encoded according to an order that achieves temporal scalability.
  • the encoded high- and low-resolution videos respectively include one intraframe (I frame) and one or more interframes (H frames) per GOP. In general, an I frame is allocated more bits than an H frame.
  • the low- resolution video 1010 is quite similar to the high-resolution videos 1020 except for resolution, all frames in the low- and high-resolution videos 1010 and 1020 excluding low-resolution I frames 1012 and 1014 are encoded into a bitstream in the present exemplary embodiment. That is, the finally generated bitstream consists of all high- resolution encoded frames and low-resolution encoded interframes.
  • FIG. 20 illustrates sharing of an intraframe according to a another exemplary embodiment of the present invention.
  • a high-resolution video 1120 shares an intraframe 1122 with a low-resolution video 1110. That is, for low-resolution video streaming, a low-resolution intraframe 1112 is created using the high-resolution intraframe 1122.
  • a high-resolution intraframe 1124 is not shared with the low-resolution video 1110 and a low-resolution frame 1114 is used an interframe.
  • FIG. 21 is a block diagram of a video encoder system 1200 according to an exemplary embodiment of the present invention. While the video encoder system 1200 encodes video data into two layers with different resolutions, it may encode video data into n layers with different resolutions.
  • the video encoder system 1200 includes a first scalable video encoder 1210 encoding a base layer video, a second scalable video encoder 1220 encoding an enhancement layer video, and a bitstream generating module 1230 that combines the encoded base layer video and enhancement layer video into a bitstream.
  • the first scalable video encoder 1210 receives the base layer video and encodes the same using scalable video coding. To accomplish this, the first scalable video encoder 1210 includes a motion estimation module 1212, a transform module 1214, and a quantization module 1216.
  • the motion estimation module 1212 estimates motion present between a reference frame and a current frame and produces a residual frame. Algorithms such as UMCTF or STAR are used to remove temporal redundancies using motion estimation. Some of the techniques described with reference to FIGS. 5 - 20 are selected for motion estimation to achieve a better trade-off between coding efficiency and image quality.
  • the transform module 1214 performs wavelet transform on the residual frame to produce transform coefficients.
  • a residual frame is decomposed into four portions, and a quarter-sized image (L image) that is similar to the entire image is placed in the upper left portion of the frame while information (H image) needed to reconstruct the entire image from the L image is placed in the other three portions.
  • L image may be decomposed into a quarter-sized LL image and information needed to reconstruct the L image.
  • the quantization module 1216 applies quantization to the transform coefficients obtained by the wavelet transform.
  • embedded quantization algorithms include Embedded Zerotrees Wavelet Algorithm (EZW), Set Partitioning in Hierarchical Trees (SPIHT), Embedded Zero Block Coding (EZBC), Embedded Block Coding with Optimized Truncation (EBCOT), and so on.
  • the second scalable video encoder 1220 receives the enhancement layer video and encodes the same using scalable video coding.
  • the second scalable video encoder 1220 includes a motion estimation module 1222, a transform module 1224, and a quantization module 1226.
  • the motion estimation module 1222 estimates motion present between a frame current being encoded and reference frames in the enhancement layer video and the base layer video and obtains a residual frame. Algorithms such as UMCTF or STAR are used to remove temporal redundancies using motion estimation.
  • the transform module 1224 performs wavelet transform on the residual frame to produce transform coefficients.
  • a residual frame is decomposed into four portions, and a quarter-sized image (L image) that is similar to the entire image is placed in the upper left portion of the frame while information (H image) needed to reconstruct the entire image from the L image is placed in the other three portions.
  • L image may be decomposed into a quarter-sized LL image and information needed to reconstruct the L image.
  • the quantization module 1226 applies quantization to the transform coefficients obtained by the wavelet transform.
  • Currently known embedded quantization algorithms include EZW, SPIHT, EZBC, EBCOT, and so on.
  • the bitstream generating module 1230 generates a bitstream containing base layer frames and enhancement layer frames encoded by the first and second scalable video encoders 1210 and 1220 and corresponding header information.
  • the video encoder system includes a plurality of video encoders encoding different resolution videos. Some of the plurality of video encoders use non-scalable video coding schemes such as H.264 or MPEG-4.
  • the generated bitstream is predecoded by a predecoder 1240 and then sent to a decoder (not shown).
  • the predecoder 1240 may be located at different positions depending on the type of video streaming services.
  • the video encoder system 1200 transmits only a predecoded bitstream to the decoder, instead of the entire bitstream generated by the bitstream generating module 1230.
  • the streaming service provider when being located separately from the video encoder system 1200 but within a streaming service provider, the streaming service provider predecodes a bitstream encoded by a content provider and sends the predecoded bitstream to the decoder.
  • the predecoder 1240 when the predecoder 1240 is located within the decoder, the predecoder 1240 truncates unnecessary bits of the bitstream in such a way as to reconstruct a video with the desired resolution and frame rate.
  • modules are functional modules and perform the same functions as described above.
  • the term 'module' means, but is not limited to, a software or hardware component, such as a Field Programmable Gate Array (FPGA) or Application Specific Integrated Circuit (ASIC), which performs certain tasks.
  • a module may advantageously be configured to reside on the addressable storage medium and configured to execute on one or more processors.
  • a module may include, by way of example, components, such as software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables.
  • components such as software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables.
  • the functionality provided for in the components and modules may be combined into fewer components and modules or further separated into additional components and modules.
  • the components and modules may be implemented such that they execute one or more computers in a com- munication system.
  • FIG. 22 is a block diagram of the video decoder system 1300 according to an exemplary embodiment of the present invention. While the video encoder system 1200 encodes video data into two layers with different resolutions, it may encode video data into n layers with different resolutions.
  • the video decoder system 1300 includes a first scalable video decoder 1310 decoding a base layer video and a second scalable video encoder 1320 decoding an enhancement layer video.
  • the first and second scalable video decoders 1310 and 1320 receive coded video data from the bitstream interpreting module 1330 for decoding.
  • the first scalable video decoder 1310 receives the encoded base layer video and decodes the same using scalable video decoding. To accomplish this, the first scalable video decoder 1310 includes an inverse quantization module 1312, an inverse transform module 1314, and a motion compensation module 1316.
  • the inverse quantization module 1312 applies inverse quantization to the received encoded video data and outputs transform coefficients.
  • inverse quantization algorithms include EZW, SPIHT, EZBC, EBCOT, and so on.
  • the inverse transform module 1314 performs inverse transform on the transform coefficients to reconstruct the original frame.
  • the inverse transform module 1314 performs inverse transform to produce a residual frame.
  • the motion compensation module 1316 compensates for motion of the residual frame using the previously reconstructed frame as a reference in order to reconstruct the original frame. Algorithms such as UMCTF or STAR may be used for the motion compensation.
  • the second scalable video decoder 1320 receives the encoded enhancement layer video data and decodes the same using scalable video decoding. To accomplish this, the second scalable video decoder 1320 includes an inverse quantization module 1322, an inverse transform module 1324, and a motion compensation module 1326.
  • the inverse quantization module 1322 applies inverse quantization to the received encoded video data and produces transform coefficients.
  • inverse quantization algorithms include EZW, SPIHT, EZBC, EBCOT, and so on.
  • the inverse transform module 1324 performs inverse transform on the transform coefficients. In the case of an intracoded frame, the inverse transform module 1324 performs inverse transform on the transform coefficients to reconstruct the original frame. In the case of an intercoded frame, the inverse transform module 1324 performs inverse transform to produce a residual frame.
  • the motion compensation module 1326 receives a residual frame and compensates for motion of the residual frame using the previously reconstructed base layer frame and the previously reconstructed enhancement layer frame as a reference in order to reconstruct the original frame. Algorithms such as UMCTF or STAR may be used for the motion compensation.
  • FIG. 23 is a diagram for explaining a process of generating a smooth intraframe in a smooth enhancement layer in intraframe sharing and decoding a shared intraframe.
  • D and U respectively denote downsampling and upsampling
  • subscripts W and M respectively denote wavelet- and MPEG-based schemes.
  • F, F , and F respectively represent a high-resolution (base layer) frame, a low-resolution (enhancement layer) frame, and a low-pass subband in the high-resolution frame.
  • a video sequence is first downsampled to a lower resolution and then the downsampled version is upsampled to a higher resolution using a wavelet-based method, followed by MPEG-based downsampling.
  • a low-resolution video sequence obtained by performing the MPEG- based downsampling is then encoded using scalable video coding.
  • the low-resolution frame F s s 1420 is not contained in a bitstream but obtained from a high-resolution intraframe F 1410 contained in the bitstream. That is, to obtain the smooth low-resolution intraframe F 1420, the high-resolution intraframe F 1410 is downsampled and then upsampled using a wavelet-based scheme to obtain approximation of the original high- resolution interframe F 1410, followed by MPEG-based downsampling. The high- resolution intraframe F 1410 is subjected to wavelet transform and quantization and then combined into the bitstream. Some bits of the bitstream is truncated by a predecoder before being transmitted to a decoder.
  • a low-pass subband F 1430 in the high- resolution intraframe F 1410 is obtained.
  • the low-pass subband F 1430 is a downsampled version D (F) of the high-resolution intraframe F 1410.
  • the decoder w that receives a low-pass subband F 1440 upsamples it using the wavelet-based scheme and downsamples an upsampled version using the MPEG-based scheme, producing a smooth intraframe F 1450.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
PCT/KR2005/000520 2004-03-04 2005-02-25 Video encoding and decoding methods and systems for video streaming service WO2005086487A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2007501706A JP2007525924A (ja) 2004-03-04 2005-02-25 ビデオストリーミングサービスのためのビデオコーディング方法とビデオエンコーディングシステム、及びビデオデコーディング方法とビデオデコーディングシステム
EP05726745A EP1721465A4 (en) 2004-03-04 2005-02-25 VIDEO ENCODING AND DECODING METHODS AND SYSTEMS FOR CONTINUOUS-FLOW VIDEO SERVICE
CA2557312A CA2557312C (en) 2004-03-04 2005-02-25 Video encoding and decoding methods and systems for video streaming service

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US54954404P 2004-03-04 2004-03-04
US60/549,544 2004-03-04
KR10-2004-0028487 2004-04-24
KR1020040028487A KR100596705B1 (ko) 2004-03-04 2004-04-24 비디오 스트리밍 서비스를 위한 비디오 코딩 방법과 비디오 인코딩 시스템, 및 비디오 디코딩 방법과 비디오 디코딩 시스템

Publications (1)

Publication Number Publication Date
WO2005086487A1 true WO2005086487A1 (en) 2005-09-15

Family

ID=34921824

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2005/000520 WO2005086487A1 (en) 2004-03-04 2005-02-25 Video encoding and decoding methods and systems for video streaming service

Country Status (4)

Country Link
EP (1) EP1721465A4 (ja)
JP (1) JP2007525924A (ja)
CA (1) CA2557312C (ja)
WO (1) WO2005086487A1 (ja)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018050226A1 (en) * 2016-09-15 2018-03-22 Telefonaktiebolaget Lm Ericsson (Publ) Guided transcoding

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014058110A1 (ko) 2012-10-09 2014-04-17 광운대학교 산학협력단 다중 계층 비디오를 위한 계층 간 예측 방법 및 그 장치
CN110611814A (zh) * 2013-01-07 2019-12-24 Vid拓展公司 用于可扩展视频编码的运动信息信令
JP6120667B2 (ja) * 2013-05-02 2017-04-26 キヤノン株式会社 画像処理装置、撮像装置、画像処理方法、プログラム、及び記録媒体
CN111406405A (zh) * 2017-12-01 2020-07-10 索尼公司 发送装置、发送方法和接收装置

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20000022913A (ko) * 1998-09-17 2000-04-25 윤종용 웨이블릿 변환을 이용한 정지영상에 대한 스케일러블부호화/복호화 방법 및 장치

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU4338800A (en) * 1999-12-22 2001-07-03 General Instrument Corporation Video compression for multicast environments using spatial scalability and simulcast coding
JP2001238210A (ja) * 2000-02-25 2001-08-31 Matsushita Electric Ind Co Ltd 階層型符号化装置、階層型復号化装置、映像信号伝送システム、媒体、および情報集合体
JP2001309378A (ja) * 2000-04-10 2001-11-02 Samsung Electronics Co Ltd 空間上階層構造と画質上階層構造とを同時に有する動映像符号化/復号化方法及び装置
JP2002142227A (ja) * 2000-11-02 2002-05-17 Matsushita Electric Ind Co Ltd 映像信号の階層型符号化装置および階層型復号化装置
US6920175B2 (en) * 2001-01-03 2005-07-19 Nokia Corporation Video coding architecture and methods for using same

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20000022913A (ko) * 1998-09-17 2000-04-25 윤종용 웨이블릿 변환을 이용한 정지영상에 대한 스케일러블부호화/복호화 방법 및 장치

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
FENG WU ET AL: "Efficient and universal scalable video coding.", PROCEEDINGS. 2000 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, vol. 2, September 2002 (2002-09-01), pages II-37 - II-40, XP010607902 *
See also references of EP1721465A4 *
SEUNG HWAN KIM ET AL: "Adaptive multiple reference frame based scalable video coding algorothm.", PROCEEDINGS OF 2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING., vol. 2, 22 September 2002 (2002-09-22) - 25 September 2002 (2002-09-25), pages II-33 - II-36, XP008111349 *
VAN DER SCHAAR M. ET AL: "A novel MPEG-4 based hybrid temporal-SNR scalability for internet video.", INTERNATIONAL CONFERENCE ON IMAGE PROCESSING., vol. 3, 10 September 2000 (2000-09-10) - 13 September 2000 (2000-09-13), pages 548 - 551, XP010529525 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018050226A1 (en) * 2016-09-15 2018-03-22 Telefonaktiebolaget Lm Ericsson (Publ) Guided transcoding
US10735735B2 (en) 2016-09-15 2020-08-04 Telefonaktiebolaget Lm Ericsson (Publ) Guided transcoding

Also Published As

Publication number Publication date
EP1721465A4 (en) 2009-01-21
EP1721465A1 (en) 2006-11-15
CA2557312C (en) 2013-04-23
JP2007525924A (ja) 2007-09-06
CA2557312A1 (en) 2005-09-15

Similar Documents

Publication Publication Date Title
US20050195900A1 (en) Video encoding and decoding methods and systems for video streaming service
US8929436B2 (en) Method and apparatus for video coding, predecoding, and video decoding for video streaming service, and image filtering method
US7933456B2 (en) Multi-layer video coding and decoding methods and multi-layer video encoder and decoder
US7839929B2 (en) Method and apparatus for predecoding hybrid bitstream
US8031776B2 (en) Method and apparatus for predecoding and decoding bitstream including base layer
EP1766998A1 (en) Scalable video coding method and apparatus using base-layer
US20050163224A1 (en) Device and method for playing back scalable video streams
AU2006229490A1 (en) Scalable multi-view image encoding and decoding apparatus and methods
JP2005519542A (ja) より高品質な基準フレームを利用したfgst符号化方法
WO2005086493A1 (en) Scalable video coding method supporting variable gop size and scalable video encoder
CA2557312C (en) Video encoding and decoding methods and systems for video streaming service
Andreopoulos et al. Wavelet-based fully-scalable video coding with in-band prediction
JP4660550B2 (ja) 多階層ビデオコーディングおよびデコーディング方法、ビデオエンコーダおよびデコーダ
Ou et al. H. 264-based resolution, SNR and temporal scalable video transmission systems
WO2006043753A1 (en) Method and apparatus for predecoding hybrid bitstream
Cha et al. Low-Band Correction Algorithm for Wavelet-based Scalable Video Coding

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2557312

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 2005726745

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 200580006644.0

Country of ref document: CN

Ref document number: 2007501706

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 1069/MUMNP/2006

Country of ref document: IN

NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Ref document number: DE

WWP Wipo information: published in national office

Ref document number: 2005726745

Country of ref document: EP