WO2010069427A1 - Method and encoder for providing a tune- in stream for an encoded video stream and method and decoder for tuning into an encoded video stream - Google Patents

Method and encoder for providing a tune- in stream for an encoded video stream and method and decoder for tuning into an encoded video stream Download PDF

Info

Publication number
WO2010069427A1
WO2010069427A1 PCT/EP2009/007649 EP2009007649W WO2010069427A1 WO 2010069427 A1 WO2010069427 A1 WO 2010069427A1 EP 2009007649 W EP2009007649 W EP 2009007649W WO 2010069427 A1 WO2010069427 A1 WO 2010069427A1
Authority
WO
WIPO (PCT)
Prior art keywords
tune
stream
coded
intra
picture
Prior art date
Application number
PCT/EP2009/007649
Other languages
French (fr)
Inventor
Harald Fuchs
Stefan DÖHLA
Ulf Jennehag
Original Assignee
Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. filed Critical Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.
Publication of WO2010069427A1 publication Critical patent/WO2010069427A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/238Interfacing the downstream path of the transmission network, e.g. adapting the transmission rate of a video stream to network bandwidth; Processing of multiplex streams
    • H04N21/2383Channel coding or modulation of digital bit-stream, e.g. QPSK modulation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/107Selection of coding mode or of prediction mode between spatial and temporal predictive coding, e.g. picture refresh
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/164Feedback from the receiver or from the transmission channel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/174Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a slice, e.g. a line of blocks or a group of blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/438Interfacing the downstream path of the transmission network originating from a server, e.g. retrieving MPEG packets from an IP network
    • H04N21/4383Accessing a communication channel
    • H04N21/4384Accessing a communication channel involving operations to reduce the access time, e.g. fast-tuning for reducing channel switching latency
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip

Definitions

  • Embodiments of the invention relate to the field of providing tune-in streams for allowing a fast tune-in into an encoded video stream, for example for allowing a fast channel change based on a tune-in stream. More specifically, embodiments of the invention relate to a method for providing a tune-in stream for an encoded video stream and an associated encoder as well as to a method for tuning into an encoded video stream and an associated decoder. More specifically, embodiments of the invention provide gradual tune-in pictures for a fast tune-in or channel change.
  • P-frames and B-frames are based on differential video coding.
  • coding efficiency is substantially increased through predictive coding (in so-called P-frames and B-frames) exploiting the temporal correlation between consecutive frames in a video sequence.
  • P-frames use only backward- prediction from previous frames whereas B-frames may additionally use forward- prediction from subsequent frames.
  • I-frames only depend on data inside the frame and enable decoding at the start of a sequence.
  • Each frame consists of macroblocks, whereas an I-frame contains intra-coded macroblocks only.
  • P and B-frames are mostly consisting of inter-coded macroblocks, i.e. macroblocks that depend on data of other frames and may also contain intra-coded macroblocks.
  • the frame types of differential video coding do typically have a fixed sequence, which is an I-frame followed by several P and B-frames until the next I-frame.
  • the range from an I-frame up to the next I-frame is called a Group of Pictures (GOP), as illustrated in Fig. 1.
  • GOP Group of Pictures
  • Fig. 1 illustrates a conventional GOP structure of an encoded video stream.
  • the encoded video stream comprises a plurality of I-pictures (intra-coded pictures) and a plurality of P-pictures (inter-coded pictures).
  • the encoded video stream comprises a plurality of consecutive frames which are labeled in Fig. 1 as frames 1 to 7 and the pictures are included within the frames.
  • the encoded video stream comprises a plurality of groups of pictures (GOPs) each of which includes at least one I-picture and one or more P- or B-pictures.
  • the GOP of the encoded stream comprises as a first frame in the GOP the I-frame, i.e. the frame including the intra-coded picture whereas the remaining frames 2 to 6 comprise the inter-coded pictures.
  • a new GOP starts.
  • P- and B-frames allow also intra-coded macroblocks which may be used to improve the coding gain, i.e. not all macroblocks benefit from inter-frame prediction.
  • Another benefit of intra-coded macroblocks is their positive effect on error resilience since errors due to missing or corrupt frames do only propagate over inter-coded macroblocks.
  • Random-access into a differentially coded video sequence may only be done at an I-frame since this frame is guaranteed to contain intra-coded macroblocks only. All following frames have potentially dependencies on the I- frame or other previous frames and a best- effort decoding of these frames would result in visible image distortion.
  • IPTV suffers from slow channel change behavior similarly as digital TV broadcast systems.
  • a disadvantage of IPTV is the high number of additional delay factors that are however of lesser importance than the influence of the video coding.
  • the advantage of IPTV is the flexibility by using multicast instead of broadest, higher available bitrates and the possibility of easily reconfigurable systems in the distribution chain.
  • fast channel change in IPTV may be accomplished not only by reducing the size of a GOP but also by other methods that provide-random-access points and filled decoder buffers in short time.
  • RAPs Random Access Points
  • This approach is a waste of network resources, especially when clients are in a steady state.
  • One solution to this is to provide a side stream that is a different video encoding of the channel containing more frequent random access points than the normal encoding of the video, the main stream.
  • the side stream is only forwarded to the client upon request, i.e. when the client tunes into a new channel. It is used until an RAP in the main stream is received. The client then switches to the main stream and drops the side stream.
  • it's quality is reduced, e.g.
  • the main stream may utilize longer I-frame distances and thus have a higher compression gain.
  • Another variation of side streams contains I-frames only as RAPs to the main stream and splices one of these with the main stream. This is described in more detail below (see also U. Jennehag, T. Zhang, and S. Pettersson. "Improving Transmission Efficiency in H.264 based IPTV Systems", IEEE Transactions on Broadcasting, vol. 53, no. 1, pp. 69-78, March 2007, and U. Jennehag and S. Pettersson. "On Synchronization Frames for Channel Switching in a GOP-Based IPTV Environment", 5 th IEEE Consumer Communications and Networking Conference 2008, pp. 638 -642, Jan 2008).
  • TIP Tune-In Pictures
  • a stream consisting of stream RAPs e.g. I-frames
  • the main stream consists of a normal GOP structure with a large I-frame distance.
  • Clients who want to decode a video stream e.g. a TV-channel, must receive both the main video stream and one frame from the TIP stream for an instant RAP.
  • the tune-in picture and the main stream are then spliced, i.e. a P-frame from the main stream is replaced by the corresponding TIP, and decoded.
  • the following example, illustrated in Fig. 2 shows a typical TIP channel switch situation.
  • Fig. 2 illustrates an example for a channel change using tune-in pictures.
  • Fig. 2 illustrates a first channel A comprising a main stream 100 comprising a plurality of frames including respective inter-coded pictures P A and intra-coded pictures U.
  • a time axis indicating times tl to tl3 is shown and during the time period tl to t4 the main stream 100 of channel A provides the frames for a first group of pictures GOP A I - A second group of pictures GOP A2 is provided starting from time instance t5.
  • channel A comprises a tune-in stream 102 which comprises a plurality of intra-coded pictures for the main stream 100 which are spaced evenly with respect to each other, i.e. every three time instances an intra-coded pictures is sent by the tune-in stream 102 of channel A.
  • Fig. 2 shows the main stream 104 for channel B and the channel B tune-in stream 106.
  • the main stream 104 of channel B comprises a plurality of intra-coded pictures I B and a plurality of inter-coded P B .
  • the main stream 104 of B in the situation shown in Fig. 2, comprises a first group of pictures GOP B i that starts with an I-frame at time instance tl and that ends with an inter-coded frame at time instance tl2. At time instance tl3 a new group of pictures GOP B2 starts with a new I-frame I ⁇ .
  • the tune-in stream 106 comprises a plurality of intra-coded frames which are provided at time instances t4, Xl, tlO and tl3.
  • spliced stream 108 which is the stream that is presented to a decoder for decoding the encoded main stream.
  • a spliced stream 108 which is the stream that is presented to a decoder for decoding the encoded main stream.
  • time instances tl to t4 frames from the main stream 100 of channel A are within the spliced stream and supplied to the decoder for decoding.
  • the client requests the channel switch at time instance t5 and the main multicast stream of channel A is immediately left at time instance t5. Then, first the TIP stream for channel B is requested at time instance t5. The client waits until traffic arrives and receives the RAP at time instance t7. The TIP stream is left when the complete RAP is received and the main stream of channel B is now requested. The RAP of the TIP stream and the main stream are spliced and decoded at time instance t7 to generate a valid coded video sequence. Residual artifacts from decoder mismatch, i.e. the mismatch of reference pictures and the resulting drift, are removed when the next I-frame is encountered in the main stream.
  • Fig. 3 illustrates, on the basis of the conventional GOP structure shown in Fig. 1 a gradually refreshing structure.
  • the frames 1 to 7 are shown.
  • each frame 1 to 7 comprises a plurality of partitions or slices a, b and c.
  • frames 1 to 6 form one group of pictures and frame 7 indicates the start of a subsequent group of pictures.
  • the encoded video stream comprises for each I-picture a plurality of intra-coded macroblocks and for each P-frame or B-frame a plurality of inter-coded macroblocks.
  • each frame comprises three macroblocks.
  • the three macroblocks of the I-picture, other than in Fig. 1 are not provided in a single frame, like frame 1 of Fig. 1, rather the intra-coded or I-macroblocks of the I-picture are distributed or spread among a plurality of frames.
  • a first I- macroblock or I-slice of the I-picture for the group of pictures comprising frames 1 to 6 is provided in slice a of frame 1.
  • the second I-macroblock is provided in slice b of frame 2
  • the third I-macroblock is provided in slice c of frame 3.
  • the bitrate smoothing is accomplished by using a method which is known as independent Segment decoding (ISD) with shifted intra-coded slices in H.263 (see ITU-T Rec. H.263, "Infrastructure of audiovisual services -Coding of moving video: Video coding for low bit rate communication", International Telecommunication Union, Jan 2005) or gradual decoder-refresh (GDR) in H.264.
  • ISD independent Segment decoding
  • H.263 see ITU-T Rec. H.263, "Infrastructure of audiovisual services -Coding of moving video: Video coding for low bit rate communication", International Telecommunication Union, Jan 2005) or gradual decoder-refresh (GDR) in H.264.
  • a GDR sequence starts typically with one I-slice (intra-coded slice) and inter-coded slices which are only using forward prediction up to the next I-slice for the same image partition in the following two frames.
  • the bits which would be necessary for the first frame if coded as an I-frame are distributed over frames one to three.
  • GDR reduces coding efficiency for smaller frame sizes resulting in small slices, because the prediction choices are reduced significantly at slice boundaries.
  • GDR is especially advantageous for low-delay video coding, where it is important that even over a bitrate limited channel the arrival time is predictable and also the required buffer sizes are to be small.
  • a first method for tune-in into a GDR requires starting decoding at the 1 st or 7 th frame in the example of Fig. 3 for an initially correct picture.
  • An additional parameter has to be signaled, the pre-roll period, that describes how many frames after a random-access point have to be decoded for assured decoding, i.e. perfect reconstruction is achieved (see G. Sullivan. "On Random Access and Bitstream Format for NT Video", Joint Video Team (NT) of ISO/IEC MPEG 4ITU-T VCEG, NT-8063, Jun 2002).
  • a pre-roll period of three frames has to be used.
  • the second method for tune-in into a GDR stream is known as best-effort decoding (see G. Sullivan. "On Random Access and Bitstream Format for NT Video", Joint Video Team (NT) of ISO/IEC MPEG 4ITU-T VCEG, NT-8063, Jun 2002) and illustrated in Fig. 4.
  • Fig. 4 describes a GDR structure for a best effort decoding.
  • each frame comprises a plurality of slices a, b, and c.
  • the I-macroblocks for an I-picture of a GOP are distributed in frames which are separated from each other by at least one frame. More specifically, the first I-macroblock is provided in slice a of frame 1 , the second I-macroblock is provided in slice b of frame 3 and the third I-macroblock is provided in slice c of frame 5.
  • the I-slices or I-macroblocks are evenly distributed among the frames of a GOP.
  • Embodiments of the invention provide a method for providing a tune-in stream for an encoded video stream of a plurality of intra-coded pictures and a plurality of inter-coded pictures, each picture comprising a plurality of macroblocks, the encoded video stream comprising a plurality of frames, wherein the plurality of macroblocks of an intra-coded picture are spread among a plurality of the frames, the method comprising:
  • a tune-in stream comprising a plurality of tune-in pictures, wherein a tune-in picture, is provided for a frame of the encoded video stream that comprises an intra-coded macroblock of an intra-coded picture
  • tune-in picture comprises the remaining intra-coded macroblocks of the intra-coded picture.
  • embodiments of the invention provides an encoder for providing a tune-in stream for an encoded video stream of a plurality of intra-coded pictures and a plurality of inter- coded pictures, each picture comprising a plurality of macroblocks, the encoded video stream comprising a plurality of frames, wherein the plurality of macroblocks of an intra- coded picture are spread among a plurality of the frames, wherein the encoder is configured to provide a tune-in stream comprising a plurality of tune-in pictures, wherein a tune-in picture, is provided for a frame of the encoded video stream that comprises an intra-coded macroblock of an intra- coded picture,
  • tune-in picture comprises the remaining intra-coded macroblocks of the intra-coded picture.
  • embodiments of the invention provide a method for tuning into an encoded video stream, the method comprising:
  • a tune-in stream comprising a plurality of tune-in pictures, wherein a tune-in picture, is provided for a frame of the encoded video stream that comprises an intra-coded macroblock of an intra-coded picture, wherein the tune-in picture comprises the remaining intra-coded macroblocks of the intra-coded picture;
  • embodiments of the invention provide a decoder for receiving encoded data and providing decoded output data, the encoder comprising:
  • a decoding portion coupled to the input and configured to tune into the encoded video stream and the tune-in stream upon receiving a tune-in request, upon receiving a tune-in picture in the tune-in stream, to splice the tune-in picture and the encoded video stream, and to decode the spliced stream.
  • Embodiments of the invention provide a computer readable medium for storing instructions for executing the methods in accordance with embodiments of the invention.
  • the invention provides a novel approach for tuning into a main stream, e.g. for a fast channel change, based on tune-in streams.
  • Embodiments of the invention are based on a gradual decoder refresh (GDR) in accordance with H.264 for the main stream (the differentially encoded video stream) and for the tune-in stream.
  • GDR gradual decoder refresh
  • a receiver would have to wait for several received NAL units of multiple frames before it may render a complete frame without visible GDR artifacts.
  • An additional tune-in stream is provided that fills the missing regions of a partially complete frame.
  • the simulation results discussed later show that the solution is at least as bitrate-efficient as previous tune- in solutions but offers the advantages of a less variably encoded bitrate and a consistent gradual improvement of picture quality.
  • Embodiments of the invention are advantageous as the random-access point (RAP) acquisition time is greatly improved using a side-stream. In addition a reduction in decoder pre-buffer size requirements is obtained.
  • Embodiments of the invention concern the combination of the tune-in picture technology and best-effort gradual decoder refresh.
  • a GDR stream as described above with regard to Fig. 4 is used for the main stream und in addition a tune-in stream is provided that comprises the missing slices needed for tune-in with a complete I-picture.
  • This overcomes the drawback "dirty random access" for best-effort GDR streams and conserves the advantageous properties of GDR streams.
  • This approach is hereafter referred to also as Gradual Tune-In Pictures (G-TIP).
  • G-TIP Gradual Tune-In Pictures
  • the best-effort GDR is considered as a viable alternative encoding variant for fast tune-in due to its property of equally distributed RAPs though the decoder output is a partially incorrect image initially.
  • this structure brings the advantage of a relatively equally distributed bitrate which facilitates the signaling of a level with a smaller buffer requirement and hence a lower initial pre-buffering delay.
  • This aspect is also important as the buffer requirements are also a key factor of tune-in delay.
  • the encoded video stream and the tune-in stream are provided to a receiver directly, for example from a service provider, or via a server from which the respective streams can be obtained upon user request.
  • Embodiments of the invention teach a method in accordance with which each frame of the encoded video stream comprises a plurality of partitions, wherein frames comprising an intra-coded macroblock only comprise a single inter-coded macroblock in one of its partitions, and the remaining intra-coded macroblock for the inter-coded picture are comprised within respective following frames.
  • the intra-coded macroblocks are provided in consecutive frames or in frames having there between one or more frames without intra-coded macroblocks.
  • the encoded video stream comprises a plurality of groups of pictures (GOPs), each of which comprises at least one intra-coded picture and one or more inter-coded pictures, wherein each GOP comprises a plurality of frames, and wherein the intra-coded macroblocks are spread among the plurality of frames of the GOP.
  • the intra-coded macroblocks are spread evenly among the plurality of frames of the GOP.
  • Fig. 1 shows a conventional GOP structure of a differentially encoded video stream
  • Fig. 2 shows an example of a channel change on the basis of tune-in pictures
  • Fig. 3 shows a GDR-structure of a differentially encoded video stream
  • Fig. 4 shows a GDR-structure for best effort decoding
  • Fig. 5 illustrates an approach for tuning into a main stream using gradual tune-in pictures in accordance with an embodiment of the invention
  • Fig. 6 shows an encoder/decoder set up for illustrating a system using the inventive approach for fast tune-in;
  • Figs. 7-10 show the tune-in PSNR traces for different video sequences.
  • a novel approach for fast tuning into a main stream for example, for a fast channel change based on a tune-in stream.
  • the novel approach is based on the combination of the above described tune-in picture technology and the gradual decoder refresh.
  • a GDR stream is used as the main stream and in addition a tune-in stream is provided comprising a plurality of tune-in pictures, wherein the tune-in pictures comprise the missing slices needed for tune-in with a complete picture.
  • Fig. 5 shows an embodiment of the invention using a combination of a tune-in picture technology and the best effort decoding GDR stream.
  • Fig. 5 shows at 200 the main stream or video encoded stream.
  • This video encoded stream comprises a plurality of frames 1 to 7 of which frames 1 to 6 form a first group of pictures GOPi and with frame 7 a new group of pictures GOP 2 starts.
  • Each of the frames comprises three slices a, b and c and the intra- coded macroblocks for an I-picture of the group of pictures GOPj are evenly distributed across the frames of group GOP 1 , i.e. are evenly distributed among the plurality of frames 1 to 6.
  • the intra-coded macroblocks for the I-picture of the group GOP 1 are provided in slice a of frame 1, in slice b of frame 3 and slice c of frame 5.
  • Fig. 5 also shows the tune-in stream 202 in accordance with the invention.
  • the tune-in stream comprises a plurality of tune-in pictures 202i to 202 4 which are provided with a predefined interval.
  • the tune-in pictures 202i to 202 4 are provided for every second frame of the main stream 200, i.e. at frame positions 1, 3, 5, 7.
  • the tune-in pictures 202 ⁇ to 202 4 have a similar structure as the frames 1 to 7 of the main stream 200, i.e. each tune-in picture comprises three slices a, b, and c.
  • the tune-in pictures 202 1 to 202 4 are provided for those frames in the main stream which include an I-macroblock, i.e. in the embodiment of Fig. 5 for frames 1, 3, 5 and 7.
  • the tune-in pictures further comprise those I-macroblocks which are missing from the associated frame in the main stream, i.e. these tune-in pictures comprise the "remaining" I-macroblocks.
  • the tune-in picture 2021 associated with frame 1 of the main stream 200 comprises the I-macroblocks that are present in the main stream in frame 3 at slice b and in frame 5 at slice c.
  • tune-in picture 202 2 comprises those I-macroblocks missing from main stream frame 3, namely the I-macroblock from slice a of frame 1 of the main stream and the I- macroblock of slice c of frame 5 of the main stream 200. The same is true for tune-in pictures 202 3 and 202 4 .
  • Fig. 5 shows the spliced stream 204.
  • the tune-in stream 202 is obtained and the main stream 200 is obtained, however, at frame position 2 no decoding or splicing can occur as no inter-coded information is available here.
  • the spliced stream comprises a complete I-frame that allows starting decoding the main stream.
  • Fig. 5 shows an embodiment in accordance with which the I-macroblocks are evenly distributed among the plurality of frames of the GOP 1
  • the inventive approach is also applicable to main streams having a different distribution of the I- macroblocks among the frames within a GOP 1
  • the inventive approach may also be applied to a GDR structure as shown in Fig. 3.
  • the tune-in stream would comprise three consecutive tune-in pictures having a structure as pictures 202 ⁇ to 202 3 .
  • Fig. 5 shows an embodiment in accordance with which the I-macroblocks are provided such that in a first frame of a GOP the macroblock for a first slice, then a macroblock for second slice and then an I-macroblock for a third slice is provided.
  • the invention is not limited to such an approach, rather, the order in which the I-macroblocks are provided with regard to the slice position is arbitrary as long as the associated tune-in pictures provide for the remaining (missing) I-macroblocks of the associated main stream frame.
  • frame 1 may comprise the I-macroblock also in slice b or in slice c and in the remaining frames 3 and 5 the macroblocks of the other slices would be provided.
  • Fig. 5 shows an embodiment of frames having three slices it is noted that the invention is not limited to such an approach, rather a plurality of slices should be provided, i.e. two or more slices, for example five slices as we will discuss below with regard to experimental results obtained by applying the inventive tune-in approach to test scenes.
  • Fig. 6 is a schematical representation of a system comprising an encoder 300 and a decoder 400.
  • the encoder 300 comprises an input 302 receiving video information to be encoded, for example video information in the form of YUV-data. This data is provided to a main encoder 304 and, in parallel, to a tune-in encoder 306.
  • the main encoder 304 provides the encoded video stream or main stream in a manner as described above with regard to Figs. 3, 4 and 5, and the tune-in encoder provides the tune-in stream in accordance with embodiments of the invention, for example in a manner as described above with regard to Fig. 5 by providing tune-in pictures having those I-macroblocks missing from an associated main frame block.
  • the encoder 300 comprises an output for providing both main stream and the tune-in stream together, for example to a communication network or the like, as it is schematically illustrated by the arrows leaving the blocks labeled main encoder and tune-in encoder.
  • the decoder 400 comprises a splice portion 402 and a decoder portion 404.
  • An input of the decoder 400 receives the combined mainstream and tune-in stream and inputs same to the splice portion 402 which operates upon receiving a tune-in request in a manner as described above with regard to Fig. 5.
  • the spliced stream 204 shown in Fig. 5 is applied from the splice portion 402 to the decoder portion 404 so that the decoder 400 provides at its output 406 the decoded video stream.
  • a set of sequences are encoded into a main stream and tune-in stream.
  • the resulting frames are spliced and decoded and the objective quality for the tune-in period is measured with luminance PSNR.
  • the tune-in period of the spliced stream is defined as the period starting at the first decoded frame until the frame where the quality has stabilized, i.e. a complete refresh from the main stream has occurred for the G-TIP and the next complete I-frame for the TIP.
  • the above described approach is used for both the TIP and G-TIP scenario.
  • the first sequence is a slow moving scene of leafs on a tree blowing in the wind (Aspen).
  • Sequence two (Red Kayak) is a part of a kayak in Whitewater.
  • the third sequence is a static clip of a snow covered mountain side surrounded by slow moving clouds (SnowMnt).
  • the final clip is an American football kickoff (TouchdownPass) which includes moving players and some accelerating panning. All sequences are 100 frames long and have a resolution of 1280 X 720 pixels at 25 frames per second.
  • the main stream for TIP and G-TIP both use a fixed quantizer parameter of 24 and the number of slices per frame is fixed to five.
  • the I-frame distance in the TIP encodings is set to 25 frames equaling one complete IDR frame per second.
  • the encoding of the G-TIP streams is chosen such that the number of I-slices per second is identical to the TIP streams.
  • Five intra-coded slices are distributed in a way similar to Fig. 4, i.e. each 5 th frame contains one I-slice and four P-slices.
  • the tune-in pictures used for G-TIP and TIP are encoded with a low quality, using the fixed quantizer parameter 45.
  • the encodings of all G-TIP and TIP sequences are done with the JM 14.2 encoder, which was slightly modified for G-TIP encoding where applicable.
  • the JM H.264 reference software (see Karsten S ⁇ hring, "IP Homepage - H.264/AVC JM Reference Software", Dec 2008, http://iphone.hhi.de/suehring.tml/) was modified for GDR and several encodings of publicly available test sequences in HD resolution was made as discussed below. As will be seen in the following discussion, the bitrate overhead for similar quality was below 0.5% in the investigated sequences.
  • the scenarios consist of the four sequences with both G-TIP and TIP tune-in.
  • the tune-in position is set to frame 30 with respect to the main stream. This tune-in frame number is used for all the investigated scenarios.
  • frame 30 to 49 represent the transition period. From frame 50 onwards only the main stream is decoded.
  • the tune-in picture and the main-stream was spliced offline and fed to several decoders that all were able to decode the spliced stream.
  • the longest possible transition period was chosen, as this is the worst-case scenario for TIP, which is for the investigated scenario a tune-in 20 frames before the next I- frame of the main stream.
  • Figs. 7-10 shows the luminance PSNR plots for G-TIP and TIP approaches for the four investigated sequences.
  • the main difference between the results for the different sequences is caused by the selection of the actual scenes. This is most noticeable when comparing the results of the SnowMnt (Fig. 9) and Red Kayak (Fig. 8) sequences.
  • SnowMnt is a very static scene with almost no movement with a high percentage of predicted macroblocks.
  • the Red Kayak sequence on the other hand, includes a scene with a lot of movement and fast moving details which requires a high percentage of the macroblocks to be intra-coded which means that the initial prediction error caused by the tune-in stream is quickly corrected.
  • Note the "staircase" effect in the SnowMnt and Touchdown Pass sequences This is the result of gradual introduction of the intra-coded slices in the main stream which clearly shows that the next level in quality is reached with the next GDR I-slice.
  • G-TIP provides a gain compared to TIP ranging from 0.47-4.IdB average frame PSNR difference for the investigated scenarios, Table I shows the mean frame PSNR difference values for the tune-in period.
  • TIP performance is coupled to the transition period length, i.e. the number of frames to the next I-frame in the main stream. Further, an informal subjective test also indicated that G- TIP is superior to TIP.
  • Embodiments of the invention provide tune-in streams and gradual decoder refresh to enable fast tune-in for IPTV.
  • Tune-in pictures are an easily applicable technique for a fast tune-in solution.
  • Even for TIPs received close to an I-frame there will be an easily visible steep quality jump with the single exception of an I-frame of the main stream, that is received right from the beginning.
  • Gradual decoder refresh with best-effort decoding on the other hand is advantageous due to its less variably encoded bitrate property in combination with a high number of possible random-access points.
  • it suffers from visible artifacts due to missing reference pictures.
  • embodiments of the invention provide the combination of tune-in streams and gradual decoder refresh.
  • the quality jump is reduced by a gradually refreshing main stream and a corresponding tune-in stream that provides intra-coded slices where normally mid-level gray would be assumed.
  • the above discussed results show that the bitrate overhead of gradual decoder refresh in general is negligible for high resolutions.
  • the additional bitrate needed for the tune-in stream is comparable to the bitrate of a tune-in picture based stream and offers higher quality during the tune-in period. It also has the advantage that the transition from the tune-in stream to the main stream is predictable in terms of quality improvement Tor any channel change event.
  • Embodiments of the invention concern approached for tuning into a stream which comprises as a main stream an encoded video stream and a tune-in stream, wherein the stream may be a single stream which is provided to a user, e.g. over a network, like the Internet.
  • the stream containing e.g. a video content may be provided by a service provider such that a user may tune into the stream at any time.
  • the stream is obtained by a user on the user's demand, e.g. from a service provider.
  • the stream (e.g. video on demand) is received by the user and when tuning into the stream decoding of the stream starts after obtaining the tune-in picture from the tune-in stream and splicing the main stream and the tune-in picture.
  • the encoded video stream and the tune-in stream are associated with a channel of a multi-channel transmission system, and a tune-in request indicates a change from a current channel of the multi-channel transmission system to a new channel of the multi-channel transmission system.
  • the self-contained blocks and the non-self-contained blocks of the streams were named as I-pictures and P- or B-pictures, respectively.
  • aspects of the invention were described in the context of an apparatus, it is noted that these aspects also represent a description of the corresponding method, i.e., a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.
  • the combined signal comprising the encoded video stream and the tune-in stream may be stored on a digital storage medium or may be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
  • embodiments of the invention may be implemented in hardware or in software.
  • the implementation may be performed using a digital storage medium, for example a floppy disk, a DVD, a CD, a ROM, a PROM, an
  • EPROM an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed.
  • Other embodiments of the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
  • embodiments of the invention may be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer.
  • the program code may for example be stored on a machine readable carrier.
  • Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
  • a further embodiment of the invention is a data stream or a sequence of signals representing the computer program for performing one of the methods described herein.
  • the data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
  • a further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
  • a processing means for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.

Abstract

For an encoded video stream (200) a tune-in stream (202) is provided. The encoded video stream comprises a plurality of intra-coded pictures and a plurality of inter-coded pictures, wherein each picture comprises a plurality of macroblocks. The encoded video stream (200) comprises a plurality of frames, and the plurality of macroblocks of an intra-coded picture are spread among a plurality of the frames. The tune-in stream is provided and comprises a plurality of tune-in pictures (20212024), wherein a tune-in picture (2021-2024) is provided for a frame of the encoded video stream (200) that comprises an intra-coded macroblock of an intra-coded picture, wherein the tune-in picture (2021-2024) comprises the remaining intra-coded macroblocks of the intra-coded picture. Also, an encoder for providing such a tune-in stream as well as a method and a decoder for tuning into an encoded video stream are described.

Description

Method and Encoder for Providing a Tune-In Stream for an Encoded Video Stream and Method and Decoder for Tuning into an Encoded Video Stream
Description
Embodiments of the invention relate to the field of providing tune-in streams for allowing a fast tune-in into an encoded video stream, for example for allowing a fast channel change based on a tune-in stream. More specifically, embodiments of the invention relate to a method for providing a tune-in stream for an encoded video stream and an associated encoder as well as to a method for tuning into an encoded video stream and an associated decoder. More specifically, embodiments of the invention provide gradual tune-in pictures for a fast tune-in or channel change.
In the old days of analogue TV, channel change was instantaneous and not an issue. With the advent of the current digital TV broadcast systems and especially newer IPTV multicast systems much higher tune-in times in the range of seconds were observed which lead to degraded user experience (see e.g. H. Fuchs and N. Farber. "Optimizing channel change time in IPTV applications", 2008 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting, April 2008). Several solutions - mainly in industry- driven standardization bodies and to a lesser extent in the science community - were proposed. The solutions range from video coding improvements to feedback-based and server-based solutions. A good overview is provided by H. Fuchs and N. Farber. "Optimizing channel change time in IPTV applications", 2008 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting, April 2008, who also briefly describe the delay contributors that dictate the tune-in time into a new channel. The most significant and ubiquitous source for delay is inherently caused by today's digital video coding schemes that do not allow to start decoding at any point in time.
Current digital video coding schemes like MPEG-2 video and H.264 are based on differential video coding. In these schemes coding efficiency is substantially increased through predictive coding (in so-called P-frames and B-frames) exploiting the temporal correlation between consecutive frames in a video sequence. P-frames use only backward- prediction from previous frames whereas B-frames may additionally use forward- prediction from subsequent frames. I-frames only depend on data inside the frame and enable decoding at the start of a sequence. Each frame consists of macroblocks, whereas an I-frame contains intra-coded macroblocks only. P and B-frames are mostly consisting of inter-coded macroblocks, i.e. macroblocks that depend on data of other frames and may also contain intra-coded macroblocks. The frame types of differential video coding do typically have a fixed sequence, which is an I-frame followed by several P and B-frames until the next I-frame. The range from an I-frame up to the next I-frame is called a Group of Pictures (GOP), as illustrated in Fig. 1.
Fig. 1 illustrates a conventional GOP structure of an encoded video stream. As can be seen from Fig. 1, the encoded video stream comprises a plurality of I-pictures (intra-coded pictures) and a plurality of P-pictures (inter-coded pictures). The encoded video stream comprises a plurality of consecutive frames which are labeled in Fig. 1 as frames 1 to 7 and the pictures are included within the frames. The encoded video stream comprises a plurality of groups of pictures (GOPs) each of which includes at least one I-picture and one or more P- or B-pictures. In the example shown in Fig. 1 , the GOP of the encoded stream comprises as a first frame in the GOP the I-frame, i.e. the frame including the intra-coded picture whereas the remaining frames 2 to 6 comprise the inter-coded pictures. At frame 7 a new GOP starts.
P- and B-frames allow also intra-coded macroblocks which may be used to improve the coding gain, i.e. not all macroblocks benefit from inter-frame prediction. Another benefit of intra-coded macroblocks is their positive effect on error resilience since errors due to missing or corrupt frames do only propagate over inter-coded macroblocks. During tune-in into a stream that is transmitted over a bitrate constraint channel, delay is caused by the random-access point acquisition time and the time that is required to fill the decoder input buffer for constant decoding without interrupts.
Random-access into a differentially coded video sequence may only be done at an I-frame since this frame is guaranteed to contain intra-coded macroblocks only. All following frames have potentially dependencies on the I- frame or other previous frames and a best- effort decoding of these frames would result in visible image distortion.
Thus, to reduce the delay caused by waiting for the next I-frame to arrive, the distance between random-access points for tune-in could be reduced, resulting in smaller GOPs. Hence more I-frames are beneficial but on the other hand reduce the coding efficiency, which is mainly accomplished by inter-frame prediction for a majority of the available video data.
Besides the random access-point also a certain amount of data is required in the decoder input buffer. This results from the fact that I-frames may easily require up to ten times more coded bits than P- and B-frames due to the potential coding gain of prediction. This behavior results in a highly variable encoded bitrate that may exceed the available bitrate on the channel during I-frame transmission. Therefore all modern video codecs define a strict buffer model that enables transmission of variable encoded streams on bandwidth constraint channels allowing both encoder and decoder to predict the required amount of data before decoding may be started. However, in order to reduce the buffering delay a more Constant Bitrate (CBR) behavior is desired.
IPTV suffers from slow channel change behavior similarly as digital TV broadcast systems. A disadvantage of IPTV is the high number of additional delay factors that are however of lesser importance than the influence of the video coding. In contrast, the advantage of IPTV is the flexibility by using multicast instead of broadest, higher available bitrates and the possibility of easily reconfigurable systems in the distribution chain. Hence, fast channel change in IPTV may be accomplished not only by reducing the size of a GOP but also by other methods that provide-random-access points and filled decoder buffers in short time.
Providing RAPs (Random Access Points) inside an IPTV video stream at a higher frequency is easy and an intuitive way to partly solve the problem of slow channel changes. However, this approach is a waste of network resources, especially when clients are in a steady state. One solution to this is to provide a side stream that is a different video encoding of the channel containing more frequent random access points than the normal encoding of the video, the main stream. The side stream is only forwarded to the client upon request, i.e. when the client tunes into a new channel. It is used until an RAP in the main stream is received. The client then switches to the main stream and drops the side stream. To minimize the additional bitrate necessary for the side stream, it's quality is reduced, e.g. by reducing the resolution or framerate (see e.g. J. M. Boyce and A. M. Tourapis. "Fast efficient channel change [set-top box applications]" in Consumer Electronics. 2005, ICCE 2005 Digest of Technical Papers, International Conference on, pp. 1-2, Jan. 2005). In addition, the main stream may utilize longer I-frame distances and thus have a higher compression gain.
Another variation of side streams contains I-frames only as RAPs to the main stream and splices one of these with the main stream. This is described in more detail below (see also U. Jennehag, T. Zhang, and S. Pettersson. "Improving Transmission Efficiency in H.264 based IPTV Systems", IEEE Transactions on Broadcasting, vol. 53, no. 1, pp. 69-78, March 2007, and U. Jennehag and S. Pettersson. "On Synchronization Frames for Channel Switching in a GOP-Based IPTV Environment", 5th IEEE Consumer Communications and Networking Conference 2008, pp. 638 -642, Jan 2008). Tune-In Pictures (TIP) are an IPTV tune-in stream technology based on the idea that it is inefficient to send intra-coded frames at a fixed frequency to provide stream resynchronization points. With TIP, a stream consisting of stream RAPs, e.g. I-frames, is separately generated in addition to the main video stream. The main stream consists of a normal GOP structure with a large I-frame distance. Clients who want to decode a video stream, e.g. a TV-channel, must receive both the main video stream and one frame from the TIP stream for an instant RAP. The tune-in picture and the main stream are then spliced, i.e. a P-frame from the main stream is replaced by the corresponding TIP, and decoded. The following example, illustrated in Fig. 2, shows a typical TIP channel switch situation.
Fig. 2 illustrates an example for a channel change using tune-in pictures. Fig. 2 illustrates a first channel A comprising a main stream 100 comprising a plurality of frames including respective inter-coded pictures PA and intra-coded pictures U. In Fig. 2 a time axis indicating times tl to tl3 is shown and during the time period tl to t4 the main stream 100 of channel A provides the frames for a first group of pictures GOPA I - A second group of pictures GOPA2 is provided starting from time instance t5. Further, channel A comprises a tune-in stream 102 which comprises a plurality of intra-coded pictures for the main stream 100 which are spaced evenly with respect to each other, i.e. every three time instances an intra-coded pictures is sent by the tune-in stream 102 of channel A.
Further, Fig. 2 shows the main stream 104 for channel B and the channel B tune-in stream 106. The main stream 104 of channel B comprises a plurality of intra-coded pictures IB and a plurality of inter-coded PB. The main stream 104 of B, in the situation shown in Fig. 2, comprises a first group of pictures GOPB i that starts with an I-frame at time instance tl and that ends with an inter-coded frame at time instance tl2. At time instance tl3 a new group of pictures GOPB2 starts with a new I-frame Iβ. The tune-in stream 106 comprises a plurality of intra-coded frames which are provided at time instances t4, Xl, tlO and tl3. Fig. 2 also shows a spliced stream 108 which is the stream that is presented to a decoder for decoding the encoded main stream. During time instances tl to t4, frames from the main stream 100 of channel A are within the spliced stream and supplied to the decoder for decoding.
The client requests the channel switch at time instance t5 and the main multicast stream of channel A is immediately left at time instance t5. Then, first the TIP stream for channel B is requested at time instance t5. The client waits until traffic arrives and receives the RAP at time instance t7. The TIP stream is left when the complete RAP is received and the main stream of channel B is now requested. The RAP of the TIP stream and the main stream are spliced and decoded at time instance t7 to generate a valid coded video sequence. Residual artifacts from decoder mismatch, i.e. the mismatch of reference pictures and the resulting drift, are removed when the next I-frame is encountered in the main stream.
One inherit problem with the TIP FCC (FCC = Fast Channel Change) approach is the mismatch between the tune-in picture and the corresponding picture in the main stream which produces a prediction error in the decoded stream. In addition, an easily visible jump in quality may also be observed when the quality stabilizes (see U. Jennehag and S. Pettersson. "On Synchronization Frames for Channel Switching in a GOP-Based IPTV Environment", 5th IEEE Consumer Communications and Networking Conference 2008, pp. 638 -642, Jan 2008). This jump in quality may be reduced by using a tune-in picture with high quality which generates less prediction error. However, such approach requires a higher bitrate and reduces the overall performance of the system. The tune-in pictures do not impose any changes in the main stream which will typically still be coded with unwanted peaks in the bitrate.
Another tune-in approach is based on a gradual decoder refresh. The common principles of differential video coding are well known and reference is now made to a rarely used method, where intra-coded macroblocks are spread over differentially coded frames. Typical applications for this technique are intra-coded macroblocks for improved error- resilience, i.e. errors in motion prediction do not propagate (see E. Steinbach, N. Farber. und B. Girod. "Standard Compatible Extension of H263 for Robust Video Transmission", IEEE Transactions on Circuits and Systems for Video Technology, vol. 7, no. 6, pp. 872- 881, Dec. 1997) and the smoothing of I-frame bitrate peaks, which means that a frame is divided into several pieces and these pieces are updated in an interleaved manner as illustrated in Fig. 3. These pieces represent a partition of the frame and are called slices in H.264.
Fig. 3 illustrates, on the basis of the conventional GOP structure shown in Fig. 1 a gradually refreshing structure. In a similar manner as in Fig. 1, for the encoded video stream the frames 1 to 7 are shown. In accordance with the gradually refreshing structure each frame 1 to 7 comprises a plurality of partitions or slices a, b and c. In Fig. 3, frames 1 to 6 form one group of pictures and frame 7 indicates the start of a subsequent group of pictures. The encoded video stream comprises for each I-picture a plurality of intra-coded macroblocks and for each P-frame or B-frame a plurality of inter-coded macroblocks. In the example shown in Fig. 3, it is assumed that each frame comprises three macroblocks. The three macroblocks of the I-picture, other than in Fig. 1 , are not provided in a single frame, like frame 1 of Fig. 1, rather the intra-coded or I-macroblocks of the I-picture are distributed or spread among a plurality of frames. In the example of Fig. 3, a first I- macroblock or I-slice of the I-picture for the group of pictures comprising frames 1 to 6 is provided in slice a of frame 1. The second I-macroblock is provided in slice b of frame 2, and the third I-macroblock is provided in slice c of frame 3.
The bitrate smoothing is accomplished by using a method which is known as independent Segment decoding (ISD) with shifted intra-coded slices in H.263 (see ITU-T Rec. H.263, "Infrastructure of audiovisual services -Coding of moving video: Video coding for low bit rate communication", International Telecommunication Union, Jan 2005) or gradual decoder-refresh (GDR) in H.264.
In the conventional GDR sequence of Fig. 3 the frames are divided into three slices. A GDR sequence starts typically with one I-slice (intra-coded slice) and inter-coded slices which are only using forward prediction up to the next I-slice for the same image partition in the following two frames. In this example, the bits which would be necessary for the first frame if coded as an I-frame are distributed over frames one to three. However, GDR reduces coding efficiency for smaller frame sizes resulting in small slices, because the prediction choices are reduced significantly at slice boundaries. GDR is especially advantageous for low-delay video coding, where it is important that even over a bitrate limited channel the arrival time is predictable and also the required buffer sizes are to be small.
A first method for tune-in into a GDR requires starting decoding at the 1st or 7th frame in the example of Fig. 3 for an initially correct picture. An additional parameter has to be signaled, the pre-roll period, that describes how many frames after a random-access point have to be decoded for assured decoding, i.e. perfect reconstruction is achieved (see G. Sullivan. "On Random Access and Bitstream Format for NT Video", Joint Video Team (NT) of ISO/IEC MPEG 4ITU-T VCEG, NT-8063, Jun 2002). In the case of the example of Fig. 3 a pre-roll period of three frames has to be used.
The second method for tune-in into a GDR stream is known as best-effort decoding (see G. Sullivan. "On Random Access and Bitstream Format for NT Video", Joint Video Team (NT) of ISO/IEC MPEG 4ITU-T VCEG, NT-8063, Jun 2002) and illustrated in Fig. 4.
Fig. 4 describes a GDR structure for a best effort decoding. In a similar manner as Fig. 3, each frame comprises a plurality of slices a, b, and c. Other than in Fig. 3, in the structure for the best effort decoding the I-macroblocks for an I-picture of a GOP are distributed in frames which are separated from each other by at least one frame. More specifically, the first I-macroblock is provided in slice a of frame 1 , the second I-macroblock is provided in slice b of frame 3 and the third I-macroblock is provided in slice c of frame 5. Preferably, as is shown in Fig. 1 , the I-slices or I-macroblocks are evenly distributed among the frames of a GOP.
Decoding may start at any frame containing an I-slice and missing referenced macroblocks are initialized with either a certain color (e.g. mid-level gray Y=Cb=Cr= 128) or any other defined value. It is assumed that both intra-coded macroblocks follow soon and enough redundancy is still available in following inter-coded macroblocks so that the image quality converges after some frames. However, the fact that all but one slice are initialized with mid-level gray, leads to "dirty random access", which is an essential shortcoming prohibiting the introduction of GDR in high-quality IPTV.
It is an object of the invention to provide an improved approach allowing fast tuning into a differentially encoded video stream.
This object is solved by a method of claim 1, by an encoder of claim 6, by a method of claim 7, and by a decoder of claim 11.
Embodiments of the invention provide a method for providing a tune-in stream for an encoded video stream of a plurality of intra-coded pictures and a plurality of inter-coded pictures, each picture comprising a plurality of macroblocks, the encoded video stream comprising a plurality of frames, wherein the plurality of macroblocks of an intra-coded picture are spread among a plurality of the frames, the method comprising:
providing a tune-in stream comprising a plurality of tune-in pictures, wherein a tune-in picture, is provided for a frame of the encoded video stream that comprises an intra-coded macroblock of an intra-coded picture,
wherein the tune-in picture comprises the remaining intra-coded macroblocks of the intra-coded picture.
Further, embodiments of the invention provides an encoder for providing a tune-in stream for an encoded video stream of a plurality of intra-coded pictures and a plurality of inter- coded pictures, each picture comprising a plurality of macroblocks, the encoded video stream comprising a plurality of frames, wherein the plurality of macroblocks of an intra- coded picture are spread among a plurality of the frames, wherein the encoder is configured to provide a tune-in stream comprising a plurality of tune-in pictures, wherein a tune-in picture, is provided for a frame of the encoded video stream that comprises an intra-coded macroblock of an intra- coded picture,
wherein the tune-in picture comprises the remaining intra-coded macroblocks of the intra-coded picture.
Further, embodiments of the invention provide a method for tuning into an encoded video stream, the method comprising:
providing an encoded video stream of a plurality of intra-coded pictures and a plurality of inter-coded pictures, each picture comprising a plurality of macroblocks, the encoded video stream comprising a plurality of frames, wherein the plurality of macroblocks of an intra-coded picture are spread among a plurality of the frames;
providing a tune-in stream comprising a plurality of tune-in pictures, wherein a tune-in picture, is provided for a frame of the encoded video stream that comprises an intra-coded macroblock of an intra-coded picture, wherein the tune-in picture comprises the remaining intra-coded macroblocks of the intra-coded picture;
upon receiving a tune-in request, tuning into the encoded video stream and the tune-in stream;
upon receiving a tune-in picture, in the tune-in stream, splicing the tune-in picture and the encoded video stream.
Further, embodiments of the invention provide a decoder for receiving encoded data and providing decoded output data, the encoder comprising:
an input for receiving an encoded video stream of a plurality of intra-coded pictures and a plurality of inter-coded pictures, each picture comprising a plurality of macroblocks, the encoded video stream comprising a plurality of frames, wherein the plurality of macroblocks of an intra-coded picture are spread among a plurality of the frames, and a tune-in stream comprising a plurality of tune-in pictures, wherein a tune-in picture, is provided for a frame of the encoded video stream that comprises an intra-coded macroblock of an intra-coded picture, wherein the tune-in picture comprises the remaining intra-coded macroblocks of the intra-coded picture; and
a decoding portion coupled to the input and configured to tune into the encoded video stream and the tune-in stream upon receiving a tune-in request, upon receiving a tune-in picture in the tune-in stream, to splice the tune-in picture and the encoded video stream, and to decode the spliced stream.
Embodiments of the invention provide a computer readable medium for storing instructions for executing the methods in accordance with embodiments of the invention.
The invention provides a novel approach for tuning into a main stream, e.g. for a fast channel change, based on tune-in streams. Embodiments of the invention are based on a gradual decoder refresh (GDR) in accordance with H.264 for the main stream (the differentially encoded video stream) and for the tune-in stream. In normal GDR mode a receiver would have to wait for several received NAL units of multiple frames before it may render a complete frame without visible GDR artifacts. An additional tune-in stream is provided that fills the missing regions of a partially complete frame. The simulation results discussed later show that the solution is at least as bitrate-efficient as previous tune- in solutions but offers the advantages of a less variably encoded bitrate and a consistent gradual improvement of picture quality.
Embodiments of the invention are advantageous as the random-access point (RAP) acquisition time is greatly improved using a side-stream. In addition a reduction in decoder pre-buffer size requirements is obtained.
Embodiments of the invention concern the combination of the tune-in picture technology and best-effort gradual decoder refresh. A GDR stream as described above with regard to Fig. 4 is used for the main stream und in addition a tune-in stream is provided that comprises the missing slices needed for tune-in with a complete I-picture. This overcomes the drawback "dirty random access" for best-effort GDR streams and conserves the advantageous properties of GDR streams. This approach is hereafter referred to also as Gradual Tune-In Pictures (G-TIP). For this embodiment, the best-effort GDR is considered as a viable alternative encoding variant for fast tune-in due to its property of equally distributed RAPs though the decoder output is a partially incorrect image initially. In addition this structure brings the advantage of a relatively equally distributed bitrate which facilitates the signaling of a level with a smaller buffer requirement and hence a lower initial pre-buffering delay. This aspect is also important as the buffer requirements are also a key factor of tune-in delay.
In accordance with embodiments of the invention the encoded video stream and the tune-in stream are provided to a receiver directly, for example from a service provider, or via a server from which the respective streams can be obtained upon user request. Embodiments of the invention teach a method in accordance with which each frame of the encoded video stream comprises a plurality of partitions, wherein frames comprising an intra-coded macroblock only comprise a single inter-coded macroblock in one of its partitions, and the remaining intra-coded macroblock for the inter-coded picture are comprised within respective following frames. In accordance with embodiments the intra-coded macroblocks are provided in consecutive frames or in frames having there between one or more frames without intra-coded macroblocks.
In accordance with embodiments of the invention, the encoded video stream comprises a plurality of groups of pictures (GOPs), each of which comprises at least one intra-coded picture and one or more inter-coded pictures, wherein each GOP comprises a plurality of frames, and wherein the intra-coded macroblocks are spread among the plurality of frames of the GOP. In accordance with an embodiment, the intra-coded macroblocks are spread evenly among the plurality of frames of the GOP.
In the following, embodiments of the invention will be described in further detail on the basis of the accompanying drawings, in which:
Fig. 1 shows a conventional GOP structure of a differentially encoded video stream;
Fig. 2 shows an example of a channel change on the basis of tune-in pictures;
Fig. 3 shows a GDR-structure of a differentially encoded video stream;
Fig. 4 shows a GDR-structure for best effort decoding;
Fig. 5 illustrates an approach for tuning into a main stream using gradual tune-in pictures in accordance with an embodiment of the invention; Fig. 6 shows an encoder/decoder set up for illustrating a system using the inventive approach for fast tune-in; and
Figs. 7-10 show the tune-in PSNR traces for different video sequences.
In accordance with embodiments of the invention, a novel approach for fast tuning into a main stream, for example, for a fast channel change based on a tune-in stream, is provided. The novel approach is based on the combination of the above described tune-in picture technology and the gradual decoder refresh. A GDR stream is used as the main stream and in addition a tune-in stream is provided comprising a plurality of tune-in pictures, wherein the tune-in pictures comprise the missing slices needed for tune-in with a complete picture. Fig. 5 shows an embodiment of the invention using a combination of a tune-in picture technology and the best effort decoding GDR stream. Fig. 5 shows at 200 the main stream or video encoded stream. This video encoded stream comprises a plurality of frames 1 to 7 of which frames 1 to 6 form a first group of pictures GOPi and with frame 7 a new group of pictures GOP2 starts. Each of the frames comprises three slices a, b and c and the intra- coded macroblocks for an I-picture of the group of pictures GOPj are evenly distributed across the frames of group GOP1, i.e. are evenly distributed among the plurality of frames 1 to 6. In the embodiment shown in Fig. 5 the intra-coded macroblocks for the I-picture of the group GOP1 are provided in slice a of frame 1, in slice b of frame 3 and slice c of frame 5.
Fig. 5 also shows the tune-in stream 202 in accordance with the invention. The tune-in stream comprises a plurality of tune-in pictures 202i to 2024 which are provided with a predefined interval. In the example shown in Fig. 5 the tune-in pictures 202i to 2024 are provided for every second frame of the main stream 200, i.e. at frame positions 1, 3, 5, 7. The tune-in pictures 202 \ to 2024 have a similar structure as the frames 1 to 7 of the main stream 200, i.e. each tune-in picture comprises three slices a, b, and c. In accordance with embodiments of the invention, the tune-in pictures 2021 to 2024 are provided for those frames in the main stream which include an I-macroblock, i.e. in the embodiment of Fig. 5 for frames 1, 3, 5 and 7. The tune-in pictures further comprise those I-macroblocks which are missing from the associated frame in the main stream, i.e. these tune-in pictures comprise the "remaining" I-macroblocks. To be more specific, the tune-in picture 2021 associated with frame 1 of the main stream 200 comprises the I-macroblocks that are present in the main stream in frame 3 at slice b and in frame 5 at slice c. In a similar manner, tune-in picture 2022 comprises those I-macroblocks missing from main stream frame 3, namely the I-macroblock from slice a of frame 1 of the main stream and the I- macroblock of slice c of frame 5 of the main stream 200. The same is true for tune-in pictures 2023 and 2024.
Further, Fig. 5 shows the spliced stream 204. In a similar manner as described above with regard to Fig. 2, upon requesting the tune-in into the main stream, for example at the position where frame 2 is presented, the tune-in stream 202 is obtained and the main stream 200 is obtained, however, at frame position 2 no decoding or splicing can occur as no inter-coded information is available here. However, at frame position 3 by splicing the tune-in picture 2022 and frame 3 from the main stream 200 the spliced stream comprises a complete I-frame that allows starting decoding the main stream.
While Fig. 5 shows an embodiment in accordance with which the I-macroblocks are evenly distributed among the plurality of frames of the GOP1, it is noted that the inventive approach is also applicable to main streams having a different distribution of the I- macroblocks among the frames within a GOP1, for example the inventive approach may also be applied to a GDR structure as shown in Fig. 3. In such a situation, the tune-in stream would comprise three consecutive tune-in pictures having a structure as pictures 202 \ to 2023. Also, it is possible to distribute the I-macroblocks among the plurality of frames with greater distances there between or with different distances there between, i.e. the number of frames between two successive I-macroblocks may vary.
Further, Fig. 5 shows an embodiment in accordance with which the I-macroblocks are provided such that in a first frame of a GOP the macroblock for a first slice, then a macroblock for second slice and then an I-macroblock for a third slice is provided. The invention is not limited to such an approach, rather, the order in which the I-macroblocks are provided with regard to the slice position is arbitrary as long as the associated tune-in pictures provide for the remaining (missing) I-macroblocks of the associated main stream frame. For example, in Fig. 5 frame 1 may comprise the I-macroblock also in slice b or in slice c and in the remaining frames 3 and 5 the macroblocks of the other slices would be provided.
Further, while Fig. 5 shows an embodiment of frames having three slices it is noted that the invention is not limited to such an approach, rather a plurality of slices should be provided, i.e. two or more slices, for example five slices as we will discuss below with regard to experimental results obtained by applying the inventive tune-in approach to test scenes.
Fig. 6 is a schematical representation of a system comprising an encoder 300 and a decoder 400. The encoder 300 comprises an input 302 receiving video information to be encoded, for example video information in the form of YUV-data. This data is provided to a main encoder 304 and, in parallel, to a tune-in encoder 306. The main encoder 304 provides the encoded video stream or main stream in a manner as described above with regard to Figs. 3, 4 and 5, and the tune-in encoder provides the tune-in stream in accordance with embodiments of the invention, for example in a manner as described above with regard to Fig. 5 by providing tune-in pictures having those I-macroblocks missing from an associated main frame block. The encoder 300 comprises an output for providing both main stream and the tune-in stream together, for example to a communication network or the like, as it is schematically illustrated by the arrows leaving the blocks labeled main encoder and tune-in encoder.
The decoder 400 comprises a splice portion 402 and a decoder portion 404. An input of the decoder 400 receives the combined mainstream and tune-in stream and inputs same to the splice portion 402 which operates upon receiving a tune-in request in a manner as described above with regard to Fig. 5. The spliced stream 204 shown in Fig. 5 is applied from the splice portion 402 to the decoder portion 404 so that the decoder 400 provides at its output 406 the decoded video stream.
To demonstrate the advantages of the invention over conventional approaches an experimental set up similar to the one in Fig. 6 was made, and in this experimental set up the encoder 300 and the decoder 400 are directly connected with each other and the output of the decoder portion 404 is fed to a PSNR measurement device 500 (PSNR = peak- signal-noise-ratio). Further, the original input signal 302 is also applied to block 500 via line 502 to evaluate the coding/decoding efficiency. In accordance with the experimental set up a main stream and a tune-in stream in accordance with embodiments of the invention was generated and compared to a main stream and a tune-in stream provided in accordance with conventional approaches (see Fig. 2).
Using the experimental setup described in Fig. 6, a set of sequences are encoded into a main stream and tune-in stream. The resulting frames are spliced and decoded and the objective quality for the tune-in period is measured with luminance PSNR. The tune-in period of the spliced stream is defined as the period starting at the first decoded frame until the frame where the quality has stabilized, i.e. a complete refresh from the main stream has occurred for the G-TIP and the next complete I-frame for the TIP. The above described approach is used for both the TIP and G-TIP scenario.
Four sequences from the NITA/ITS selection where retrieved from the Video Quality Experts Group FTP server to be used. The first sequence is a slow moving scene of leafs on a tree blowing in the wind (Aspen). Sequence two (Red Kayak) is a part of a kayak in Whitewater. The third sequence is a static clip of a snow covered mountain side surrounded by slow moving clouds (SnowMnt). The final clip is an American football kickoff (TouchdownPass) which includes moving players and some accelerating panning. All sequences are 100 frames long and have a resolution of 1280 X 720 pixels at 25 frames per second.
The main stream for TIP and G-TIP both use a fixed quantizer parameter of 24 and the number of slices per frame is fixed to five. In addition the I-frame distance in the TIP encodings is set to 25 frames equaling one complete IDR frame per second. The encoding of the G-TIP streams is chosen such that the number of I-slices per second is identical to the TIP streams. Five intra-coded slices are distributed in a way similar to Fig. 4, i.e. each 5th frame contains one I-slice and four P-slices.
The tune-in pictures used for G-TIP and TIP are encoded with a low quality, using the fixed quantizer parameter 45. The encodings of all G-TIP and TIP sequences are done with the JM 14.2 encoder, which was slightly modified for G-TIP encoding where applicable. The JM H.264 reference software (see Karsten Sϋhring, "IP Homepage - H.264/AVC JM Reference Software", Dec 2008, http://iphone.hhi.de/suehring.tml/) was modified for GDR and several encodings of publicly available test sequences in HD resolution was made as discussed below. As will be seen in the following discussion, the bitrate overhead for similar quality was below 0.5% in the investigated sequences.
Eight different scenarios are investigated for which the tune-in quality is studied. The scenarios consist of the four sequences with both G-TIP and TIP tune-in. The tune-in position is set to frame 30 with respect to the main stream. This tune-in frame number is used for all the investigated scenarios. Hence, frame 30 to 49 represent the transition period. From frame 50 onwards only the main stream is decoded.
For all encodings the tune-in picture and the main-stream was spliced offline and fed to several decoders that all were able to decode the spliced stream.
For better comparison of the TIP and G-TIP tune-in quality, the longest possible transition period was chosen, as this is the worst-case scenario for TIP, which is for the investigated scenario a tune-in 20 frames before the next I- frame of the main stream.
The results from the experiment described above are now discussed. Figs. 7-10 shows the luminance PSNR plots for G-TIP and TIP approaches for the four investigated sequences. The main difference between the results for the different sequences is caused by the selection of the actual scenes. This is most noticeable when comparing the results of the SnowMnt (Fig. 9) and Red Kayak (Fig. 8) sequences. SnowMnt is a very static scene with almost no movement with a high percentage of predicted macroblocks. The Red Kayak sequence on the other hand, includes a scene with a lot of movement and fast moving details which requires a high percentage of the macroblocks to be intra-coded which means that the initial prediction error caused by the tune-in stream is quickly corrected. Note the "staircase" effect in the SnowMnt and Touchdown Pass sequences. This is the result of gradual introduction of the intra-coded slices in the main stream which clearly shows that the next level in quality is reached with the next GDR I-slice.
G-TIP provides a gain compared to TIP ranging from 0.47-4.IdB average frame PSNR difference for the investigated scenarios, Table I shows the mean frame PSNR difference values for the tune-in period.
Figure imgf000016_0001
Table 1 : Mean frame PSNR differences
TIP performance is coupled to the transition period length, i.e. the number of frames to the next I-frame in the main stream. Further, an informal subjective test also indicated that G- TIP is superior to TIP.
Embodiments of the invention provide tune-in streams and gradual decoder refresh to enable fast tune-in for IPTV. Tune-in pictures are an easily applicable technique for a fast tune-in solution. However, there's a steep quality jump from the low-quality tune-in stream to the high-quality main stream. Even for TIPs received close to an I-frame, there will be an easily visible steep quality jump with the single exception of an I-frame of the main stream, that is received right from the beginning. Gradual decoder refresh with best-effort decoding on the other hand is advantageous due to its less variably encoded bitrate property in combination with a high number of possible random-access points. However, it suffers from visible artifacts due to missing reference pictures. To overcome this disadvantage, embodiments of the invention provide the combination of tune-in streams and gradual decoder refresh. The quality jump is reduced by a gradually refreshing main stream and a corresponding tune-in stream that provides intra-coded slices where normally mid-level gray would be assumed. The above discussed results show that the bitrate overhead of gradual decoder refresh in general is negligible for high resolutions. The additional bitrate needed for the tune-in stream is comparable to the bitrate of a tune-in picture based stream and offers higher quality during the tune-in period. It also has the advantage that the transition from the tune-in stream to the main stream is predictable in terms of quality improvement Tor any channel change event.
Embodiments of the invention concern approached for tuning into a stream which comprises as a main stream an encoded video stream and a tune-in stream, wherein the stream may be a single stream which is provided to a user, e.g. over a network, like the Internet. The stream containing e.g. a video content may be provided by a service provider such that a user may tune into the stream at any time. In such a situation, after receiving the tune-in request the stream including both the main stream and the tune-in stream is received by the user. In another embodiment of the invention, the stream is obtained by a user on the user's demand, e.g. from a service provider. The stream (e.g. video on demand) is received by the user and when tuning into the stream decoding of the stream starts after obtaining the tune-in picture from the tune-in stream and splicing the main stream and the tune-in picture.
In accordance with other embodiments of the invention the encoded video stream and the tune-in stream are associated with a channel of a multi-channel transmission system, and a tune-in request indicates a change from a current channel of the multi-channel transmission system to a new channel of the multi-channel transmission system.
In the description of the embodiments of the invention, the self-contained blocks and the non-self-contained blocks of the streams were named as I-pictures and P- or B-pictures, respectively. It is noted, that the term "picture", in general, determines an encoded content that includes data or information that is necessary to decode the content of the block. In case of I-pictures all data or information is included that is necessary to decode the complete content of the block, whereas in case of P- or B-pictures not all information is included that is necessary to decode a complete picture, rather additional information from preceding or following pictures is required.
Although some aspects of the invention were described in the context of an apparatus, it is noted that these aspects also represent a description of the corresponding method, i.e., a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.
The combined signal comprising the encoded video stream and the tune-in stream may be stored on a digital storage medium or may be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
Depending on certain implementation requirements, embodiments of the invention may be implemented in hardware or in software. The implementation may be performed using a digital storage medium, for example a floppy disk, a DVD, a CD, a ROM, a PROM, an
EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Other embodiments of the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Further, embodiments of the invention may be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier. Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
A further embodiment of the invention is a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
Yet a further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
The above described embodiments are merely illustrative for the principles of the invention. It is understood that modifications and variations of the arrangements and the details described herein will be apparent to others skilled in the art. It is the intent, therefore, to be limited only by the scope of the impending patent claims and not by the specific details presented by way of description and explanation of the embodiments herein.

Claims

Claims
1. A method for providing a tune-in stream (202) for an encoded video stream (200) of a plurality of intra-coded pictures and a plurality of inter-coded pictures, each picture comprising a plurality of macroblocks, the encoded video stream (200) comprising a plurality of frames, wherein the plurality of macroblocks of an intra- coded picture are spread among a plurality of the frames, the method comprising:
providing a tune-in stream (202) comprising a plurality of tune-in pictures (2021- 2024), wherein a tune-in picture (202]-2024), is provided for a frame of the encoded video stream (200) that comprises an intra-coded macroblock of an intra-coded picture,
wherein the tune-in picture (202r2024) comprises the remaining intra-coded macroblocks of the intra-coded picture.
2. The method of claim 1, wherein each frame of the encoded video stream (200) comprises a plurality of partitions (a, b, c), wherein frames (1, 3, 5, 7) comprising an intra-coded macroblock of an intra-coded picture comprise a single intra-coded macroblock in one of its partitions (a, b, c), and wherein the remaining intra-coded macroblocks for the intra-coded picture are comprised within respective following frames of the encoded video stream (200).
3. The method of claim 2, wherein the intra-coded macroblocks of the intra-coded picture are provided in consecutive frames of the encoded video stream, or are provided in frames (1, 3, 5, 7) of the encoded video streams (200), having there between one or more frames (2, 4, 6) without intra-coded macroblocks.
4. The method of one of claims 1 to 3, wherein the encoded video stream (200) comprises a plurality of groups of pictures (GOPs), each GOP comprising at least one intra-coded picture and one or more inter-coded pictures, wherein each GOP comprises a plurality of frames (1, 2, 3, 4, 5, 6), and wherein the intra-coded macroblocks of the at least one intra-coded picture are spread among the plurality of frames (1, 2, 3, 4, 5, 6) of the GOP.
5. The method of claim 4, wherein the intra-coded macroblocks are spread evenly among the plurality of frames (1, 2, 3, 4, 5, 6) of the GOP.
6. An encoder for providing a tune-in stream (202) for an encoded video stream (200) of a plurality of intra-coded pictures and a plurality of inter-coded pictures, each picture comprising a plurality of macroblocks, the encoded video stream (200) comprising a plurality of frames, wherein the plurality of macroblocks of an intra- coded picture are spread among a plurality of the frames,
wherein the encoder (300) is configured to provide a tune-in stream (202) comprising a plurality of tune-in pictures (2021 -2024), wherein a tune-in picture (202i-2024), is provided for a frame of the encoded video stream (200) that comprises an intra-coded macroblock of an intra-coded picture,
wherein the tune-in picture (202i-2024) comprises the remaining intra-coded macroblocks of the intra-coded picture.
7. A method for tuning into an encoded video stream (200), the method comprising:
providing an encoded video stream (200) of a plurality of intra-coded pictures and a plurality of inter-coded pictures, each picture comprising a plurality of macroblocks, the encoded video stream (200) comprising a plurality of frames, wherein the plurality of macroblocks of an intra-coded picture are spread among a plurality of the frames;
providing a tune-in stream (202) comprising a plurality of tune-in pictures (2021- 2024), wherein a tune-in picture (202r2024), is provided for a frame of the encoded video stream (200) that comprises an intra-coded macroblock of an intra-coded picture, wherein the tune-in picture (202^2024) comprises the remaining intra- coded macroblocks of the intra-coded picture;
upon receiving a tune-in request, tuning into the encoded video stream (200) and the tune-in stream (202);
upon receiving a tune-in picture, (2022) in the tune-in stream (202), splicing the tune-in picture (2022) and the encoded video stream (200).
8. The method of claim 7, further comprising decoding the spliced stream.
9. The method of claim 7 or 8, wherein the encoded video stream (200) and the tune- in stream (202) are associated with one of a channel of a multi-channel transmission system, wherein the tune-in request indicates a change from a current channel of the multi-channel transmission system to a new channel of the multi-channel transmission system, or
with a stream, wherein the tune-in request initiates an initial tuning into the stream, or
a stream, which is obtained on demand of a user, wherein the tune-in request initiates an initial tuning-in to the stream.
10. The method of one of claims 7 to 9, wherein the encoded video stream (200) and the tune-in stream (202) are provided to a receiver directly or via a server.
11. A decoder for receiving encoded data and providing decoded output data, the encoder comprising:
an input for receiving an encoded video stream (200) of a plurality of intra-coded pictures and a plurality of inter-coded pictures, each picture comprising a plurality of macroblocks, the encoded video stream (200) comprising a plurality of frames, wherein the plurality of macroblocks of an intra-coded picture are spread among a plurality of the frames, and a tune-in stream (202) comprising a plurality of tune-in pictures (202r2024), wherein a tune-in picture (202!-202-O, is provided for a frame of the encoded video stream (200) that comprises an intra-coded macroblock of an intra-coded picture, wherein the tune-in picture (202i-2024) comprises the remaining intra-coded macroblocks of the intra-coded picture; and
a decoding portion (404) coupled to the input and configured to tune into the encoded video stream (200) and the tune-in stream (202) upon receiving a tune-in request, upon receiving a tune-in picture (2022) in the tune-in stream (202), to splice the tune-in picture (2022) and the encoded video stream (200), and to decode the spliced stream.
12. A computer readable medium for storing instructions which, when being executed by a computer, carry out a method of one of the claims 1 to 5 and/or a method of one of claims 7 to 10.
PCT/EP2009/007649 2008-12-19 2009-10-26 Method and encoder for providing a tune- in stream for an encoded video stream and method and decoder for tuning into an encoded video stream WO2010069427A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US13910808P 2008-12-19 2008-12-19
US61/139,108 2008-12-19

Publications (1)

Publication Number Publication Date
WO2010069427A1 true WO2010069427A1 (en) 2010-06-24

Family

ID=41506466

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2009/007649 WO2010069427A1 (en) 2008-12-19 2009-10-26 Method and encoder for providing a tune- in stream for an encoded video stream and method and decoder for tuning into an encoded video stream

Country Status (1)

Country Link
WO (1) WO2010069427A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014163867A3 (en) * 2013-03-13 2015-01-08 Apple Inc. Codec techniques for fast switching
CN105451742A (en) * 2013-01-07 2016-03-30 布莱阿姆青年大学 Methods for reducing cellular proliferation and treating certain diseases
WO2018046705A3 (en) * 2016-09-08 2018-05-24 Koninklijke Kpn N.V. Partial video decoding method, device and system
EP3565246A1 (en) * 2018-05-01 2019-11-06 Agora Lab, Inc. Progressive i-slice reference for packet loss resilient video coding
EP3637764A4 (en) * 2017-05-24 2020-04-29 NTT Electronics Corporation Video coding device and video coding method
WO2020101547A1 (en) * 2018-11-14 2020-05-22 Saab Ab Video data burst control for remote towers
CN112672148A (en) * 2019-10-16 2021-04-16 安讯士有限公司 Video encoding method and video encoder configured to perform the method
WO2021142365A3 (en) * 2020-01-09 2021-09-02 Bytedance Inc. Restrictions on gradual decoding refresh (gdr) unit
EP4013053A1 (en) * 2020-12-14 2022-06-15 Intel Corporation Adaptive quality boosting for low latency video coding
US11700390B2 (en) 2019-12-26 2023-07-11 Bytedance Inc. Profile, tier and layer indication in video coding
US11743505B2 (en) 2019-12-26 2023-08-29 Bytedance Inc. Constraints on signaling of hypothetical reference decoder parameters in video bitstreams
US11812062B2 (en) 2019-12-27 2023-11-07 Bytedance Inc. Syntax for signaling video subpictures
JP7406229B2 (en) 2019-10-28 2023-12-27 株式会社ミラティブ DELIVERY SYSTEM, PROGRAMS AND COMPUTER-READABLE STORAGE MEDIA
US11985357B2 (en) 2022-07-08 2024-05-14 Bytedance Inc. Signalling of the presence of inter-layer reference pictures

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030123546A1 (en) * 2001-12-28 2003-07-03 Emblaze Systems Scalable multi-level video coding
US20040218673A1 (en) * 2002-01-03 2004-11-04 Ru-Shang Wang Transmission of video information
WO2004114668A1 (en) * 2003-06-16 2004-12-29 Thomson Licensing S.A. Decoding method and apparatus enabling fast channel change of compressed video
WO2008136831A1 (en) * 2007-05-04 2008-11-13 Qualcomm Incorporated Digital multimedia channel switching

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030123546A1 (en) * 2001-12-28 2003-07-03 Emblaze Systems Scalable multi-level video coding
US20040218673A1 (en) * 2002-01-03 2004-11-04 Ru-Shang Wang Transmission of video information
WO2004114668A1 (en) * 2003-06-16 2004-12-29 Thomson Licensing S.A. Decoding method and apparatus enabling fast channel change of compressed video
WO2008136831A1 (en) * 2007-05-04 2008-11-13 Qualcomm Incorporated Digital multimedia channel switching

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
HANNUSKELA M: "Sync Pictures", JOINT VIDEO TEAM (JVT) H264, no. JVT-D101, 26 July 2002 (2002-07-26), pages 1 - 8, XP040418367 *
KARCZEWICZ M ET AL: "The SP- and SI-frames design for H.264/AVC", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 13, no. 7, 1 July 2003 (2003-07-01), pages 637 - 644, XP011099256, ISSN: 1051-8215 *
KUMAR S ET AL: "Error resiliency schemes in H.264/AVC standard", JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, ACADEMIC PRESS, INC, US, vol. 17, no. 2, 1 April 2006 (2006-04-01), pages 425 - 450, XP024905100, ISSN: 1047-3203, [retrieved on 20060401] *
KURCEREN R ET AL: "Synchronization-predictive coding for video compression: the sp frames design for jvt/h.26l", PROCEEDINGS / 2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING : 22 - 25 SEPTEMBER 2002, ROCHESTER, NEW YORK, USA, IEEE OPERATIONS CENTER, PISCATAWAY, NJ, vol. 2, 22 September 2002 (2002-09-22), pages 497 - 500, XP010608017, ISBN: 978-0-7803-7622-9 *
ULF JENNEHAG ET AL: "On Synchronization Frames for Channel Switching in a GOP-Based IPTV Environment", CONSUMER COMMUNICATIONS AND NETWORKING CONFERENCE, 2008. CCNC 2008. 5TH IEEE, IEEE CCP, PISCATAWAY, NJ, USA, 1 January 2008 (2008-01-01), pages 638 - 642, XP031211956, ISBN: 978-1-4244-1456-7 *

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105451742A (en) * 2013-01-07 2016-03-30 布莱阿姆青年大学 Methods for reducing cellular proliferation and treating certain diseases
WO2014163867A3 (en) * 2013-03-13 2015-01-08 Apple Inc. Codec techniques for fast switching
CN105144727A (en) * 2013-03-13 2015-12-09 苹果公司 Codec techniques for fast switching
US9900629B2 (en) 2013-03-13 2018-02-20 Apple Inc. Codec techniques for fast switching with intermediate sequence
CN105144727B (en) * 2013-03-13 2019-11-19 苹果公司 Video stream transmission method, equipment, client device and computer-readable medium
US10638169B2 (en) 2013-03-13 2020-04-28 Apple Inc. Codec techniquest for fast switching without a synchronization frame
WO2018046705A3 (en) * 2016-09-08 2018-05-24 Koninklijke Kpn N.V. Partial video decoding method, device and system
US11153580B2 (en) 2016-09-08 2021-10-19 Koninklijke Kpn N.V. Partial video decoding method, device and system
EP3783905A1 (en) * 2016-09-08 2021-02-24 Koninklijke KPN N.V. Partial video decoding method , device and system
EP3637764A4 (en) * 2017-05-24 2020-04-29 NTT Electronics Corporation Video coding device and video coding method
CN110430434A (en) * 2018-05-01 2019-11-08 达音网络科技(上海)有限公司 The Video coding of anti-dropout performance is realized with reference to method using gradual I slice
CN110430434B (en) * 2018-05-01 2022-04-26 达音网络科技(上海)有限公司 Video coding for realizing anti-packet loss performance by adopting progressive I slice reference method
EP3565246A1 (en) * 2018-05-01 2019-11-06 Agora Lab, Inc. Progressive i-slice reference for packet loss resilient video coding
WO2020101547A1 (en) * 2018-11-14 2020-05-22 Saab Ab Video data burst control for remote towers
US11153561B2 (en) 2019-10-16 2021-10-19 Axis Ab Video encoding method and video encoder configured to perform such method
JP7309676B2 (en) 2019-10-16 2023-07-18 アクシス アーベー Video encoding methods and video encoders configured to perform such methods
JP2021078109A (en) * 2019-10-16 2021-05-20 アクシス アーベー Video encoding method and video encoder configured to perform such method
EP3809700A1 (en) * 2019-10-16 2021-04-21 Axis AB Periodic intra refresh pattern for video encoding
CN112672148A (en) * 2019-10-16 2021-04-16 安讯士有限公司 Video encoding method and video encoder configured to perform the method
CN112672148B (en) * 2019-10-16 2022-07-15 安讯士有限公司 Video encoding method and video encoder configured to perform the method
JP7406229B2 (en) 2019-10-28 2023-12-27 株式会社ミラティブ DELIVERY SYSTEM, PROGRAMS AND COMPUTER-READABLE STORAGE MEDIA
US11843726B2 (en) 2019-12-26 2023-12-12 Bytedance Inc. Signaling of decoded picture buffer parameters in layered video
US11700390B2 (en) 2019-12-26 2023-07-11 Bytedance Inc. Profile, tier and layer indication in video coding
US11743505B2 (en) 2019-12-26 2023-08-29 Bytedance Inc. Constraints on signaling of hypothetical reference decoder parameters in video bitstreams
US11831894B2 (en) 2019-12-26 2023-11-28 Bytedance Inc. Constraints on signaling of video layers in coded bitstreams
US11876995B2 (en) 2019-12-26 2024-01-16 Bytedance Inc. Signaling of slice type and video layers
US11812062B2 (en) 2019-12-27 2023-11-07 Bytedance Inc. Syntax for signaling video subpictures
US11765394B2 (en) 2020-01-09 2023-09-19 Bytedance Inc. Decoding order of different SEI messages
WO2021142365A3 (en) * 2020-01-09 2021-09-02 Bytedance Inc. Restrictions on gradual decoding refresh (gdr) unit
US11936917B2 (en) 2020-01-09 2024-03-19 Bytedance Inc. Processing of filler data units in video streams
US11956476B2 (en) 2020-01-09 2024-04-09 Bytedance Inc. Constraints on value ranges in video bitstreams
US11968405B2 (en) 2020-01-09 2024-04-23 Bytedance Inc. Signalling of high level syntax indication
EP4013053A1 (en) * 2020-12-14 2022-06-15 Intel Corporation Adaptive quality boosting for low latency video coding
US11985357B2 (en) 2022-07-08 2024-05-14 Bytedance Inc. Signalling of the presence of inter-layer reference pictures

Similar Documents

Publication Publication Date Title
WO2010069427A1 (en) Method and encoder for providing a tune- in stream for an encoded video stream and method and decoder for tuning into an encoded video stream
EP2359569B1 (en) Encoder and method for generating a stream of data
US6611624B1 (en) System and method for frame accurate splicing of compressed bitstreams
US7852919B2 (en) Field start code for entry point frames with predicted first field
US7609762B2 (en) Signaling for entry point frames with predicted first field
US7924921B2 (en) Signaling coding and display options in entry point headers
JP4109113B2 (en) Switching between bitstreams in video transmission
US10425661B2 (en) Method for protecting a video frame sequence against packet loss
US8213779B2 (en) Trick mode elementary stream and receiver system
EP2695390B1 (en) Fast channel change for hybrid device
US7839930B2 (en) Signaling valid entry points in a video stream
US9197889B2 (en) Method and system for multi-layer rate control for a multi-codec system
JP2017522767A (en) Random access in video bitstream
JP2006025401A (en) Reverse presentation of digital media stream
EP3262840B1 (en) Mitigating loss in inter-operability scenarios for digital video
WO1999005864A1 (en) Editing device, editing method, splicing device, splicing method, encoding device, and encoding method
EP2664157B1 (en) Fast channel switching
WO2015162226A2 (en) Digital media splicing system and method
US10757473B2 (en) Digital media splicing system and method
Jennehag et al. Gradual tune-in pictures for fast channel change
US9219930B1 (en) Method and system for timing media stream modifications

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09744963

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09744963

Country of ref document: EP

Kind code of ref document: A1