WO2004045216A1

WO2004045216A1 - Video streaming device and method of control for switchable video streams

Info

Publication number: WO2004045216A1
Application number: PCT/EP2003/010929
Authority: WO
Inventors: Jonathan Soon Yew Teh; Anthony Richard May
Original assignee: Motorola Inc
Priority date: 2002-11-13
Filing date: 2003-09-29
Publication date: 2004-05-27
Also published as: HK1063402A1; GB2395387A8; GB2395387A; AU2003267427A1; GB0226452D0; GB2395387B

Abstract

A method of controlling a video streaming device, comprising a switching means for switching between alternative data rate versions of one of a plurality of video streams. The switching means employs a plurality of switching methods. Switching method selection is controlled by data encoded within selected frames of each of the alternative data rate versions of the video stream. The encoded data indicates which if any of the plurality of switching methods to select at and/or after said selected frames for the desired video stream switch. This enables the vide streaming device to employ different switching method for each frame, enabling the reduction both of prediction errors after switching and data storage requirements.

Description

Video streaming device and method of control for switchable video streams

Technical Field

This invention relates to the area of streaming video transmission .

Background

Streaming video involves transmitting video data from a server across a communications channel to a client, where it is played back in real-time as the video data is being received. This communications channel may include a wireless channel, or a network such as a LAN, or the

Internet. This channel may not always deliver sufficient data rate for the video data. As such, video streams are usually encoded at different data rates in order to accommodate varying channel conditions.

Several existing video streaming systems that work over the Internet either measure the available data rate before the video data is transmitted, or alternatively allow the client device to indicate the available data rate. These systems then transmit the appropriate stream, that being the stream which has a data rate less than or egual to the available data rate.

However, such systems are unable to cope when the available data rate changes during transmission of video data. When the channel data rate drops, the video is paused and the client has to wait for its buffer to fill up with video data before playback is resumed. There is a need for a video-streaming server to be able to switch between alternative data rate video streams responsively as channel conditions change. However, implementing such a switch is not a matter of simply changing to the alternative stream midway through transmission, as the different frame types used in hybrid video coding have to be taken into account to ensure that the resulting output at the client is of sufficient quality.

As a further consideration, the server has limited resources in terms of storage space and processing power. Any solution will preferably try to optimise both such resources and quality.

A number of video stream switching methods are known in the art, each corresponding to the different frame types present in a hybrid video codec such as MPEG-4 or H.26L. P- and I-Frames are defined as follows:

a) A P-frame is an intercoded frame that may predict macroblocks (segments of the image) from previous frame (s), using what is termed forward prediction. P-frames are thus sensitive to stream continuity.

b) An I-frame is an intracoded frame that only uses prediction within the same frame, and can essentially be thought of as a single still image.

Consequently, there are P-frame and I-frame switching methods as outlined below.

P-frame switching is illustrated in figure 1. In figure 1, video stream two 120 can be assumed to have a lower data rate and lower frame rate than stream one 110, and the figure illustrates switching from stream one to stream two. In this case when a stream-switching request 130 is received, the server simply selects the next nearest frame 124 from the new stream, resulting in the transmitted frame sequence 111, 112, 113, 124, 125, ...

P-frame switching as illustrated in figure 1 will result in display errors. This is because the next nearest frame 124 in the stream two is encoded using predictions based on the previous frame in stream two, 123, but due to the switch 140 must now use predictions from frame 113 of stream one, which does not contain identical information.

In most cases, the decoder would consequently suffer from drift' , which occurs when subsequent P-frames are received at the decoder and the reconstructed image drifts further away from the originally encoded image. The stream does eventually recover as intra-coded macroblocks exist in the video stream which causes the influence of the prediction errors incurred at the switch to recede in time, but the perception by the user is of a period of low-integrity imaging.

I-frame switching, by contrast, ensures that no errors occur when a stream switch takes place. I-frame switching requires that I-frames be inserted within each video stream, often at the start of a new scene. Because I-frames do not employ inter-frame prediction, it is possible to switch between streams upon reaching the next nearest I- frame in the new stream without incurring a prediction error. I-frame switching is illustrated in figure 2. In figure 2, stream two 220 can be assumed to have a lower data rate and lower frame rate than stream one 210, and the figure illustrates switching from stream one to stream two. In this case, when a stream-switching request 230 is received, the server must wait until an I-frame is due to occur in stream two 220. At this point the streams switch, 240, resulting in the transmitted frame sequence 211, 212, 213, 214, 215, 225, 226, ...

I-frame switching has a number of disadvantages. I-frames use proportionately more data and so necessarily reduce the overall quality of the encoded stream for a given data rate. Moreover, I-frames are typically only included every 5 or 10 seconds, and so the responsiveness of the system to stream switch requests is slow, increasing the likelihood of stream pauses at the client. A final disadvantage is that at low bitrates, H.263 and MPEG-4 encoders generally do not insert I-frames at all, and so an assumed use of I- frame switching for multiple streams would require modifications to the encoders.

A third method exists that attempts to overcome the problems of I- and P-frame switching.

S-frame switching utilises an additional supplementary data stream that bridges the two video streams during the switching process. The S video stream is generated to contain data for the nearest decompressed frame in a second stream, based on the previous input frame in a first stream. The result is that the switch from stream one to stream two does not incur prediction errors of the type seen with direct P-frame switching. S-frame switching is illustrated in figure 3. In figure 3, stream two 320 can be assumed to have a lower data rate and lower frame rate than stream one 310, and the figure illustrates switching from stream one to stream two. In this case when a stream-switching request 330 is received the server uses the S-frame corresponding to the frame in stream one as the basis for prediction of the next frame in stream two. This results in the transmitted frame sequence 311, 312, 313, 314, 304, 325, 326, ...

Whilst S-Frame switching produces the best quality reconstructed frames at the decoder for minimal delay, it necessitat.es the generation of a separate S-frame stream for each possible switching scenario, increasing the storage requirements at the server.

If one assumes that a stream switch is only permitted between adjacent data rate streams, then for n possible data rates the server requires 2 (n-1) S-frame streams. If one tries to mitigate the additional storage cost by only providing a corresponding S-frame every 10 P-frames, for example, this again reduces the responsiveness of the server to data rate changes.

The three switching methods above constitute 3 possible cost/quality trade-offs;

i. P-switching incurs no cost, but impacts quality when there is frame misalignment between streams due to the different data rates. ii. I-switching preserves switching quality, but requires additional data that impacts on overall quality for a given data rate. iii. S-switching preserves switching quality, but incurs cost in the server by requiring significant increases in storage, and cost during the encoding process by increasing the amount of computation required to produce the streams.

No one prior art method provides the ideal solution. Thus, there is a need to optimise this cost/quality trade-off, to improve the efficiency of video stream switching.

Summary of the Invention

In accordance with the present invention, there is provided a method of controlling a video streaming device, as claimed in claim 1, and a video streaming device, as claimed in claim 8.

Brief description of the drawings

FIG. 1 shows the switching strategy for the P-frame switching method.

FIG. 2 shows the switching strategy for the I-frame switching method.

FIG. 3 shows the switching strategy for the S-frame switching method.

FIG. 4 shows the switching strategy for the P-frame switching method when P-Frames between video streams are aligned.

Detailed description of the preferred embodiment In a preferred embodiment of the present invention, selected frames of each video stream are augmented with switching strategy recommendation data. Thus for each selected frame, data indicating the use of P-frame, I frame or S-frame or another form of switching as the best option for that frame may be encoded, for each desired switching scenario. This would require 2 or fewer bits of information per scenario, and thus would impose little cost when compared to the typical 2,800 bits of information per P- frame in a 36kbps video stream.

In a preferred embodiment of the present invention, the generation of switching strategy recommendation data would use the following rules:

i. Use P-frame switching when frames are aligned; ii. Use I-frame switching when the next available I-frame is an acceptably short distance from the current frame; iii. Use any alternative switching method supported by the streaming protocol as appropriate; iv. Use S-frame switching for remaining frames.

The provision of switching guidance need not be limited to the three methods of P-frame, I-frame or S-frame switching. Therefore rule iii above acknowledges that additional switching methods may benefit from control by the present invention.

A pair of P-frames in two separate video streams are aligned when they have identical time-stamps. Figure 4 illustrates this occurrence, in which switching from stream one 410 to stream two 420 minimises prediction error as frames 413 and 422 are substantially identical for the purposes of forward prediction by frame 423.

Evaluation

Evaluations in support of the present invention suggest that a significant number of P-frames are aligned. The table below is based on a 90 second test video sequence containing a number of different scenes:

Thus rules i. and iv. above may reduce the required number of S-frames by approximately a third for these four switching scenarios which are 36 to 22kbps, 22 to 36kbps, 22 to 15kbps and 15 to 22kbps. Moreover, the proportion of aligned P-Frames increases with data rate, and consequently the use of the present invention reduces the number of required S-frames as S-frames get larger.

Inclusion of I-frames in the video stream, for example at the start of each new scene, would further reduce the required number of S-frames by use of rule ii.

Alternative Embodiment

In an alternative embodiment, it may be decided that it is not necessary to be able to switch between streams at every possible rame. For example, if it is known that the client buffer is capable of storing 2 seconds of video, then a switch delay of 0.5 seconds may be acceptable. In the alternative embodiment, S-frames are therefore only required where there is more than (for example) a 0.5 second gap until the next pair of aligned P-frames or until the next I-frame.

Evaluation

Evaluations in support of the present invention suggest that this would additionally reduce the need for S-frames. The table below is based on a 90 second test video sequence containing a number of different scenes:

The final right-hand column suggests the likely proportion of required S-Frames relative to the number of P-frames. For the higher data rate example, fewer than 1 in 8 P- frames will require corresponding S-frames, as P-frame alignment occurs on average every half second. Where there are longer gaps, one frame in every 0.5s of the gap will have its switching strategy recommendation data encoded to provide an S-frame switch at that point, and a corresponding S-frame will be generated. For the lower data rate example, roughly 1 in 5 P-frames will require corresponding S-frames as P-frame alignment occurs on average every second, so requiring typically one S-frame between such alignments.

Again, the addition of I-frames will further reduce the S- fra e requirement . In summary, a preferred embodiment of the present invention incorporates switching strategy recommendation data within video frame data, for alternative data rate video streams. The switching strategy recommendation data indicates which of a plurality of alternative switching strategies to use for a given frame, based on rules concerning the location of frames that permit low error switching to occur, such as I-frames and aligned P-frames. The benefit of such an arrangement is to maintain quality whilst minimising cost, particularly in terms of S-frame generation and storage. An alternative embodiment proposes further reductions in S- frame. numbers, by only recommending S-frames if the temporal proximity of aligned P-frames or I-frames exceeds a stipulated duration.

Claims

1. A method of controlling a video streaming device, the video streaming device comprising a switching means for switching between alternative data rate versions of one of a plurality of video streams (110, 120, 210, 220, 310, 320, 410, 420) , the switching means employing a plurality of switching methods, the method of controlling comprising selecting the switching method in dependence on: the encoding of data within one or more selected frames of each of the plurality of alternative data rate versions of the video stream, the encoded data indicating which, if any, of the plurality of switching methods to select at and/or after said selected frame, in order to make the desired video stream switch.

2. A method of controlling a video streaming device according to claim 1, wherein: the switching means employs any of I-frame, P-frame or

S-frame switching; and the selection of I-frame, P-frame or S-frame switching is determined by the encoding of data within the selected frames indicating which, if any, of I-frame, P-frame or S- frame switching to select at and/or after said selected frames .

3. A method of controlling a video streaming device according to claim 1 or claim 2, wherein every video frame is selected for the encoding of data.

4. A method of controlling a video streaming device according to any previous claim, wherein switching method selection data is encoded to include method selection information only for the next higher alternative data rate video stream corresponding to the current video stream, where available, and next lower alternative data rate video stream corresponding to the current video stream, where available.

5. A method of controlling a video streaming device according to any previous claim, wherein S-frames that have been generated for the purpose of S-frame switching between a pair of alternative data rate versions of the video stream are only stored if the encoded data on either of the aforesaid pair of alternative data rate versions of the video stream indicates that the S-frame is required.

6. A method of controlling a video streaming device according to any previous claim, wherein the encoded data indicating which if any of the plurality of switching methods to select is based on the following rules; i. Use P-frame switching when frames are aligned; ii. Use I-frame switching when the next available I-frame is reached, if the next available I-frame is an acceptably short distance from the current frame for the purposes of the application; iii. Use any alternative switching method supported by the streaming protocol as appropriate; iv. Use S-frame switching for remaining frames.

7. A method of controlling a video streaming device according to any of claims 1 to 5, wherein the encoded data indicating which, if any, of the plurality of switching methods to select is based on the following rules; i. if the next available aligned P-frame is an acceptably short distance from the current frame for the purposes of the application, then use P-frame switching when the next available aligned P-frame is reached; ii. if the next available I-frame is an acceptably short distance from the current frame for the purposes of the application, then use I-frame switching when the next available I-frame is reached; iii. if the availability of said switching method occurs in an acceptably short time for the purposes of the application, then use any alternative switching method supported by the streaming protocol when said alternative switching method becomes available; iv. if rules i., ii. or iii. are not met, then use S-frame switching for each frame where the onset of the subsequent frame according to rule i, ii or iii would exceed the acceptable time following the last available switching position.

8. A video streaming device comprising a switching means for switching between alternative data rate versions of one of a plurality of video streams (110, 120, 210, 220, 310, 320, 410, 420), the switching means employing a plurality of switching methods, and the switching means being adapted to select the data rate version of a video stream in dependence on: encoded data within one or more selected frames of each of the plurality of alternative data rate versions of the video stream; the encoded data indicating which, if any, of the plurality of switching methods to select at and/or after said selected frames in order to make the desired video stream switch.

9. A video streaming device according to claim 8, wherein the video streaming device is a server.