Video streaming device and method of control for switchable video streams
Technical Field
This invention relates to the area of streaming video transmission .
Background
Streaming video involves transmitting video data from a server across a communications channel to a client, where it is played back in real-time as the video data is being received. This communications channel may include a wireless channel, or a network such as a LAN, or the
Internet. This channel may not always deliver sufficient data rate for the video data. As such, video streams are usually encoded at different data rates in order to accommodate varying channel conditions.
Several existing video streaming systems that work over the Internet either measure the available data rate before the video data is transmitted, or alternatively allow the client device to indicate the available data rate. These systems then transmit the appropriate stream, that being the stream which has a data rate less than or egual to the available data rate.
However, such systems are unable to cope when the available data rate changes during transmission of video data. When the channel data rate drops, the video is paused and the client has to wait for its buffer to fill up with video data before playback is resumed.
There is a need for a video-streaming server to be able to switch between alternative data rate video streams responsively as channel conditions change. However, implementing such a switch is not a matter of simply changing to the alternative stream midway through transmission, as the different frame types used in hybrid video coding have to be taken into account to ensure that the resulting output at the client is of sufficient quality.
As a further consideration, the server has limited resources in terms of storage space and processing power. Any solution will preferably try to optimise both such resources and quality.
A number of video stream switching methods are known in the art, each corresponding to the different frame types present in a hybrid video codec such as MPEG-4 or H.26L. P- and I-Frames are defined as follows:
a) A P-frame is an intercoded frame that may predict macroblocks (segments of the image) from previous frame (s), using what is termed forward prediction. P-frames are thus sensitive to stream continuity.
b) An I-frame is an intracoded frame that only uses prediction within the same frame, and can essentially be thought of as a single still image.
Consequently, there are P-frame and I-frame switching methods as outlined below.
P-frame switching is illustrated in figure 1. In figure 1, video stream two 120 can be assumed to have a lower data
rate and lower frame rate than stream one 110, and the figure illustrates switching from stream one to stream two. In this case when a stream-switching request 130 is received, the server simply selects the next nearest frame 124 from the new stream, resulting in the transmitted frame sequence 111, 112, 113, 124, 125, ...
P-frame switching as illustrated in figure 1 will result in display errors. This is because the next nearest frame 124 in the stream two is encoded using predictions based on the previous frame in stream two, 123, but due to the switch 140 must now use predictions from frame 113 of stream one, which does not contain identical information.
In most cases, the decoder would consequently suffer from drift' , which occurs when subsequent P-frames are received at the decoder and the reconstructed image drifts further away from the originally encoded image. The stream does eventually recover as intra-coded macroblocks exist in the video stream which causes the influence of the prediction errors incurred at the switch to recede in time, but the perception by the user is of a period of low-integrity imaging.
I-frame switching, by contrast, ensures that no errors occur when a stream switch takes place. I-frame switching requires that I-frames be inserted within each video stream, often at the start of a new scene. Because I-frames do not employ inter-frame prediction, it is possible to switch between streams upon reaching the next nearest I- frame in the new stream without incurring a prediction error.
I-frame switching is illustrated in figure 2. In figure 2, stream two 220 can be assumed to have a lower data rate and lower frame rate than stream one 210, and the figure illustrates switching from stream one to stream two. In this case, when a stream-switching request 230 is received, the server must wait until an I-frame is due to occur in stream two 220. At this point the streams switch, 240, resulting in the transmitted frame sequence 211, 212, 213, 214, 215, 225, 226, ...
I-frame switching has a number of disadvantages. I-frames use proportionately more data and so necessarily reduce the overall quality of the encoded stream for a given data rate. Moreover, I-frames are typically only included every 5 or 10 seconds, and so the responsiveness of the system to stream switch requests is slow, increasing the likelihood of stream pauses at the client. A final disadvantage is that at low bitrates, H.263 and MPEG-4 encoders generally do not insert I-frames at all, and so an assumed use of I- frame switching for multiple streams would require modifications to the encoders.
A third method exists that attempts to overcome the problems of I- and P-frame switching.
S-frame switching utilises an additional supplementary data stream that bridges the two video streams during the switching process. The S video stream is generated to contain data for the nearest decompressed frame in a second stream, based on the previous input frame in a first stream. The result is that the switch from stream one to stream two does not incur prediction errors of the type seen with direct P-frame switching.
S-frame switching is illustrated in figure 3. In figure 3, stream two 320 can be assumed to have a lower data rate and lower frame rate than stream one 310, and the figure illustrates switching from stream one to stream two. In this case when a stream-switching request 330 is received the server uses the S-frame corresponding to the frame in stream one as the basis for prediction of the next frame in stream two. This results in the transmitted frame sequence 311, 312, 313, 314, 304, 325, 326, ...
Whilst S-Frame switching produces the best quality reconstructed frames at the decoder for minimal delay, it necessitat.es the generation of a separate S-frame stream for each possible switching scenario, increasing the storage requirements at the server.
If one assumes that a stream switch is only permitted between adjacent data rate streams, then for n possible data rates the server requires 2 (n-1) S-frame streams. If one tries to mitigate the additional storage cost by only providing a corresponding S-frame every 10 P-frames, for example, this again reduces the responsiveness of the server to data rate changes.
The three switching methods above constitute 3 possible cost/quality trade-offs;
i. P-switching incurs no cost, but impacts quality when there is frame misalignment between streams due to the different data rates. ii. I-switching preserves switching quality, but requires additional data that impacts on overall quality for a given data rate.
iii. S-switching preserves switching quality, but incurs cost in the server by requiring significant increases in storage, and cost during the encoding process by increasing the amount of computation required to produce the streams.
No one prior art method provides the ideal solution. Thus, there is a need to optimise this cost/quality trade-off, to improve the efficiency of video stream switching.
Summary of the Invention
In accordance with the present invention, there is provided a method of controlling a video streaming device, as claimed in claim 1, and a video streaming device, as claimed in claim 8.
Brief description of the drawings
FIG. 1 shows the switching strategy for the P-frame switching method.
FIG. 2 shows the switching strategy for the I-frame switching method.
FIG. 3 shows the switching strategy for the S-frame switching method.
FIG. 4 shows the switching strategy for the P-frame switching method when P-Frames between video streams are aligned.
Detailed description of the preferred embodiment
In a preferred embodiment of the present invention, selected frames of each video stream are augmented with switching strategy recommendation data. Thus for each selected frame, data indicating the use of P-frame, I frame or S-frame or another form of switching as the best option for that frame may be encoded, for each desired switching scenario. This would require 2 or fewer bits of information per scenario, and thus would impose little cost when compared to the typical 2,800 bits of information per P- frame in a 36kbps video stream.
In a preferred embodiment of the present invention, the generation of switching strategy recommendation data would use the following rules:
i. Use P-frame switching when frames are aligned; ii. Use I-frame switching when the next available I-frame is an acceptably short distance from the current frame; iii. Use any alternative switching method supported by the streaming protocol as appropriate; iv. Use S-frame switching for remaining frames.
The provision of switching guidance need not be limited to the three methods of P-frame, I-frame or S-frame switching. Therefore rule iii above acknowledges that additional switching methods may benefit from control by the present invention.
A pair of P-frames in two separate video streams are aligned when they have identical time-stamps. Figure 4 illustrates this occurrence, in which switching from stream one 410 to stream two 420 minimises prediction error as
frames 413 and 422 are substantially identical for the purposes of forward prediction by frame 423.
Evaluation
Evaluations in support of the present invention suggest that a significant number of P-frames are aligned. The table below is based on a 90 second test video sequence containing a number of different scenes:
Thus rules i. and iv. above may reduce the required number of S-frames by approximately a third for these four switching scenarios which are 36 to 22kbps, 22 to 36kbps, 22 to 15kbps and 15 to 22kbps. Moreover, the proportion of aligned P-Frames increases with data rate, and consequently the use of the present invention reduces the number of required S-frames as S-frames get larger.
Inclusion of I-frames in the video stream, for example at the start of each new scene, would further reduce the required number of S-frames by use of rule ii.
Alternative Embodiment
In an alternative embodiment, it may be decided that it is not necessary to be able to switch between streams at every possible rame. For example, if it is known that the client buffer is capable of storing 2 seconds of video, then a switch delay of 0.5 seconds may be acceptable.
In the alternative embodiment, S-frames are therefore only required where there is more than (for example) a 0.5 second gap until the next pair of aligned P-frames or until the next I-frame.
Evaluation
Evaluations in support of the present invention suggest that this would additionally reduce the need for S-frames. The table below is based on a 90 second test video sequence containing a number of different scenes:
The final right-hand column suggests the likely proportion of required S-Frames relative to the number of P-frames. For the higher data rate example, fewer than 1 in 8 P- frames will require corresponding S-frames, as P-frame alignment occurs on average every half second. Where there are longer gaps, one frame in every 0.5s of the gap will have its switching strategy recommendation data encoded to provide an S-frame switch at that point, and a corresponding S-frame will be generated. For the lower data rate example, roughly 1 in 5 P-frames will require corresponding S-frames as P-frame alignment occurs on average every second, so requiring typically one S-frame between such alignments.
Again, the addition of I-frames will further reduce the S- fra e requirement .
In summary, a preferred embodiment of the present invention incorporates switching strategy recommendation data within video frame data, for alternative data rate video streams. The switching strategy recommendation data indicates which of a plurality of alternative switching strategies to use for a given frame, based on rules concerning the location of frames that permit low error switching to occur, such as I-frames and aligned P-frames. The benefit of such an arrangement is to maintain quality whilst minimising cost, particularly in terms of S-frame generation and storage. An alternative embodiment proposes further reductions in S- frame. numbers, by only recommending S-frames if the temporal proximity of aligned P-frames or I-frames exceeds a stipulated duration.