WO2010100427A1 - Video streaming - Google Patents

Video streaming Download PDF

Info

Publication number
WO2010100427A1
WO2010100427A1 PCT/GB2010/000390 GB2010000390W WO2010100427A1 WO 2010100427 A1 WO2010100427 A1 WO 2010100427A1 GB 2010000390 W GB2010000390 W GB 2010000390W WO 2010100427 A1 WO2010100427 A1 WO 2010100427A1
Authority
WO
WIPO (PCT)
Prior art keywords
quality
buffer
estimated
gop
bit rate
Prior art date
Application number
PCT/GB2010/000390
Other languages
French (fr)
Other versions
WO2010100427A8 (en
Inventor
Michael Erling Nilsson
Rory Stewart Turnbull
Ian Barry Crabtree
Stephen Clifford Appleby
Patrick Joseph Mulroy
Steve Hoare
Original Assignee
British Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by British Telecommunications filed Critical British Telecommunications
Priority to CN201080010722.5A priority Critical patent/CN102369732B/en
Priority to EP10707637A priority patent/EP2404449A1/en
Publication of WO2010100427A1 publication Critical patent/WO2010100427A1/en
Publication of WO2010100427A8 publication Critical patent/WO2010100427A8/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/177Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a group of pictures [GOP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • H04N19/126Details of normalisation or weighting functions, e.g. normalisation matrices or variable uniform quantisers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/15Data rate or code amount at the encoder output by monitoring actual compressed data size at the memory before deciding storage at the transmission buffer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/152Data rate or code amount at the encoder output by measuring the fullness of the transmission buffer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/164Feedback from the receiver or from the transmission channel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding

Definitions

  • This invention relates to video streaming over networks, and is particularly useful in the case of networks with a non-deterministic bandwidth availability.
  • a situation is typical of the Internet, where packet delivery is by best-effort, or where the physical medium has an inherently non-deterministic behaviour, such as wireless connections.
  • This invention is applicable both in situations where a compressed asset is available in advance of delivery (such as Video on Demand), and where the complete asset is not available in advance (such as streaming a live event).
  • a compressed asset is available in advance of delivery (such as Video on Demand)
  • the complete asset is not available in advance (such as streaming a live event).
  • a server for streaming video includes a coding apparatus as shown in Figure 1 , comprising a video interface 1 that receives digitally coded video signals, in uncompressed form, at a video input 2.
  • a processor 3 operates under control of programs stored in disc storage 4 and has access to memory 5 and a video output buffer 6 that feeds a video output 7.
  • the programs include a general purpose operating system 40 and video coding software 41 which implements one of more of the coding methods shortly to be described.
  • the server transmits to a client, including a video decoder. This can be of conventional construction and is therefore not illustrated. It is however worth mentioning that the client includes a buffer for buffering received video data until it can be decoded.
  • each media asset is partitioned into time slices, and each time slice is encoded at a number of qualities.
  • the asset is streamed by sending time slices in temporal order from any of the different quality streams.
  • Time-slices will typically be coded independently (e.g. as a closed group of pictures, where each group begins with an I-frame) but may also support a switching picture framework to allow more bit efficient concatenation.
  • Stream switching as a mechanism to cope with dynamically changing bandwidth is a well known technique. Our objective here, however, is to select which quality to stream at each time-slice to maximise the perceptual quality of the overall stream, subject to the constraints of available bandwidth and client buffering.
  • each time slice is a group of pictures (GOP) and is encoded with a fixed set of qualities.
  • the invention is equally applicable to the case where a different set of qualities is used for each time slice.
  • all GOPs were encoded in advance, though if desired they could be encoded as required, with a sufficient look-ahead that the necessary results are available when needed.
  • Encoding at quality level j is preferably performed as described in our co-pending European patent application no. 08250815.1 (agent's ref.A31594) wherein each index j corresponds to a respective fixed perceptual quality.
  • each node represents the resulting buffer state (suitably quantised) after transmitting a time slice at a given quality, and each link represents a quality selection decision.
  • the buffer states in the trellis need to be quantised to such a level that there are sufficient number of states to find an optimum solution.
  • PCT/GB2008/003691 (agent's ref A31511) gives further details of a buffer state Viterbi trellis used for constant quality video encoding.
  • Evaluating the quality of a path through this lattice is not- as simple as taking the average of the qualities of the individual slices. For instance, it is well-known that constant quality is preferable to variable quality, even though the variable quality stream may have a higher average quality (See D. Hands & K. Cheng, Subject responses to constant and variable quality video, Human Vision and Electronic Imaging XIII 2008, SPIE Electronic Imaging, San Jose, California, USA) . It is also often suggested that over certain timescales, the perceived quality is biased towards the lower end of the qualities seen. Here, we assume that we have some quality measure that can rank paths through this lattice.
  • a buffer is used to provide some decoupling from the delivery time of (compressed) media samples, and their playout time. This buffering allows smoothing of a variable delivery rate of media samples.
  • the quality of the media playout is reduced.
  • the nature of the reduction will depend on a number of factors, including the masking ability of the client, the transport protocol used and whether buffer overflow/underflow occurs for audio or video or some other type of media.
  • a timeslice encoded at higher quality will produce more data than one encoded at lower quality. Therefore, the use of higher quality timeslices will cause data to be played out from the client buffer at a higher rate, and will need a higher rate of delivery over the network to prevent buffer underflow.
  • / and M are timeslice indices o, is the probability of buffer overflow just before we remove time slice i+l from the buffer, u, is the probability of buffer underflow just after we remove time slice i from the buffer.
  • I indicates the most recent GOP whose quality has been determined.
  • mi is the estimated bit rate, and is used to determine the quality of GOP 1+ 1.
  • B S i is the buffer level before GOP i is removed from the decoder buffer.
  • Q s, is the quality selected for GOP i-1.
  • C s, i is the cost before the decision for GOP i is made. The choice for GOP i affects the next
  • the whole media asset is encoded at each of the quality levels, and the number of bits used to encode each GOP i at each quality j, bj 0 , is recorded.
  • an initial estimate is made of the mean bit rate at which delivery through the network might be achieved. This may be derived from measurements made during preceding exchanges of information between the server and the client, in which, for example, the client requested the content; or it may be a value derived by the server based on how many other streams it is already delivering, the time of day or some other factor; or it may simply be a constant value.
  • this initial mean bit delivery rate as ⁇ I INIT - AS the quality of GOPs is determined and as they are transmitted, we will update this mean bit delivery rate, being referred to as In 1 at the time when the quality of GOP 1+1 is to be determined.
  • mi is not necessarily the actual bit rate at time I; rather, it is the most up- to date estimate of bit-rate available to be used in calculations concerning GOP 1+1.
  • 203 We determine a start up delay to be signalled to the receiver, indicating how long the receiver should wait between first receiving data to decode and removing all of the data representing the first GOP instantaneously from its buffer and starting to decode that data.
  • This value may be set to a fixed value, such as more or less than one GOP period, or may be set according to the video asset to be delivered, being longer for an asset for which the initial video scenes are particularly difficult to compress.
  • this start up delay figure as D.
  • s F * (number of quantised buffer states — 1) / maximum buffer size, where 7' indicates integer division with rounding.
  • the first GOP is transmitted at the selected quality level, and the value of the start up delay, D, is also transmitted.
  • the transmission of the GOP is monitored, and the statistics of the network throughput are updated to derive, if necessary, a new value of mean bit rate, mo, and parameters indicative of its variability.
  • the GOP (GOP I) is transmitted at the selected quality level. The transmission of the GOP is monitored, and the statistics of the network throughput are updated to derive, if necessary, a new value of mean bit rate, mi, and parameters indicative of its variability.
  • each node represents the resulting buffer state (suitably quantised) after transmitting a GOP at a given quality, and where each link represents a quality selection decision.
  • 309 Perform the actions in steps 110 to 119 for each value of quality index j and for each state s for which the state variable S s j is not marked as inactive.
  • s' F'* (number of quantised buffer states - I)/ maximum buffer size (with rounding).
  • Q s ⁇ is the quality at which GOP i-1 was coded on the path through the trellis to state s for GOP i.
  • o S ⁇ i j is the probability of buffer overflow just before we remove timeslice z+1 at when timeslice / has been encoded at quality y starting in state s
  • u Si i j is the probability of buffer underflow just after we remove timeslice i at quality y starting in state s.
  • step 315 If state variable S s ',i +1 is marked as active go to step 317, else go to step 316.
  • step 328 If there is only one possible path from GOP I to GOP 1+1 after the pruning process then exit to step 328.
  • step 307 If there are more GOPs to be processed then repeat for next GOP by returning to step 307; otherwise (325), from the set of paths arriving to the end of the file choose the path that leads to the best final cost, prune all other paths and exit to step 328. Note other choices for the best final state are possible: best cost after one GOP (or any number), highest lowest buffer level along trellis path etc. 328 Return the chosen quality transition path
  • step 313 we need to estimate the number of bits we would expect to be able to get through the network by the given time and a measure of the accuracy of that estimate.
  • Another option is to continually collect statistics as to bit rate and compute the actual standard deviation over a recent time window.
  • the probability given the standard deviation ⁇ * that the buffer fullness will actually reach zero - i.e. a deviation of (B s, i - bj j )/ ⁇ j times the standard deviation can be looked up in a Gaussian cumulative probability table. Alternatively it can be calculated from
  • the buffer would overflow if the fullness exceeds the buffer size B.
  • the probability of this can be found by looking up (F' -B)/ ⁇ j in a Gaussian cumulative probability table, or from
  • the method caters for variability in the actual bandwidth through the incorporation of the under and over flow probabilities. It can be further extended however by varying the bandwidth estimate as we propagate through the lattice based on other available information about future network bandwidth. There may be known events such as other streams about to end, which introduce a dependence on other streams but which would mean an imminent bandwidth increase. Downstairs rate curves were introduced .in the context of optimal bandwidth reservation for VBR coded video (See K. Sun & M. Ghanbari, An Algorithm for VBR video transmission scheme over the Internet, in
  • Any VBR asset will have a peak rate requirement to ensure no buffer starvation problems at the most difficult part of the content. Once this point is passed the next peak rate will be lower and so on.
  • This series of peak rates form a downstairs stepping profile and this future profile of all currently streamed assets may also be available. If streamed at a rate proportional to this rate requirement (e.g. using MuITCP and variable N) this would suggest with no new streams added there will be more rate and less contention moving forward in time.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Error Detection And Correction (AREA)

Abstract

A video sequence to be encoded is divided into a plurality of temporal portions, and analysed to determine (at least for each portion after the first), in accordance with a plurality of encoding quality settings, (i) a quality metric for the portion and (ii) the number of bits generated by encoding the portion at that quality setting. These data are analysed, for example using a Viterbi-like trellis, to choose a set of quality settings, one per portion, that tends to minimise a combined quality cost for the sequence. This the combined quality cost is the sum of individual quality costs each of which is a function of the quality metric of the respective encoded portion. The sequence is encoded using the chosen quality settings. In order to determine each individual quality cost, despite not knowing precisely what network throughput will be available at any given time in the future, one proceeds by estimating receiver buffer fullness and its standard deviation. From said estimates, the probability of buffer underflow and or overflow is obtained, the individual quality costs being a function also of the underflow and/or overflow probability.

Description

Video Streaming
This invention relates to video streaming over networks, and is particularly useful in the case of networks with a non-deterministic bandwidth availability. Such a situation is typical of the Internet, where packet delivery is by best-effort, or where the physical medium has an inherently non-deterministic behaviour, such as wireless connections.
This invention is applicable both in situations where a compressed asset is available in advance of delivery (such as Video on Demand), and where the complete asset is not available in advance (such as streaming a live event). "Adaptive Streaming within the 3GPP Packet-Switched Streaming Service", IEEE
Network, March/ April 2006 is of interest in this context as it details a 3GPP standardised streaming service over mobile networks with variable transmission bandwidth due to the nature of wireless channels. This system uses RTP/UDP and reacts to frequent client buffer status messages via RTCP reports to choose between multiple fixed bit rate encodings or to change the rate of a live encoding system to ensure no buffer over or under runs.
Some embodiments of the invention will now be described, by way of example, with reference to the accompanying drawings.
A server for streaming video includes a coding apparatus as shown in Figure 1 , comprising a video interface 1 that receives digitally coded video signals, in uncompressed form, at a video input 2. A processor 3 operates under control of programs stored in disc storage 4 and has access to memory 5 and a video output buffer 6 that feeds a video output 7. The programs include a general purpose operating system 40 and video coding software 41 which implements one of more of the coding methods shortly to be described. The server transmits to a client, including a video decoder. This can be of conventional construction and is therefore not illustrated. It is however worth mentioning that the client includes a buffer for buffering received video data until it can be decoded.
In the streaming system envisioned, each media asset is partitioned into time slices, and each time slice is encoded at a number of qualities. The asset is streamed by sending time slices in temporal order from any of the different quality streams. Time-slices will typically be coded independently (e.g. as a closed group of pictures, where each group begins with an I-frame) but may also support a switching picture framework to allow more bit efficient concatenation. Stream switching as a mechanism to cope with dynamically changing bandwidth is a well known technique. Our objective here, however, is to select which quality to stream at each time-slice to maximise the perceptual quality of the overall stream, subject to the constraints of available bandwidth and client buffering.
In this description we consider that each time slice is a group of pictures (GOP) and is encoded with a fixed set of qualities. However, it should be noted that the invention is equally applicable to the case where a different set of qualities is used for each time slice. In the prototype, all GOPs were encoded in advance, though if desired they could be encoded as required, with a sufficient look-ahead that the necessary results are available when needed.
Encoding at quality level j is preferably performed as described in our co-pending European patent application no. 08250815.1 (agent's ref.A31594) wherein each index j corresponds to a respective fixed perceptual quality.
It is useful to consider the problem as that of choosing a path through a lattice, where each node represents the resulting buffer state (suitably quantised) after transmitting a time slice at a given quality, and each link represents a quality selection decision. The buffer states in the trellis need to be quantised to such a level that there are sufficient number of states to find an optimum solution. Our co-pending international patent application no.
PCT/GB2008/003691 (agent's ref A31511) gives further details of a buffer state Viterbi trellis used for constant quality video encoding.
Evaluating the quality of a path through this lattice is not- as simple as taking the average of the qualities of the individual slices. For instance, it is well-known that constant quality is preferable to variable quality, even though the variable quality stream may have a higher average quality (See D. Hands & K. Cheng, Subject responses to constant and variable quality video, Human Vision and Electronic Imaging XIII 2008, SPIE Electronic Imaging, San Jose, California, USA) . It is also often suggested that over certain timescales, the perceived quality is biased towards the lower end of the qualities seen. Here, we assume that we have some quality measure that can rank paths through this lattice. We calculate the cumulative quality in such a way that the cumulative quality metric up to any given point on the path is only dependent on the cumulative quality metric up to the previous time slice, and the quality transition from the previous time slice to the current one. This allows us to use a dynamic programming technique to select the highest quality path in a very efficient manner. Actually, our preferred metric is expressed as a cost (which is smaller, the higher is the quality).
We write the cost metric of a path through the lattice up to time slice i as;
C,+1 = /(<?,,<?,_, )+ C, where qk is the quality selected for time slice /, and C, is the cumulative cost metric of a path up to (but not including) time slice i.
We can then use the Viterbi algorithm to select the highest quality path in a very efficient manner. Strictly speaking, the Viterbi constraints would preclude any dependence of Cj+ 1 on qj.i, but in practice we find nevertheless that the above constraint produces good results. Typically, when streaming video, a buffer is used to provide some decoupling from the delivery time of (compressed) media samples, and their playout time. This buffering allows smoothing of a variable delivery rate of media samples.
If the decoder buffer is allowed to underflow (i.e. media samples are delivered later than they should be for decoding and playout), or "overflow (too many samples are delivered and cannot be stored), then the quality of the media playout is reduced. The nature of the reduction will depend on a number of factors, including the masking ability of the client, the transport protocol used and whether buffer overflow/underflow occurs for audio or video or some other type of media.
When streaming content, we need to achieve a balance between the data delivery rate over the network, and the rate at which data are removed from the buffer for playout, such that the buffer neither underflows nor overflows.
Generally, a timeslice encoded at higher quality will produce more data than one encoded at lower quality. Therefore, the use of higher quality timeslices will cause data to be played out from the client buffer at a higher rate, and will need a higher rate of delivery over the network to prevent buffer underflow.
It should also be noted that, in general, different timeslices encoded at the same quality will generate very different quantities of data. For video at least, the size of a compressed timeslice will depend very much on the content. It is clear that there is not a simple relationship between quality and the data rate of a timeslice.
Here we introduce a means for balancing quality against estimated future bandwidth to control the risk of buffer overflow or underflow. Since often we don't know what the delivery rate of the network will be in the future, we cannot know with certainty what paths we can deliver through the timeslice lattice. To take this uncertainty into account, we associate with each future timeslice a probability distribution of the number of bits that we are likely to be able to deliver to the client between the current time and the time at which we complete delivery of that timeslice. As we propagate forward, we would expect the mean of this distribution to increase, as with more time we expect to deliver more bits. We would also expect the standard deviation of the distribution to increase, representing the increasing uncertainty as we move our estimate further into the future.
Given this probability distribution, and an estimate for the current client buffer fill, we can calculate the probability distribution of client buffer fullness for each future timeslice for each quality path leading to that timeslice. This will enable us to calculate the probability that a particular path through the lattice will cause a buffer underflow or overflow.
To choose the best path we need to extend our notion of cumulative quality to include the probability of a buffer underflow or overflow. Clearly, the way that underflow and overflow probabilities are incorporated into the cumulative metric will depend on the perceptual impact that an underflow or overflow has on the perception of overall quality. For instance, if media is being streamed using TCP, then a buffer overflow is of little consequence, since TCP's flow control mechanism will deal with it without any impact on perceived quality. However, a buffer underflow will be very noticeable, as it will typically cause a temporary loss of audio and frozen video. It may then be preferable to favour fuller buffers This means that our cost metric now has the form; C1+, = f(q, ,q,_x ,o, ,ut )+ C, Where
/ and M are timeslice indices o, is the probability of buffer overflow just before we remove time slice i+l from the buffer, u, is the probability of buffer underflow just after we remove time slice i from the buffer.
In the prototype, the cost was calculated in accordance with the following C1+1 = C, - Aj1T1 + K.abs(q, - q^ + K'u, + K"JO, where A, K, K' and K" are weighting factors, T1 is the play-out (viewing) duration of the timeslice, and where q is measured on the continuous scale defined in ITU-R Recommendation BT.500, in which the quality terms bad, poor, fair and good are associated with values between 1 and 5 inclusive. Some weightings which gave good results were A = 1, K = 10, K' = 2.5, K" = 2.5 with q, in the range of 2.6 to 4.2. Figure 2 is a flowchart showing the operation of the coding. This considers one group of pictures at a time; if it is desired instead to process time slices each consisting of two or more groups, then for "GOP" read "time slice" or "sequence of GOPs".
The terminology for the variables is as follows: I indicates the most recent GOP whose quality has been determined. mi is the estimated bit rate, and is used to determine the quality of GOP 1+ 1.
It may be helpful to think of a state S5 1 being positioned in time just before GOP i is removed from the decoder buffer, although a state has no real concept of a time.
BS i is the buffer level before GOP i is removed from the decoder buffer. Qs,, is the quality selected for GOP i-1. Cs,i is the cost before the decision for GOP i is made. The choice for GOP i affects the next
Hence Qs,o is meaningless, Cs 0 and Cs,i are zero (or any other arbitrary value) as they relate to levels before Viterbi has been used to make a decision.
200 Before transmission, the whole media asset is encoded at each of the quality levels, and the number of bits used to encode each GOP i at each quality j, bj0, is recorded.
202 When transmission is about to start, an initial estimate is made of the mean bit rate at which delivery through the network might be achieved. This may be derived from measurements made during preceding exchanges of information between the server and the client, in which, for example, the client requested the content; or it may be a value derived by the server based on how many other streams it is already delivering, the time of day or some other factor; or it may simply be a constant value. We refer to this initial mean bit delivery rate as ΠIINIT- AS the quality of GOPs is determined and as they are transmitted, we will update this mean bit delivery rate, being referred to as In1 at the time when the quality of GOP 1+1 is to be determined. Thus mi is not necessarily the actual bit rate at time I; rather, it is the most up- to date estimate of bit-rate available to be used in calculations concerning GOP 1+1. 203 We determine a start up delay to be signalled to the receiver, indicating how long the receiver should wait between first receiving data to decode and removing all of the data representing the first GOP instantaneously from its buffer and starting to decode that data. This value may be set to a fixed value, such as more or less than one GOP period, or may be set according to the video asset to be delivered, being longer for an asset for which the initial video scenes are particularly difficult to compress. We denote this start up delay figure as D.
Note that in this example we make the quality decisions on a GOP by GOP basis but in a practical implementation one could, if desired, do the video processing on a picture basis (or even smaller, e.g. slice of a picture), i.e. the sender would make a decision for a GOP, and then transmit in turn each of the pictures in the GOP, and the receiver may wait until all of the first picture has been delivered before starting decoding. In other words, a start up delay could be less than a GOP period. Naturally the GOP level mathematics in this flowchart would require appropriate modification.
205 Set an estimated receiver buffer fullness tally, F, to indicate how much data will be in the receiver buffer immediately before instantaneously removing all of the bits representing the first GOP. This depends on the start up delay D and the initial mean bit delivery rate as ITIINIT-
F = D - mINIT
207 Define a state value s as the quantised buffer fullness tally.: s = F * (number of quantised buffer states — 1) / maximum buffer size, where 7' indicates integer division with rounding.
209 Mark the state variable Ss 0 as active and assign for that state 5 an initial cost CS)0 = 0, and' set buffer level Bs>0 to F. We initialise a set of state variables Ss,j to "inactive state", for each possible state s for each GOP i (i= 1 to the end of the asset). 211 The quality for the first GOP, jo, is selected from one of the available quality levels. This decision may be based on the estimate of the achievable network throughput, or may be constant, so that, for example, the mid-range quality is selected.
213 Record the quality at which the first GOP was transmitted, jo, as Qs,i.
217 The first GOP is transmitted at the selected quality level, and the value of the start up delay, D, is also transmitted. The transmission of the GOP is monitored, and the statistics of the network throughput are updated to derive, if necessary, a new value of mean bit rate, mo, and parameters indicative of its variability.
219 We set a pointer I to zero: as the method proceeds, this pointer will point to the index of the GOP for which the quality was most recently determined. Then, in order to determine the quality of the next GOP (and in turn the subsequent GOPs): 221 A local pointer i is set to I.
223 The procedure shown in the flowchart of Figure 3 is invoked, as described below, to select the quality.
225 The pointer I is incremented (I= 1+ 1). 227 The GOP (GOP I) is transmitted at the selected quality level. The transmission of the GOP is monitored, and the statistics of the network throughput are updated to derive, if necessary, a new value of mean bit rate, mi, and parameters indicative of its variability.
229 This process, starting with the selection of quality through transmission of the GOP at the selected quality and monitoring of that transmission is repeated from Step 221 until delivery of the media asset is complete.
In order to select the quality of the next GOP, we consider the problem as that of choosing a path through a lattice, where each node represents the resulting buffer state (suitably quantised) after transmitting a GOP at a given quality, and where each link represents a quality selection decision. The buffer state is obtained by dividing the expected buffer fullness, derived using the estimated network throughput, by a fixed parameter. We find that using 300 states between buffer empty and buffer full provides satisfactory results. In this case we would determine the state s from the buffer fullness B according to s = B * 299 / BufferSize, where V indicates integer division with rounding. More generally: s = int ( ( B * ( ( number of states - I ) / BufferSize ) ) + 0.5 ) where "int" means the integer part of (so that, for example, int(3.9) = 3).
In some cases it may be beneficial to define further states beyond buffer overflow and beyond buffer underflow because the estimated buffer fullness is simply an estimate: the actual value at that time may be more or less. By allowing states beyond the buffer size, we are able to propagate more paths through the trellis, and hence possibly find a better solution. For example, we may allow states from -100 to 400, and consider buffer fullness levels that would lead to values of s outside of this range as invalid, and prune such paths from the trellis. Our above-mentioned international patent application no. PCT/GB2008/003691 gives further details of a buffer state Viterbi trellis used for constant quality video encoding.
We turn now to the flowchart of Figure 3:
300 Initialisation. We re-initialise the set of state variables Ss,k to "inactive state", for each possible state s for each GOP k from GOP k=I+l to the end. 302 Set an estimated receiver buffer fullness tally, F, to indicate how much data will be in the receiver buffer immediately before the bits of GOP 1+1 are all instantaneously removed, using knowledge of how much data has already been sent to the receiver, S, how much more data is expected to be sent by the time that the receiver is ready to decode GOP 1+1, using the current estimated mean bit delivery rate as mi and the start up delay, D, and how much data the receiver would have removed for decoding, given by which qualities of encoding, ji, have been decoded. Denoting the time elapsed since the start of transmission as ti, the additional time until GOP 1+1 is to be decoded at the receiver as x, and the GOP periods as Tj, we observe that
+ x = D +
Figure imgf000010_0001
Observing that the amount of data already transmitted, S, is equal to the amount of data that will have been removed from the receiver buffer by this time, allows us to set F as
F = In1 • x
304 Define a state value s as the quantised buffer fullness tally.: s = F * (number of quantised buffer states - I)/ maximum buffer size, where V indicates integer division with rounding. Mark the state variable Ss,j+i as active, assign for that state 5 an initial cost CSij+1 = 0, and set buffer level Bsi+1 to F.
307 Set i = i+1.
309 Perform the actions in steps 110 to 119 for each value of quality index j and for each state s for which the state variable Ssj is not marked as inactive. 310 Determine a new value of receiver buffer fullness tally by adding the estimated number of bits received by the receiver during T, and subtracting the number of bits by consumed by the decoder: F = Bs,i- bij + (m, * Ti).
312 Determine the state, s', at the next level of the trellis, i+1, as the quantised value of the buffer fullness F'. If this is not a valid state, as it represents an invalid level of fullness, no further processing of this potential path in the trellis is considered, and control passes to step 319. s' = F'* (number of quantised buffer states - I)/ maximum buffer size (with rounding).
313 Calculate the overflow probability oSfij and underflow probability USJJ from the estimated buffer fullness tally F' and the standard deviation o\ of the number of transmitted bits, as described in more detail below.
314 Determine the cost, C, of the path to future state s' from the state s, according to C= CSJ - A.qj.Ti + K.abS(qj - QMj) + K'»gJJ + K"x>tJJ
where Qs \ is the quality at which GOP i-1 was coded on the path through the trellis to state s for GOP i. oij is the probability of buffer overflow just before we remove timeslice z+1 at when timeslice / has been encoded at quality y starting in state s, uSiij is the probability of buffer underflow just after we remove timeslice i at quality y starting in state s.
315 If state variable Ss',i+1 is marked as active go to step 317, else go to step 316.
316 Mark the state SS ii+i as active and set the cost of this state Cs',i+i to C, the buffer fill tally of this state B5^+I to F' and set the parent of this state P S',i+i = s. Also record the quality used to arrive at this state as
Figure imgf000011_0001
= cy. Go to step 119.
317 If the cost C >= Cs>>i+i, go to step 119, else go to step 318.
318 Prune the path in the trellis from state PS',i+i at GOP i to state s' at GOP i+1. Overwrite the previously stored values by setting C5^+I to C, B5^i+1 to F' and Ps\i+i = s. This creates a path in the trellis from state s at GOP i to state s' at GOP i+1. Also record the quality used to arrive at this state as
Figure imgf000011_0002
= qj
319 End of processing for this combination of s and j. If there are more to process, return to step 309.
320 For each state s for which the state variable Ss,j is not marked as inactive determine if any future paths from s have survived to this point. If none prune out the whole path leading to this state.
322 If there is only one possible path from GOP I to GOP 1+1 after the pruning process then exit to step 328.
324 If there are more GOPs to be processed then repeat for next GOP by returning to step 307; otherwise (325), from the set of paths arriving to the end of the file choose the path that leads to the best final cost, prune all other paths and exit to step 328. Note other choices for the best final state are possible: best cost after one GOP (or any number), highest lowest buffer level along trellis path etc. 328 Return the chosen quality transition path
We return now to discussion of the estimation of the mean bit rate, its standard deviation, and the derived overflow and underflow probabilities.
In order to be able to calculate the overflow and underflow probabilities in step 313, we need to estimate the number of bits we would expect to be able to get through the network by the given time and a measure of the accuracy of that estimate. We prefer to estimate a mean number of bits and its standard deviation, and then assume a Gaussian distribution to calculate, the probabilities.
We measure the average bit rate through the network for the immediately preceding groups of pictures, averaging over say 10 to 100 groups of pictures. We then assume that this mean bit rate, mi, is sustained into the future. The mean number of bits, m, delivered by some time in the future, T, is then simply m = mi * T.
We found in a simple simulation of a network in which sessions started and stopped randomly and the total bandwidth was shared equally between them, that the mean number of bits delivered by a given time did vary in this way and that the standard deviation of the number of bits delivered increased roughly in proportion the mean and to the square root of time up to a limit of about 25% of the mean.
In measurements made on a network we found similar behaviour on average, with standard deviation again varying with the square root of time up to a limit. In some specific cases we found the standard deviation was mostly constant over time, at about 10% of the mean.
In our experiments we have achieved good performance when we model the standard deviation, σ,, as increasing in proportion to the mean and to the square root of time (measured from the current time into the future) up to a limit of 25% of the mean after 25 seconds and remaining at 25% of the mean subsequently. This can be expressed as:
Figure imgf000012_0001
Another option is to continually collect statistics as to bit rate and compute the actual standard deviation over a recent time window.
We then use these results to determine the probability of a path through the trellis from GOP i to GOP i+1 resulting in overflow (or underflow) at GOP i+1, which is added into the cost of the transition from GOP i to GOP i+1, as at Step 114 above.
We have determined the estimated buffer fullness F' at Step 110. The probability, given the standard deviation α* that the buffer fullness will actually reach zero - i.e. a deviation of (Bs,i - bj j)/ σj times the standard deviation can be looked up in a Gaussian cumulative probability table. Alternatively it can be calculated from
Figure imgf000013_0001
Similarly, the buffer would overflow if the fullness exceeds the buffer size B. The probability of this can be found by looking up (F' -B)/ σj in a Gaussian cumulative probability table, or from
Figure imgf000013_0002
In alternative embodiments of the invention we introduce the possibility of:
- Allowing the trellis to determine the quality for more than one GOP, and only running it again if there is a serious difference between actual and expected network throughput;
- Working with a look ahead window of frames, rather than looking all the way to the end, so that it can be used for real time encoding.
The method caters for variability in the actual bandwidth through the incorporation of the under and over flow probabilities. It can be further extended however by varying the bandwidth estimate as we propagate through the lattice based on other available information about future network bandwidth. There may be known events such as other streams about to end, which introduce a dependence on other streams but which would mean an imminent bandwidth increase. Downstairs rate curves were introduced .in the context of optimal bandwidth reservation for VBR coded video (See K. Sun & M. Ghanbari, An Algorithm for VBR video transmission scheme over the Internet, in
Proceedings of International Symposium on Telecommunications (IST2003), Isfahan, Iran, August 16-18, 2003). Any VBR asset will have a peak rate requirement to ensure no buffer starvation problems at the most difficult part of the content. Once this point is passed the next peak rate will be lower and so on. This series of peak rates form a downstairs stepping profile and this future profile of all currently streamed assets may also be available. If streamed at a rate proportional to this rate requirement (e.g. using MuITCP and variable N) this would suggest with no new streams added there will be more rate and less contention moving forward in time.
We may possibly have some knowledge of what types of assets might be requested at this time of day (EPG input, children's programs, sports events) and hence may be able to change how conservative or optimistic our future bandwidth projection will be. Additionally if one asset would consistently cause buffer overrun (i.e. sustained rate > highest quality rate) and using TCP then flow control would cause the streamer to throttle back and other assets would get a higher rate than otherwise expecting. Conversely if one asset would consistently cause buffer underflow and sufficient playback delay was not enforced then this would most likely be dropped and again more rate would then be available to the other streams.
It will be observed that the system described here differs from our earlier international patent application in a) having constant perceptual quality and hence variable rate encodings available at the server and information on future bit consumption requirements for different qualities. For any given transmission rate one may plan a future stream switching route taking the variability of the rate demand of each different quality and the projected buffer levels at the client into account. b) We can exploit known future events (e.g. one stream due to end shortly), client buffer status and future bandwidth demand in our planning to get a better trade- off of quality to available rate.
If we are using schemes to influence the share of any bandwidth a particular stream receives these algorithms can be built in to the future projections to give a better result.

Claims

Claims
1. A method of video coding comprising a) dividing a sequence to be encoded into a plurality of temporal portions; b) analysing the sequence to determine at least for each portion after the first, in accordance with a plurality of encoding quality settings, (i) a quality metric for the portion and (ii) the number of bits generated by encoding the portion at that quality setting; c) analysing the data to choose a set of quality settings, one per portion, that tends to minimise a combined quality cost for the sequence; wherein the combined quality cost is the sum of individual quality costs each of which is a function of the quality metric of the respective encoded portion; and d) encoding the sequence using the chosen quality settings characterised by e) estimation of receiver buffer fullness and the standard deviation thereof; f) and, in determining each individual quality cost, determining, from said estimates, the probability of buffer underflow and or overflow, the cost being a function also of the underflow and/or overflow probability.
2. A method according to claim 1 in which the receiver buffer fullness and the standard deviation thereof are estimated from measurements of the actual transmitted bit rate.
3. A method according to claim 1 in which a mean bit rate that can be transmitted is estimated from measurements of the actual transmitted bit rate, the estimated receiver buffer fullness is estimated using this estimated mean bit rate and the standard deviation of the estimated receiver buffer fullness after a time is estimated as a predetermined function of the mean bit rate and the time that over which the estimated bit rate contributes to the buffer fullness.
4. A method according to claim 3 in which the standard deviation of the receiver buffer fullness, up to a predetermined limit, is proportional to the mean bit rate and the square root of the time.
PCT/GB2010/000390 2009-03-05 2010-03-04 Video streaming WO2010100427A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201080010722.5A CN102369732B (en) 2009-03-05 2010-03-04 video streaming
EP10707637A EP2404449A1 (en) 2009-03-05 2010-03-04 Video streaming

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP09250628.6 2009-03-05
EP09250628A EP2227023A1 (en) 2009-03-05 2009-03-05 Video streaming

Publications (2)

Publication Number Publication Date
WO2010100427A1 true WO2010100427A1 (en) 2010-09-10
WO2010100427A8 WO2010100427A8 (en) 2011-10-06

Family

ID=41092107

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2010/000390 WO2010100427A1 (en) 2009-03-05 2010-03-04 Video streaming

Country Status (3)

Country Link
EP (2) EP2227023A1 (en)
CN (1) CN102369732B (en)
WO (1) WO2010100427A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014135838A1 (en) 2013-03-08 2014-09-12 Oxford Nanopore Technologies Limited Enzyme stalling method
US10298985B2 (en) 2015-05-11 2019-05-21 Mediamelon, Inc. Systems and methods for performing quality based streaming
US20200021634A1 (en) * 2018-07-16 2020-01-16 Netflix, Inc. Techniques for determining an upper bound on visual quality over a completed streaming session
US11076187B2 (en) 2015-05-11 2021-07-27 Mediamelon, Inc. Systems and methods for performing quality based streaming

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103428107B (en) * 2012-05-14 2016-08-24 中国科学院声学研究所 A kind of adaptive code stream switching method based on buffer underflow probability Estimation and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1248466A1 (en) * 2000-04-11 2002-10-09 Mitsubishi Denki Kabushiki Kaisha Method and apparatus for transcoding of compressed image
WO2006078594A1 (en) * 2005-01-19 2006-07-27 Thomson Licensing Method and apparatus for real time parallel encoding

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1248466A1 (en) * 2000-04-11 2002-10-09 Mitsubishi Denki Kabushiki Kaisha Method and apparatus for transcoding of compressed image
WO2006078594A1 (en) * 2005-01-19 2006-07-27 Thomson Licensing Method and apparatus for real time parallel encoding

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Adaptive Streaming within the 3GPP Packet-Switched Streaming Service", IEEE NETWORK, March 2006 (2006-03-01)
D. HANDS; K. CHENG: "Subject responses to constant and variable quality video, Human Vision and Electronic Imaging XIII 2008", SPIE ELECTRONIC IMAGING
ORTEGA A ET AL: "OPTIMAL TRELLIS-BASED BUFFERED COMPRESSION AND FAST APPROXIMATIONS", IEEE TRANSACTIONS ON IMAGE PROCESSING, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 3, no. 1, 1 January 1994 (1994-01-01), pages 26 - 39, XP000433559, ISSN: 1057-7149 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014135838A1 (en) 2013-03-08 2014-09-12 Oxford Nanopore Technologies Limited Enzyme stalling method
US10298985B2 (en) 2015-05-11 2019-05-21 Mediamelon, Inc. Systems and methods for performing quality based streaming
US11076187B2 (en) 2015-05-11 2021-07-27 Mediamelon, Inc. Systems and methods for performing quality based streaming
US20200021634A1 (en) * 2018-07-16 2020-01-16 Netflix, Inc. Techniques for determining an upper bound on visual quality over a completed streaming session
US10911513B2 (en) * 2018-07-16 2021-02-02 Netflix, Inc. Techniques for determining an upper bound on visual quality over a completed streaming session
US11778010B2 (en) 2018-07-16 2023-10-03 Netflix, Inc. Techniques for determining an upper bound on visual quality over a completed streaming session

Also Published As

Publication number Publication date
WO2010100427A8 (en) 2011-10-06
CN102369732B (en) 2015-09-23
EP2404449A1 (en) 2012-01-11
EP2227023A1 (en) 2010-09-08
CN102369732A (en) 2012-03-07

Similar Documents

Publication Publication Date Title
EP2612495B1 (en) Adaptive streaming of video at different quality levels
EP2589223B1 (en) Video streaming
US7054371B2 (en) System for real time transmission of variable bit rate MPEG video traffic with consistent quality
US20030165150A1 (en) Multi-threshold smoothing
CN104205769B (en) The DASH clients and receiver of the improvement selected using playback rate
US7706384B2 (en) Packet scheduling with quality-aware frame dropping for video streaming
US20030195977A1 (en) Streaming methods and systems
KR20040041170A (en) Data communications method and system using receiving buffer size to calculate transmission rate for congestion control
Zahran et al. OSCAR: An optimized stall-cautious adaptive bitrate streaming algorithm for mobile networks
EP2404449A1 (en) Video streaming
Peng et al. A hybrid control scheme for adaptive live streaming
Chou et al. Rate-distortion optimized receiver-driven streaming over best-effort networks
CN112437321B (en) Adaptive code rate calculation method based on live broadcast streaming media
CN106921860B (en) End-to-end video sending method and device
US7533075B1 (en) System and method for controlling one or more signal sequences characteristics
AT&T NimbusSanL-Regu
Bouras et al. Evaluation of single rate multicast congestion control schemes for MPEG-4 video transmission
Mulroy et al. The use of MulTCP for the delivery of equitable quality video
KR100782343B1 (en) Method of streaming image
Turaga et al. Adaptive live streaming over enterprise networks
ZHANG et al. Joint rate allocation and buffer management for robust transmission of VBR video
WO2023181205A1 (en) Video player, video playback method, and program
Tunali et al. Robust quality adaptation for internet video streaming
EP2408204A1 (en) Video streaming
Figueroa et al. Buffer management for scalable video streaming

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201080010722.5

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10707637

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

REEP Request for entry into the european phase

Ref document number: 2010707637

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2010707637

Country of ref document: EP