WO2021167531A1

WO2021167531A1 - Methods and systems for bandwidth estimation

Info

Publication number: WO2021167531A1
Application number: PCT/SG2021/050076
Authority: WO
Inventors: Praveen Kumar Yadav; Wei Tsang OOI
Original assignee: National University Of Singapore
Priority date: 2020-02-21
Filing date: 2021-02-16
Publication date: 2021-08-26

Abstract

A method for estimating bandwidth comprises: receiving a server response comprising a plurality of chunks of a segment of a video; identifying the plurality of chunks by identifying the presence of a chunk delineator in the response; identifying at least a subset of the chunks that have a size that is equal to or greater than a Maximum Transfer Unit (MTU) value; and outputting an estimated bandwidth based on a total size of chunks in the subset, and a total download time for the chunks in the subset.

Description

METHODS AND SYSTEMS FOR BANDWIDTH ESTIMATION

Technical Field

The present invention relates, in general terms, to method and systems for bandwidth estimation, particularly (but not exclusively) in the context of live streaming of video over communication networks.

Background

Video streaming represents a significant portion of internet traffic today. The popularity of live video streaming applications, such as online video games, live event streaming, virtual reality applications, and live video surveillance, is increasing with an expected fifteen-fold growth to reach 17% of the total traffic by the end of 2020. These applications require low end-to-end latency (also called glass-to-glass latency), i.e., the time lag between video capture and the actual playback time at the client, to enable real-time interaction while providing high quality and avoiding rebuffering events.

Dynamic adaptive streaming over HTTP (DASH) clients have been designed for video- on-demand (VoD) services without any stringent latency requirements in mind. Therefore, legacy DASH solutions face a severe problem in delivering low-latency live streaming, in that the latency achievable by these known solutions ranges from 6 to 40 seconds.

This unacceptable latency is often due to a large playback buffer and segment duration. In traditional DASH solutions, the origin server has to wait for a full segment to be encoded and packaged before it can be pushed to a content delivery network (CDN). This video contribution process requires at least one segment duration delay. In practice, the DASH video player has to buffer several segments (e.g., three segments in Apple HTTP live streaming) to start decoding and rendering. Considering all these serial delays, from video capture until rendering, can significantly increase the latency.

One way to reduce latency is to use a shorter segment duration (one second or less). Although this can control latency within target limits to some extent, it has several problems, such as (i) decrease in encoding efficiency, (ii) frequent quality change (video instability), and {Hi) a large increase in the number of requests and responses.

To avoid these problems while keeping the latency small, chunked transfer encoding (CTE) with MPEG Common Media Application Format (CMAF; ISO/IEC 23000-19) has recently emerged as the standard packager for low-latency delivery. CTE is one of the main features of FITTP/1.1 (RFC 7230), and allows delivering a segment in small pieces called chunks. A chunk can be as small as a single frame to deliver it to the client in near real-time, even before the segment is fully encoded and available at the origin. In CMAF, a chunk is the smallest referenceable unit of a segment that contains Movie Fragment Box ('moof') and Media Data Box ('mdat') atoms. These atoms make a chunk independently decodable, although representation switching still happens at the first chunk of the segment that contains the Instantaneous Decoder Refresh (IDR) frame.

Although the development of CTE and CMAF solution has been a step forward in low- latency streaming, this alone is not enough for reducing latency. Further latency reduction requires efficient content delivery networks (CDNs), resource-efficient encoders, and optimized adaptive rate and playback speed adaptation algorithms.

Summary

The present disclosure relates to a method for estimating bandwidth for client-side control of live streaming, comprising: receiving a server response comprising one or more chunks of a segment of a video; identifying the one or more chunks by identifying the presence of chunk delineators in the response; identifying at least a subset of the chunks that have a size that is equal to or greater than a Maximum Transfer Unit (MTU) value; and outputting an estimated bandwidth based on a total size of chunks in the subset, and a total download time for the chunks in the subset.

The method may further comprise, for each chunk, determining a start time and end time for the chunk based on the chunk delineators. In some embodiments, the start time and end time are determined after completion of download of the segment. In some embodiments, the chunk delineators comprise one or more moof boxes and one or more mdat boxes. The start time may be a timestamp of a moof box, and the end time may be a timestamp of an end of a mdat box.

In some embodiments, the total download time is determined according to start times and end times of chunks in the subset.

The method may comprise at least partially reading the response prior to completion of download of the response, before identifying if the chunk delineators are present in the response.

The present disclosure also relates to a method for determining at least one of a playback speed and a representation bitrate for live streaming, comprising : calculating an ideal queue utilisation based on assuming an instantaneous buffer occupancy as an expected value of buffer slack; and determining a product of the representation bitrate and playback speed based on the ideal queue utilisation and a bandwidth estimated according to a method as disclosed herein.

The method may further comprise: setting the playback speed to 1 and determining the representation bitrate if a latency is within a predetermined limit; decreasing the representation bitrate and increasing the playback speed if the latency is greater than the predetermined limit due to high buffer occupancy of a buffer, wherein each response is stored in the buffer before being played back; and decreasing the representation bitrate and the playback speed if a network bandwidth, of a network over which each response is transmitted, is less than a previously determined representation bitrate.

The present disclosure also relates to a live streaming client system, comprising: memory; and at least one processor, the memory storing instructions that when executed by the at least one processor, cause the at least one processor to perform a bandwidth estimation process comprising : receiving a server response comprising one or more chunks of a segment of a video; identifying the one or more chunks by identifying the presence of chunk delineators in the response; identifying at least a subset of the chunks that have a size that is equal to or greater than a Maximum Transfer Unit (MTU) value; and outputting an estimated bandwidth based on a total size of chunks in the subset, and a total download time for the chunks in the subset.

The bandwidth estimation process performed by the system may further comprise, for each chunk, determining a start time and end time for the chunk based on the chunk delineators. This may be done after completion of download of the segment.

In some embodiments, the chunk delineators comprise one or more moof boxes and one or more mdat boxes. The start time may be a timestamp of a moof box, and the end time may be a timestamp of an end of a mdat box.

In some embodiments, the bandwidth estimation process determines the total download time according to start times and end times of chunks in the subset. In some embodiments, the bandwidth estimation process comprises at least partially reading the response prior to completion of download of the response, before identifying if the chunk delineators are present in the response.

The client system may be used for determining at least one of a playback speed and a representation bitrate to download, the at least one processor being further configured to: calculate an ideal queue utilisation based on assuming an instantaneous buffer occupancy as an expected value of buffer slack; and determine a product of the representation bitrate and playback speed based on the ideal queue utilisation and the estimated bandwidth.

In some embodiments of the client system: if a latency is within a predetermined limit, the at least one processor is configured to set the playback speed to 1 and determine the representation bitrate; wherein each response is stored in a buffer before being played back, and if the latency is greater than the predetermined limit due to high buffer occupancy, the at least one processor decreases the representation bitrate and increases the playback speed; and where a network bandwidth, of a network over which each response is transmitted, is less than a previously determined representation bitrate, the at least one processor decreases the representation bitrate and the playback speed.

Brief description of the drawings

Embodiments of the present invention will now be described, by way of non-limiting example, with reference to the drawings in which:

Figure 1 is a block diagram of an example architecture of a live streaming client system according to certain embodiments;

Figure 2 is a flow diagram of a bandwidth estimation process according to certain embodiments;

Figure 3 is a flow diagram of an adaptive bitrate algorithm according to certain embodiments;

Figures 4(a) to 4(f) show actual and estimated bandwidth for six different network profiles;

Figure 5 shows bandwidth estimation error for the algorithm used in a prior art method (LoL) when latency drops below one second;

Figure 6 shows stall duration vs. number of stalls for different latency limits;

Figure 7 shows bitrate vs. changes in representation for different latency limits;

Figure 8 shows average latency for algorithms under different latency limits;

Figure 9 shows maximum instantaneous latency for different latency limits;

Figure 10 shows playback speed for different latency limits;

Figure 11 shows percentage of total time playing at normal speed;

Figure 12 shows QoE for algorithms based on the average bitrate, magnitude to changes in bitrate, and stalls;

Figure 13 shows QoE considering the latency and playback speed additionally; and Figure 14 shows average (top) and maximum (bottom) latency using 3 and 6 second segments and 3 seconds of latency limit.

Detailed description Figure 1 shows an example architecture of a client system 100 for live streaming of media content. The client system 100 streams data of the media content from one or more content servers via a network 102 by making HTTP requests to the one or more content servers.

Data stored by a content server is split into time-consecutive segments, such as MP4 segments, that can be addressed and downloaded independently. The addresses (HTTP URLs) of these segments are sent by the server to the client 100. Each segment of the media content is associated with one HTTP address.

Each server also stores a manifest file (typically an XML file) that describes the nature of the media content. This may include the encoding format, such as resolution, bitrate and timing information, the list of segments, and their associated URLs. This manifest file is sent to the client 100 such that the client 100 can send HTTP requests to the content server to download the segments of the media content.

For the MPEG/DASH streaming protocol, by reference to which embodiments of the present invention are described, the manifest file is referred to as a Media Presentation Description (MPD) file. Where the media content is video, an MPD file contains information pertaining to different possible representations of the video. A representation comprises a bitrate and resolution at which the video is encoded. A client 100 can choose which representation of the video to download segments of, depending on (for example) bandwidth and buffer occupancy, as will be described in further detail below.

The client system 100 comprises a DASH client module 110 that comprises a controller component 112 and a bandwidth estimation component 114. The DASH client module 110 is in communication with an HTTP client 120. The HTTP client 120 requests, for example via a Fetch API, a video segment based on information given in the MPD file. The properties available in MPD, namely availabilityStartTime and publishTime, allow the DASH client module 110 to calculate the instantaneous latency and request (via the HTTP client 120) the latest available segment from the server.

One or more Fetch API methods called by the client system 100 enable the reading of a chunk out of a partially downloaded segment. Downloaded chunks are pushed, as they are received, to the playback buffer 130 of a video player 140, from where they are rendered at a specific speed decided by the controller 112 of the DASH client module 110. The video player 140 may be a software application that functions to output data in suitable form to a display (via an integrated graphics component or dedicated graphics card, for example) for viewing by a user.

The DASH client module 110 is also responsible for estimating the bandwidth using the downloaded chunk's size (in bytes) and the time interval for its download, by the bandwidth estimation component 114. In this regard, in embodiments of the present disclosure, a chunk parser 116 of the DASH client module 110 receives data fragments as chunks of a segment are received via HTTP client 120. The HTTP client 120 sends complete chunks to the buffer 130, but while a chunk is downloading, the chunk parser 116 monitors the data fragments to precisely determine the start and end times of each chunk so that the bandwidth estimation component 114 may accurately estimate the bandwidth, for use by the controller component 112 in selecting the bitrate and playback speed to feed to the video player 140.

The controller component 112 implements an adaptive bitrate (ABR) algorithm that considers different heuristics based on the estimated bandwidth value and the buffer information, the required latency limits, and the instantaneous playback speed, to decide the next representation of the segment to download and the playback speed, as will be described in further detail below.

In the described embodiments, the systems described herein are in the form of one or more networked computing systems, each having a memory, at least one processor, and at least one computer-readable non-volatile storage medium (e.g. , solid state drive), and the processes described herein are implemented in the form of processor- executable instructions stored on the at least one computer-readable storage medium. However, it will be apparent to those skilled in the art that the processes described herein can alternatively be implemented, either in their entirety or in part, in one or more other forms such as configuration data of a field-programmable gate array (FPGA), and/or one or more dedicated hardware components such as application-specific integrated circuits (ASICs).

A problem with previously devised approaches for other contexts such as VoD streaming is that they use the measured value of throughput for the downloaded segment. This is not a reasonable estimate of the network transmission capacity. Computing the throughput as (segment size/segment download time) always produces a throughput value equal to (or slightly smaller than) the segment encoding bitrate due to inter-chunk idle periods introduced during CTE delivery. Hence, the ABR algorithm experiences incorrect throughput measurements, which prevent it from switching to higher bitrate levels. Moreover, a DASH client faces a longer latency if its buffer contains video chunks for a longer duration or a higher network latency due to a sudden drop in bandwidth. In the case of a client that plays the entire video at normal playback speed without skipping any of the chunks, the latency becomes cumulative and its future value cannot go lower than the instantaneous value. One way to reduce the latency is to skip some segments and download the latest segment. This technique is not suitable for many applications such as online gaming and live sports streaming as the user would not like to skip key moments in a game to reduce the latency. Another technique for reducing the latency is to speed up the rendering of the buffered chunks to playback the later segments earlier. In such a case, the controller needs to decide a judicious value for the playback speed for this latency control process, as an overly aggressive playback speedup leads to an empty buffer, causing a playback stall.

In view of these issues, embodiments of the present disclosure provide a novel bandwidth estimation process that provides more accurate bandwidth estimates. Embodiments also provide a process for bitrate adaptation and playback speed control that is based on a M/D/I/K queueing system model, and that provides improved performance over known adaptive bit rate (ABR) algorithms.

The need for improved bandwidth estimation is motivated by the following considerations. The encoding time of a segment in live video streaming is equal to its content duration since the encoder has to wait for each frame to be produced in realtime. This encoding time affects the throughput measurement at the client end. For example, a segment encoded at 2000 Kbps with a segment duration of 4 seconds has a total size of 8000 Kb and takes 4 seconds to generate entirely at the server (origin). In the hypothetical case of zero network delay, the segment requested at the beginning of its encoding at the server needs 4 seconds to get fully downloaded at the client's end, resulting in a throughput measurement of 2000 Kbps. Therefore, in the case of additional network latency, the measured throughput cannot exceed the segment's bitrate value and cannot depict the available bandwidth. This inaccuracy in measuring the bandwidth calls for an alternative measurement for bandwidth estimation.

An embodiment of a bandwidth estimation method 200 will now be described with reference to Figure 2. The bandwidth estimation method 200 is an iterative process in which the chunk parser component 116 may use the read() method of the Fetch API to track the progress of the chunk download, and parse the chunk payload in real-time. This is illustrated beginning at block 202, where the process 200 begins by obtaining a downloaded data fragment (block 202). The downloaded data fragment is a portion of a chunk of a segment requested by FITTP client 120.

At block 204 the chunk parser 116 checks for a chunk delineator. For example, the chunk delineator may be a 'moof' box (or atom), indicating the start of a chunk, or an 'mdat' box (or atom), indicating the end of a chunk.

If a chunk start is detected (block 206), the start time of the chunk is stored (block 208), and processing returns to block 202 for analysis of the next data fragment. For example, if the chunk parser 116 detects a 'moof box of a chunk, the timestamp is stored as the beginning time of the chunk download, for example using performance.now().

If, instead, a chunk end is detected (block 210), the end time of the chunk is stored (block 212), as well as the size (in bytes) of the chunk. For example, the end time of the chunk may be the timestamp of the end of a 'mdat' box.

A check is then performed (block 214) to determine whether the entire segment has been downloaded. If not, processing returns to block 202, to obtain the first data fragment of the next chunk.

If all chunks of the segment have been downloaded, then at block 216, a chunk filtering process is performed. Chunk parser 116 checks whether the bytes received for a chunk are equal to or more than the maximum transmission unit (MTU) of the network 102 over which the content is being streamed. If the condition is satisfied, the chunk is retained for consideration in the bandwidth estimation. Otherwise, the chunk is considered to correspond to an idle period, and is filtered out from the segment bandwidth estimation.

At block 218, the total size and download time of the subset of chunks which pass the filter at block 216 (which may, in some cases, be all chunks of the segment). In this regard, the total size of the filtered chunks is the sum of the sizes of the chunks in the subset, and the download time is the sum of intervals between the respective start times and end times of the chunks in the subset. At block 220, the estimated bandwidth for the segment is output, this being computed as the total size divided by the download time from block 218.

The process 200 is typically repeated for each segment of the media content. The estimated bandwidth that is output by process 200 for any given segment may be passed to controller 112 to use in selection of an appropriate representation and playback speed for the next segment.

The bandwidth estimation process 200 may be described by the pseudocode in Equation (1).

The bandwidth estimation process 200 is based on the insight that the presence of 'moof' atoms in the portion of the response body implies the beginning of a new chunk, and that received bytes that equal or exceed the MTU implies that the server contains enough data ready for transmission to make it use more than one transfer unit. Therefore, the portion of the response body is likely to have less encoding-induced latency. An embodiment of a method for determining at least one of a playback speed and a representation bitrate for live streaming will now be described. The method implements an algorithm that is based on approximating the playback buffer 130 as a queuing system. Using a queueing model, the algorithm may estimate the client's 100 buffer occupancy convergence for a given bitrate while playing the video at a specific rate with the given network bandwidth. The algorithm then selects the representation of the next segment to download and the playback speed of the video so that the buffer occupancy converges to the ideal value.

A DASH client can be seen as an M/D/1/K queuing system where the live video stream chunks that belong to a fixed duration segment arrive in the queue with capacity K. Each segment consists of several chunks. Let each segment of the video be encoded into L representation levels, with bitrate values R - {r_vr₂, ...,¾}, and < h if i < j. The DASH client starts playing the video as soon as the chunk arrives in the queue. Although the video is played when a chunk is available, the decision to switch the video's representation level is taken by the client at the segment boundary where IDR frames are present. The client requests a new segment once it finishes the download of the previous one.

CMAF-based video segmentation and the Fetch API allow chunks as small as a single frame to be added to the buffer and decoded before the download of the entire segment. Therefore, in embodiments of the present disclosure, the buffer occupancy is measured as the number of frames available in the buffer for playing. The algorithm implemented by the controller component 112 calculates the value of buffer occupancy by taking the product of buffer occupancy in seconds and the video's frame rate. Having a higher buffer occupancy increases the latency as the chunk waits for a longer time in the buffer to get played.

Turning now to Figure 3, an embodiment of a method 300 for determining at least one of a playback speed and a representation bitrate for live streaming comprises controller component 112 determining (block 302) a latency and buffer occupancy, the values of which may be obtained from playback buffer 130.

At block 304, the buffer capacity k is set to be the desired latency limit. Next, at block 306, controller 112 determines the ideal value of the queue utilisation p for keeping the buffer half-filled, by considering the instantaneous buffer occupancy as the expected average buffer slack as per the etiquette of the M/D/1/K queue. Using this value of p and the estimated bandwidth BW output by the process 200 of Figure 2, the product of bitrate and playback speed with the help of queuing utilization p = l/m can be determined as: r x s — BW I p (2)

Equation (2) may be used for bitrate and playback speed adaptation under different scenarios, depending on the latency and buffer occupancy.

For example, at block 308, if the controller 112 determines that the reported latency is within the desired latency limit, the playback speed s can be set to 1, and the representation level with the highest bitrate that is less than or equal to r^* - BW/p, based on Equation (2) (block 310). This is because in practice, the set of available bitrate values R is discrete. Let this selected representation level be y, with the bitrate r_y.

If the controller 112 determines (block 312) that the latency is higher than the desired limit and the buffer occupancy is high, the controller 112 selects (block 314) a lower bitrate. In some embodiments, this corresponds to a representation that is one level lower than the representation it would select for s = 1, which is - l with the bitrate T_y^. This selection of representation level allows the controller 112 to select a playback speed s > l, which satisfies the condition r_y_ ₁ x s = BW/p.

In some embodiments, while calculating p using buffer occupancy at the time t, the algorithm may use the effective instantaneous buffer occupancy B_t/s instead of B_t because s > l , which leads to faster drainage of the buffer occupancy. Such a calculation of p accounts for the change in average buffer occupancy due to the change in playback speed to avoid rebuffering stalls. Since p is a monotonically decreasing function of instantaneous buffer occupancy, and there are a finite set of values for s due to the web browser and QoE constraints, therefore we can approximate the value of s that satisfies the relationship between r, s, BW, and p. Returning to block 312, the controller 112 may determine that the buffer occupancy is too low to avoid rebuffering. This may occur when the network bandwidth is lower than the bitrate for the selected representation level, such that the latency increases with the buffer occupancy decrease. The buffer occupancy is very low in such a case when the latency is beyond the latency limit. This lower value of the buffer occupancy leads to a r^* value that is lower than the bitrate of the lowest available representation. Accordingly, in this scenario, the controller 112 uses B_t/s to calculate (block 316) because the buffer is now draining slowly due to slower playback speed. Since in the model, buffer occupancy is measured in units of time, draining the buffer slower effectively increases the buffer occupancy. The controller 112 then selects (block 318) the lowest possible representation level, with bitrate r_lr and sets the playback speed (s < l) to a value which satisfies the condition r₍ x s = BW/p. At block 320, the controller 112 sends the determined playback speed to the video player 140. In some embodiments, the playback speed need only be sent to video player 140 if it differs from the current playback speed reported by the video player 140. The controller 112 may also send the selected representation level (which corresponds to the selected bitrate) to the HTTP client 120 for downloading of the next segment.

Algorithm evaluation

To evaluate the present method, we compare it against three algorithms designed for low-latency scenarios. The following discussion relates to a comparison of an implementation (referred to herein as QLive) of the present method against those three algorithms.

The first one is LoL (M. Lim et a I . , "When they go high, we go low: low-latency live streaming in dash.js with LoL; in Proceedings of the 11th ACM Multimedia Systems Conference, ACM, Istanbul, Turkey, 321-326) that includes three main modules: a selforganizing map (SOM) learning-based ABR algorithm, playback speed control, and throughput measurement. The ABR algorithm of LoL considers four heuristics as an input (measured throughput, latency, current buffer level, and QoE) in the SOM model to perform ABR decisions. It also implements a robust throughput measurement algorithm that tracks when a chunk arrives and its download finishes. The next algorithm is L2A (Theo Karagkioules, Rufael Mekuria, Dirk Griffioen, and Arjen Wagenaar (2020), "Online learning for low-latency adaptive streaming", in Proceedings of the 11^th ACM Multimedia Systems Conference, ACM, Istanbul, Turkey, 315-320), which uses an online optimization convex optimization framework to formulate ABR selection problem in low-latency live streaming and proposes an online learning rule to solve the optimization problem. The main intuition behind L2A is to learn the best policy that can select a suitable bitrate for each segment downloaded. It does so without requiring any parameter tuning, modifications according to the application type, statistical assumptions for the channel, or bandwidth estimation.

The third algorithm, Stallion (C. Gutterman et a I . , "Stallion: video adaptation algorithm for low-latency video streaming", in Proceedings of the 11^th ACM Multimedia Systems Conference, ACM, Istanbul, Turkey, 327-332), uses the throughput-based ABR of Dash.js with the slight modification that incorporates a sliding window technique to measure the mean and standard deviation of both the throughput and latency and then performs ABR decisions. This modification makes Stallion react well to low latency requirements.

The playback speed algorithm for LoL, L2A, and Stallion is based on the default Dash.js algorithm and is independent of the bitrate adaptation algorithm. While L2A and Stallion use the original playback speed algorithm, LoL modifies it to consider the buffer state before changing the playback speed to avoid rebufferings. For a fair comparison, we use the default playback speed algorithm of Dash.js for LoL, L2A, and Stallion. The algorithm controls the latency by increasing the playback speed based on the latency limit and the instantaneous latency. The algorithm first calculates the Dί,, which is the difference between the instantaneous latency and the latency limit. This value is used for the calculation of playback rate as ((2 x s_max)/{ l + e^~ALX5)^j + l - s_max. Here, s_max is the upper threshold for the playback speed. Besides, we used the bandwidth estimation technique of the present disclosure for all ABR algorithms, including the ABR algorithm of the present disclosure.

Video sample

We use the software library provided by Streamline to generate the CMAF-based live stream random color pattern). It uses FFmpeg (https://ffmpeg.org/) to encode and package the streams in CMAF format, and a Python origin server to deliver the packaged chunks to the client using CTE. The original version of the source code generates the video stream with one bitrate. We modified the code to generate the video stream with three bitrates, chunked at the frame level. The highest bitrate for live streaming suggested by Facebook is 4000 Kbps, although there is no guideline for the lowest value. YouTube suggests the lowest bitrate in the range of 300 Kbps to 700 Kbps for live streaming. There are various other bitrate recommendations in the similar range 500 Kbps to 4000 Kbps from other platforms. Therefore, we used three values 500 Kbps, 2500 Kbps, and 4000 Kbps. We choose the segment duration that is less than or equal to the required latency limit. For example, for 3 seconds of latency limit, we use 1, 2, and 3 seconds of video segments. Similarly, for 2 seconds of latency limit, we use 1 and 2 seconds video segments, and for 1-second latency limit, we use just a 1-second video segment. We describe the effect of choosing a segment duration with a higher value than the latency limit in the Results and Comparison section below by analyzing the algorithms with 6-second segment and a 3-second latency limit.

Network profiles

We used six network profiles for evaluation. Profiles PI and P2 are taken from the DASFI Industry Forum Guidelines. It has 5 levels with rate {1500, 2000, 3000, 4000, 5000} Kbps, with corresponding delay of {100, 88, 75, 50, 38} ms, and packet loss {0.12, 0.09, 0.06, 0.08, 0.09} % varying at the interval of the 30 seconds. PI follows a high- low-high pattern; P2 follows a low-high-low pattern. Profile P3 and P4 are FISDPA network traces from a moving car and train with the rate ranging from 3 to 5876 Kbps varying at the interval of 1 second. These two profiles are very challenging as the bandwidth is very low, and there are a few instances where rebuffering avoidance is tough. Profile P5 and P6 are 4G/LTE network traces from a moving car and train with the rate ranging from 0 to 173000 Kbps varying at the 1-second interval.

Evaluation Metrics

First, we evaluated the accuracy of our bandwidth estimation algorithm. Next, we evaluated the traditional performance metrics for bitrate adaptation, namely, average bitrate, changes in representation to check the quality fluctuations, the magnitude of changes in quality (as the difference in bitrate values), number of stalls, duration of each stall, latency, and playback speed. We used two QoE models (Yin et a I . , Proceedings of SIGCOMM Ί5, 325-338; and Twitch grand challenge) to calculate the QoE using these metrics. Finally, we evaluated the latency dependent matrices. We measure the average latency to check if an algorithm can control the latency with the preset limits. The instantaneous value of latency is critical for applications such as online gaming and video surveillance. Therefore, we check for the maximum instantaneous value of latency as well. We also check the average playback speed and the duration for which video is played at the normal speed by different algorithms.

Implementation and Experimental Setup

We implemented the ABR algorithm of the present disclosure on Dash.js video reference player (v3.1.3). The other low-latency ABR algorithms (LoL, L2A, and Stallion) are already integrated in Dash.js. We evaluate the algorithms mentioned above with three latency limits of 1, 2, and 3 seconds. For the playback speed adaptation, we use the YouTube recommendation speed limit, i.e., 0.25 times to 2 times of normal playback speed. We used an Apache Webserver to host the Dash.js video player containing the four bitrate adaptation algorithms for comparison. The FFmpeg encoder with CMAF packager and origin are running on the server provided by the Streamline. The server and client run on two different Linux based machines connected by a router. We used the tc NetEm network emulator to control the network bandwidth according to the network profiles.

We tested the algorithms for each network profile and latency limit with different segment durations for 10 minutes of the total live video session. We use the network profiles in the loop when the traces are available for less than 10 minutes. We measure all the metrics in the interval of 30 milliseconds to cover the frame-level changes in the buffer because buffer occupancy is often low in the low latency streaming.

Results and Comparison

We now compare and describe the performance of different algorithms. First, we describe the performance enhancement for our proposed bandwidth estimation accuracy. Then, we compare the performance of different bitrate adaptation and speed control algorithms using the metrics such as average bitrate, changes in representation level, number of stalls, stall duration, playback speed, latency, and QoE. We present each algorithm's average results for a given latency limit over different network profiles and segment duration.

Bandwidth estimation accuracy The accuracy of the bandwidth estimation algorithm used in LoL is high in general. However, the method fails when the latency drops below 1 second. This phenomenon is not common with the LoL, L2A, and Stallion because these methods use the Dash.js default playback speed algorithm, which both tend to slow down the playback speed to increase the instantaneous latency to the given limit whenever the latency value drops below the limit. The QLive speed adaptation is different and does not aim to increase the latency regardless of the instantaneous value. Figure 5 shows one such example, where LoL's bandwidth estimation fails. Around 330 seconds, the QLive latency drops below 1 second, and the algorithm also selects the lowest representation having the bitrate 500 Kbps. Such representation and the lower instantaneous latency give a false bandwidth estimate, which is much too low compared to the actual value. Therefore, QLive keeps on selecting the lowest representation, and the instantaneous value of latency remains lower than 1 second as the actual bandwidth is much higher than the representation's bitrate. Therefore, the error in the bandwidth estimation continues.

To address this, the chunk filtering approach based on Equation 1 was used. This modification leads to a very high average bandwidth estimation accuracy of 94%, with the minimum value of 87% for different profiles, and segment duration, calculated using Dynamic Time Warping with Manhattan distance. Some of the inaccuracy is due to experiment artifact - the segment's download falls in the time window when the NetEm is changing the bandwidth, causing the averaging of the values on the client's end. As we have demonstrated in Figure 4 for one of the experiment traces, the difference between the actual and estimated bandwidth is minimal regardless of latency.

Stall duration and number of stalls

Figure 6 shows the XY plot for comparing the number of stalls and their duration for different algorithms under different latency limits. The points that are closer to the lower-left corner yield better performance in both dimensions, i.e., the minimum number of stalls with a minimum duration of stalls. QLive is almost in the lower-left corner for different latency limits. The stall duration is 0.6 seconds, 0.8 seconds, and 0.7 seconds and the number of stalls is 0.8, 1, 0.5, on average, for different latency limits. LoL is the nearest to QLive and even has slightly fewer stalls for 2-second latency limit but a higher stall duration. LoL incurs average stall duration of 0.6 seconds, 1.1 seconds, and 0.8 seconds with an average of 5.5, 0.8, and 0.8 stalls for different latency limits. Stallion has the worst performance for 1 and 2 seconds latency limits with the highest stall duration of 3.9 seconds and 2.1 seconds, whereas L2A had the most prolonged stall duration of 1.6 seconds for 3 seconds. All the algorithms' performances improved as we increased the latency limits, as more extended latency limits also give more room for the buffer to grow.

Average bitrate and changes in representation

Figure 7 shows the XY plot for the average bitrate and the number of changes in representation for different latency limits. We can see that QLive always has the highest bitrate that is 6% to 14% higher than the nearest performing algorithms and 40% to 53% higher than the worst-performing algorithms. Although the number of changes in the representations is also highest, the values are less than 1/5 times the maximum possible values, based on the segment duration and total playback duration. Additionally, QLive also has the lowest stall duration as described in the previous section. This shows that QLive is more adaptive to changing network bandwidth for improving the overall bitrate. Stallion has the nearest bitrate value to the QLive for 1 and 2 seconds latency limit, but also has the highest stall duration on average. Similarly, L2A has the nearest bitrate to the QLive for 3 seconds latency limit and has the highest stall duration. LoL has more changes in the representation levels for 1 and 2-second latency limits, i.e., 28 and 29 compared to 20 and 19 for L2A and 15 and 14.6 for Stallion. Its lower stall duration than Stallion and L2A shows that the changes in the representation levels have helped reduce the stalls. Therefore, it is vital to see the bitrate, changes in the representation levels, and stalls altogether.

Average and maximum instantaneous latency

Figures 8 and 9 show the average and the maximum latency at any instance for different algorithms under different latency limits. Applications like online video games, live auctioning, and video surveillance require lower average latency and lower instantaneous latency, making both factors vital for performance evaluation. QLive latency is the nearest to the different latency limits. It got the smallest limit of 1 second, the average latency is 1.4 seconds, and the maximum instantaneous latency is 7 seconds. Other algorithms have variable performance for different latency limits. Stallion always chooses a representation with a lower bitrate than the bandwidth, which leads to a decrease in the latency as the limit increases, but still have higher average latency than the QLive. Both L2A and Stallion have the worst performance in terms of average latency for the minimal latency limit of 1 second, which is 10 times higher than the limit. The instantaneous latency for L2A is also highest, i.e., 28 seconds for the 3 seconds latency limit.

The maximum latency for QLive with 3 second latency limit is even lower than that for the 1 second limit for other algorithms. This is because QLive bitrate adaptation and playback speed control are tightly coupled and calculated together based on the available representation levels, previous playback speed, network, and buffer conditions. In contrast, other algorithms depend on the Dash.js default playback speed adaption algorithm, and their bitrate adaptation is a separate process.

Playback speed

QLive and the compared algorithms modify the playback speed to control the latency within the given limits. Having lower latency is essential for QoE in live streaming, but too high fluctuations in the playback speed also hamper the QoE. We analyze the average playback speed and the percentage of the total time a video is played at normal speed using the different algorithms in Figure 10, and 11 respectively. As can be seen, the average playback speed for QLive is closer to the normal speed, i.e., 1.05, 1.09, and 1.17 for different latency limits. The percentages of time that QLive plays at normal speed are also the highest, i.e., 92%, 86%, and 75% for 1 second, 2 seconds, and 3 seconds latency limit. Interestingly, the percentage declines with the increase in the latency limit. This decline is because we use 1 second, 2 seconds, and 3 seconds segments for 3-second latency limit. Profiles P3 to P6 have bandwidth changing every second. Having a 3-second segment makes the adaptation difficult, as the player changes the playback speed and representation level only when it finishes the entire segment download. Therefore, for a 1-second latency limit where there is just a 1- second segment to test, it has an average speed closer to the normal speed and the highest amount of time playing at normal speed. This trend of change in average playback speed and the amount of time getting played at normal speed is reversed for other ABR algorithms which increase the latency limits. This is because their playback speed adaptation considers the gap between the latency limits and the instantaneous value of latency, regardless of bandwidth and buffer conditions. Therefore, when the latency is high, and the buffer is draining, an increase in playback speed adds to the latency. L2A, in general, plays minimal time at normal speed, even for the 1-second segment. It only plays 30% to 53% of the times at normal speed on average for different latency limits. Quality of experience

As we described in the previous paragraph, the effect of playback speed on QoE is very subjective; we present the results for partial QoE proposed by Yin et al. using the following equation:

Here, J\r is the total number of segments; q maps a bitrate to a quality value; T_stall is the total stall duration during the playback and T_s is the startup delay. As in Yin et al, q(·) is the identity function, 1= 1, D and A_s are set to the maximum bitrate of the video sample. To capture the effect of playback speed and latency, we also evaluate the QoE based on the extended formulation used in Twitch's ACM MMSys 2020 Grand Challenge as:

Here, the equation additionally penalizes the QoE for latency L, and for non-normal speed | 1 - p | for each segment n. Figures 12 and 13 show this QoE for different algorithms using Equations 3 and 4 respectively.

Since the QLive has the lowest latency and plays the video longer at normal speed, it leads in both formulations of QoE. It has up to 43% and 45% better QoE based on Equations 3 and 4 respectively. The QoE increases with the increase in the latency limit due to the improvement in the average bitrate. Stallion has the second position because it has the fewest changes in the representations.

Results over Internet

We tested QLive with the 1-second latency limit over the internet, where the encoding server and the client are on two different continents and approximately 10900 km away. We run ten sessions, each of 1-hour duration, during the COVID-19 pandemic period when the internet traffic is experiencing an unexpected surge as most of the events and meetings are happening online. Many streaming services in the UK and Europe were forced to reduce the video quality during the testing period to prevent a possible collapse of the internet. In this overloaded Internet situation, QLive achieves an average latency of 0.92 seconds with an average stall of 1.6 seconds while playing at the unit speed 99.2% of the total time. The average bitrate is also 3837 Kbps, where the maximum attainable value is 4000 Kbps as per the encoding.

Segment duration versus latency limit

To analyze the effect of using a segment bigger than the latency limit, we perform the experiment using 6-second segments with 3-second latency limit and compared the maximum instantaneous latency and average latency for all the algorithms. Figure 14 compares the average and maximum instantaneous latency using 3 and 6 seconds of the segment. Since the changes in representation are only possible at the segment boundaries, having longer segment results may result in more erroneous decisions for rate and playback speed adaption. Further, having a segment longer than the latency limit leads to an increase in the segment's playback speed whose encoding is still in progress, causing a buffer drought and further increases the latency.

The average latency for QLive jumps more than 3 times when the segment duration is 6 seconds compared to 3 seconds. L2A is minimally affected as its original latency of 3 seconds was already the highest compared to other algorithms. LoL and Stallion also face 5 seconds and 4 seconds increase in average latency. Similarly, the maximum latency also increased for all algorithms by 4 to 9 seconds with the increase in the segment duration.

Summary of algorithm evaluation

QLive has the upper hand in most performance metrics because of its combined rate and playback speed adaptation algorithm. The default Dash.js playback speed adaptation algorithm used by the compared bitrate adaptation algorithm is not suitable when the buffer is low and the latency is beyond the limit. Increasing the playback speed in such a situation further increases the latency and causes more stalls. Despite having more number of changes in representation levels, QLive has the highest bitrate with minimal stall while playing the video longer at normal speed. This leads to better QoE calculated based on the two popular formulations. Other ABR algorithms (LoL, L2A, and Stallion) have better performance in terms of the number of changes in the representations. Flowever, the network situation in Profiles P3 and P4, where the bandwidth is low and fluctuations are high, creates a challenging situation that needs more sensitive bitrate and playback adaptation. It will be appreciated that many further modifications and permutations of various aspects of the described embodiments are possible. Accordingly, the described aspects are intended to embrace all such alterations, modifications, and variations that fall within the spirit and scope of the appended claims. Throughout this specification and the claims which follow, unless the context requires otherwise, the word "comprise", and variations such as "comprises" and "comprising", will be understood to imply the inclusion of a stated integer or step or group of integers or steps but not the exclusion of any other integer or step or group of integers or steps.

The reference in this specification to any prior publication (or information derived from it), or to any matter which is known, is not, and should not be taken as an acknowledgment or admission or any form of suggestion that that prior publication (or information derived from it) or known matter forms part of the common general knowledge in the field of endeavour to which this specification relates.

Claims

1. A method for estimating bandwidth for client-side control of live streaming, comprising: receiving a server response comprising one or more chunks of a segment of a video; identifying the one or more chunks by identifying the presence of chunk delineators in the response; identifying at least a subset of the chunks that have a size that is equal to or greater than a Maximum Transfer Unit (MTU) value; and outputting an estimated bandwidth based on a total size of chunks in the subset, and a total download time for the chunks in the subset.

2. The method of claim 1, further comprising, for each chunk, determining a start time and end time for the chunk based on the chunk delineators.

3. The method of claim 2, wherein the start time and end time are determined after completion of download of the segment.

4. The method of any one of claims 1 to 3, wherein the chunk delineators comprise one or more moof boxes and one or more mdat boxes.

5. The method of claim 4, wherein the start time is a timestamp of a moof box, and the end time is a timestamp of an end of a mdat box.

6. The method of any one of claims 2 to 5, wherein the total download time is determined according to start times and end times of chunks in the subset.

7. The method of any one of claims 1 to 6, comprising at least partially reading the response prior to completion of download of the response, before identifying if the chunk delineators are present in the response.

8. A method for determining at least one of a playback speed and a representation bitrate for live streaming, comprising: calculating an ideal queue utilisation based on assuming an instantaneous buffer occupancy as an expected value of buffer slack; and determining a product of the representation bitrate and playback speed based on the ideal queue utilisation and a bandwidth estimated according to the method of any one of claims 1 to 7.

9. The method of claim 8, further comprising: setting the playback speed to 1 and determining the representation bitrate if a latency is within a predetermined limit; decreasing the representation bitrate and increasing the playback speed if the latency is greater than the predetermined limit due to high buffer occupancy of a buffer, wherein each response is stored in the buffer before being played back; and decreasing the representation bitrate and the playback speed if a network bandwidth, of a network over which each response is transmitted, is less than a previously determined representation bitrate.

10. A live streaming client system, comprising: memory; and at least one processor, the memory storing instructions that when executed by the at least one processor, cause the at least one processor to perform a bandwidth estimation process comprising: receiving a server response comprising one or more chunks of a segment of a video; identifying the one or more chunks by identifying the presence of chunk delineators in the response; identifying at least a subset of the chunks that have a size that is equal to or greater than a Maximum Transfer Unit (MTU) value; and outputting an estimated bandwidth based on a total size of chunks in the subset, and a total download time for the chunks in the subset.

11. The client system of claim 10, wherein the bandwidth estimation process further comprises, for each chunk, determining a start time and end time for the chunk based on the chunk delineators.

12. The client system of claim 11, wherein the bandwidth estimation process comprises determining the start time and end time after completion of download of the segment.

13. The client system of any one of claims 10 to 12, wherein the chunk delineators comprise one or more moof boxes and one or more mdat boxes.

14. The client system of claim 13, wherein the start time is a timestamp of a moof box, and the end time is a timestamp of an end of a mdat box.

15. The client system of any one of claims 11 to 14, wherein the bandwidth estimation process determines the total download time according to start times and end times of chunks in the subset.

16. The client system of any one of claims 10 to 15, wherein the bandwidth estimation process comprises at least partially reading the response prior to completion of download of the response, before identifying if the chunk delineators are present in the response.

17. The client system of any one of claims 11 to 16, being used for determining at least one of a playback speed and a representation bitrate to download, the at least one processor being further configured to: calculate an ideal queue utilisation based on assuming an instantaneous buffer occupancy as an expected value of buffer slack; and determine a product of the representation bitrate and playback speed based on the ideal queue utilisation and the estimated bandwidth.

18. The client system according to claim 17, wherein: if a latency is within a predetermined limit, the at least one processor is configured to set the playback speed to 1 and determine the representation bitrate; wherein each response is stored in a buffer before being played back, and if the latency is greater than the predetermined limit due to high buffer occupancy, the at least one processor decreases the representation bitrate and increases the playback speed; and where a network bandwidth, of a network over which each response is transmitted, is less than a previously determined representation bitrate, the at least one processor decreases the representation bitrate and the playback speed.