USING QUALITY INFORMATION FOR ADAPTIVE STREAMING
OF MEDIA CONTENT
RELATED U.S. APPLICATIONS
[0001] This application claims priority to U.S. Provisional Application No.61/752, 796, titled "System and Method for ln-band Signaling of Segment Quality for Smooth Adaptive Streaming," filed on January 15, 2013, and also claims priority to U.S. Provisional
Application No. 61/752,831 , titled "System and Method for Out-of-band Signaling of Quality Information," filed on January 15, 2013, both of which are hereby incorporated by reference in their entirety.
BACKGROUND
[0002] Video streaming is becoming more and more popular, with video traffic exceeding 50 percent of the total traffic over content distribution networks (CDNs) according to some estimates. DASH (Dynamic Adaptive Streaming over HyperText Transfer Protocol (HTTP)) is designed to promote efficient delivery of multimedia content from servers to clients through HTTP-based content distribution networks.
[0003] Adaptive streaming over HTTP allows streaming clients to switch between different representations of multimedia content that has been encoded (compressed) at multiple, different bitrates. Each representation may be divided into one or more
segments, and each segment may be divided into one or more sub-segments. Bitrate information for each representation, either at the representation level or at the
segment/sub-segment level, is provided to a client through a document containing metadata, referred to as Media Presentation Description (MPD). Based on that bitrate information, the client can switch from one representation to another according to the available bandwidth. Switching from one representation to another occurs at segment/sub- segment boundaries, referred to as stream access points (SAPs). Thus, the client may stream segments/sub-segments from a higher bitrate representation when more bandwidth is available, and may stream segments/sub-segments from a lower bitrate representation
with less bandwidth is available. Generally speaking, bitrate-driven adaptive streaming is greedy in nature, because each client selects and streams the representation that has the highest possible bitrate that the client can handle and is within the amount of bandwidth that is available.
[0004] Bitrate can influence the level of quality, although quality is not solely dependent on bitrate. Fluctuations in quality can occur when bandwidth or bitrate varies over time, such as when a client switches back-and-forth between higher and lower bitrate representations as just described. Furthermore, the greedy nature of bitrate-driven adaptive streaming can, in some instances, intensify the degree and frequency of changes in available bandwidth as multiple streaming clients and applications compete for bandwidth, thereby also intensifying fluctuations in quality.
[0005] Representations may be encoded as constant bit rate (CBR) versions or variable bit rate (VBR) versions. With CBR, the bitrate is well-controlled so that it is nearly constant. However, the complexity of content may change as a result of, for example, switching from relatively static (less complex) scenes to dynamic (more complex) scenes. Consequently, quality may fluctuate significantly unless the specified bitrate is sufficiently high to envelope the more complex scenes. However, the constant use of a higher bitrate, even for less complex scenes, means that bandwidth is wasted when those scenes are being sent over the network.
[0006] VBR may be unconstrained, or it may be constrained (the maximum bitrate is capped). With VBR, a higher bitrate can be allocated to more complex scenes and a lower bitrate can be allocated to less complex scenes. As a result, fluctuations in quality are relatively small, but quality still may not be consistent, particularly when VBR is
constrained. Also, use of a higher bitrate may not necessarily improve quality, and consequently bandwidth is wasted when a higher bitrate is used but does not contribute to an improvement in quality.
[0007] To summarize, bitrate-driven adaptive streaming results in at least a couple of issues: fluctuations in quality, and inefficient use of bandwidth.
SUMMARY
[0008] Vigorous and frequent changes in bandwidth are often encountered in networks, for example, in wireless networks. Solutions for adaptive streaming are often designed with the aim of adapting to the changing bandwidth of the network while using as much bandwidth as possible. In the contemporary DASH (Dynamic Adaptive Streaming over HTTP) specification (e.g., ISO/IEC 23009-1 ), adaptations to changing bandwidth are enabled by switching between representations (or segments/sub-segments) based only on bitrate information; adaptation is realized by matching the bitrate of
representations/segments/sub-segments to the available bandwidth.
[0009] As discussed above, adaptation to match the bitrate of the media content to the available bandwidth may result in significant fluctuations in the quality of the streamed content, which may negatively impact a viewer's experience; fluctuations in bandwidth can translate into fluctuations in quality. Also, bandwidth may be wasted; a higher bitrate does not necessarily mean higher quality, particularly for less complex scenes where a lower bitrate results in satisfactory quality.
[0010] In embodiments according to this disclosure, these issues are addressed by providing, to a client, a measure of the quality of the media data (e.g., a quality value). Embodiments according to this disclosure also pertain to how the presence of quality values is signaled to a client, how quality values are provided to a client, and how quality values are used by a client in adaptive streaming.
[0011] In embodiments according to the present disclosure, different representations are associated with an instance of media content (e.g., a movie), and a representation can include multiple portions (e.g., segments or sub-segments) of media content. A respective quality value can be associated with each of the portions. Information (e.g., an MPD (Media Presentation Description)) about the instance of media content is generated. The
information includes quality information for the instance of media content. The information about the instance of content, including the quality information, can be accessed by and/or sent to a client. The quality information indicates the availability of the quality values and where those quality values reside and/or how they can be retrieved.
[0012] In one embodiment, a quality value for a portion of the instance of media content is included in a box (data structure) associated with that portion. In one such embodiment, in a DASH implementation, the quality value is included in the "sidx" box associated with the portion of interest. In such an embodiment, the quality information included in the information about the instance of media content (e.g., the MPD) includes an element (e.g., an Extensible Markup Language (XML) element) indicating the quality value is available in the box. This type of approach may be referred to as "in-band."
[0013] In one embodiment, the quality information included in the information about the instance of media content (e.g., the MPD) includes an element (e.g., an XML element), and a quality value for a portion of the instance of media content is an attribute of that element. This type of approach may be referred to as "out-of-band."
[0014] In one embodiment, a quality value for a portion of the instance of media content is included in a first file or location separate from (different from) any file(s) or location(s) containing the first portion of media content. In such an embodiment, the quality information included in the information about the instance of media content (e.g., the MPD) includes an element (e.g., an XML element) indicating the quality value is available in the first file and providing the location of that file. This type of approach may also be referred to as out-of-band.
[0015] To stream an instance of media content, a client can access the information about the instance of media content (e.g., the MPD), including the quality information. The client is made aware of the presence of quality values for the instance of media content and the location of the quality values via the quality information, as described above. The
client can access the quality values, and then request a portion (e.g., a segment or sub- segment) of the instance of media content based on the quality value for that portion.
[0016] In one embodiment, a client can select a candidate set of portions of the instance of media content to be downloaded based on bitrate, and then replace a portion in the candidate set with another portion that has a different (e.g., lower) bitrate but still has a satisfactory quality value. While the media content is being downloaded, the client can adapt to changes in available bandwidth by replacing a portion in the candidate set with a different portion that is selected based not just on its bitrate but also based on its quality value.
[0017] In summary, embodiments according to the present disclosure enhance bitrate-driven adaptation with quality-aware adaptation. Quality as well as bitrate information is used to make more intelligent adaptation decisions. As a result, a more consistent level of quality can be provided. Furthermore, in contrast to adaptations driven only by bitrate, the highest bitrate representation (segment/sub-segment) may not always be selected for streaming. For example, in instances in which a satisfactory level of quality can be achieved with a lower bitrate segment versus a higher bitrate segment, then the lower bitrate segment can be selected. Consequently, available bandwidth is more efficiently used, and bandwidth is not consumed unnecessarily.
[0018] These and other objects and advantages of the various embodiments of the present disclosure will be recognized by those of ordinary skill in the art after reading the following detailed description of the embodiments that are illustrated in the various drawing figures.
BRIEF DESCRIPTION OF THE DRAWINGS
[0019] The accompanying drawings, which are incorporated in and form a part of this specification and in which like numerals depict like elements, illustrate embodiments of the present disclosure and, together with the description, serve to explain the principles of the disclosure.
[0020] Figure 1 is a block diagram showing examples of components of a system (e.g., a DASH system) upon which embodiments according to the present disclosure can be implemented.
[0021] Figure 2 illustrates representations of an instance of media content in an embodiment according to the present disclosure.
[0022] Figure 3 illustrates an example of a portion of an instance of media content in an embodiment according to the present disclosure.
[0023] Figure 4 is a flowchart of an example of a computer-implemented method for preparing media content, including quality information, in embodiments according to the present disclosure.
[0024] Figure 5 is a flowchart of an example of a computer-implemented method for locating, accessing, and using quality values in embodiments according to the present disclosure.
[0025] Figure 6 is an example of a list of portions of an instance of media content selected based on quality values in embodiments according to the present disclosure.
[0026] Figure 7 and 8 are flowcharts of examples of computer-implemented methods for accessing and using quality values in embodiments according to the present disclosure.
[0027] Figure 9 is a flowchart of an example of a computer-implemented method for providing quality values in embodiments according to the present disclosure.
[0028] Figure 10 is a block diagram of an example of a computing system upon which embodiments according to the present disclosure can be implemented.
DETAILED DESCRIPTION
[0029] Reference will now be made in detail to the various embodiments of the present disclosure, examples of which are illustrated in the accompanying drawings. While described in conjunction with these embodiments, it will be understood that they are not intended to limit the disclosure to these embodiments. On the contrary, the disclosure is intended to cover alternatives, modifications and equivalents, which may be included within the spirit and scope of the disclosure as defined by the appended claims. Furthermore, in the following detailed description of the present disclosure, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. However, it will be understood that the present disclosure may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to unnecessarily obscure aspects of the present disclosure.
[0030] Some portions of the detailed descriptions that follow are presented in terms of procedures, logic blocks, processing, and other symbolic representations of operations on data bits within a computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. In the present application, a procedure, logic block, process, or the like, is conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those utilizing physical
manipulations of physical quantities. Usually, although not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as transactions, bits, values, elements, symbols, characters, samples, pixels, or the like.
[0031] It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present disclosure, discussions
utilizing terms such as "receiving," "identifying," "associating," "accessing," "requesting," "using", "indicating," "retrieving," "selecting," "replacing," "monitoring," "providing,"
"publishing," "measuring," "recording," and "generating," or the like, refer to actions and processes (e.g., flowcharts 400, 500, 700, 800, and 900 of Figures 4, 5, 7, 8, and 9, respectively) of a computer system or similar electronic computing device or processor (e.g., system 1000 of Figure 10). The computer system or similar electronic computing device manipulates and transforms data represented as physical (electronic) quantities within the computer system memories, registers or other such information storage, transmission or display devices.
[0032] Embodiments described herein may be discussed in the general context of computer-executable instructions residing on some form of computer-readable storage medium, such as program modules, executed by one or more computers or other devices. By way of example, and not limitation, computer-readable storage media may comprise non-transitory computer storage media and communication media. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or distributed as desired in various embodiments.
[0033] Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, random access memory (RAM), read only memory (ROM), electrically erasable programmable ROM (EEPROM), flash memory or other memory technology, compact disk ROM (CD-ROM), digital versatile disks (DVDs) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store the desired information and that can accessed to retrieve that information.
[0034] Communication media can embody computer-executable instructions, data structures, and program modules, and includes any information delivery media. By way of
example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), infrared and other wireless media. Combinations of any of the above can also be included within the scope of computer-readable media.
[0035] For simplicity, embodiments according to the present disclosure may be discussed in the context of DASH (Dynamic Adaptive Streaming over HTTP (HyperText Transport Protocol)), in some instances using DASH terminology. However, it is
understood that embodiments according to the present disclosure are not necessarily limited to a DASH implementation, and that the use of DASH terminology does not necessarily limit such embodiments to a DASH implementation.
[0036] Figure 1 is a block diagram showing examples of components of a system 100 (e.g., a DASH system) upon which embodiments according to the present disclosure can be implemented. The server 105 is a source for different instances of media content, including recorded items (such as, but not limited to, movies) and live events (such as, but not limited to, a news or sports broadcast). The media content may include audio content and/or video content.
[0037] The instances of media content are variably encoded (compressed). That is, for example, the instances of media content may be encoded using different encoding schemes (codecs) and may have different resolutions and bandwidths (bitrates). An instance of media content may be based on the same raw content, but encoded differently. In other words, the same instance of content may be encoded at different resolutions and different bitrates using different codecs; each unique combination of resolution, bitrate, etc., may be referred to as a representation.
[0038] The server 105 may encode and otherwise process the instance of media content, or the instances of media content may be encoded and processed at another server and then transmitted to the server 105. The instances of media content can be transmitted to a client 1 15 via a content delivery network (CDN) 1 10.
[0039] The CDN 1 10 may be wired or wireless or a combination of both. In a DASH implementation, the CDN 1 10 utilizes HTTP because, for example, that allows the existing Web structure to be used for content streaming. Also, HTTP commands such as partial GET can be used.
[0040] The client 1 15 be a type of computing device such as, but not limited to, a laptop, desktop, tablet, notebook, cell phone, smart phone, media player, camera, gaming console, or the like.
[0041] With reference to Figure 2, representations 1 , 2, ... N of an instance of media content 200 are encoded at different bitrates, resolutions, aspect ratios, and the like. As mentioned above, an instance of media content 200 corresponds to the same raw content (e.g., a single movie); there may be different representations (at different bitrates, etc.) of a single instance of media content.
[0042] Each representation 1 , 2, N may be divided into segments (SEC), and each segment may be divided into sub-segments (not shown). Each segment or sub- segment represents a portion of the instance of media content 200; as used herein, a portion of an instance of media content may refer to a segment or to a sub-segment.
Representations may be grouped into adaptation sets; an adaptation set can include one or more representations of a given time period within the instance of media content.
[0043] For example, an instance of media content may be logically divided into any number of time periods. One or more adaptation sets may be associated with each time period. One or more representations may be associated with each adaptation set. One or more portions may be associated with representation: one or more segments may be associated with each representation; and one or more sub-segments may be associated with each segment.
[0044] Information about each instance of media content available on the server 105 is published or broadcast in some manner that allows the client 1 15 to be made aware of the availability of that content. In a DASH implementation, information about an instance of media content is included in an MPD (a Media Presentation Description); an MPD is prepared and published for each instance of media content. An MPD is an XML
(Extensible Markup Language) document that is accessible to clients. The information about an instance of media content (e.g., an MPD) describes properties of the various representations of that instance, such as bitrates, resolutions, and aspect ratios, and also describes how content can be retrieved from the server 105. For example, the information about an instance of media content can include Uniform Resource Locators (URLs) for each segment/sub-segment of that instance.
[0045] Significantly, in embodiments according to the present disclosure, the information about an instance of media content also includes quality information. In a DASH implementation, the quality information is included in the MPD.
[0046] Generally speaking, a measure of quality (e.g., one or more quality values) is generated and made available to the client 1 15. The measure of quality may be provided at the representation level (e.g., one quality value for all segments in a representation of an instance of media content) or at the segment/sub-segment level (a quality value per portion of an instance of media content). The quality information included in the information about an instance of media content indicates that a measure of quality is available for that instance. The quality information also indicates where quality values reside and/or how they can be retrieved.
[0047] The client 1 15 can use the measure of quality to make more intelligent decisions during adaptive streaming. More specifically, the client 115 can consider both bitrate and measure of quality when making decisions during adaptive streaming. Based on the bitrate and measure of quality, the client 1 15 can select suitable segments/sub- segments based on the bandwidth available in the network 1 10 (Figure 1 ), and can retrieve those segments/sub-segments from the server 105 via HTTP requests.
[0048] The quality values may be provided "in-band" or "out-of-band." Below, the in- band and out-of-band approaches are described. Then, the generation of quality values and their use in adaptive streaming are described.
IN-BAND APPROACHES
[0049] In some embodiments according to present disclosure, quality values are provided in-band; the quality values are provided with the data constituting the media content. Figure 3 illustrates an example of a segment 300 in a DASH implementation. The segment 300 can be identified and accessed using its own URL, and can be downloaded to the client separately from another such segment.
[0050] Files conforming to the ISO (International Organization for Standardization) Base Media File Format are formed as a series of objects or data structures called "boxes." A box is, essentially, an object-oriented building block defined by a unique type identifier and includes a length field and a payload. The segment 300 includes a number of such boxes: styp; sidx; moof; and mdat.
[0051] The styp box designates the segment type. The sidx box contains index information for the segment 300. The moof box contains metadata for a fragment of the media content, and the mdat box contains the media data (e.g., audio and/or video frames) for that fragment. The initialization segment 305 specifies media content used to initialize the media content.
[0052] In an in-band embodiment, a measure of quality for the segment 300 is included in the sidx box. More specifically, in one embodiment, the information
conventionally included in the sidx box is extended as shown in the example of Table 1.
[0053] In Table 1 , asterisks (*) are used to identify the extended portion of the sidx box, "quality_value" is a value that indicates the quality of the media data in the referenced segment/sub-segment, and "scale_factor" controls the precision of the quality value.
Table 1 - Example of Extended sidx Box
aligned(8) class SegmentlndexBox extends FullBox('sidx', version, 0) {
unsigned int(32) referenceJD;
unsigned int(32) timescale;
if (version==0)
{
unsigned int(32) earliest_presentation_time;
unsigned int(32) first_offset;
}
else
{
unsigned int(64) earliest_presentation_time;
unsigned int(64) first_offset;
}
unsigned int(16) reserved = 0;
unsigned int(16) reference_count;
for(i=1 ; i <= reference_count; i++)
{
bit (1 ) reference_type;
unsigned int(31 ) referenced_size;
unsigned int(32) subsegment_duration;
bit(1 ) starts_with_SAP;
unsigned int(3) SAP_type;
unsigned int(28) SAP_delta_time;
* if (reference_type ==0); //if media data is referenced *
* unsigned int(16) quality_value *
* unsigned int(16) scale_factor *
[0054] A qidx box can be defined as a relatively simple version of the sidx box for segments used in live streaming. The type of information that can be included in a qidx box is shown in the example of Table 2.
Table 2 - Example of qidx Box
aligned(8) class SegmentlndexBox extends FullBox('qidx', version, 0) {
unsigned int(32) referenceJD;
unsigned int(16) quality_value;
unsigned int(16) scale_factor;
[0055] The qidx box can be placed after the styp box, before the moof and mdat boxes.
[0056] In one embodiment, the quality information included in the information about the instance of media content (e.g., in the MPD) includes an element (e.g., an XML element) signaling the presence of a measure of quality in a segment box (e.g., the sidx box or qidx box) and indicating what quality metric is used. Table 3 defines an XML element that can be included in the MPD in a DASH implementation. In other words, in a DASH implementation, the MPD is extended to include a new element (QualityMetric).
[0057] In Table 3, the element name is "QualityMetric," and its attributes include "@schemelDUri," "@value," and "©accuracy." The element may be applied at the adaptation set level or at the representation level.
Table 3 - Example of an MPD Element and Attributes (ln-band) Element or Attribute Name Description
QualityMetric Element name.
@schemeldUri Identifies the scheme.
@value Indicates the metric used to
express quality.
©accuracy Float type, indicating accuracy of
the quality value for all segments.
If not present, the quality value is rounded off to last digit.
[0058] The measure of quality may be expressed as, for example, peak signal-to- noise ratio (PSNR), mean opinion score (MOS), or structural similarity index (SSIM).
[0059] In one embodiment, a qinfo box is defined and included in the initialization segment 305 to provide general information about quality metrics. The type of information that can be included in the qinfo box is shown in the example of Table 4.
Table 4 - Example of qinfo Box
aligned(8) class SegmentlndexBox extends Box('qinfo') {
unsigned int(4) metric_type
unsigned int(28) reserved;
}
[0060] An in-band approach is advantageous because it can be implemented using the existing indexing mechanism (e.g., sidx), by extending the sidx box to include quality values, thereby maintaining backward compatibility. Also, additional files are not required to carry the quality values.
OUT-OF-BAND APPROACHES
[0061] In some embodiments according to present disclosure, quality values are provided out-of-band.
[0062] In an out-of-band embodiment, the quality information included in the information about the instance of media content (e.g., in the MPD) includes an element (e.g., an XML element) that includes measure of quality (quality values as one its attributes. In other words, in a DASH implementation, the MPD is extended to include a new element (QualityTimeline).
[0063] Table 5 defines an example of an XML element that can be included in the MPD in a DASH implementation. In Table 5, the element name is "QualityTimeline," and its attributes include "@qualityMetric," "@scaleFactor," "@s," @n," "@sf," "@q", and "@b. In one embodiment, the QualityTimeline element is applied at the segment level.
Table 5 - Example of an MPD Element and Attributes (Out-of-band) Element or Attribute Name Description
QualityTimeline Element name. @qualityMetric Indicates what metric is used to express
quality (e.g., PSNR, MOS, SSIM). scaleFactor The default scale factor for the value of
@q in all Q elements.
Q 0 ... N
Segment number of the first segment
contained in the element. When not
present, it is the segment number of the
first segment of the representation in
the current period if this is the first Q
element in QualityTimeline, or it is the
segment number of the segment next to
the last segment contained in the
previous Q element.
@n Number of segments contained in the
element sharing the same quality and
bandwidth values.
@sf Scale factor for value @q in the
enclosing Q element. When present, it
overrides @scaleFactor in parent
element. When not present,
@scaleFactor in Quality Timeline is
used.
@q Scaled value of the quality metric in
integer.
@b Bandwidth required for real time
delivery of the segment, in kbps. If not
present, the value of
Representation@bandwidth attribute
applies.
[0064] Run-length coding can be used to compress the size of the QualityTimeline element. The Q element contains one or more (sub)segments with the same quality and bitrate; the number of (sub)segments in each Q element is indicated by the attribute @n. Quality and/or bitrate (the values of @q and/or @b) can be quantized so that
(sub)segments with close quality and bitrate can be grouped together. The design of the QualityTimeline element enables non-linear quantization. A default value of the scale factor (@scaleFactor) can be present at the element level, and a scale factor can be present in each Q element; the latter value overrides the default scale factor. Bitrate and size can be derived from each other if the duration of each (sub)segment is known.
However, bitrate can be expressed using fewer digits and less space. Bitrate (@b) is used in the Table 5 example. As bitrate can be obtained from the MPD, the Table 5 embodiment can be used even if a sidx box is not present in the media segment.
[0065] In another out-of-band embodiment, quality values are provided in a separate index file. That is, the quality values are included in a file (referred to herein as a quality index file) that is in a different location than any file or files containing the media content itself. In such an embodiment, the quality information included in the information about the instance of media content (e.g., in the MPD) includes an element (e.g., an XML element) that identifies the location of the quality index file. In other words, in a DASH
implementation, the MPD is extended to include a new element (Qualitylndex).
[0066] Table 6 defines an example of an XML element that can be included in the MPD in a DASH implementation. In Table 6, the element name is "Qualitylndex," and its attributes include "@sourceURL" and "@range." In one embodiment, the Qualitylndex element is applied at the representation level.
Table 6 - Example of an MPD Element and Attributes (Out-of-band) Element or Attribute Name Description
Qualitylndex Element name.
@sourceURL The URL of the index file,
(grange Byte range within the index file.
[0067] The type of information that can be included in a quality index file is shown in Table 7.
Table 7 - Example of a Quality Index File
aligned(8) class QualitylndexEntry extends Box('qidx'){
unsigned int(4) qualityjnetric;
unsigned int(4) reserved;
unsigned int(32) num_quality_ranges;
for ( i = 0; i < num_quality_ranges; i++ ) {
unsigned int(32) num_subsegments;
unsigned int(32) scale_factor;
unsigned int(32) quality_value;
}
}
[0068] In Table 7, "quality_value" is a value that indicates the quality of the media data in the referenced representation, "scale_factor" controls the precision of the quality value, and "qualityjnetric" indicates the type of metric used to express quality (e.g., 1 = PSNR, 2 = MOS, 3 = SSIM).
[0069] An out-of-band approach is advantageous because, for example, quality values can be carried and delivered separate from the media content, so that the quality values can be retrieved independently before the associated media segment is requested.
PREPARATION OF MEDIA CONTENT
[0070] Figure 4 is a flowchart 400 of an example of a computer-implemented method for preparing media content, including quality information, in embodiments according to the present disclosure. The flowchart 400 can be implemented as computer-executable instructions residing on some form of non-transitory computer-readable storage medium. The operations of the flowchart 400 can be implemented on the server side of the system 100 of Figure 1 (e.g., by the server 105, or by one or more servers communicatively coupled to the server 105). Although described using a single instance of media content as an example, the operations can be readily extended to multiple instances of media content.
[0071] In block 402 of Figure 4, an instance of media content is encoded into multiple representations with different bitrates. The content can be encoded in VBR or CBR.
[0072] In block 404, in one embodiment, each representation is divided into smaller portions (e.g., segments/sub-segments). The portions may be of different lengths (different time durations).
[0073] In block 406, the quality of each portion of each representation is measured and recorded. Techniques for measuring quality are known in the art.
[0074] In block 408, the representations are encapsulated into segment(s), which may be further divided into sub-segments.
[0075] In block 410, the portions (segments/sub-segments) are sent to and stored on a server (e.g., the server 105 of Figure 1 ).
[0076] In block 412, information about the instance of media content (e.g., an MPD) is generated, describing what media content is available and how it can be accessed and retrieved. According to embodiments of the present disclosure, information about the instance of media content (e.g., an MPD) includes quality information (in-band or out-of- band). The quality information indicates that quality values are available, where those quality values reside, and/or how they can be retrieved.
[0077] In an in-band embodiment, in a DASH implementation, the quality information includes a QualityMetric element in the MPD, described above.
[0078] In an out-of-band embodiment, in a DASH implementation, the quality information includes a QualityTimeline element, described above.
[0079] In an out-of-band embodiment, in a DASH implementation, the quality information includes a Qualitylndex element, described above.
[0080] In block 414, information about the instance of media content, including quality information (e.g., an MPD), is published and is accessible to a client, e.g., on a Web page, using a URL. A client can access the information about the instance of media content using, for example, a Web browser, e-mail, Short Message Service (SMS), etc.
QUALITY-AWARE ADAPTIVE STREAMING
[0081] Figure 5 is a flowchart 500 of an example of a computer-implemented method for locating, accessing, and using quality values in embodiments according to the present disclosure. The flowchart 500 can be implemented as computer-executable instructions residing on some form of non-transitory computer-readable storage medium. The operations of the flowchart 500 can be implemented on the client side of the system 100 of Figure 1 (e.g., by the client 1 15).
[0082] In block 502 of Figure 5, a client accesses (reads or retrieves) information about an instance of media content (e.g., an MPD), including quality information. The client can parse the information about the instance of media content to identify which representations are available on the server 105 (Figure 1 ) and their characteristics, such as bandwidth, resolution, the codec used, etc.
[0083] In one embodiment, in a DASH implementation, if the QualityMetric element described above is present, then the client knows that in-band quality information is available (e.g., quality values are in the sidx or qidx box).
[0084] In one embodiment, in a DASH implementation, if the QualityTimeline element described above is present, then client knows that out-of-band quality information is available (quality values are an attribute of that element).
[0085] In one embodiment, in a DASH implementation, if the Qualitylndex element described above is present, then client knows that out-of-band quality information is available (quality values are in a separate quality index file, whose location is included in the element).
[0086] In block 504, the client selects a set of representations based on, for example, its capability or the user's preference.
[0087] In block 506 of Figure 5, from the set of representations and using the information about the instance of content (e.g., the MPD), the client selects a list of candidate portions (segments/sub-segments) to be downloaded over time. In other words, the client creates a map of portion versus time over the length (in time) of the instance of media content. Essentially, the client selects a portion per time period. Initially, in one embodiment, the client selects the candidate portions based on bitrate. Generally speaking, at this point, the client will select portions at the highest bitrate it can handle (process).
[0088] Figure 6 illustrates an example in which three different representations R1 , R2, and R3 are selected by the client. The first portion of representation R1 , for the time period from TO to T1 (where TO coincides with the beginning of the instance of media content), is identified as R1 P0 and has a bitrate of 1 Mbps and a quality level of five (5) (arbitrary quality units are used in this example). Similarly, the first portion of
representation R2, for the time period from TO to T1 , is identified as R2P0 and has a bitrate of 0.5 Mbps and a quality level of four (4); and the first portion of representation R3, for the time period from TO to T1 , is identified as R3P0 and has a bitrate of 0.25 Mbps and a quality level of three (3). Other portions, for different time periods, are similarly identified and have respective bitrates and quality levels as shown.
[0089] In block 506 of Figure 5, based on bitrate, the initial list 600 of candidate portions would initially include R1 P0, R1 P1 , R1 P2, and R1 P3.
[0090] In block 508, the client accesses (reads) the quality values for the portions in the selected set of representations (R1 , R2, and R3).
[0091] In an in-band embodiment, the client can obtain quality values from the sidx box or the qidx box. The client does not necessarily have to request each portion (e.g., an entire media segment) to obtain the quality values. Instead, the client can request the relatively small part of the portion that constitutes the sidx or qidx box. For example, for each portion in the selected set of representations (R1 , R2, and R3), the client can request the sidx or qidx box using the HTTP partial GET command and retrieve a quality value from the box.
[0092] In an out-of-band embodiment, the client can obtain quality values from the information about the instance of media content itself (e.g., from the MPD itself).
[0093] In another out-of-band embodiment, the client can obtain quality values from the quality index file(s) specified in the information about the instance of media content (e.g., specified in the MPD).
[0094] In block 510, the client can replace portions in the initial list 600 of candidate portions with lower bitrate portions that have satisfactory quality. In one embodiment, the client can compare the quality value for each portion against a predefined requirement such as a threshold value. A quality requirement can be a pre-defined value based on, for example, historical experience or user preferences, or it can be a dynamic value derived from the quality level of (sub)segments that have been downloaded. If the quality value for a portion satisfies the quality requirement, then that portion can be replaced in the list with a lower bitrate portion. In one embodiment, the portion with a quality value closest to a threshold value is selected, regardless of its associated bitrate. In another embodiment, the portion with a quality value that is highest but does not exceed the threshold value is selected, regardless of its associated bitrate.
[0095] In the example of Figure 6, the quality threshold is 4 (this is an arbitrary value chosen for illustration purposes). For the time period T0-T1 , the portions R1 P0, R2P0, and R3P0 all satisfy the quality threshold. However, R3P0 is selected because it has the lowest bitrate, and R3P0 replaces R1 P0 in the quality-aware list 610 of candidate portions. In similar fashion, R2P1 is chosen to replace R1 P1 , and R3P3 is chosen to replace R1 P3.
[0096] Consequently, in contrast to conventional approaches, the portion
(segment/sub-segment) with the highest possible bitrate will not necessarily be requested and downloaded by the client. Accordingly, bandwidth is saved.
[0097] In this manner, the client can construct a detailed quality-aware list 610 of quality and bitrate over time for portions of the instance of media content. To reduce startup delay, this can be done after streaming starts in background.
[0098] Significantly, in contrast to conventional approaches, the portion
(segment/sub-segment) with the highest possible bitrate will not necessarily be requested and downloaded by the client. Accordingly, bandwidth is saved.
[0099] In block 512 of Figure 5, the client requests portions (segments or sub- segments) off the list 610 in sequence. Basically, the client works through the list 610 of candidate portions, requesting each portion in turn.
[0100] In block 514, the client monitors network conditions, particularly available bandwidth. If the client is about to request a particular portion, but the available bandwidth decreases to less than the bitrate for that portion, then the client can select another portion having a bitrate that satisfies the available bandwidth. For example, the client can access information indicating an amount of available bandwidth in the network at a point in time, and can replace a portion in the list 610 scheduled to be downloaded in an interval spanning the point in time, with another portion that has a respective bitrate value that satisfies the amount of available bandwidth. If the client can choose a replacement from multiple portions (e.g., segments) at different bitrates, then quality level can be used to choose the replacement in a manner similar to that just described. In other words, if a portion P1 is to be replaced in the list 610 by either a portion P2 or a portion P3, and portion P2 has a lower bitrate then portion P3 but the same or similar quality level, then the portion P2 can be chosen.
[0101] Figures 7 and 8 are flowcharts 700 and 800, respectively, of examples of computer-implemented methods in embodiments according to the present disclosure. The flowcharts 700 and 800 can be implemented as computer-executable instructions residing on some form of non-transitory computer-readable storage medium. In particular, the operations included in the flowcharts 700 and 800 can be implemented on the client side of the system 100 of Figure 1 (e.g., by the client 1 15).
[0102] In block 702 of Figure 7, information about an instance of media content is accessed (e.g., an MPD is accessed). The information includes quality information that indicates quality values for portions of the instance of media content. The quality values may be in-band or out-of-band, and the quality information indicates both the presence of the quality values and their locations.
[0103] In block 704, the quality values indicated by the quality information are accessed.
[0104] In block 706, a portion of the instance of media content is accessed. The portion is selected using a quality value for the first portion.
[0105] In block 708, the first portion is received.
[0106] In block 802 of Figure 8, information indicating an amount of available bandwidth in a content delivery network is accessed.
[0107] In block 804, portions of an instance of media content are identified. The portions have a respective bitrate that satisfies the amount of available bandwidth. The portions also have respective measures of quality (quality values) associated therewith.
[0108] In block 806, a first portion of the portions is selected to be downloaded. The first portion is selected according to its associated measure of quality.
[0109] In block 808, the first portion is requested for delivery over the content delivery network.
[01 10] Figure 9 is a flowchart 900 of an example of a computer-implemented method in embodiments according to the present disclosure. The flowchart 900 can be
implemented as computer-executable instructions residing on some form of non-transitory computer-readable storage medium. The operations of the flowchart 900 can be implemented on the server side of the system 100 of Figure 1 (e.g., by the server 105, or by one or more servers communicatively coupled to the server 105).
[01 11] In block 902 of Figure 9, information about an instance of media content is generated. The information includes quality information for the instance of media content.
Each portion of the instance of media content has a respective quality value associated therewith.
[01 12] In block 904, the information, included the quality information, is provided to a client. The quality information indicates the availability of the quality values and how to find them, as previously described herein.
[01 13] In summary, embodiments according to the present disclosure enhance bitrate-driven adaptation with quality-aware adaptation. Quality as well as bitrate information is used to make more intelligent adaptation decisions. As a result, a more consistent level of quality can be provided. Furthermore, in contrast to adaptations driven only by bitrate, the highest bitrate representation (segment/sub-segment) may not always be selected for streaming. For example, in instances in which a satisfactory level of quality can be achieved with a lower bitrate segment versus a higher bitrate segment, then the lower bitrate segment can be selected. Consequently, available bandwidth is more efficiently used, and bandwidth is not consumed unnecessarily.
[01 14] Both network operators (content providers) and content subscribers can benefit. An operator can benefit because, by using network resources more efficiently, more subscribers can be accommodated. Subscribers benefit because they can stream content at an acceptable level of quality while spending less on data plans that are based on consumption. Also, energy consumed by client devices can be reduced, which is particularly meaningful for power-constrained mobile devices, as power consumption is directly related to bandwidth usage.
[01 15] Figure 10 is a block diagram of an example of a computing system 1000 capable of implementing embodiments according to the present disclosure. The computing system 1000 broadly represents any single or multi-processor computing device or system capable of executing computer-readable instructions. The computing system 1000 can be used to implement the functionality provided by the server 105 of Figure 1 or any of the other server components in the system 100. The computing system 1000 can also be
used to implement the functionality of the client 1 15 of Figure 1. Depending on the implementation, the computing system 1000 may not include all of the elements shown in Figure 10, and/or it may include elements in addition to those shown in Figure 10.
[01 16] In its most basic configuration, the computing system 1000 may include at least one processor 1002 (CPU) and at least one memory 1004. The processor 1002 generally represents any type or form of processing unit capable of processing data or interpreting and executing instructions. In certain embodiments, the processor 1002 may receive instructions from a software application or module. These instructions may cause the processor 1002 to perform the functions of one or more of the example embodiments described and/or illustrated herein.
[01 17] The memory 1004 generally represents any type or form of volatile or nonvolatile storage device or medium capable of storing data and/or other computer-readable instructions. In certain embodiments the computing system 1000 may include both a volatile memory unit (such as, for example, the memory 1004) and a non-volatile storage device (not shown).
[01 18] The computing system 1000 also includes a display device 1006 that is operatively coupled to the processor 1002. The display device 1006 is generally configured to display a graphical user interface (GUI) that provides an easy to use interface between a user and the computing system.
[01 19] The computing system 1000 also includes an input device 1008 that is operatively coupled to the processor 1002. The input device 1008 may include a touch sensing device (a touch screen) configured to receive input from a user's touch and to send this information to the processor 1002. The processor 1002 interprets the touches in accordance with its programming.
[0120] An input device 1008 may be integrated with the display device 1006 or they may be separate components. In the illustrated embodiment, the input device 1008 is a
touch screen that is positioned over or in front of the display device 1006. The input device 1008 and display device 1006 may be collectively referred to herein as a touch screen display 1007.
[0121] The communication interface 1022 of Figure 10 broadly represents any type or form of communication device or adapter capable of facilitating communication between the example computing system 1000 and one or more additional devices. For example, the communication interface 1022 may facilitate communication between the computing system 1000 and a private or public network including additional computing systems.
Examples of a communication interface 1022 include, without limitation, a wired network interface (such as a network interface card), a wireless network interface (such as a wireless network interface card), a modem, and any other suitable interface. In one embodiment, the communication interface 1022 provides a direct connection to a remote server via a direct link to a network, such as the Internet. The communication interface 1022 may also indirectly provide such a connection through any other suitable connection. The communication interface 1022 may also represent a host adapter configured to facilitate communication between the computing system 1000 and one or more additional network or storage devices via an external bus or communications channel.
[0122] As illustrated in Figure 10, the computing system 1000 may also include at least one input/output (I/O) device 1010. The I/O device 1010 generally represents any type or form of input device capable of providing/receiving input or output, either computer- or human-generated, to/from the computing system 1000. Examples of an I/O device 1010 include, without limitation, a keyboard, a pointing or cursor control device (e.g., a mouse), a speech recognition device, or any other input device.
[0123] Many other devices or subsystems may be connected to computing system 1000. Conversely, all of the components and devices illustrated in Figure 10 need not be present to practice the embodiments described herein. The devices and subsystems referenced above may also be interconnected in different ways from that shown in Figure 10. The computing system 1000 may also employ any number of software, firmware,
and/or hardware configurations. For example, the example embodiments disclosed herein may be encoded as a computer program (also referred to as computer software, software applications, computer-readable instructions, or computer control logic) on a computer- readable medium.
[0124] The computer-readable medium containing the computer program may be loaded into the computing system 1000. All or a portion of the computer program stored on the computer-readable medium may then be stored in the memory 1004. When executed by the processor 1002, a computer program loaded into the computing system 1000 may cause the processor 1002 to perform and/or be a means for performing the functions of the example embodiments described and/or illustrated herein. Additionally or alternatively, the example embodiments described and/or illustrated herein may be implemented in firmware and/or hardware.
[0125] While the foregoing disclosure sets forth various embodiments using specific block diagrams, flowcharts, and examples, each block diagram component, flowchart step, operation, and/or component described and/or illustrated herein may be implemented, individually and/or collectively, using a wide range of hardware, software, or firmware (or any combination thereof) configurations. In addition, any disclosure of components contained within other components should be considered as examples because many other architectures can be implemented to achieve the same functionality.
[0126] The process parameters and sequence of steps described and/or illustrated herein are given by way of example only and can be varied as desired. For example, while the steps illustrated and/or described herein may be shown or discussed in a particular order, these steps do not necessarily need to be performed in the order illustrated or discussed. The various example methods described and/or illustrated herein may also omit one or more of the steps described or illustrated herein or include additional steps in addition to those disclosed.
[0127] While various embodiments have been described and/or illustrated herein in the context of fully functional computing systems, one or more of these example
embodiments may be distributed as a program product in a variety of forms, regardless of the particular type of computer-readable media used to actually carry out the distribution. The embodiments disclosed herein may also be implemented using software modules that perform certain tasks. These software modules may include script, batch, or other executable files that may be stored on a computer-readable storage medium or in a computing system. These software modules may configure a computing system to perform one or more of the example embodiments disclosed herein. One or more of the software modules disclosed herein may be implemented in a cloud computing environment. Cloud computing environments may provide various services and applications via the Internet. These cloud-based services (e.g., software as a service, platform as a service,
infrastructure as a service, etc.) may be accessible through a Web browser or other remote interface. Various functions described herein may be provided through a remote desktop environment or any other cloud-based computing environment.
[0128] The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The
embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as may be suited to the particular use contemplated.
[0129] Embodiments according to the invention are thus described. While the present disclosure has been described in particular embodiments, it should be appreciated that the invention should not be construed as limited by such embodiments, but rather construed according to the below claims.