GB2534057A

GB2534057A - Methods for providing media data, method for receiving media data and corresponding devices

Info

Publication number: GB2534057A
Application number: GB1603880.4A
Authority: GB
Inventors: Denoual Franck; Fablet Youenn
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2013-07-12
Filing date: 2013-07-12
Publication date: 2016-07-13
Anticipated expiration: 2033-07-12
Also published as: GB201312547D0; GB2516112A; GB201603880D0; GB2534057B; GB2516112B

Abstract

Disclosed is a method and device for streaming data including at least one temporal segment representing media data (e.g. audio and/or video data) divided according to periods of time from a server to a client. The client sends a request, 650, to the server for a description file (such as a Media Presentation Description or MPD file) that includes a description of address information (such as a URL) of a temporal segment of the data. In response to this request the server selects, 653, data from sets of data represented by the description file and sends, 655, the description file to the client device. In addition the server sends the selected data 657, 658, to the client device in response to receiving the request for the description file from the client. The client correspondingly receives the description file and the selected data that is sent from the server. The invention can be applied to video streamed using the Dynamic Adaptive Streaming over http (DASH) standard.

Description

METHODS FOR PROVIDING MEDIA DATA, METHOD FOR RECEIVING MEDIA DATA AND CORRESPONDING DEVICES

FIELD OF THE INVENTION

The present invention concerns methods for providing media data, methods for receiving media data and corresponding devices.

BACKGROUND OF THE INVENTION

Solutions for adaptive streaming of media data from a server to a client device have been proposed, in order to adapt in particular the type and quantity of data that are sent to the client device to the features of the concerned client device and to the characteristics of the networks providing the connection between the server and the client device.

In this context, some solutions, such as the DASH (Dynamic Adaptive Streaming over HTTP) standard, propose to store a plurality of versions of the resource (or content) to be distributed and to send to a client device requesting the resource a description file including a description of the various versions representing the resource and respective pointers (e.g. URLs) to these versions.

Based on the description file, the client device can then select a version of the resource that best matches its needs and request this version using the corresponding pointer.

This solution is advantageous in that the description file is light as it contains no media data (but only pointers to media data). It avoids the exchange of media data that would be unsuitable for the client device by letting the client select relevant versions for its usage. Moreover it fits in the current Web architecture based on HTTP and can exploit caching mechanisms already deployed.

In return, this solution however needs several exchanges (or roundtrips) between the client device and the server before media data is received at the client device and may then be decoded and displayed, which results in a start-up delay

SUMMARY OF THE INVENTION

The invention provides a method for providing data including at least one temporal segment representing media data divided according to periods of time, the method comprising the following steps implemented by a server device: receiving from a client device a request for a description file including a description of address information of a temporal segment; selecting data from among sets of data represented by the description file; sending the description file to the client device in response to receiving the request for the description file from the client device; and sending the selected data which is selected from among the sets of data represented by the description file, to the client device, in response to receiving the request for the description file from the client device.

By sending data selected in an appropriate manner in response to receiving the request for the description file only (i.e. sending data that is not solicited by the client device, but has been selected by the server as further explained below), one or several roundtrip(s) can be avoided and the decoding and display of the media data can thus start faster.

In some embodiments, the address information is an URL.

In embodiments, the media data is video and/or audio data In other embodiments, the description file further describes at least one of a type of the media data, a bitrate of the media data, an encoding format of the media data, and time duration of the temporal segment.

In some embodiments, the method further comprises: receiving a request for a temporal segment from the client device which

received the description file; and

sending the requested temporal segment to the client device in response to the request for the temporal segment from the client device.

In embodiments, the description file describes addresses for each of a plurality of temporal segments which are based on the same media data, wherein resolutions of the plurality of temporal segments are different.

In some embodiments, the method further comprises sending a push promise frame for indicating an intention of pushing the selected data to the client device, wherein the push promise frame is defined by HTTP/2.

According to a specific feature, the description file is sent to the client device after sending the push promise frame to the client device.

In embodiments, the selected data is selected from among the sets of data by using preference data received from the client device.

According to a specific feature, the preference data includes at least one of a transmission rate of the media data and a preferred language.

In embodiments, the selected data is selected from among the sets of data by using registered information of the client device registered prior to receiving the request for the description file from the client device.

The invention also provides a method for receiving data including at least one temporal segment representing media data divided according to periods of time, the method comprising the following steps implemented by a client device: sending to a server device a request for a description file including a description of address information of a temporal segment; receiving the description file from the server device as a response of the request for the description file to the server device; and receiving, as a response of the request for the description file to the server device, selected data which is selected by the server device from among sets of data represented by the description file.

In embodiments, the address information is an URL.

In some embodiments, the media data is video and/or audio data.

In embodiments, the description file further describes at least one of a type of the media data, a bitrate of the media data, an encoding format of the media data, and time duration of the temporal segment.

In some embodiments, the method further comprises: sending a request for a temporal segment to the server device which sent the description file; and receiving the requested temporal segment from the server device as a response of the request for the temporal segment to the server device.

In some embodiments, the method further comprises receiving a push promise frame for pushing the selected data from the server device, wherein the push promise frame is defined by HTTP/2.

According to a specific feature, the description file is sent from the server device after sending the push promise frame.

In embodiments, the selected data is selected by the server device from among the sets of data by using preference data.

In embodiments, the selected data is selected by the server device from among the sets of data by using registered information of the client device registered prior to sending the request for the description file to the server device.

The invention also provides a device for providing data including at least one temporal segment representing media data divided according to periods of time, comprising: a receiver configured to receive from a client device a request for a description file including a description of address information of a temporal segment; a selection module configured to select data from among sets of data

represented by the description file;

a sending module configured to send the description file to the client device in response to receiving the request for the description file from the client device; and a push module configured to send the selected data which is selected from among the sets of data represented by the description file, to the client device, in response to receiving the request for the description file from the client device.

The invention also provides a device for receiving data including at least one temporal segment representing media data divided according to periods of time, comprising: a sending module configured to send to a server device a request for a description file including a description of address information of a temporal segment; a receiver configured to receive the description file from the server device as a response of the request for the description file to the server device; and to receive, as a response of the request for the description file to the server device, selected data which is selected by the server device from among sets of data represented by the description file.

Lastly, the invention provides a system for exchanging data comprising a device for providing data as defined above and a device for receiving data as defined 35 above.

Optional features proposed above for the method for providing media data and the method for receiving media data also apply to the various devices and system just mentioned.

BRIEF DESCRIPTION OF THE DRAWINGS

Other particularities and advantages of the invention will also emerge from the following description, illustrated by the accompanying drawings, in which: -Figure 1 describes a context of use of the DASH standard for streaming media content over HTTP; -Figure 2 illustrates the main steps of an exemplary method for the generation of a media presentation and a manifest file; -Figure 3 gives an example of a DASH manifest; -Figures 4a and 4b respectively show the standard behaviour of a DASH client and a tree representation of an exemplary manifest file; -Figure 5 shows exemplary methods respectively implemented by a server and by a client device in accordance with the teachings of the invention; Figure 6 describes an exemplary method implemented by a server; Figure 7 describes a possible method implemented by a client device; Figure 8 shows an exemplary hardware configuration.

DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION

Figure 1 describes the general context of use of the DASH (Dynamic Adaptive Streaming over HTTP) standard for streaming media content (generally audio/video content) over HTTP. Most of current protocols and standards for adaptive media streaming over HTTP are based on a similar approach.

DASH defines how to describe a media presentation in a manifest file in XML format (i.e. a file in the eXtensible Markup Language), called media presentation description file or MPD (Media Presentation Description) here below. VVhen delivered to a client, this manifest file provides enough information allowing the client to request and control the delivery of the media content.

A media server 300 stores different media presentations. An example of media presentation is for example media presentation 301 which contains audio and video data. In this example, audio and video are interleaved in a same file. An exemplary method for obtaining this media presentation is described below with reference to Figure 2.

Media presentation 301 has been temporally split into small independent and consecutive temporal segments 302a, 302b, 302c (for example MP4 segments) 5 that can be addressed and downloaded independently. The downloading addresses (here in the form of HTTP Uniform Resource Locators or HTTP URLs) of the media content corresponding to each of these temporal segments are set by the server 300. Precisely, an HTTP URL is associated to each temporal segment of the audio/video media content.

Server 300 also stores a manifest file 304, here an XML document (see the example shown in Figure 3 described below), that describes the content of the media presentation including media content characteristics (e.g. type of media -audio, video, audio-video, text, etc.; encoding format; bitrate; timing information; time duration of the segments) and the list of temporal media segments and associated URLs.

Alternatively, the document can contain template information allowing rebuilding the explicit list of the temporal media segments and associated URLs. This document may be written using the eXtensible Markup Language (XML).

Manifest file 304 is designed to be sent to a client device 310.

After receiving manifest (manifest at client side is referred to as 305), a DASH control engine 313 executed by the client device 310 may parse the document to have access to the association between temporal segments of the different media contents and HTTP addresses. Moreover, the manifest gives information about the content (here, interleaved audio/video) of the media presentation. Such information may include resolution, bit-rate, etc. Under the control of DASH control engine 313, an HTTP client 311 of the client device 310 (i.e. the process or application executed on the client device 310 to perform exchanges based on HTTP) may therefore emit HTTP requests 306 for downloading the desired temporal segments of the different media contents described in the manifest.

Upon receiving HTTP requests 306, server 300 sends HTTP responses 307 conveying the requested temporal segments. HTTP client 311 extracts from the responses the temporal media segments and provides them to the input buffer 307 of a media engine 312 (i.e. a process executed by client device 310 or a module of client device 310 that handles media data).

Finally, media segments can be decoded by a decoder 308 and displayed (e.g. on a screen 309 of the client device 310).

Media engine 312 then interacts with DASH control engine 313 so that the requests for next temporal segments are issued at an appropriate time. In this goal, the next segment is identified from manifest 305. The time to issue the corresponding request depends on the load of reception buffer 307. In this respect, DASH control engine 313 sends necessary requests to make sure that no buffer overflow or starvation will occur.

Figure 2 illustrates the main steps of an exemplary method for the generation of both the media presentation and the manifest file.

Audio data 400 and video data 401 are respectively acquired, e.g. using a digital video camera. Audio data 400 are compressed at step 402, for example by encoding according to the MP3 standard, so as to obtain an audio elementary stream 404. In parallel, video data 401 are compressed at step 403, e.g. by encoding using a video compression algorithms such as MPEG4, MPEG/AVC, SVC, HEVC or scalable HEVC, so as to obtain a video elementary stream 405.

Elementary streams 404, 405 are encapsulated at step 406 as a global media presentation 407. For example, the ISO BMFF standard (or the extension of this ISO BMFF standard to AVC, SVC, HEVC, scalable extension of HEVC, etc.) can be used for describing the content of the encoded audio and video elementary streams as a global media presentation 407.

The encapsulated media presentation (407) is used for generating (step 408) an XML manifest (or description file) 409. Several distinct representations of video data 401 and audio data 400 can be acquired and/or compressed and/or encapsulated with different parameters (each representation corresponding to a specific set of parameters) and described in the same media presentation 407.

For the specific case of the MPEG/DASH streaming protocol, the manifest file is called the Media Presentation Description file (MPD file) and is organized as now explained.

The root element is the MPD element that contains attributes applying to all the presentation plus DASH information like profile or schema. The media presentation is split into temporal periods represented by a Period element. The MPD file contains all the data related to each temporal period.

By receiving this information, the client is aware of the content for each period of time. Each Period is organized into AdaptationSet elements. A possible organization is to have one or more AdaptationSet per media type contained in the presentation. An AdaptationSet related to video contains information about the different possible representations of the encoded videos available at the server. Each representation is described in a Representation element. For example, a first representation can be a video encoded at a spatial resolution 640x480 and compressed at the bit rate of 500 kbits/s. A second representation can be the same video but compressed at 250 kbits/s.

Each video can then be downloaded by HTTP requests as the client knows the HTTP addresses related to the video thanks to the following scheme. The association between content of each representation and HTTP addresses is done by using an additional level of description: the temporal segments. Each video representation is split into temporal segments (typically a few seconds). Each temporal segment is a content stored at the server that is accessible through an HTTP address (URL or URL with one byte range). Different elements can be used to describe the 15 temporal segments in the MPD file: SegmentList, SegmentBase or Segment Template. In addition, a specific segment is available: the initialization segment. This initialization segment contains MP4 initialization information On particular, if the video has been encapsulated by using the ISO BMFF or extensions) that describes the encapsulated video stream. For example, it makes it possible for the client device to easily instantiate the decoding algorithms related to the video.

The HTTP addresses of the initialization segment and the media segments are thus given in the MPD file.

Figure 3 gives an example of a DASH manifest (MPD file) for a given media presentation.

In this MPD, two media are described: the first one is an English audio stream and the second one is a video stream. The English audio stream is introduced through the AdaptationSet tag 500 (while the video stream is introduced through the AdaptationSet tag 503).

Two alternative representations are available for this audio stream: -the first representation 501 is an MP4 encapsulated elementary audio stream with a bit-rate of 64000 bits/s. The codec (or decoder) to be used to handle this elementary stream (after mp4 parsing) is defined in the standard by the attribute codecs having the value: imp4a.0x40'. The first representation 501 is accessible by a request at the address formed by the concatenation of the BaseURL elements in the segment hierarchy: <BaseURL>7657412348.mp4</BaseURL> is a relative URI. The <BaseURL> being defined at the top level in the MPD element by ihttp://cdntexample.com/' or by http://cdn2.example.comf (two servers are available for streaming the same content) is the absolute URI. The client can then request the English audio stream (precisely its representation 501) from the request to the address: http://cd n1.exam ple. com/7657412348. m p4' or to the address: http://cdn2.example.com/7657412348.mp4'.

-the second representation 502 is an MP4 encapsulated elementary audio stream with a bit-rate of 32000 bits/sec. The same explanations as for the first representation 501 can be made and the client device can thus request this second representation 502 by a request at either one of the following addresses: http://cdn1.example.com/3463646346.mp4' or http://cdn2.example.com/3463646346.mp4'.

The adaptation set 503 related to the video contains six representations.

These representations contain videos at different spatial resolutions (320x240, 640x480, 1280x720) and at different bit rates (from 256000 to 2048000 bits per second). For each of these representations, a different URL is associated through a BaseURL element. The client device can therefore choose between these alternative representations of the same video according to different criteria such as estimated bandwidth, screen resolution, etc. (Note that, in Figure 3, the decomposition of the Representation into temporal segments is not illustrated, for the sake of clarity.) Figure 4a shows the standard behaviour of a DASH client. Figure 4b shows a tree representation of an exemplary manifest file (description file or MPD) used in the method shown in Figure 4a.

When starting a streaming session, a DASH client starts by requesting the manifest file (step 600). After waiting for the server's response and receiving the manifest file (step 601), the client analyzes the manifest file (step 602), selects a set AS of Ada ptationSets suitable for its environment (step 603), then selects, within each Ada ptationSet ASU, a Representation in the MPD suitable for example for its bandwidth, decoding and rendering capabilities (step 604).

The DASH client can then build in advance the list of segments to request, starting with initialization information for the media decoders. This initialization segment has to be identified in the MPD (step 605) since it can be common to multiple representations, adaptation sets and periods or specific to each Representation or even contained in the first media segment.

The client then requests the initialization segment (step 606). Once the initialization segment is received (step 607), the decoders get initiated (step 608).

The client then requests first media data on a segment basis (step 610) and buffers a minimum data amount (thanks to the condition at step 609) before actually starting decoding and displaying (step 613). These multiple requests/responses between the MPD download and the first displayed frames introduce a startup delay in the streaming session. After these initial steps, the DASH streaming session continues in a standard way, Le. the DASH client adapts and requests the media segments one after the other.

Figure 5 shows exemplary methods respectively implemented by a server and by a client device in accordance with the teachings of the invention, in order to obtain a DASH fast start.

As in the standard process just described, the first step consists for the client to request the description file, here an MPD file (step 650). The client then waits for the server's response (step 651).

In the meantime, the server analyses the MPD file (step 652), in particular to identify (step 653) initialization data which will help the client to start faster, as explained below. An exemplary embodiment for step 653 is described below with reference to Figure 6.

Once initialization data is identified by the server, it sends a PUSH_PROMISE frame to the client at step 654 to indicate its intention to push initialization data without waiting for a client's request.

Possibly, it signals in addition that it will also push initial media data (step 656) by sending another PUSH_PROMISE frame including header fields that allow the client to identify the concerned resource, Le. the concerned initial media data, such as:scheme, :host, and:path.

Both in the case of a PUSH_PROMISE frame for initialization data and of a PUSH_PROMISE frame for initial media data, other header fields are also added by the server to indicate how much the server is confident in the data it has decided to push: in the present embodiment, a confidence level parameter is associated to (Le. included in a header of) the PUSH_PROMISE frame. The determination of the confidence /eve/ parameter is described below with reference to Figure 6. The server can also insert a specific DASH header to unambiguously indicate the segment that it intends to push.

To minimize the risk that a client will make a request for initialization data and first media data that are to be pushed, the PUSH_PROMISE frames should be sent prior to any content in the response, Le. step 654 and step 656 should occur before a step 655 of sending the MPD file from the server to the client device.

Thus, when PUSH_PROMISE frames are sent to the client device, the server sends the MPD file to the client device at step 655.

If the server has not received any CANCEL or ERROR message from the client device in the meantime, it starts pushing initialization data (step 657) and first media data (step 658).

PUSH_PROMISE frames and pushing of data from the server to the client device is for instance performed in accordance with corresponding features being developed in the frame of HTTP 2.0, as described for instance in the document "Hypertext Transfer Protocol version 2.0, draft-ietf-httphis-http2-latest", HTTPbis Working Group, Internet-Draft, June 24, 2013 (available for instance at http://http2. g ithu b. io/http2-spec/).

Upon receipt at the client device, the initialization data can be used by the client to set up the decoder(s) (step 659) and the first media data are buffered (step 660) until a sufficient amount of data is available for decoding and rendering (e.g. displaying) without display freeze.

When the client has fully received the MPD file, it parses it (step 662) and starts decoding and displaying (step 663) provided enough data are buffered (step 661). If this is not the case, and the client device knows from PUSH_PROMISE frames sent by the server (see step 656) that more segments will be sent, it waits at step 664 for the completion of the push of first media data from the server. During this idle step 664, the client device may prepare the next requests for subsequent segments that will be issued in a standard client controlled DASH (step 665), as already explained above. This is possible because the client device has received information on the initial media 30 data to be pushed (or being pushed) in the corresponding PUSH_PROMISE frame (see step 656 above) and can thus prepare requests for the temporal segment immediately following the last temporal segment intended to be pushed by the server. The client device, when it has fully received the MPD, may also use the information on initial media data received at step 656 to check whether this initial media data fills the buffer and, if not, to send a request for the following media data (e.g. media data corresponding to a temporal segment following the temporal segment represented by initial media data) according to the standard client controlled DASH process prior to step 661 (contrary to what is shown in Figure 5 which shows a case where the pushed initial media data fills the buffer). This enables the client to correct a bad estimation from the server on the quantity of first media data to push.

This process enables a streaming client to start displaying media earlier than in standard manifest-based streaming. Indeed, the startup delay is reduced because the number of HTTP roundtrips on the network is reduced to get initialization data and/or initial media data.

This process remains however compliant with the current DASH standard, because -there is no modification of the MPD file: its transmission remains light and fast; -the behaviour of standard DASH clients (Le. not benefiting from the teachings of the invention) may be unchanged: such client devices would ignore unrecognized HTTP headers and, when not accepting the push feature, would simply have to perform more requests/responses and thus spend more time to start the presentation.

Figure 6 describes an exemplary method implemented at the server side following a request for the manifest (or description file) from a client device.

This method seeks to identify the most relevant initial data to push in advance so that the client can start rapidly the display of the media presentation. In step 700, the request for the manifest is received. The server then checks at step 701 whether the client device inserted some preferences in the request. 25 This may be done via a dedicated HTTP header like for example to express a transmission rate for the media presentation and a preferred language for audio stream: GET http://myservercom/presentation/pres1.mpd \r\n Prefered-MediaRange: bw=2000;lang=FR \r\n\r\n If the request includes preferences (test 701 true), the server analyses the client's preferences (step 703) and sets its confidence level parameter to the value "high" (step 704).

If no indication is provided in the request (test 701 false), the server checks at step 702 whether it has already registered service usage information (logs) for this client (i.e. statistics or usage data based on prior exchanges between the user or client device and the server) or the information from the User-Agent header. Indeed, the User-Agent header is defined as a HTTP header in RFC2616 (see e.g. http://swie.tf. orcilrfelrfc2616 txt) and provides a means for applications to exchange information like for example operating system, browser type, application name, etc.).

For instance, the DASH server may have an authentication scheme for the clients before being able to use the service; in a variation, it can be a user logging before getting access to the service. With such means, the server can link media parameters to a connected user or device.

When prior usage information (logs) is available for the concerned client device or user (test 702 true), by parsing the logs at step 705, the server can deduce most frequent usages for a given client or user. For example, it can deduce that the user or client device always selects an audio stream with French language and video stream in HD (High Definition). Moreover, the server can know whether this is a first request in the open TCP connection or not (client connected to service and requesting a second media presentation). In this case, the bandwidth estimation can be more accurate and reliable and the TCP congestion window may be bigger than for a first request. This can impact the choice made by the server in terms of suitable Representation, By registering DASH quality metrics, the server can have in its logs the changes among various representations the user/client usually performs. From this, the server determines the usual behaviour between "aggressive" or constant depending on the frequency of changes (by changes we mean switches to other Representation, whatever the criterion: bandwidth, resolution, frame rate, etc.). An aggressive client is a DASH client that will automatically switch to a different representation when its context changes. As an example, when monitoring bandwidth or buffer occupancy, an aggressive client will request a Representation with a different bandwidth as soon as a new Representation has characteristics closer to the client's context compared to the current Representation. In opposition, a constant client will try to avoid frequent Representation switches in order to maintain stable quality and display rate. When the user/client device behaviour is rather aggressive in terms of adaptation, the server then knows that whatever it selects as initial representation to start the streaming, the client will try to adapt in the following first seconds or minutes of streaming.

Wien preferences are deduced from logs, the server sets its confidence /eve/ parameter to the value "mid' at step 706. Indeed, this information may be a bit less relevant than explicit preferences signaling by the client itself (test 701 true).

When no log information is available (test 702 false), then the server puts its confidence_level parameter to the lowest value: "/oW' at step 707. This indicates that the server is performing a best guess on the information it pushes because it has no a priori information to decide. Further process in this case is described below (see step 711).

In parallel of this confidence_level parameter computation, the server may parse the manifest at step 708. In cases where the manifest is not liable to change very often (especially for on-demand service, in opposite to live service), the parsing of the manifest can be performed offline, once for all, by registering the description of the various Representations in a lookup table. This lookup table may also be used by the server to link clients' logs to some parts of the media presentation. This enables a faster log processing (see step 705 described above) to deduce some client's preferences.

The parsing of the manifest (step 708) provides information to the server at the time of selecting (at step 709) a suitable Representation as initial Representation (Le. initial media data) to start the streaming.

Both steps 703 and 705 (obtaining preferences respectively in the request or based on usage data from prior exchanges) consist in translating preferences or usages from client device/user into concrete parameters that would match MPD attributes. For example, it can be bandwidth, width and height of the video, the kind of codec in use, the language for subtitles or audio streams. Then, from the obtained values for these parameters, the server compares with values in the manifest to identify at step 709 the most convenient Representation to push to the client.

It may be noted that this step 709 is typically what the client device performs continuously in a dynamic and adaptive streaming protocol like DASH. Here, the same step is performed by the server at the beginning of the streaming session by MPD parsing means.

In case no suitable Representation can be deduced in 709, test 710 is false and the server puts its confidence level parameter to the "/oW' value On step 707 previously mentioned).

When the confidence_value parameter has the "lovv" value (either because no preferences could be determined or because no suitable Representation can be found based on preferences), the server decides at step 711 to select the simplest Representation. For video, for instance, the simplest Representation may be the Representation with lowest spatial resolution and designed for lowest bandwidth. According to a possible complementary feature (not represented in Figure 6), when there is no ambiguity on the codec (Le. all video Representations have the same value for the codec attribute, Le. the same codec, for example HEVC, has been used to encode all the video Representations), the confidence level parameter may be raised to the value "mid'.

The next step after step 711, or when a suitable Representation has been found (test 710 true), consists in identifying the initialization data (step 712). Indeed, in the DASH manifest (or description file), initialization information can be signaled in different ways: it can be explicitly put in an Initialization element of a SegmentBase, SegmentList or SegmentTemplate element that provides a direct URL to the initialization data.

In this case, this URL is put in a header field of the PUSH_PROMISE frame (see step 654 described above with reference to Figure 5) that will allow the client to identify the resource promised to be pushed (by specifying the variables:scheme, :host, and:path and eventually:Range).

When initialization data is not explicitly described, this means that media segments are self-initialized. In such case, the server has to parse the beginning of the segment (e.g. segment index information boxes for segments in mp4 format). Based on this analysis, it can build the corresponding URL with the appropriate byte range that will be put as header in the PUSH_PROMISE frame.

Once identified, a PUSH_PROMISE frame for initialization data is immediately sent to the client (step 713, corresponding to step 654 in Figure 5), immediately followed here by the push of the initialization data (step 717a, corresponding to step 657 in Figure 5). When initialization data are received, the client can then initialize its media decoders (step 717b).

Optionally, to improve the segment signaling and later identification by the client device when processing the PUSH_PROMISE frame (see step 806 described below), the server can indicate in step 713: the nature of the pushed data: initialization or media or both On case of self-initializing segments); the parameters of the URL template or an indication of the segment as a path in the MPD representation tree of Figure 4b (for example: P2AS21R211S1; Lea concatenation of element type followed by an identifier). It may be noted that this requires the client device to have received the MPD. Then, the server can decide to add this specific information only in the PUSH_PROMISE messages that it thinks will be processed after MPD reception by the client device. To help the decision at the client device on accepting or not a PUSH_PROMISE before the MPD reception and parsing, the server can indicate, instead of the segment path in the MPD, qualitative information on the pushed segment, such as whether it is a segment from a base layer or an enhancement layer; according to another example, the server can place in a header the attributes of the selected Representation with their values.

According to a possible embodiment (not represented on Figure 6), when parsing the manifest at step 708 determines that initialization data is present in top level elements of the manifest (i.e. whatever the Representations, the initialization data is common to all representations; for example in case of dependent Representation), the server can immediately (i.e, concurrently with step 708) send the PUSH_PROMISE frame designating initialization data with a confidence /eve/ parameter set to the value "high" since there is no risk of mismatch between pushed data and what the client would have chosen. The benefit of sending the confidence level parameter with the PUSH_PROMISE frame, for example as a HTTP header, is that it can help the client device in accepting or cancelling the push promise (see the description of Figure 7 below).

Thanks to this feature, the client will receive even earlier the initialization data required to setup its decoders (as the PUSH_PROMISE frame is sent early). This also works when initialization data is unique for a given media type (e.g. one single InitializationSegment per AdaptationSet whatever the number of Representations in this AdaptationSet). This even faster push would come just after the parsing of the manifest (step 708 described above), thus before processing logs or preferences (steps 701, 703 and 705 described above).

Then, if the confidence level parameter previously determined by the server is greater than or equal to the "mid' value (test 714), the server takes the initiative of pushing the first media data it considers as suitable for the client.

This is done iteratively in two steps: first a PUSH_PROMISE frame is sent (step 715, corresponding to step 656 in Figure 5) and then the push of first media data starts in step 719. This is repeated for each first media data segment that has been selected to be pushed in step 709.

According to a possible embodiment, when consecutive media segments are promised to be pushed (i.e. a plurality of PUSH_PROMISE are sent for respective 35 media segments), the PUSH_PROMISE associated to the current media segment is marked as a child or a follower of a previous PUSH_PROMISE (step 716). This can be put as a new HTTP header in the PUSH_PROMISE frame if the server is stateless or kept in a table if the server is stateful. Keeping this relationship can be useful to perform hierarchical cancel on push promises (as described below with reference to Figure 7).

A possible schedule of the various transmissions of data is as follows: before actually pushing first media data, the server starts pushing the initialization data in step 717a mentioned above; in parallel to sending the PUSH_PROMISE frame relating to first media data and initialization data, the server also sends the MPD file (manifest) at step 718 and keeps the stream open until the pushed data are completely sent.

In another embodiment, test 714 can be avoided to push first media data whatever the level of confidence. But in case the confidence level parameter is set to "/oW', the server may wait for a potential CANCEL from the client before actually pushing the first (or initial) media data.

When pushing the first media data, the server determines the overall quantity of data to push and the speed to use (flow control).

Regarding the first aspect, the server can exploit information from the manifest such as for example the minBufferTime attribute mentioned at the beginning of the manifest. Using this attribute, and considering the Representation selected in step 709 or 711, and given the segment duration attribute also provided in the manifest, the server easily determines the number of segments to push to fulfill the minBufferTime constraint (Le. the quantity of segments, hence the quantity of data, forming the initial media data to be pushed). Advantageously, when parsing of the manifest (step 708) is performed offline, this number of first media segments can be recorded in a table in a memory of the server.

Regarding the second aspect, given the duration of the segment and the bandwidth of the chosen Representation, an estimate of the required bitrate may be obtained by the server. This provides, mainly for video segments, the transmission rate to use. For example for a compressed video representation with bandwidth equal to 1.6 Mbits/s having segments of 5 seconds duration, each segment would represent 1 mega-byte of data to send. By default, the flow control in HTTP v2.0 provides a stream window size at most equal to 65535 bytes. Thus, in our example, this means that the client would have to send back to the server an acknowledgement for each packet of 65536 pushed bytes, so in our example more than 15 times per segment! Since we aim at reducing network roundtrips and traffic when using the push feature under development HTTP 2.0, we see clearly that there is a need here to modify the default behaviour (actually the default congestion window size) to enable DASH fast start (by reducing network traffic).

In case the client device sends preferences included in its request for the manifest, it can also indicate that a SETTINGS frame is to be sent immediately after the request; this SETTINGS frame specifies for instance an initial window size (SETTINGS_INITIAL_WINDOW SIZE) in line with its buffering capacities. According to a possible variation, this SETTINGS frame can be sent at connection setup time.

Another possibility is for the client device, when acknowledging the first pushed data, to send a VVINDOW UPDATE with appropriate size.

Figure 7 describes a possible method implemented by the client device, when exchanging data with the server executing a method for instance as described in Figure 6, in accordance with the teachings of the invention.

According to a possible application of this method, the client device connects to the server in order to benefit from a video on demand service. The connection establishment between the client and the server is conventional. In the present example, both the client device and the server are able to exchange messages using HTTP/2.0 protocol described for instance in the already mentioned document "Hypertext Transfer Protocol version 2.0, draft-ietf-httpbis-http2-latesr.

At a time (for instance when the user at the client device selects a given video), the client device gets information from the server on the address (e.g. the URL) of a manifest describing a media presentation (here the video the user would like to see).

The client device then prepares a request to download the manifest (step 800). In a preferred embodiment, the client adds through HTTP headers some preferences on the video resolution, codecs, bandwidth it supports (step 801). The client device then sends its request to the server (step 802).

In the present embodiment, the client device then sends at step 803 an HTTP/2.0 SETTINGS frame to indicate an initial window size (SETTINGS_INITIAL_WINDOW SIZE) in line with its buffering capacities (see the document "Hypertext Transfer Protocol version 2.0, draft-ietf-httphis-http2-latest" mentioned above, section 3.8.5).

In step 804, the client device starts processing the various server responses: receiving data forming the manifest and parsing it (step 805) but also the PUSH_PROMISE frame(s) sent by the server (step 806).

Before deciding to accept or to cancel the push(es) designated in the PUSH_PROMISE frame(s), the client builds the URL of the resource the server intends to push (step 806) and checks (step 807) the confidence level parameter that has been included in the PUSH_PROMISE frame by the server.

In parallel and when the manifest (or description file) is fully received, the client device builds (step 808) the list of desired media segments it would like to get (La the list of versions of each segment that best suit its needs) and initializes a current segment index variable to 0 (step 809). The first step in processing the PUSH_PROMISE consists (step 810a) in checking the confidence level parameter. Then, depending on (predefined) client settings or user preferences the client may decide to reject the PUSH_PROMISE under a certain level of confidence, for example the PUSH_PROMISEs for which the PUSH_PROMISE frames include a confidence level parameter with a "/ow" value.

If the client can match (step 810b) the URL mentioned in the PUSH_PROMISE frame with the URL of a desired segment (as derived from the manifest in step 808 as just mentioned), it initializes a table for a list of pending segments being transmitted with their transmission status (step 811). If the client cannot identify the segment intended to be pushed by the server in step 810b in the list of desired media segments, it then cancels the push (step 812) by sending an appropriate CANCEL instruction to the server.

To facilitate the segment identification at step 810b, the client can exploit additional header information like for example the index of the pushed segment, as the path in the MPD tree representation (see Figure 4b), or the URL template parameters when the description file (i.e. the MPD file or manifest) relies on Segment Template. This is a specific CANCEL message here (step 812) since using the hierarchical relationship inserted by the server when building the PUSH_PROMISE (see the description of Figure 6 above), the client can send a recursive CANCEL that will result in the cancellation of the current PUSH_PROMISE plus the following ones. According to a possible embodiment, when the client device cannot interpret the push promise, it stops by default all pushes of media data corresponding to the next temporal segments of a media resource.

This new usage of the CANCEL instructions will avoid the client to repeat CANCEL messages once it is desynchronized with the server in terms of media segment identification. In such case, the client will fall back to a pull mode.

When the segment to be received by the push from the server corresponds to a desired segment (test 810b true), the client then continues the processing of the PUSH_PROMISE frames (test 813 and loop on step 806).

When all PUSH_PROMISE frames have been processed, the client device expects and begins receiving and buffering (step 814) data corresponding to the accepted PUSH_PROMISE.

When enough media segments are received in the reception buffer of the client (test 815), they are processed by the client (816). The current segment index variable is then updated with the ordering number of the first segment in the list (step 817). It should be noted that not all clients may get access to the client's buffer. For example, web applications in particular do not usually have access to the web browser cache. In such a case, the server may send the list of pushed segments to the web application client directly. This information may be exchanged from the server to the client using a web socket connection, for instance.

When all pushed media segments have been processed, the client can then go back to standard pull-based DASH (step 818), starting requesting data corresponding to the next segment, designated by the variable segment index + 1. In parallel, the pushed segment data are used to start the decoding and the display of the selected video.

With reference to Figure 8, a particular hardware configuration of a device for streaming media data, or for receiving media data, able to implement methods according to the invention is now described by way of example.

A device implementing the invention is for example a microcomputer 50, a workstation, a server, a personal digital assistant, or a mobile telephone connected to various peripherals.

The peripherals connected to the device comprise for example a digital video camera 64, or a scanner or any other image acquisition or storage means, connected to an input/output card (not shown) and supplying video data to the device On particular to the server shown in Figure 1).

The device 50 comprises a communication bus 51 to which there are connected: -a central processing unit CPU 52 taking for example the form of a microprocessor; -a read only memory 53 in which may be contained the programs whose execution enables the methods according to the invention. It may be a flash memory or 5 EEPROM; -a random access memory 54, which, after powering up of the device 50, contains the executable code of the programs of the invention necessary for the implementation of the invention. As this memory 54 is of random access type (RAM), it provides fast access compared to the read only memory 53. This RAM memory 54 stores in particular the various video data, initialization data and description file used the processing is carried out; -a screen 55 for displaying data, in particular video and/or serving as a graphical interface with the user, who may thus interact with the programs according to the invention, using a keyboard 56 or any other means such as a pointing device, for example a mouse 57 or an optical stylus; -a hard disk 58 or a storage memory, such as a memory of compact flash type, able to contain the programs of the invention as well as data used or produced on implementation of the invention; -an optional diskette drive 59, or another reader for a removable data carrier, adapted to receive a diskette 63 and to read/write thereon data processed or to process in accordance with the invention; and -a communication interface 60 connected to the telecommunications network 61, the interface 60 being adapted to transmit and receive data.

In the case of audio data, the device 50 is preferably equipped with an input/output card (not shown) which is connected to a microphone 62.

The communication bus 51 permits communication and interoperability between the different elements included in the device 50 or connected to it. The representation of the bus 51 is non-limiting and, in particular, the central processing unit 52 unit may communicate instructions to any element of the device 50 directly or by means of another element of the device 50.

The diskettes 63 can be replaced by any information carrier such as a compact disc (CD-ROM) rewritable or not, a ZIP disk or a memory card. Generally, an information storage means, which can be read by a micro-computer or microprocessor, integrated or not into the device for processing a video sequence, and which may possibly be removable, is adapted to store one or more programs whose execution permits the implementation of the method according to the invention.

The executable code enabling the coding device to implement the invention may equally well be stored in read only memory 53, on the hard disk 58 or on a removable digital medium such as a diskette 63 as described earlier. According to a variant, the executable code of the programs is received by the intermediary of the telecommunications network 61, via the interface 60, to be stored in one of the storage means of the device 50 (such as the hard disk 58) before being executed.

The central processing unit 52 controls and directs the execution of the instructions or portions of software code of the program or programs of the invention, the instructions or portions of software code being stored in one of the aforementioned storage means. On powering up of the device 50, the program or programs which are stored in a non-volatile memory, for example the hard disk 58 or the read only memory 53, are transferred into the random-access memory 54, which then contains the executable code of the program or programs of the invention, as well as registers for storing the variables and parameters necessary for implementation of the invention. It will also be noted that the device implementing the invention or incorporating it may be implemented in the form of a programmed apparatus. For example, such a device may then contain the code of the computer program(s) in a fixed form in an application specific integrated circuit (ASIC).

The device described here and, particularly, the central processing unit 52, may implement all or part of the processing operations described in relation with Figures 1 to 7, to implement methods according to the present invention and constitute devices according to the present invention.

The above examples are merely embodiments of the invention, which is not limited thereby.

Claims

CLAIMS1. A method for providing data including at least one temporal segment representing media data divided according to periods of time, the method comprising the following steps implemented by a server device: receiving from a client device a request for a description file including a description of address information of a temporal segment; selecting data from among sets of data represented by the description file; sending the description file to the client device in response to receiving the request for the description file from the client device; and sending the selected data which is selected from among the sets of data represented by the description file, to the client device, in response to receiving the request for the description file from the client device.
2. The method for providing data according to claim 1, wherein the address information is an URL.
3. The method for providing data according to claim 1 or 2, wherein the media data is video and/or audio data.
4. The method for providing data according to any of claims 1 to 3, wherein the description file further describes at least one of a type of the media data, a bitrate of the media data, an encoding format of the media data, and time duration of the temporal segment.
5. The method for providing data according to any of claims 1 to 4, further comprising receiving a request for a temporal segment from the client device which received the description file; and sending the requested temporal segment to the client device in response to the request for the temporal segment from the client device.
6. The method for providing data according to any of claims 1 to 5, wherein the description file describes addresses for each of a plurality of temporal segments which are based on the same media data, wherein resolutions of the plurality of temporal segments are different.
7. The method for providing data according to any of claims 1 to 6, further comprising sending a push promise frame for indicating an intention of pushing the selected data to the client device, wherein the push promise frame is defined by 5 HTTP/2.
8. The method for providing data according to claim 7, wherein the description file is sent to the client device after sending the push promise frame to the client device.
9. The method for providing data according to any of claims 1 to 8, wherein the selected data is selected from among the sets of data by using preference data received from the client device.
10. The method for providing data according to claim 9, wherein the preference data includes at least one of a transmission rate of the media data and a preferred language.
11. The method for providing data according to any of claim 1 to 8, wherein the selected data is selected from among the sets of data by using registered information of the client device registered prior to receiving the request for the description file from the client device.
12. A method for receiving data including at least one temporal segment representing media data divided according to periods of time, the method comprising the following steps implemented by a client device: sending to a server device a request for a description file including a description of address information of a temporal segment; receiving the description file from the server device as a response of the request for the description file to the server device; and receiving, as a response of the request for the description file to the server device, selected data which is selected by the server device from among sets of data represented by the description file.
13. The method for receiving data according to claim 12, wherein the address information is an URL.
14. The method for receiving data according to claim 12 or 13, wherein the media data is video and/or audio data.
15. The method for receiving data according to any of claims 12 to 14, wherein the description file further describes at least one of a type of the media data, a bitrate of the media data, an encoding format of the media data, and time duration of the temporal segment.
16. The method for receiving data according to any of claims 12 to 15, further comprising: sending a request for a temporal segment to the server device which sentthe description file; andreceiving the requested temporal segment from the server device as a response of the request for the temporal segment to the server device.
17. The method for receiving data according to any of claims 12 to 16, wherein the description file describes addresses for each of a plurality of temporal segments which are based on the same media data, wherein resolutions of the plurality of temporal segments are different.
18. The method for receiving data according to any of claims 12 to 17, further comprising receiving a push promise frame for pushing the selected data from the server device, wherein the push promise frame is defined by HTTP/2.
19. The method for receiving data according to claim 18, wherein the description file is sent from the server device after sending the push promise frame.
20. The method for receiving data according to any of claims 12 to 19, wherein the selected data is selected by the server device from among the sets of data by using preference data.
21. The method for receiving data according to claim 20, wherein the preference data includes at least one of a transmission rate of the media data and a preferred language.
22. The method for receiving data according to any of claim 12 to 19, wherein the selected data is selected by the server device from among the sets of data by using registered information of the client device registered prior to sending the request for the description file to the server device.
23. A device for providing data including at least one temporal segment representing media data divided according to periods of time, comprising: a receiver configured to receive from a client device a request for a description file including a description of address information of a temporal segment; a selection module configured to select data from among sets of datarepresented by the description file;a sending module configured to send the description file to the client device in response to receiving the request for the description file from the client device; and a push module configured to send the selected data which is selected from among the sets of data represented by the description file, to the client device, in response to receiving the request for the description file from the client device.
24. The device for providing data according to claim 23, wherein the address information is an URL.
25. The device for providing data according to claim 23 or 24, wherein the media data is video and/or audio data.
26. The device for providing data according to any of claims 23 to 25, wherein the description file further describes at least one of a type of the media data, a bitrate of the media data, an encoding format of the media data, and time duration of the temporal segment.
27. The device for providing data according to any of claims 23 to 26, wherein the receiver is further configured to receive a request for a temporal segment from the client device which received the description file; and the sending module is further configured to send the requested temporal segment to the client device in response to the request for the temporal segment from the client device.
28. The device for providing data according to any of claims 23 to 27, wherein the description file describes addresses for each of a plurality of temporal segments which are based on the same media data, wherein resolutions of the plurality of temporal segments are different.
29. The device for providing data according to any of claims 23 to 28, further comprising a module configured to send a push promise frame for indicating an intention of pushing the selected data to the client device, wherein the push promise frame is defined by HTTP/2.
30. The device for providing data according to claim 29, wherein the description file is sent to the client device after sending the push promise frame to the client device.
31. The device for providing data according to any of claims 23 to 30, 20 wherein the selected data is selected from among the sets of data by using preference data received from the client device.
32. The device for providing data according to claim 31, wherein the preference data includes at least one of a transmission rate of the media data and a preferred language.
33. The device for providing data according to any of claim 23 to 30, wherein the selected data is selected from among the sets of data by using registered information of the client device registered prior to receiving the request for thedescription file from the client device.
34. A device for receiving data including at least one temporal segment representing media data divided according to periods of time, comprising: a sending module configured to send to a server device a request for a description file including a description of address information of a temporal segment; a receiver configured to receive the description file from the server device as a response of the request for the description file to the server device; and to receive, as a response of the request for the description file to the server device, selected data which is selected by the server device from among sets of datarepresented by the description file.
35. The device for receiving data according to claim 34, wherein the address information is an URL.
36. The device for receiving data according to claim 34 or 35, wherein the media data is video and/or audio data.
37. The device for receiving data according to any of claims 34 to 36, wherein the description file further describes at least one of a type of the media data, a bitrate of the media data, an encoding format of the media data, and time duration of the temporal segment.
38. The device for receiving data according to any of claims 34 to 37, wherein the sending module is further configured to send a request for a temporal segment to the server device which sent the description file; and the receiver is further configured to receive the requested temporal segment from the server device as a response of the request for the temporal segment to the server device.
39. The device for receiving data according to any of claims 34 to 38, wherein the description file describes addresses for each of a plurality of temporal segments which are based on the same media data, wherein resolutions of the plurality of temporal segments are different.
40. The device for receiving data according to any of claims 34 to 39, wherein the receiver is further configured to receive a push promise frame for pushing the selected data from the server device, wherein the push promise frame is defined by HTTP/2.
41. The device for receiving data according to claim 40, wherein the description file is sent from the server device after sending the push promise frame.
42. The device for receiving data according to any of claims 34 to 41, 5 wherein the selected data is selected by the server device from among the sets of data by using preference data.
43. The device for receiving data according to claim 42, wherein the preference data includes at least one of a transmission rate of the media data and a preferred language.
44. The device for receiving data according to any of claims 34 to 41, wherein the selected data is selected by the server device from among the sets of data by using registered information of the client device registered prior to sending the request for the description file to the server device.
45. A system for exchanging data comprising a device for providing data according to any of claims 23 to 33 and a device for receiving data according to any of claims 34 to 44.
46. A non-transitory computer-readable medium storing a program which, when executed by a microprocessor or computer system in a device, causes the device to perform the method according to any of claims 1 to 22.