CN113364728A

CN113364728A - Media content receiving method, device, storage medium and computer equipment

Info

Publication number: CN113364728A
Application number: CN202110229332.7A
Authority: CN
Inventors: 索达加伊拉吉
Original assignee: Tencent America LLC
Current assignee: Tencent America LLC
Priority date: 2020-03-03
Filing date: 2021-03-02
Publication date: 2021-09-07
Anticipated expiration: 2041-03-02
Also published as: CN113364728B

Abstract

The application provides a media content receiving method, a device, a storage medium and a computer device. The device comprises a receiving module, a determining module, an analyzing module and an output module. The receiving module is configured to receive a Media Presentation Description (MPD) file. The MPD file indicates a validity period, wherein remote element parsing of the MPD file is valid during the validity period. Based on the media presentation within the validity period, the determination module is to determine whether the remote element has been parsed within the validity period indicated by the MPD file. When it is determined that the remote element is not parsed within the validity period, the parsing module is configured to parse a remote element of the MPD file. And, each time a remote element of the MPD file is referenced during the validity period, the output module is to output media content corresponding to the same parsed remote element during the validity period.

Description

Media content receiving method, device, storage medium and computer equipment

Incorporation by reference

This application claims priority of REMOTE link expiration in media stream "U.S. provisional application 62/984,470 filed 3/2020 (REMOTE LINK VALIDITY INTERVAL IN MEDIA STREAMING", and 17/095,097 filed 11/2020 (REMOTE LINK VALIDITY INTERVAL IN MEDIA STREAMING), the entire contents of which are incorporated herein by reference in their entirety.

Technical Field

The present application relates to multimedia technologies, and in particular, to a method and an apparatus for receiving media content, a storage medium, and a computer device.

Background

The background description provided herein is intended to be a general presentation of the background of the application. Work of the presently named inventors, to the extent it is described in this background section, as well as aspects of the description, is not admitted to be prior art by inclusion in this application, nor is it expressly or implied that it is prior art to the present application.

For remote elements in Dynamic Adaptive Streaming over Hypertext transfer protocol (DASH) standardization based on Hypertext transfer protocol, the Moving Picture Experts Group (MPEG) uses at least one extensible markup language linking language (XLink) element (e.g., XLink link) in a Media Presentation Description (MPD). During media content playback, at least one XLink element in the MPD needs to be parsed before building the timeline for the remote element. However, the activation attribute (e.g., onRequest mode) of at least one XLink element requires that the at least one XLink element be parsed prior to playback of the media content. This requirement is ambiguous and can lead to inconsistent playback of media content by various players (e.g., players that randomly access or play back remote elements).

Disclosure of Invention

An aspect of the present application provides an apparatus for receiving media content. The device comprises a receiving module, a determining module, an analyzing module and an output module. Wherein the receiving module is configured to receive a Media Presentation Description (MPD) file indicating a validity period during which remote element parsing of the MPD file is valid. Based on the media presentation within the validity period, the determination module is to determine whether the remote element has been parsed within the validity period indicated by the MPD file. When it is determined that the remote element is not parsed within the validity period, the parsing module is configured to parse a remote element of the MPD file. The output module is to output, each time a remote element of the MPD file is referenced during the validity period, media content corresponding to the same parsed remote element during the validity period.

In one embodiment, the MPD file includes a first parameter and a second parameter, wherein the first parameter is used for indicating a start time of the validity period, and the second parameter is used for indicating a duration of the validity period.

In one embodiment, the first parameter is an offset between a start time of the validity period and a start time of the media content, wherein the media content corresponds to the parsed remote element.

In one embodiment, one of a period element, a basic attribute descriptor, and a supplemental attribute descriptor included in the MPD file includes the first parameter and the second parameter.

In one embodiment, the determining module is further configured to determine whether the remote element has been parsed within the validity period based on a hyperlink in an MPD file corresponding to the remote element.

In one embodiment, the hyperlinks are extensible markup language linking language (XLink) links.

In one embodiment, the time axis of the validity period is a time axis of the MPD file.

An aspect of the present application provides a method of receiving media content. In one method, a Media Presentation Description (MPD) file is received that indicates a validity period within which remote element parsing of the MPD file is valid. Determining whether the remote element has been parsed within the validity period indicated by the MPD file based on the media presentation within the validity period. When it is determined that the remote element is not parsed within the validity period, parsing a remote element of the MPD file. Outputting media content corresponding to the same parsed remote element during the validity period each time a remote element of the MPD file is referenced during the validity period.

The present application further provides a computer device comprising at least one processor and at least one memory, the at least one memory having at least one instruction stored therein, the at least one instruction being loaded and executed by the at least one processor to implement the above-described method of receiving media content.

The present application also provides a non-transitory computer readable medium having stored therein instructions, which when executed by a processor, cause the processor to perform the above method of receiving media content.

By adopting the technical scheme, the earliest time for analyzing the remote elements is provided for the client, so that the client is prevented from analyzing the remote elements earlier than the earliest time and obtaining an error result. An embodiment of the application further provides a validity period for remote element parsing, so that a client is prevented from parsing the same remote element for multiple times. In addition, because the client parses the remote element only once in the expected time, consistent playback may be provided. In addition, the client can randomly access the content or replay the content so that the same content can be played.

Drawings

Other features, properties, and various advantages of the disclosed subject matter will be further apparent from the following detailed description and the accompanying drawings.

FIG. 1 illustrates an exemplary Dynamic Adaptive Streaming over Hypertext transfer protocol (DASH) system based on session and Hypertext transfer protocol in accordance with embodiments of the present application;

fig. 2 illustrates another exemplary session-based DASH system according to an embodiment of the present application;

fig. 3 illustrates an exemplary DASH client architecture in accordance with an embodiment of the present application;

FIG. 4 illustrates an exemplary streaming architecture according to an embodiment of the present application;

FIG. 5 illustrates an exemplary validity period for a remote element in accordance with an embodiment of the present application;

FIG. 6 shows a flowchart outlining an example of a process according to an embodiment of the present application; and

FIG. 7 shows a schematic diagram of a computer system according to an embodiment of the application.

Detailed Description

I. Dynamic Adaptive Streaming Over Hypertext Transfer Protocol (DASH) and Media Presentation Description (MPD) based on Hypertext Transfer Protocol

A dynamic adaptive streaming over hypertext transfer protocol DASH is an adaptive bit rate streaming technology, and streaming of media Content can be achieved by using a hypertext transfer protocol (HTTP) infrastructure (e.g., a web server, a Content Delivery Network (CDN), various proxies, a cache, and the like). DASH supports on-demand and live streaming from DASH servers to DASH clients and allows DASH clients to control streaming sessions so that DASH servers do not need to handle the load of additional streaming adaptation management in large-scale deployments. DASH also allows DASH clients to select streaming from various DASH servers, thus enabling further load balancing of the network for the benefit of the DASH clients. DASH provides dynamic switching between different media tracks, e.g., by changing the bit rate to adapt to network conditions.

In DASH, a media presentation description MPD file provides information for a DASH client by downloading media segments from a DASH server, thereby adapting streaming content. The MPD file may be segmented and partially transmitted to reduce session initiation delay. During the streaming session, the MPD file may be updated. In some examples, the MPD file supports content accessible characteristics, ratings, and representation of camera views. DASH also supports the delivery of multi-view and scalable encoded content.

The MPD file may include a sequence of at least one period, where each period may be defined by a period element in the MPD file. The MPD file may include the availableStartTime attribute of the MPD and the start attribute for each period. For a media presentation with a dynamic type (e.g., for a live service), the sum of the start attribute of the period, the MPD attribute availableStartTime, and the duration of the media segment may indicate the availability time of the period, in particular the first media segment corresponding to each representation in the period, in a coordinated Universal Time (UTC) format. For media presentations with static types (e.g., for on-demand services), the start attribute for the first time period may be 0. For any other period, the start attribute may specify a time offset between the start time of the corresponding period relative to the start time of the first period. Each time period may extend all the way to the start of the next time period or, in the case of the last time period, to the end of the media presentation. The slot start time may be accurate and reflect the actual timing of playing the media of all previous slots.

Each time period may contain at least one adaptation set, and each adaptation set may contain at least one representation of the same media content. The representation may be one of a plurality of alternative encoded versions of audio or video data. The representation may differ according to the type of encoding, e.g. according to bit rate, resolution, codec used for video data and bit rate, and/or codec used for audio data. The term representation may be used to refer to a portion of encoded audio or video data that corresponds to a particular period of multimedia content and is encoded in a particular manner.

The adaptation set for a particular period may be assigned to the group indicated by the group attribute in the MPD file. Adaptation sets in the same group are generally considered alternatives to each other. For example, each adaptation set for a particular time period of video data may be assigned to the same group, such that any adaptation set may be selected for decoding to display video data for the multimedia content for the corresponding time period. In some examples, media content within a time period may be represented by one adaptation set (if any) of group 0, or a combination of each non-zero group containing at most one adaptation set. The timing data for each time period representation (presentation) may be expressed relative to the start time of the time period.

The representation (representation) may comprise at least one segment. Each representation may include an initialization segment, or each segment of the representation may be self-initializing. If the initialization segment exists, the initialization segment may contain initialization information for accessing the representation (presentation). In some cases, the initialization segment does not contain media data. A segment may be uniquely referenced by an Identifier, where the Identifier includes a Uniform Resource Locator (URL), a Uniform Resource Name (URN), or a Uniform Resource Identifier (URI), among others. The MPD file may provide an identifier for each segment. In some examples, the MPD file may also provide byte ranges in the form of range attributes, where the byte ranges may correspond to data of segments within a file accessed by a URL, URN, or URI.

Each representation (rendering) may also include at least one media component, where each media component may correspond to an encoded version of a separate media type, such as audio, video, or chronological text (e.g., for closed captioning). Within one representation (rendering), the media components may span the boundaries of a continuous media segment and be temporally continuous.

In some embodiments, the DASH client may access and download the MPD file from a DASH server. That is, the DASH client may retrieve the MPD file for initiating the real-time session. Based on the MPD file, and for each selected representation (presentation), the DASH client may make a number of decisions, including determining what the most recent segment is available in the server, determining the segment availability start times for the next and possible future segments, determining when to start playing of the segments and from which timeline in the segments, and determining when to fetch/acquire a new MPD file. Once the service is played, the client may monitor the offset between the live broadcast service and its own broadcast, and need to detect and compensate its own broadcast.

Session-Based DASH operation and Session-Based Description (SBD)

Note that the MPD file may be generic for all DASH clients. In order to dedicate MPD files to sessions of DASH clients, the Moving Picture Expert Group (MPEG) provides session-based DASH operations. In session-based DASH operations, a DASH client may receive a side file, such as a session-based description file. The side file may provide instructions for DASH clients to customize each session, and possibly an MPD file for each client.

Fig. 1 illustrates an exemplary session-based DASH system (100) according to an embodiment of the present application. In a session-based DASH system (100), MPD files are sent from a DASH server (101) (e.g., a content server) to a DASH client (102). The DASH client (102) may receive media segments from the DASH server (101) based on the MPD file. The DASH client (102) may send a request to the DASH server (101) to update the MPD file. Additionally, the DASH client (102) may receive SBD files from the DASH server (101) or a third party (e.g., a session controller).

Note that multiple DASH servers may send MPD files and media segments, so MPD files and media segments may be sent from different DASH servers to a DASH client (102). Additionally, the DASH server receiving the request sent from the DASH client (102) may be different from the DASH server sending the media segments.

According to aspects of the present application, an SBD file may include a plurality of time ranges and corresponding key-value pairs (or name-value pairs) as well as other metadata. The SBD file may be referenced in the MPD file by, for example, a URL. The SBD file may be used to customize MPD files received by the DASH client (102) to be specific to a session of the DASH client (102). For example, the SBD file may allow session-specific elements to be added to the segment URL without generating a unique per-session MPD.

Fig. 2 illustrates another exemplary session-based DASH system (200) according to an embodiment of the present application. In a session-based DASH system (200), a content generation device (201) (e.g., a smartphone) prepares and generates multimedia content, which content generation device (201) may include an audio source (e.g., a microphone) and a video source (e.g., a camera). The content generation device (201) may store or transmit multimedia content to the content server (202), wherein the content server (202) may store various multimedia content. A content server (202) may receive a request for at least one media segment of multimedia content from a client device, such as a DASH access client (203). Multimedia content is described by an MPD file, which may be stored and updated by a content server (202), and which may be accessed by client devices including DASH access clients (203) to retrieve media segments.

To retrieve session-specific media segments, the DASH access client (203) may send a request to an SBD client (204) (e.g., a session client) to access SBD files, where the SBD files are received by the SBD client (204) and include a plurality of time ranges and corresponding key-value pairs for the current session. For example, the DASH access client (203) may send the key name and time range to the SBD client (204), and then the SBD client (204) resolves the key name and time range and returns the values corresponding to the key name and time range to the DASH access client (203). The DASH access client (203) may include this value in the query of the segment URL, which may be sent to the content server (202) for requesting the session-specific media segment when the segment request is an HTTP GET or partial GET request.

Note that the SBD client (204) may receive multiple SBD files from different session controllers, such as session controller (205) and session controller (206).

According to aspects of the present disclosure, any or all of the features of a Content Delivery Network (CDN) Content server (202) (e.g., a DASH server) may be implemented on at least one device, such as a router, bridge, proxy device, switch, or other device. The content server (202) may include a request processing unit to receive a network request from a client device (e.g., a DASH access client (203)). For example, the request processing unit may be operative to receive an HTTP GET or partial GET request and provide data of the multimedia content for the request. The request may specify a segment using the URL of the segment. In some examples, the request may also specify at least one byte range of the segment, thus including a partial GET request. The request processing unit may be further configured to service the HTTP HEAD request to provide the header data of the segment.

In some embodiments, the content generation device (201) and the content server (202) may be coupled by a wireless or wired network, or may be directly communicatively coupled.

In some embodiments, the content generation device (201) and the content server (202) may be included in the same device.

In some embodiments, a content server (202) and session controllers (205) - (206) may be included in the same device.

In some embodiments, the content server (202) and DASH access client (203) may be coupled through a wireless network or a wired network.

In some embodiments, SBD client (204) and session controllers (205) - (206) may be coupled by a wireless or wired network, or may be communicatively coupled directly.

In some embodiments, a DASH access client (203) and an SBD client (204) may be included in the same device.

Dash client architecture

Fig. 3 illustrates an exemplary DASH client architecture according to an embodiment of the present application. A DASH client (or DASH player), such as DASH client (102), may be used to communicate with applications (312) and handle various types of events, including (i) MPD events, (ii) in-band events, and (iii) timing metadata events.

The manifest parser (310) may parse a manifest (e.g., MPD). For example, the manifest may be provided by a DASH server (101). The manifest parser (310) may extract event information about MPD events, in-band events, and timing metadata events embedded in timing metadata tracks. The extracted event information may be provided to DASH logic (311) (e.g., DASH player control, selection, and heuristics logic). Based on the event information, DASH logic (311) may notify the application (312) of the event scheme signaled in the manifest.

The event information may include event scheme information that distinguishes different event streams. The application (312) may subscribe to an event schema of interest using the event schema information. The Application (312) may further indicate, via at least one subscription Programming interface (API), a scheduling mode required for each subscription plan. For example, the application (312) may send a subscription request to a DASH client that identifies at least one event schema of interest and any required corresponding scheduling patterns.

If an application (312) subscribes to at least one event schema that is transmitted as part of at least one timed metadata track, an in-band event and "moof" parser (303) may stream the at least one timed metadata track to the timed metadata track parser (304). For example, the in-band event and "moof" parser (303) parses movie fragment boxes ("moof"), and then parses the timed metadata tracks based on control information of DASH logic (311).

The timed metadata track parser (304) may extract event messages embedded in the timed metadata track. The extracted event messages may be stored in an event and timing metadata buffer (306). A synchronizer/scheduler module (308) (e.g., an event and timing metadata synchronizer and scheduler) may schedule (or send) subscribed events to an application (312).

MPD events described in the MPD may be parsed by the manifest parser (310) and stored in the event and timing metadata buffer (306). For example, the manifest parser (310) parses each event stream element of the MPD and parses each event described in each event stream element. Event information, such as presentation time and event duration, for each event signaled in the MPD may be stored in the event and timing metadata buffer (306) associated with the event.

An in-band event and "moof" parser (303) can parse the media segments to extract in-band event messages. Any in-band events so identified and associated presentation times and durations may be stored in an event and timing metadata buffer (306).

Accordingly, the event and timing metadata buffer (306) may store MPD events, in-band events, and/or timing metadata events. The event and timing metadata buffer (306) may be, for example, a First-In-First-Out (FIFO) buffer. The event and timing metadata buffer (306) may be managed in correspondence with the media buffer (307). For example, any event or timing metadata corresponding to a media segment may be stored in the event and timing metadata buffer (306) as long as the media segment is present in the media buffer (307).

The DASH access API (302) may manage the retrieval and reception of content streams (or data streams) that include media content and various metadata through the HTTP protocol stack (301). The DASH access API (302) may separate the received content stream into different data streams. The data stream provided to the in-band event and "moof" parser (303) may include a media segment, at least one timing metadata track, and in-band event signaling included in the media segment. In an embodiment, the data stream provided to the manifest parser (310) may comprise an MPD.

The DASH access API (302) may forward the manifest to the manifest parser (310). In addition to describing events, manifests may also provide information about media segments to DASH logic (311), DASH logic (311) may communicate with applications (312) and in-band events and "moof" parser (303). The application (312) may be associated with media content handled by a DASH client. Based on the information about the media segments provided in the manifest, control/synchronization signals exchanged between the application (312), DASH logic (311), manifest parser (310) and DASH access API (202) may control the retrieval of the media segments from the HTTP stack (301).

An in-band event and "moof" parser (303) may parse the media data stream into media segments, wherein the media segments include media content, timing metadata in the timing metadata track, and any signaled in-band events in the media segments. Media segments comprising media content may be parsed by a file format parser (305) and stored in a media buffer (307).

Events stored in the event and timing metadata buffer (306) may allow the synchronizer/scheduler (308) to communicate available events (or events of interest) related to the application (312) through the event/metadata API. The application (312) may be used to process available events (e.g., MPD events, in-band events, or timing metadata events) and subscribe to specific events or timing metadata through the notification synchronizer/scheduler (308). The events stored in the event and timing metadata buffer (306) are independent of the application (312), while any events related to the DASH client itself may be forwarded by the synchronizer/scheduler (308) to DASH logic (311) for further processing.

If the application (312) subscribes to a particular event, the synchronizer/scheduler (308) may communicate to the application (312) an event instance (or timing metadata sample) corresponding to the event schema to which the application (312) has subscribed. The event instances can be communicated according to a scheduling mode (e.g., for a particular event scenario) or a default scheduling mode indicated by the subscription request. For example, in a receive scheduling mode, when an event instance is received in the event and timing metadata buffer (306), the event instance may be sent to the application (312). On the other hand, in the start-up scheduling mode, an event instance may be sent to the application (312) at a presentation time associated with the event instance, e.g., when synchronized with a timing signal of the media decoder (309).

Remote elements in dash

Targeting-based media content customization is a popular and important feature in media streaming. This feature may allow the client (or media player) to play media content based on at least one factor, such as the client's preference, location, age, and interest, among other factors. Thus, each client may have different media content. In this regard, targeting-based advertising is an important use case. Client-side target-based media content customization may be achieved through remote elements (or remote resources) in media streaming protocols, such as DASH and other streaming protocols. The remote element may be an element (e.g., a period element) that is not explicitly defined in a manifest (e.g., MPD) processed by the client. Instead, a link corresponding to the remote element may be provided in the manifest. Through the link, the client may obtain remote element resolution. The client can then replace the remote element with the remote element parse and play back media content corresponding to the remote element based on the remote element parse.

In some related examples, such as DASH, an extensible markup language (XML) linking language (XLink) may be used for the remote elements. XLink may allow elements (e.g., XLink links) to be inserted into an XML document in order to create and describe links between resources. A resource may be an addressable unit of information or traffic. Examples include files, images, documents, programs, and query results. The method of addressing the Resource may be an Internationalized Resource Identifier (IRI) reference. Using XLink links may be referred to as traversal, and traversal may involve a pair of resources: a starting resource and an ending resource. The starting resource is the starting point of the traversal. The ending resource is the end point of the traversal. By parsing the XLink, the client can obtain a remote element parse that is a traversal from the starting resource to the ending resource.

A remote element using XLink may include two attributes @ XLink: href and @ XLink: act. The attribute @ xlink: href may include a URL that points to the full description of the remote element, and the attribute @ xlink: act may specify the parsing model (e.g., onLoad or onRequest) of the remote element. Table 1 shows remote elements and corresponding remote element resolutions. The remote element is a period element that can be resolved into two period elements.

TABLE 1

According to aspects of the present application, the activation attribute of the XLink can be used with the remote element to convey the time required to parse the remote element. The activation attributes of an XLink link may include an onLoad mode and an onRequest mode.

In onLoad mode, the remote element needs to be parsed before the media content is played back. Once the remote element is parsed, the entire media content can be played so that the application or client can immediately traverse to the ending resource when the starting resource is loaded. For example, in onLoad mode, the remote element resolution is not the most up-to-date resolution. That is, after parsing the remote element, the targeting-based media content customization (e.g., targeting-based advertising) may not be updated for each reference to the remote element, and may be the same as the previous targeting-based advertising.

In onRequest mode, a remote element may be parsed a few minutes before media content corresponding to the remote element needs to be played. This allows further customization of the target-based media content according to the time of playback. Thus, the remote element resolution may be the most recent resolution. That is, even if the remote element has been previously parsed, the remote element needs to be parsed again for each reference of the remote element in order to update the target-based media content (e.g., the target-based advertisement), and unlike the previous target-based advertisement. For example, in a post-load event triggered for a traversal, an application or client may traverse from a starting resource to an ending resource. An example of such an event is when a user clicks on a presentation that initiates a resource, or the software module counts down before redirection is completed.

However, since the manifest may be updated in some applications such as DASH, the effectiveness of remote element parsing may not be clear in onRequest mode. In some related examples, such obscured effectiveness may result in inconsistent playback of media content depending on the player implementation and/or inventory.

V. validity period of remote element resolution

The present application introduces a validity period (or period) for remote element parsing. The interval may define a period of time during which remote element parsing is valid. Thus, the client may know when it is the earliest time to parse the remote element and does not attempt to parse the remote element earlier than the earliest time. In addition, after parsing the remote element, the client may know the valid time of the parsed remote element. Thus, to avoid inconsistencies in playback, the client need not parse the remote element again within the validity period. If the client returns to the same point within the validity period, the client may play the same content.

Fig. 4 illustrates an exemplary streaming architecture according to an embodiment of the application. The streaming architecture includes two content servers (401) - (402) and a client (403). The interfaces between the two content servers (401) - (402) and the client (403) may be represented by IF1 and IF2, respectively. Assume that the content server (401) can provide a manifest to the client (403) for playback of the target-based media content. In one embodiment, the manifest includes three periods P1, P2, and P3. Specifically, P1 and P3 may be explicit periods in which the corresponding media content may reside in the content server (401). P2 may be a remote period in which XLink links (P2 links) pointing to the content server (402) may define corresponding media content.

In some related examples, when the client (403) parses the manifest, the client (403) may need to parse a P2 link to the content server (402). But the client (403) may not know the exact time to resolve the P2 link. Additionally, when updating a manifest such as an MPD, it is unclear to the client (403) whether to resolve the P2 link again.

To provide consistent playback of media content, two parameters, namely, a Validity Start Time (VST) and a Validity Duration (VD), may be included in a hyperlink (e.g., an XLink link for a remote element). The VST parameter is the start time in the media presentation timeline at which parsing the remote element can result in a valid explicit element. The VD parameter is a duration of time during which remote element resolution remains valid. Thus, these two attributes may define the lifetime (or lifetime) of the remote element parsing. If the current media presentation time exceeds the lifetime of the remote element resolution, the remote element resolution may be considered expired. Otherwise, element parsing may be used.

FIG. 5 illustrates an exemplary remote element validity period according to an embodiment of the present application. In fig. 5, a client (e.g., client (403)) receives a manifest such as an MPD that includes three periods P1, P2, and P3, where P1 and P3 are explicit periods and P2 is a remote period. A hyperlink, such as an Xlink link, of P2 may include two variables, startOffset and duration. Thus, the VST parameter and the VD parameter may be calculated using VST-startOffset and VD-duration, respectively, where the variable PeriodStart is the start time of the period (e.g., P2) corresponding to the remote element.

If the client's current media presentation time reaches the MPD time corresponding to the VST, or any time thereafter, the client may check whether there is a valid resolution of the remote element. If there is an analytic, the analytic is valid for the validity period, i.e., VST ≦ current time < VST + VD. However, if no such parsing exists, the client may (again) parse the remote element. Note that the VD parameter may be a number, the duration of the parent element of the remote element, or the duration of the MPD.

Table 2 shows an exemplary syntax table that supports signaling during active periods of remote element parsing at the cycle level. In table 2, the variable remotelinkingstartoffset may be an offset between the start time of the period corresponding to the remote element and the start time of the valid period for the remote element parsing. Thus, the variable remoteLinkStartOffset may indicate the earliest time to parse the remote element. The variable remoteLinkDuration may be the duration of the validity period of the remote element resolution. In one example, the variable remoteLinkDuration may equal-1, indicating that the validity period may last until the end of the period. In another example, the default value of the variable remoteLinkDuration may be equal to INF (e.g., a maximum integer) indicating that the validity period may last to the end of the entire media presentation.

TABLE 2

In an embodiment, the remote element may be signaled to parse a start offset variable and a duration variable within the validity period by using a base attribute descriptor or a supplemental attribute descriptor. For example, a descriptor may include three attributes: identity, scheme URI and value. The schemeURI attribute may be used to signal a remote element such as an XLink link. Value attributes may be used for the start offset variable and the duration variable. The values of the start offset variable and the duration variable may be separated by a space. The descriptors may be at various levels, such as an MPD level or a period level.

In addition, the same descriptor with the same schemeURI and without value can be used at MPD level, signaling the presence of such descriptors during the validity period of remote element parsing, so that the client can be aware that the corresponding functionality is needed to play the media content correctly.

The start offset variable within the validity period may provide the client with the earliest time to parse the remote element so that the client may avoid parsing the remote element earlier than the earliest time to obtain erroneous results. Additionally, the duration variable may provide a lifetime for remote element parsing such that the client may avoid parsing the same remote element multiple times. Consistent playback of media content may be provided during the validity period, as the client only needs to parse the remote element once during the validity period. In one use case, the validity period may be set to a longer duration, in which case the client may play the same media content whenever the remote element is randomly accessed or played back. In another use case, the validity period may be set to a shorter duration, in which case the client may retrieve and play the media content, respectively, whenever the remote element is randomly accessed or played back.

According to aspects of the present application, the validity period of the remote element resolution may be constructed by including two attributes in the link corresponding to the remote element. These two attributes are the start time and duration of remote element parsing. Timing of the use period may be defined based on a media presentation timeline. The duration of remote element parsing may be defined in terms of the duration of the parent element of the remote element. The duration attribute may be defined to be infinitely valid until the validity, or an explicit amount of duration, of the duration of the parent element. The start time attribute may be signaled by using an offset value from the start time of the parent element of the remote element. These attribute values may be signaled as new attributes or descriptors of the parent element, where the descriptors may be basic attribute descriptors or supplemental attribute descriptors. Additionally, the presence of such attributes may be signaled at the MPD level such that the client may be aware that the client may need to support functionality to obtain a lifetime of remote element parsing prior to parsing a period corresponding to the remote element in order to consistently playback media content corresponding to the remote element

VI. flow chart

Fig. 6 shows a flow chart summarizing a process (600) according to an embodiment of the application. In various embodiments, process (600) is performed by processing circuitry, such as processing circuitry in a DASH client (102), processing circuitry in a DASH access client (203), processing circuitry in an SBD client (204), and so forth. In some embodiments, process (600) is implemented in software instructions, such that when processing circuitry executes software instructions, processing circuitry performs process (600). The process (600) begins (S610), where the process (600) receives an MPD file indicating a validity period during which remote element parsing of the MPD file is valid. Then, the process (600) proceeds to step (S620).

At step (S620), if it is determined that the media presentation is within the validity period, the process (600) determines whether the remote element has been parsed within the validity period indicated by the MPD file. If the remote element is not resolved within the validity period, the process (600) proceeds to step (S630). Otherwise, the process (600) proceeds to step (S640).

At step (S630), the process (600) parses the remote element of the MPD file for the validity period. Then, the process (600) proceeds to step (S640).

At step (S640), for each reference to a remote element in the MPD file within a validity period, the process (600) outputs media content, where the media content corresponds to the same parsed remote element within the validity period.

The process (600) then ends.

In an embodiment, the MPD file includes a first parameter and a second parameter, wherein the first parameter indicates a start time of the validity period, and the second parameter indicates a duration of the validity period.

In an embodiment, the first parameter is an offset between a start time of the validity period and a start time of media content corresponding to the remote element.

In an embodiment, one of the base attribute descriptor and the supplemental attribute descriptor includes a first parameter and a second parameter in a period element included in the MPD file.

In an embodiment, the processing circuitry determines whether the remote element has been parsed within the validity period based on a hyperlink included in an MPD file corresponding to the remote element.

In an embodiment, the hyperlinks are extensible markup language linking language (XLink) links.

In an embodiment, the timeline of the validity period is the timeline of the MPD file.

The present application also provides an apparatus for receiving media content, comprising: a receiving module to receive a media presentation description, MPD, file indicating a validity period during which a remote element parsing of the MPD file is valid; a determination module to determine whether the remote element has been parsed within a validity period indicated by the MPD file based on determining the media presentation within the validity period; a parsing module, configured to parse a remote element of the MPD file when it is determined that the remote element is not parsed within the validity period; and an output module, configured to output, each time a remote element of the MPD file is referenced during the validity period, media content corresponding to the same parsed remote element during the validity period.

In one implementation, the MPD file includes a first parameter and a second parameter, where the first parameter indicates a start time of the validity period, and the second parameter indicates a duration of the validity period.

In one implementation, the first parameter is an offset between a start time of the validity period and a start time of the media content, wherein the media content corresponds to the parsed remote element.

In one implementation, one of a period element, a basic attribute descriptor, and a supplemental attribute descriptor included in the MPD file includes the first parameter and the second parameter.

In one implementation, the determining module is further configured to: determining whether the remote element has been parsed within the validity period based on a hyperlink in an MPD file corresponding to the remote element.

In one implementation, the hyperlinks are extensible markup language linking language (XLink) links. The time axis of the validity period may be a time axis of the MPD file.

The present application further provides a computer apparatus comprising at least one processor and at least one memory having at least one instruction stored therein, the at least one instruction being loaded and executed by the at least one processor to implement the method of fig. 6.

The present application also provides a non-transitory computer readable medium having stored therein instructions that, when executed by a processor, cause the processor to perform the method of fig. 6.

The present application also provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read by a processor of the computer device from the computer-readable storage medium and executed by the processor to cause the computer device to perform the method provided by fig. 6, described above.

VII. computer System

The techniques described above may be implemented as computer software via computer readable instructions and physically stored in one or more computer readable media. For example, fig. 7 illustrates a computer system (700) suitable for implementing certain embodiments of the disclosed subject matter.

The computer software may be encoded in any suitable machine code or computer language, and by assembly, compilation, linking, etc., mechanisms create code that includes instructions that are directly executable by one or more computer Central Processing Units (CPUs), Graphics Processing Units (GPUs), etc., or by way of transcoding, microcode, etc.

The instructions may be executed on various types of computers or components thereof, including, for example, personal computers, tablets, servers, smartphones, gaming devices, internet of things devices, and so forth.

The components illustrated in FIG. 7 for the computer system (700) are exemplary in nature and are not intended to limit the scope of use or functionality of the computer software implementing the embodiments of the application in any way. Neither should the configuration of components be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary embodiments of the computer system (700).

The computer system (700) may include some human interface input devices. Such human interface input devices may respond to input from one or more human users through tactile input (e.g., keyboard input, swipe, data glove movement), audio input (e.g., sound, applause), visual input (e.g., gestures), olfactory input (not shown). The human-machine interface device may also be used to capture media that does not necessarily directly relate to human conscious input, such as audio (e.g., voice, music, ambient sounds), images (e.g., scanned images, photographic images obtained from still-image cameras), video (e.g., two-dimensional video, three-dimensional video including stereoscopic video).

The human interface input device may include one or more of the following (only one of which is depicted): keyboard (701), mouse (702), touch pad (703), touch screen (710), data glove (not shown), joystick (705), microphone (706), scanner (707), camera (708).

The computer system (700) may also include certain human interface output devices. Such human interface output devices may stimulate the senses of one or more human users through, for example, tactile outputs, sounds, light, and olfactory/gustatory sensations. Such human interface output devices may include haptic output devices (e.g., haptic feedback through a touch screen (710), data glove (not shown), or joystick (705), but there may also be haptic feedback devices that do not act as input devices), audio output devices (e.g., speaker (709), headphones (not shown)), visual output devices (e.g., screens (710) including cathode ray tube screens, liquid crystal screens, plasma screens, organic light emitting diode screens, each with or without touch screen input functionality, each with or without haptic feedback functionality-some of which may output two-dimensional visual output or more than three-dimensional output by means such as stereoscopic picture output; virtual reality glasses (not shown), holographic displays and smoke boxes (not shown)), and printers (not shown). These virtual output devices (e.g., screen (710)) may be connected to the system bus (708) through a graphics adapter (750).

The computer system (700) may also include human-accessible storage devices and their associated media such as optical media including compact disc read-only/rewritable (CD/DVD ROM/RW) with CD/DVD (720) or similar media (721), thumb drives (722), removable hard drives or solid state drives (723), conventional magnetic media such as magnetic tapes and floppy disks (not shown), ROM/ASIC/PLD based application specific devices such as secure dongle (not shown), and the like.

Those skilled in the art will also appreciate that the term "computer-readable medium" used in connection with the disclosed subject matter does not include transmission media, carrier waves, or other transitory signals.

The computer system (700) may also include a network interface (754) to at least one communication network (755). For example, the at least one communication network (755) may be wireless, wired, optical. The at least one communication network (755) may also be a local area network, a wide area network, a metropolitan area network, a vehicular network, and industrial network, a real-time network, a delay tolerant network, and the like. Examples of the at least one communication network (755) also include ethernet, wireless local area networks, local area networks such as cellular networks (GSM, 3G, 4G, 5G, LTE, etc.), television wired or wireless wide area digital networks (including cable, satellite, and terrestrial broadcast television), vehicular and industrial networks (including CANBus), and so forth. Some networks typically require external network interface adapters for connecting to some general purpose data ports or peripheral buses (749) (e.g., USB ports of computer system (700)); other systems are typically integrated into the core of the computer system (700) by connecting to a system bus as described below (e.g., an ethernet interface to a PC computer system or a cellular network interface to a smart phone computer system). Using any of these networks, the computer system (700) may communicate with other entities. The communication may be unidirectional, for reception only (e.g., wireless television), unidirectional for transmission only (e.g., CAN bus to certain CAN bus devices), or bidirectional, for example, to other computer systems over a local or wide area digital network. Each of the networks and network interfaces described above may use certain protocols and protocol stacks.

The human interface device, human accessible storage device, and network interface described above may be connected to the core (740) of the computer system (700).

The core (740) may include one or more Central Processing Units (CPUs) (741), Graphics Processing Units (GPUs) (742), special purpose programmable processing units in the form of Field Programmable Gate Arrays (FPGAs) (743), hardware accelerators (744) for specific tasks, and so forth. These devices, as well as Read Only Memory (ROM) (745), random access memory (746), internal mass storage (747) (e.g., internal non-user accessible hard drives, solid state drives, etc.), etc., may be connected by a system bus (748). In some computer systems, the system bus (748) may be accessed in the form of one or more physical plugs, so as to be extendable by additional central processing units, graphics processing units, and the like. The peripheral devices may be attached directly to the system bus (748) of the core or connected through a peripheral bus (749). The architecture of the peripheral bus includes peripheral controller interface PCI, universal serial bus USB, etc.

CPU (741), GPU (742), FPGA (743), and accelerator (744) may execute certain instructions that, in combination, may constitute the computer code described above. The computer code may be stored in ROM (745) or RAM (746). The transitional data may also be stored in RAM (746), while the persistent data may be stored in, for example, internal mass storage (747). Fast storage and retrieval of any memory device can be achieved through the use of cache memories, which can be closely associated with one or more CPUs (741), GPUs (742), mass storage (747), ROM (745), RAM (746), and the like.

The computer-readable medium may have computer code thereon for performing various computer-implemented operations. The media and computer code may be those specially designed and constructed for the purposes of the present application, or they may be of the kind well known and available to those having skill in the computer software arts.

By way of example, and not limitation, a computer system having architecture (700), and in particular cores (740), may provide functionality as a processor (including CPUs, GPUs, FPGAs, accelerators, etc.) executing software embodied in one or more tangible computer-readable media. Such computer-readable media may be media associated with the user-accessible mass storage described above, as well as certain storage with a non-volatile core (740), such as core internal mass storage (747) or ROM (745). Software implementing various embodiments of the present application may be stored in such devices and executed by the core (740). The computer-readable medium may include one or more memory devices or chips, according to particular needs. The software may cause the core (740), and in particular the processors therein (including CPUs, GPUs, FPGAs, etc.), to perform certain processes or certain portions of certain processes described herein, including defining data structures stored in RAM (746) and modifying such data structures in accordance with software-defined processes. Additionally or alternatively, the computer system may provide functionality that is logically hardwired or otherwise embodied in circuitry (e.g., accelerator (744)) that may operate in place of or in conjunction with software to perform certain processes or certain portions of certain processes described herein. Where appropriate, reference to software may include logic and vice versa. Where appropriate, reference to a computer-readable medium may include circuitry (e.g., an Integrated Circuit (IC)) storing executable software, circuitry comprising executable logic, or both. The present application includes any suitable combination of hardware and software.

While the application has described several exemplary embodiments, various modifications, arrangements, and equivalents of the embodiments are within the scope of the application. It will thus be appreciated that those skilled in the art will be able to devise various systems and methods which, although not explicitly shown or described herein, embody the principles of the application and are thus within its spirit and scope.

Claims

1. A method of receiving media content, the method comprising:

receiving a Media Presentation Description (MPD) file indicating a validity period within which a remote element resolution of the MPD file is valid;

determining whether the remote element has been parsed within a validity period indicated by the MPD file based on the media presentation within the validity period;

when it is determined that the remote element is not parsed within the validity period, parsing a remote element of the MPD file; and the number of the first and second groups,

outputting, each time a remote element of the MPD file is referenced during the validity period, media content corresponding to the same parsed remote element during the validity period.

2. The method of claim 1, wherein the MPD file comprises a first parameter and a second parameter, wherein the first parameter is used for indicating a start time of the validity period, and wherein the second parameter is used for indicating a duration of the validity period.

3. The method of claim 2, wherein the first parameter is an offset between a start time of the validity period and a start time of the media content, wherein the media content corresponds to the parsed remote element.

4. The method of claim 2, wherein one of a period element, a basic attribute descriptor, and a supplemental attribute descriptor included in the MPD file comprises the first parameter and the second parameter.

5. The method of any of claims 1-4, wherein the determining further comprises:

determining whether the remote element has been parsed within the validity period based on a hyperlink in an MPD file corresponding to the remote element.

6. The method of claim 5, wherein the hyperlinks are extensible markup language linking language (XLink) links.

7. The method according to any one of claims 1 to 4, wherein a time axis of the validity period is a time axis of the MPD file.

8. An apparatus for receiving media content, comprising:

a receiving module to receive a Media Presentation Description (MPD) file indicating a validity period during which a remote element parsing of the MPD file is valid;

a determining module to determine whether the remote element has been parsed within a validity period indicated by the MPD file based on a media presentation within the validity period;

a parsing module, configured to parse a remote element of the MPD file when it is determined that the remote element is not parsed within the validity period; and the number of the first and second groups,

an output module, during the validity period, for outputting media content corresponding to a same parsed remote element during the validity period each time a remote element of the MPD file is referenced.

9. The apparatus of claim 8, wherein the MPD file comprises a first parameter and a second parameter, wherein the first parameter is configured to indicate a start time of the validity period, and wherein the second parameter is configured to indicate a duration of the validity period.

10. The apparatus of claim 9, wherein the first parameter is an offset between a start time of the validity period and a start time of the media content, wherein the media content corresponds to the parsed remote element.

11. The apparatus of claim 9, wherein one of a period element, a basic attribute descriptor, and a supplemental attribute descriptor included in the MPD file comprises the first parameter and the second parameter.

12. The apparatus of any of claims 8-11, wherein the determining module is further configured to:

13. The apparatus according to any one of claims 8 to 11, wherein a time axis of the validity period is a time axis of the MPD file.

14. A computer device comprising at least one processor and at least one memory having at least one instruction stored therein, the at least one instruction being loaded and executed by the at least one processor to implement the method of any one of claims 1 to 7.

15. A non-transitory computer-readable medium having stored therein instructions, which when executed by a processor, cause the processor to perform the method of any one of claims 1 to 7.