WO2017213234A1

WO2017213234A1 - Systems and methods for signaling of information associated with a visual language presentation

Info

Publication number: WO2017213234A1
Application number: PCT/JP2017/021362
Authority: WO
Inventors: Kiran Mukesh MISRA; Sachin G. Deshpande
Original assignee: Sharp Kabushiki Kaisha
Priority date: 2016-06-10
Filing date: 2017-06-08
Publication date: 2017-12-14
Also published as: TW201743622A

Abstract

A device may be configured to signal a syntax element indicating an visual language presentation is associated with a service and signal one or more syntax elements identifying information associated with the visual language presentation.

Description

SYSTEMS AND METHODS FOR SIGNALING OF INFORMATION ASSOCIATED WITH A VISUAL LANGUAGE PRESENTATION

The present disclosure relates to the field of interactive television.

Digital media playback capabilities may be incorporated into a wide range of devices, including digital televisions, including so-called “smart” televisions, set-top boxes, laptop or desktop computers, tablet computers, digital recording devices, digital media players, video gaming devices, cellular phones, including so-called “smart” phones, dedicated video streaming devices, and the like. Digital media content (e.g., video and audio programming) may originate from a plurality of sources including, for example, over-the-air television providers, satellite television providers, cable television providers, online media service providers, including, so-called streaming service providers, and the like. Digital media content may be delivered over packet-switched networks, including bidirectional networks, such as Internet Protocol (IP) networks and unidirectional networks, such as digital broadcast networks.

Digital media content may be transmitted from a source to a receiver device (e.g., a digital television or a smart phone) according to a transmission standard. Examples of transmission standards include Digital Video Broadcasting (DVB) standards, Integrated Services Digital Broadcasting Standards (ISDB) standards, and standards developed by the Advanced Television Systems Committee (ATSC), including, for example, the ATSC 2.0 standard. The ATSC is currently developing the so-called ATSC 3.0 suite of standards. The ATSC 3.0 suite of standards seek to support a wide range of diverse services through diverse delivery mechanisms. For example, the ATSC 3.0 suite of standards seeks to support broadcast multimedia delivery, so-called broadcast streaming/file download multimedia delivery, so-called broadband streaming/file download multimedia delivery, and combinations thereof (i.e., “hybrid services”). An example of a hybrid service contemplated for the ATSC 3.0 suite of standards includes a receiver device receiving an over-the-air video broadcast (e.g., through a unidirectional transport) and receiving a synchronized secondary audio presentation (e.g., a secondary language) from an online media service provider through a packet network (i.e., through a bidirectional transport.

One embodiment of the present invention discloses a method for signaling information associated with a visual language presentation, the method comprising:
receiving a visual language presentation;
signaling a syntax element indicating a type of unit used to indicate an origin for overlaying the visual language presentation with respect to a main video presentation; and
signaling one or more syntax elements providing values for the origin based on the indicated type of unit.

Another embodiment of the present invention discloses a device for rendering a visual language presentation, the device comprising a non-transitory computer readable medium and one or more processors configured to:
receive a main video presentation;
receive a visual language presentation;
parse a syntax element indicating a type of unit used to indicate an origin for overlaying the visual language presentation with respect to the main video presentation;
parse one or more syntax elements providing values for the origin based on the indicated type of unit; and
render a presentation including the visual language presentation overlaid on the main video presentation based on the one or more syntax elements providing values for the origin.

Another embodiment of the present invention discloses a method for rendering a visual language presentation, the method comprising:
receiving a main video asset;
receiving a closed signing video asset;
parsing a syntax element indicating one of a plurality of types of units used to indicate an origin for overlaying the closed signing video asset with respect to a main video asset;
parsing a syntax element indicating the distance of the origin from the left edge of the main video asset;
parsing a syntax element indicating the distance of the origin from the top edge of the main video asset; and
determining the origin location based on the indicated type of units, the indicated distance of the origin from the left edge of the main video asset, and the indicated distance of the origin from the top edge of the main video asset; and
rendering a presentation including the closed signing video asset overlaid on the main video asset based on the determined origin.

FIG. 1 is a conceptual diagram illustrating an example of a content delivery protocol model according to one or more techniques of this disclosure. FIG. 2 is a conceptual diagram illustrating an example of respective delivery mechanisms of a media service according to one or more techniques of this disclosure. FIG. 3 is a block diagram illustrating an example of a system that may implement one or more techniques of this disclosure. FIG. 4 is a block diagram illustrating an example of a service distribution engine that may implement one or more techniques of this disclosure. FIG. 5A is a block diagram illustrating an example of a component encapsulator that may implement one or more techniques of this disclosure. FIG. 5B is a block diagram illustrating an example of a component encapsulator that may implement one or more techniques of this disclosure. FIG. 6 is a block diagram illustrating an example of a receiver device that may implement one or more techniques of this disclosure.

In general, this disclosure describes techniques for signaling (or signalling) information associated with a visual language presentation associated with a service. A visual language presentation associated with a service may include a visual language presentation corresponding to audio content associated with a service. For example, a visual language presentation may include a video or images of an interpreter performing sign language corresponding to a dialogue component of a television program. It should be noted that audio content may be included as part of an audio-visual service (e.g., television programming) or in some examples may be included as a dedicated audio service (e.g., radio programming). It should be noted that although in some examples the techniques of this disclosure are described with respect to ATSC standards, the techniques described herein are generally applicable to any transmission standard. For example, the techniques described herein are generally applicable to any of DVB standards, ISDB standards, ATSC Standards, Digital Terrestrial Multimedia Broadcast (DTMB) standards, Digital Multimedia Broadcast (DMB) standards, Hybrid Broadcast and Broadband Television (HbbTV) standards, World Wide Web Consortium (W3C) standards, Universal Plug and Play (UPnP) standards, and other video encoding standards. Further, it should be noted that incorporation by reference of documents herein is for descriptive purposes and should not be constructed to limit and/or create ambiguity with respect to terms used herein. For example, in the case where one incorporated reference provides a different definition of a term than another incorporated reference and/or as the term is used herein, the term should be interpreted in a manner that broadly includes each respective definition and/or in a manner that includes each of the particular definitions in the alternative.

According to one example of the disclosure, a method for signaling information associated with a visual language presentation comprises receiving a visual language presentation and signaling one or more syntax elements indicating overlay information for the visual language presentation.

According to another example of the disclosure, a device for signaling information associated with a visual language presentation comprises one or more processors configured to receive a visual language presentation and signal one or more syntax elements indicating overlay information for the visual language presentation.

According to another example of the disclosure, an apparatus signaling information associated with a visual language presentation comprises means for receiving a visual language presentation and means for signaling one or more syntax elements indicating overlay information for the visual language presentation.

According to another example of the disclosure, a non-transitory computer-readable storage medium comprises instructions stored thereon that upon execution cause one or more processors of a device to receive a visual language presentation and signal one or more syntax elements indicating overlay information for the visual language presentation.

According to one example of the disclosure, a method for parsing information associated with a visual language presentation comprises parsing one or more syntax elements indicating overlay information for the visual language presentation.

According to another example of the disclosure, a device for parsing information associated with a visual language presentation comprises one or more processors configured to parse one or more syntax elements indicating overlay information for the visual language presentation.

According to another example of the disclosure, an apparatus for parsing information associated with a visual language presentation comprises means for parsing one or more syntax elements indicating overlay information for the visual language presentation.

According to another example of the disclosure, a non-transitory computer-readable storage medium comprises instructions stored thereon that upon execution cause one or more processors of a device to parse one or more syntax elements indicating overlay information for the visual language presentation.

The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.

Computing devices and/or transmission systems may be based on models including one or more abstraction layers, where data at each abstraction layer is represented according to particular structures, e.g., packet structures, modulation schemes, etc. An example of a model including defined abstraction layers is the so-called Open Systems Interconnection (OSI) model illustrated in FIG. 1. The OSI model defines a 7-layer stack model, including an application layer, a presentation layer, a session layer, a transport layer, a network layer, a data link layer, and a physical layer. It should be noted that the use of the terms upper and lower with respect to describing the layers in a stack model may be based on the application layer being the uppermost layer and the physical layer being the lowermost layer. Further, in some cases, the term “Layer 1” or “L1” may be used to refer to a physical layer, the term “Layer 2” or “L2” may be used to refer to a link layer, and the term “Layer 3” or “L3” or “IP layer” may be used to refer to the network layer.

A physical layer may generally refer to a layer at which electrical signals form digital data. For example, a physical layer may refer to a layer that defines how modulated radio frequency (RF) symbols form a frame of digital data. A data link layer, which may also be referred to as a link layer, may refer to an abstraction used prior to physical layer processing at a sending side and after physical layer reception at a receiving side. As used herein, a link layer may refer to an abstraction used to transport data from a network layer to a physical layer at a sending side and used to transport data from a physical layer to a network layer at a receiving side. It should be noted that a sending side and a receiving side are logical roles and a single device may operate as both a sending side in one instance and as a receiving side in another instance. A link layer may abstract various types of data (e.g., video, audio, or application files) encapsulated in particular packet types (e.g., Motion Picture Expert Group - Transport Stream (MPEG-TS) packets, Internet Protocol Version 4 (IPv4) packets, etc.) into a single generic format for processing by a physical layer. A network layer may generally refer to a layer at which logical addressing occurs. That is, a network layer may generally provide addressing information (e.g., Internet Protocol (IP) addresses) such that data packets can be delivered to a particular node (e.g., a computing device) within a network. As used herein, the term network layer may refer to a layer above a link layer and/or a layer having data in a structure such that it may be received for link layer processing. Each of a transport layer, a session layer, a presentation layer, and an application layer may define how data is delivered for use by a user application.

Transmission standards, including transmission standards currently under development, may include a content delivery protocol model specifying supported protocols for each layer and may further define one or more specific layer implementations. Referring again to FIG. 1, an example content delivery protocol model is illustrated. In the example illustrated in FIG. 1, content delivery protocol model 100 is “aligned” with the 7-layer OSI model for illustration purposes. It should be noted that such an illustration should not be construed to limit implementations of the content delivery protocol model 100 or the techniques described herein. Content delivery protocol model 100 may generally correspond to the currently proposed content delivery protocol model for the ATSC 3.0 suite of standards. Further, the techniques described herein may be implemented in a system configured to operate based on content delivery protocol model 100.

The ATSC 3.0 suite of standards includes ATSC Standard A/321, System Discovery and Signaling Doc. A/321:2016, 23 March 2016 (hereinafter “A/321”), which is incorporated by reference herein in its entirety. A/321 describes the initial entry point of a physical layer waveform of an ATSC 3.0 unidirectional physical layer implementation. Further, aspects of the ATSC 3.0 suite of standards currently under development are described in Candidate Standards, revisions thereto, and Working Drafts (WD), each of which may include proposed aspects for inclusion in a published (i.e., “final” or “adopted”) version of an ATSC 3.0 standard. For example, ATSC Candidate Standard: Physical Layer Protocol, Doc. S32-230r46, 6 May 2016, which is incorporated by reference in its entirety, describes specific aspects of an approved ATSC 3.0 unidirectional physical layer implementation. The approved ATSC 3.0 unidirectional physical layer includes a physical layer frame structure including a defined bootstrap, preamble, and data payload structure including one or more physical layer pipes (PLPs). A PLP may generally refer to a logical structure within an RF channel or a portion of an RF channel. That is, a PLP may include a portion of an RF channel having particular modulation and coding parameters. The proposed ATSC 3.0 unidirectional physical layer provides that a single RF channel can contain one or more PLPs and each PLP may carry one or more services. In one example, multiple PLPs may carry a single service.

In the proposed ATSC 3.0 suite of standards, the term service may be used to refer to a collection of media components presented to the user in aggregate (e.g., a video component, an audio component, and a sub-title component), where components may be of multiple media types, where a service can be either continuous or intermittent, where a service can be a real time service (e.g., multimedia presentation corresponding to a live event) or a non-real time service (e.g., a video on demand service, an electronic service guide service, etc.), and where a real time service may include a sequence of television programs. In some cases, it may be desirable for a service provider (e.g., a television broadcaster) to provide a visual language presentation. For example, for hearing impaired individuals in the United States, American Sign Language (ASL) may be their primary language. In some cases, the visual language presentation may correspond to an audio component of a service. Individuals with hearing impairments may prefer a visual language presentation of an interpreter performing ASL (or another type of sign language) as opposed to closed captions (e.g., sub-titles), as the visual language presentation may be more expressive of the audio content and may provide a better viewer experience. It should be noted that a visual language presentation may include and/or be referred to as a closed signing video, which may be a component of a service and/or a separate service. In one example, a receiver device may receive a visual language presentation and enable a user to cause the visual language presentation to be overlaid over a main video presentation (e.g., as a so-called Picture in Picture (PIP) presentation). In another example, a receiver device may receive a visual language presentation and enable a user to cause the visual language presentation to be presented along-side a main video presentation.

In one example, a service provider may receive or generate a video component including a visual language presentation. A telecommunications standard may define one or more authoring requirements for a visual language presentation. For example, with respect to ATSC 3.0, it has been proposed that a visual language presentation include a resolution corresponding to standard definition (SD) or sub-SD quality, and an aspect ratio corresponding to the aspect ratio of the main video presentation. Further, a visual language presentation may include a dynamic range and color gamut corresponding to the main video presentation (e.g., a High Dynamic Range (HDR) and a Wide Color Gamut (WCG)). Further, with respect to the ATSC 3.0 suite of standards, it has been proposed that an author of a visual language presentation provide size and position metadata that indicates the author’s intended region of the main video presentation where the visual language presentation is to be overlaid (this may be referred to as an overlay position and/or intended placement). For example, an author of a visual language presentation may determine an overlay position that provides minimal obstruction with respect to a particular main video presentation (e.g., a bottom left versus a bottom right overlay position).

Further, a telecommunications standard may define signaling information that a service provider may optionally and/or be required to include in a transmission of a visual language presentation to receiver devices. With respect to the ATSC 3.0 suite of standards, it has been proposed that a service provider may distribute a visual language presentation as a video component that may be carried over-the-air in the same or a separate PLP as a main video presentation component, where the visual language presentation may be synchronized with the main video presentation (e.g., signing is synchronized with the dialogue of a television program). It has been proposed that a visual language presentation may be a video component carried over-the-top (e.g., using broadband or cellular networks). Further, with respect to the ATSC 3.0 suite of standards, it has been proposed that a service provider provide signaling to indicate the presence of one or more visual language presentations, the size and position of the overlay position of the visual language presentation, and a language of visual language presentation. In one example, a receiver device receiving a visual language presentation may place a visual language presentation of the size and position as indicated by the author and signaled by the service provider.

Further, it should be noted that in some examples services may include application based features. Application based features may include service components including an application, optional files to be used by the application, and optional notifications directing the application to take particular actions at particular times. In one example, an application may be a collection of documents constituting an enhanced or interactive service. The documents of an application may include Hypertext Markup Language (HTML), Dynamic HTML, eXtensible Markup Language (XML), JavaScript, JavaScript Object Notation (JSON), Cascading Style Sheets (CSS), and/or multimedia files. It should be noted that the proposed ATSC 3.0 suite of standards specifies that new types of services may be defined in future versions. Thus, as used herein the term service may refer to a service described with respect to the proposed ATSC 3.0 suite of standards and/or other types of digital media services. Current the proposed ATSC 3.0 suite of standards do not specify signaling for information associated with a visual language presentation.

Referring to FIG. 1, content delivery protocol model 100 supports streaming and/or file download through the ATSC Broadcast Physical layer using MPEG Media Transport Protocol (MMTP) over User Datagram Protocol (UDP) and Internet Protocol (IP) and Real-time Object delivery over Unidirectional Transport (ROUTE) over UDP and IP. MMTP is described in ISO/IEC: ISO/IEC 23008-1, “Information technology-High efficiency coding and media delivery in heterogeneous environments-Part 1: MPEG media transport (MMT),” which is incorporated by reference herein in its entirety. An overview of ROUTE is provided in ATSC Candidate Standard: Signaling, Delivery, Synchronization, and Error Protection (A/331) Doc. S33-1-500r5, 14 January 2016, Rev. 7, 1 June 2016 (hereinafter “A/331”), which is incorporated by reference in its entirety. It should be noted that although ATSC 3.0 uses the term broadcast to refer to a unidirectional over-the-air transmission physical layer, the so-called ATSC 3.0 broadcast physical layer supports video delivery through streaming or file download. As such, the term broadcast as used herein should not be used to limit the manner in which video and associated data may be transported according to one or more techniques of this disclosure.

In the case where MMTP is used for streaming and/or file download through the ATSC Broadcast Physical layer, service component data (e.g., video data, audio data, closed caption data, closed signing data, etc.) may be encapsulated in a Media Processing Unit (MPU). MMTP defines a MPU as “a media data item that may be processed by an MMT entity and consumed by the presentation engine independently from other MPUs.” A logical grouping of MPUs may form an MMT asset, where MMTP defines an asset as “any multimedia data to be used for building a multimedia presentation. An asset is a logical grouping of MPUs that share the same asset identifier for carrying encoded media data.” For example, for a video component, MPUs may include groups of pictures (GOPs) that are independently decodable and an asset may include several MPUs forming a video sequence. One or more assets may form a MMT package, where a MMT package is a logical collection of media content. For example, an MMT package may include an asset corresponding to a main video component, an asset corresponding to an audio component, and an asset corresponding to a visual language presentation. A/331 provides that a single MMT package can be delivered over one or more MMTP sessions, where each MMTP session can be identified by a destination IP address and a destination UDP port number. Further, A/331 provides that multiple MMT packages can be delivered by a single MMTP session. A/331 provides that each PLP can carry one or more MMTP sessions. In addition, A/331 provides that one MMTP session can be carried by more than one PLP.

In the case where ROUTE is used for streaming and/or file download through the ATSC Broadcast Physical layer, service component data (e.g., video data, audio data, closed caption data, visual language presentation data, etc.) may be encapsulated in a Dynamic Adaptive Streaming over Hypertext Transport Protocol (HTTP) (DASH) Media Presentation (i.e., ROUTE/DASH). Further, service component data may be associated with one or more segments carried over Layer Coding Transport (LCT) channels. For media delivery, an LCT channel may carry as a whole, or in part, a media component and a ROUTE session may be considered as the multiplex of LCT channels that carry constituent media components of one or more media presentations. That is, each ROUTE session may include one or more LCT channels, where LCT channels are subsets of a ROUTE session. Further, A/331 provides that one or more LCT channels may be included in a PLP and as such, a ROUTE session may be carried by one or more PLPs. Further, similar to a MMTP session, A/331 provides that a ROUTE session may be identified by a destination IP address and a destination UDP port number. It should be noted that a ROUTE session may further be identified by a source IP address.

FIG. 2 is a conceptual diagram illustrating respective delivery mechanisms of a service as an MMT package and a service as DASH media presentation. As illustrated in FIG. 2, for each respective delivery mechanism corresponding service layer signaling occurs. In general, service layer (or level) signaling (SLS) may include information that enables a receiver device to discover and/or access user services and their content components. A/331 provides specific data structures that may be included as part of a service layer signaling. That is, A/331 defines a set of message formats to be used to communicate signaling information necessary for the delivery and consumption of services by a receiver device. Referring to FIG. 2, for service layer signaling with respect to a MMTP delivery mechanism, A/331 service layer signaling includes a User Service Bundle Descriptor (USBD) and MMT specific signaling messages. For the sake of brevity, the format of the USBD for MMT is not described herein, however, reference is made to A/331. It should be noted that in one example, receiver devices may be expected to disregard reserved values, and unrecognized or unsupported descriptors, XML attributes and elements. In one example, reserved fields in syntax are reserved for future use and receiving devices conforming to the defined specification are expected to disregard reserved fields.

In addition to including one or more assets, a MMT package includes presentation information (PI) and asset delivery characteristics (ADC). Presentation information includes documents (PI documents). A PI document may be delivered as one or more signalling messages. Asset delivery characteristics describe the quality of service (QoS) requirements and statistics of assets for delivery. PIs and ADCs may be associated with one or more assets and MPUs encapsulated therein. MMT specifies a signaling function that defines a set of message formats for signaling messages. MMT specifies message formats for carrying signaling tables, descriptors or delivery related information. Table 1 provides the syntax of the general format of MMT signaling messages. It should be noted that in Table 1, and other tables included in this description, uimsbf refers to an unsigned integer most significant bit first data type and bslbf refers to a bit string, left bit first data type.

MMT provides the following definitions for syntax elements message_id, version, length, extension, message_payload:

As illustrated in Table 1, a message may be identified using a message identifier value. In MMT, message identifier values of 0x8000 to 0xFFFF may be reserved for private use. A/331 defines a MMT signaling message (e.g., mmt_atsc3_message()), where a MMT signaling message is defined to deliver information specific to ATSC 3.0 services. A MMT signaling message may be identified using a MMT message identifier value reserved for private use (e.g., a value of 0x8000 to 0xFFFF). Table 2 provides example syntax for a MMT signaling message mmt_atsc3_message().

A/331 provides the following definitions for syntax elements message_id, version, length, service_id, atsc3_message_content_type, atsc3_message_content_version, atsc3_message_content_compression, URI_length, URI_byte, atsc3_message_content_length, atsc3_message_content_byte, and reserved:

It should be noted that with respect to Table 2 and Table 3 that A/331 does not currently define a closed signaling descriptor message type. With respect to Table 3, atsc3_message_content_type may be expanded to include a closed signaling descriptor message type, according to one or more of the techniques described herein. For example, any one of reserved values 0x0009~0xFFFF may correspond to an example closed signaling descriptor message described herein.

Referring again to FIG. 2, for service layer signaling with respect to a ROUTE/DASH delivery mechanism, A/331 service layer signaling includes a Service-based Transport Session Instance Description (S-TSID), a User Service Bundle Descriptor (USBD), and a Media Presentation Document (MPD). Each of a S-TSID, a USBD, and a MPD may include fragments that describe service layer properties. A fragment may include a set of XML-encoded metadata fragments. In one example, the metadata fragments may be carried over a dedicated LCT channel. In A/331, the USBD fragment includes service identification, device capabilities information, references to other SLS fragments required to access the service and constituent media components, and metadata to enable the receiver to determine the transport mode (e.g., broadcast and/or broadband) of service components. In A/331, the USBD also includes a reference to an MPD fragment that contains descriptions for content components of the ATSC 3.0 Service delivered over broadcast and/or broadband. In A/331, the USBD also includes a reference to the S-TSID fragment which provides access related parameters to the transport sessions carrying contents of this ATSC 3.0 Service. In A/331, the S-TSID fragment, referenced by the USBD, provides transport session descriptions for the one or more ROUTE sessions in which the media content components of a service are delivered, and descriptions of the delivery objects carried in those LCT channels. For the sake of brevity, details of the format of the S-TSID and the USBD fragments are not described herein, however, reference is made to A/331.

In A/331, the MPD is a SLS metadata fragment that includes a formalized description of a DASH-IF (DASH Interoperability Forum) profile of a DASH Media Presentation. A DASH Media Presentation may correspond to a linear service or part of a linear service of a given duration defined by a service provider (e.g., a single TV program, or the set of contiguous linear TV programs over a period of time). The contents of the MPD provide the resource identifiers for segments and the context for the identified resources within the Media Presentation. In A/331, the data structure and semantics of the MPD fragment are described with respect to the Media Presentation Description as defined by the DASH-IF profile of MPEG DASH ISO/IEC: ISO/IEC 23009-1:2014, “Information technology - Dynamic adaptive streaming over HTTP (DASH) - Part 1: Media presentation description and segment formats,” International Organization for Standardization, 2nd Edition, 5/15/2014 (hereinafter, “ISO/IEC 23009-1:2014”), which is incorporated by reference herein. It should be noted that draft third editions of ISO/IEC 23009-1 are currently being proposed. Thus, a MPD may include a MPD as described in “ISO/IEC 23009-1:2014,” currently proposed MPDs, and/or combinations thereof. A Media Presentation as described in a MPD may include a sequence of one or more Periods, where each Period may include one or more Adaptation Sets. It should be noted that in the case where an Adaptation Set includes multiple media content components, then each media content component may be described individually. Each Adaptation Set may include one or more Representations. The properties of each media content component may be described by an AdaptationSet element and/or elements within an Adaption Set, including for example, a ContentComponent element.

FIG. 3 is a block diagram illustrating an example of a system that may implement one or more techniques described in this disclosure. System 300 may be configured to communicate data in accordance with the techniques described herein. In the example illustrated in FIG. 3, system 300 includes one or more receiver devices 302A-302N, television service network 304, television service provider site 306, wide area network 312, one or more content provider sites 314A-314N, and one or more data provider sites 316A-316N. System 300 may include software modules. Software modules may be stored in a memory and executed by a processor. System 300 may include one or more processors and a plurality of internal and/or external memory devices. Examples of memory devices include file servers, file transfer protocol (FTP) servers, network attached storage (NAS) devices, local disk drives, or any other type of device or storage medium capable of storing data. Storage media may include Blu-ray discs, DVDs, CD-ROMs, magnetic disks, flash memory, or any other suitable digital storage media. When the techniques described herein are implemented partially in software, a device may store instructions for the software in a suitable, non-transitory computer-readable medium and execute the instructions in hardware using one or more processors.

System 300 represents an example of a system that may be configured to allow digital media content, such as, for example, a movie, a live sporting event, etc., and data, applications and media presentations associated therewith (e.g., visual language presentations), to be distributed to and accessed by a plurality of computing devices, such as receiver devices 302A-302N. In the example illustrated in FIG. 3, receiver devices 302A-302N may include any device configured to receive data from television service provider site 306. For example, receiver devices 302A-302N may be equipped for wired and/or wireless communications and may include televisions, including so-called smart televisions, set top boxes, and digital video recorders. Further, receiver devices 302A-302N may include desktop, laptop, or tablet computers, gaming consoles, mobile devices, including, for example, “smart” phones, cellular telephones, and personal gaming devices configured to receive data from television service provider site 306. It should be noted that although system 300 is illustrated as having distinct sites, such an illustration is for descriptive purposes and does not limit system 300 to a particular physical architecture. Functions of system 300 and sites included therein may be realized using any combination of hardware, firmware and/or software implementations.

Television service network 304 is an example of a network configured to enable digital media content, which may include television services, to be distributed. For example, television service network 304 may include public over-the-air television networks, public or subscription-based satellite television service provider networks, and public or subscription-based cable television provider networks and/or over the top or Internet service providers. It should be noted that although in some examples television service network 304 may primarily be used to enable television services to be provided, television service network 304 may also enable other types of data and services to be provided according to any combination of the telecommunication protocols described herein. Further, it should be noted that in some examples, television service network 304 may enable two-way communications between television service provider site 306 and one or more of receiver devices 302A-302N. Television service network 304 may comprise any combination of wireless and/or wired communication media. Television service network 304 may include coaxial cables, fiber optic cables, twisted pair cables, wireless transmitters and receivers, routers, switches, repeaters, base stations, or any other equipment that may be useful to facilitate communications between various devices and sites. Television service network 304 may operate according to a combination of one or more telecommunication protocols. Telecommunications protocols may include proprietary aspects and/or may include standardized telecommunication protocols. Examples of standardized telecommunications protocols include DVB standards, ATSC standards, ISDB standards, DTMB standards, DMB standards, Data Over Cable Service Interface Specification (DOCSIS) standards, HbbTV standards, W3C standards, and UPnP standards.

Referring again to FIG. 3, television service provider site 306 may be configured to distribute television service via television service network 304. For example, television service provider site 306 may include one or more broadcast stations, a cable television provider, or a satellite television provider, or an Internet-based television provider. In the example illustrated in FIG. 3, television service provider site 306 includes service distribution engine 308 and database 310. Service distribution engine 308 may be configured to receive data, including, for example, multimedia content, interactive applications, and visual language presentations, and distribute data to receiver devices 302A-302N through television service network 304. For example, service distribution engine 308 may be configured to transmit television services according to aspects of the one or more of the transmission standards described above (e.g., an ATSC standard). In one example, service distribution engine 308 may be configured to receive data through one or more sources. For example, television service provider site 306 may be configured to receive a transmission including television programming through a satellite uplink/downlink. Further, as illustrated in FIG. 3, television service provider site 306 may be in communication with wide area network 312 and may be configured to receive data from content provider sites 314A-314N and further receive data from data provider sites 316A-316N. It should be noted that in some examples, television service provider site 306 may include a television studio and content may originate therefrom.

Database 310 may include storage devices configured to store data including, for example, multimedia content and data associated therewith, including for example, descriptive data and executable interactive applications. For example, a sporting event may be associated with an interactive application that provides statistical updates. Data associated with multimedia content may be formatted according to a defined data format, such as, for example, HTML, Dynamic HTML, XML, and JSON, and may include URIs and Universal Resource Locators (URLs) enabling receiver devices 302A-302N to access data, e.g., from one of data provider sites 316A-316N. In some examples, television service provider site 306 may be configured to provide access to stored multimedia content and distribute multimedia content to one or more of receiver devices 302A-302N through television service network 304. For example, multimedia content (e.g., music, movies, and television (TV) shows) stored in database 310 may be provided to a user via television service network 304 on a so-called on demand basis.

Wide area network 312 may include a packet based network and operate according to a combination of one or more telecommunication protocols. Telecommunications protocols may include proprietary aspects and/or may include standardized telecommunication protocols. Examples of standardized telecommunications protocols include Global System Mobile Communications (GSM) standards, code division multiple access (CDMA) standards, 3^rd Generation Partnership Project (3GPP) standards, European Telecommunications Standards Institute (ETSI) standards, European standards (EN), IP standards, Wireless Application Protocol (WAP) standards, and Institute of Electrical and Electronics Engineers (IEEE) standards, such as, for example, one or more of the IEEE 802 standards (e.g., Wi-Fi). Wide area network 312 may comprise any combination of wireless and/or wired communication media. Wide area network 312 may include coaxial cables, fiber optic cables, twisted pair cables, Ethernet cables, wireless transmitters and receivers, routers, switches, repeaters, base stations, or any other equipment that may be useful to facilitate communications between various devices and sites. In one example, wide area network 316 may include the Internet.

Referring again to FIG. 3, content provider sites 314A-314N represent examples of sites that may provide multimedia content to television service provider site 306 and/or receiver devices 302A-302N. For example, a content provider site may include a studio having one or more studio content servers configured to provide multimedia files and/or streams to television service provider site 306. In one example, content provider sites 314A-314N may be configured to provide multimedia content using the IP suite. For example, a content provider site may be configured to provide multimedia content to a receiver device according to Real Time Streaming Protocol (RTSP), HTTP, or the like.

Data provider sites 316A-316N may be configured to provide data, including hypertext based content, and the like, to one or more of receiver devices 302A-302N and/or television service provider site 306 through wide area network 312. A data provider site 316A-316N may include one or more web servers. Data provided by data provider site 316A-316N may be defined according to data formats, such as, for example, HTML, Dynamic HTML, XML, and JSON. An example of a data provider site includes the United States Patent and Trademark Office website. It should be noted that in some examples, data provided by data provider sites 316A-316N may be utilized for so-called second screen applications. For example, companion device(s) in communication with a receiver device may display a website in conjunction with television programming being presented on the receiver device. It should be noted that data provided by data provider sites 316A-316N may include audio and video content.

As described above, service distribution engine 308 may be configured to receive data, including, for example, multimedia content, interactive applications, and visual language presentations, and distribute data to receiver devices 302A-302N through television service network 304. FIG. 4 is a block diagram illustrating an example of a service distribution engine that may implement one or more techniques of this disclosure. Service distribution engine 400 may be configured to receive data and output a signal representing that data for distribution over a communication network, e.g., television service network 304. For example, service distribution engine 400 may be configured to receive one or more data streams and output a signal that may be transmitted using a single radio frequency band (e.g., a 6 MHz channel, an 8 MHz channel, etc.) or a bonded channel (e.g., two separate 6 MHz channels). A data stream may generally refer to data encapsulated in a set of one or more data packets.

As illustrated in FIG. 4, service distribution engine 400 includes component encapsulator 402, transport/network packet generator 404, link layer packet generator 406, frame builder and waveform generator 408, and system memory 410. Each of component encapsulator 402, transport/network packet generator 404, link layer packet generator 406, frame builder and waveform generator 408, and system memory 410 may be interconnected (physically, communicatively, and/or operatively) for inter-component communications and may be implemented as any of a variety of suitable circuitry, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware or any combinations thereof. It should be noted that although service distribution engine 400 is illustrated as having distinct functional blocks, such an illustration is for descriptive purposes and does not limit service distribution engine 400 to a particular hardware architecture. Functions of service distribution engine 400 may be realized using any combination of hardware, firmware and/or software implementations.

System memory 410 may be described as a non-transitory or tangible computer-readable storage medium. In some examples, system memory 410 may provide temporary and/or long-term storage. In some examples, system memory 410 or portions thereof may be described as non-volatile memory and in other examples portions of system memory 410 may be described as volatile memory. Examples of volatile memories include random access memories (RAM), dynamic random access memories (DRAM), and static random access memories (SRAM). Examples of non-volatile memories include magnetic hard discs, optical discs, floppy discs, flash memories, or forms of electrically programmable memories (EPROM) or electrically erasable and programmable (EEPROM) memories. System memory 410 may be configured to store information that may be used by service distribution engine 400 during operation. It should be noted that system memory 410 may include individual memory elements included within each of component encapsulator 402, transport/network packet generator 404, link layer packet generator 406, and frame builder and waveform generator 408. For example, system memory 410 may include one or more buffers (e.g., First-in First-out (FIFO) buffers) configured to store data for processing by a component of service distribution engine 400.

Component encapsulator 402 may be configured to receive one or more components of a service and encapsulate the one or more components according to a defined data structure. For example, component encapsulator 402 may be configured to receive one or more media components, including a visual language presentation including a closed signing video component, and generate a package based on MMTP. Further, component encapsulator 402 may be configured to receive one or more media components and generate media presentation based on DASH. It should be noted that in some examples, component encapsulator 402 may be configured to generate service layer signaling data. Transport/network packet generator 404 may be configured to receive a transport package and encapsulate the transport package into corresponding transport layer packets (e.g., UDP, Transport Control Protocol (TCP), etc.) and network layer packets (e.g., Ipv4, Ipv6, compressed IP packets, etc.). Link layer packet generator 406 may be configured to receive network packets and generate packets according to a defined link layer packet structure (e.g., an ATSC 3.0 link layer packet structure).

Frame builder and waveform generator 408 may be configured to receive one or more link layer packets and output symbols (e.g., OFDM symbols) arranged in a frame structure. As described above, a frame may include one or more PLPs may be referred to as a physical layer frame (PHY-Layer frame). In one example, a frame structure may include a bootstrap, a preamble, and a data payload including one or more PLPs. A bootstrap may act as a universal entry point for a waveform. A preamble may include so-called Layer-1 signaling (L1-signaling). L1-signaling may provide the necessary information to configure physical layer parameters. Frame builder and waveform generator 408 may be configured to produce a signal for transmission within one or more of types of RF channels: a single 6 MHz channel, a single 7 MHz channel, single 8 MHz channel, a single 11 MHz channel, and bonded channels including any two or more separate single channels (e.g., a 14 MHz channel including a 6 MHz channel and a 8 MHz channel). Frame builder and waveform generator 408 may be configured to insert pilots and reserved tones for channel estimation and/or synchronization. In one example, pilots and reserved tones may be defined according to an OFDM symbol and sub-carrier frequency map. Frame builder and waveform generator 408 may be configured to generate an OFDM waveform by mapping OFDM symbols to sub-carriers. It should be noted that in some examples, frame builder and waveform generator 408 may be configured to support layer division multiplexing. Layer division multiplexing may refer to super-imposing multiple layers of data on the same RF channel (e.g., a 6 MHz channel). Typically, an upper layer refers to a core (e.g., more robust) layer supporting a primary service and a lower layer refers to a high data rate layer supporting enhanced services. For example, an upper layer could support basic High Definition video content and a lower layer could support enhanced Ultra-High Definition video content.

As described above, component encapsulator 402 may be configured to receive one or more media components, including a closed signing video component, and generate a package based on MMTP. FIG. 5A is a block diagram illustrating an example of a component encapsulator that may implement one or more techniques of this disclosure. Component encapsulator 500 may be configured to generate a package according to the techniques described herein. In the example illustrated in FIG. 5A, functional blocks of component encapsulator 500 correspond to functional blocks for generating a package (e.g., an MMT Package). As illustrated in FIG. 5A, component encapsulator 500 includes presentation information generator 502, asset generator 504, and asset delivery characteristic generator 506. Each of presentation information generator 502, asset generator 504, and asset delivery characteristic generator 506 may be interconnected (physically, communicatively, and/or operatively) for inter-component communications and may be implemented as any of a variety of suitable circuitry, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware or any combinations thereof. It should be noted that although component encapsulator 500 is illustrated as having distinct functional blocks, such an illustration is for descriptive purposes and does not limit component encapsulator 500 to a particular hardware architecture. Functions of component encapsulator 500 may be realized using any combination of hardware, firmware and/or software implementations.

Asset generator 504 may be configured to receive media components and generate one or more assets for inclusion in a package. Asset delivery characteristic generator 506 may be configured to receive information regarding assets to be included in a package and provide QoS requirements. Presentation information generator 502 may be configured to generate presentation information documents. As described above, a MMT package includes presentation information (PI) and asset delivery characteristics (ADC) and a PI document may be delivered as one or more signalling messages. Thus, presentation information generator 502 may be configured to generate signalling messages according to the techniques described herein. It should be noted that in some examples, a service distribution engine (e.g., service distribution engine 308 or service distribution engine 400) or specific components thereof may be configured to generate signalling messages according to the techniques described herein. As such, the description of signalling messages with respect to presentation information generator 502 should not be construed to limit the techniques described herein.

As described above, with respect to Table 2 and Table 3, A/331 does not currently define a closed signing descriptor message. Table 4 provides an example of syntax that may be used to signal information associated with a visual language presentation according to one or more techniques of this disclosure.

With respect to the example syntax illustrated in Table 4, syntax elements descriptor_tag, descriptor_length, origin_info_present, language_info_present, asset_id_length, asset_id_byte, main_video_asset_id_length, main_video_asset_id_byte, language_length, language_byte, origin_units, col_origin_percentage, col_origin_percentage_frac, row_origin_percentage, row_origin_percentage_frac, col_origin_px, row_origin_px, col_origin_luma_samples, and row_origin_luma_samples may be based on the following example definitions:

With respect to the example syntax illustrated in Table 4, closed_signing_descriptor() associates a visual language presentation video asset to a main video presentation asset. It should be noted that in some examples, for each of asset_id_length, language_length, and main_video_asset_id_lenghth minus X notation may be used, where a signaled value plus X indicates a length in bytes (e.g., X = 1, 2, etc.). Referring to Table 3 above, A/331 defines a video stream properties descriptor message, video_stream_properties_descriptor(), which may be used to signal properties of one or more video stream assets. Thus, properties of a main video presentation asset may be signaled in an instance of a video stream properties descriptor message. In one example, properties of a visual language presentation video asset may be signaled in the instance of a video stream properties descriptor message including the main presentation properties. In one example, properties of a visual language presentation video asset may be signaled in a separate video stream properties descriptor message.

Further, it should be noted that with respect to a video stream properties descriptor including information for a visual language presentation video asset, one or more syntax elements in video_stream_properties_descriptor() in A/331 may be constrained to a subset of possible values based on properties of the main video presentation. For example, A/331 provides where a video stream properties descriptor includes the following optionally signaled syntax elements pic_width_in_luma_samples and pic_height_in_luma_samples which specify the width and height of a decoded picture of video in luma samples. It should be noted that with respect to luma sample units being used to indicate overlay information, luma sample units may refer to luma samples as defined for a main video presentation. For example, a picture of a main video presentation may be defined as: an array of luma samples in monochrome format or an array of luma samples and two corresponding arrays of chroma samples in 4:2:0, 4:2:2, and 4:4:4 colour format, where luma may be defined as an adjective specifying that a sample array or single sample is representing the monochrome signal related to the primary colours.

In one example, the values of pic_width_in_luma_samples and pic_height_in_luma_samples for the visual language presentation video asset may be based on the values of pic_width_in_luma_samples and pic_height_in_luma_samples for the main video presentation. For example, it may be a requirement that the resolution of the visual language presentation is less than or equal to the resolution of the main video presentation. In a similar manner, one or more of the following example additional constraints may be imposed:

Further, in one example, the closed_signing_descriptor() in Table 4 may be modified to include (e.g., using optional signaling indicated by one or more flags) one or more syntax elements included in video_stream_properties_descriptor(). For example, the closed_signing_descriptor() in Table 4 may be modified to include syntax element codec_code included video_stream_properties_descriptor() as defined in A/331 or another syntax element indicating the codec of visual language presentation video. In one example, closed_signing_descriptor() may be modified such that properties of a visual language presentation video may be signaled by a service provider without signaling a video_stream_properties_descriptor() for the visual language presentation video. In this example, properties of a main video presentation asset may be signaled in an instance of a video stream properties descriptor message and properties of visual language presentation video asset and the relationship between the main video presentation asset and the visual language presentation video asset may be signaled in an instance of a closed signing descriptor message.

As described above, with respect to the ATSC 3.0 suite of standards, it has been proposed that a service provider be enabled to signal the presence of one or more visual language presentations, the author’s intended size and position of the overlay of the visual language presentation, and a language of visual language presentation. The example syntax illustrated in Table 4 provides an example of syntax that enables a service provider to signal the position of the overlay of a visual language presentation using origin information. In one example, origin information may relate the top-left corner of a visual language presentation to a position in a main video presentation. For example, a visual language presentation may have a resolution of 640x480 luma samples and a main video presentation may have a resolution of 3840x2160 luma samples, origin information may be used to indicate the author’s intended placement of the top-left corner of the 640x480 visual language presentation within the 3840x2160 main video presentation, (i.e., the overlay position). It should be noted that in other examples, the origin may relate other points (e.g., top-right corner, bottom-left corner, center, etc.) of the visual language presentation video component to the main video presentation.

It should be noted that in some examples, a closed signing descriptor message may include a syntax element indicating the size (e.g., in luma samples) of a visual language presentation. In one example, the size may be signaled by signaling integer values (e.g., number of rows and columns). In one example, available sizes (e.g., 640x480, 640x360, 320x240, 384x288, 854x480, 768x576, etc.) may be defined and indicated by values of a size syntax element (e.g., an 8-bit syntax element). In one example, where a visual language presentation is constrained to having a size of one of: 640x480 or 384x288 a 1-bit size syntax element may indicate a size of 640x480 or 384x288.

As illustrated in the example illustrated in Table 4, a service provider may signal origin information using three distinct types of units, i.e., px units, percentage units, or luma sample units. It should be noted that in other examples, a closed signing descriptor message may include fewer or more types of units for signaling origin information. For example, in one example, a closed signing descriptor message may not include px units. It should be noted that in this case, syntax element origin_units may be a 1-bit flag indicating one of percentage units or luma sample units (or two other types of units, in some examples). Further, in the case where a closed signing descriptor message only includes one type of unit, syntax element origin_units may not be included in a closed signing descriptor message. For example, in the case where a visual language presentation has a resolution of 640x480 luma samples and a main video presentation has a resolution of 3840x2160 luma samples and the author’s intended placement of the 640x480 visual language presentation within the 3840x2160 main video presentation is 32 luma samples from right edge of the main video presentation and 32 luma samples from bottom edge of main video presentation (e.g., a bottom-right placement), each of px units, percentage units, or luma sample units may be used to signal the intended placement.

For the example described above, in the case where a service provider signals the intended placement using luma sample units, the signaling may be derived as follows:

It should be noted that in other examples, for columns indexing may start at zero for the rightmost column (i.e., column count is to the left) and/or for rows indexing may start at zero for the bottommost row (i.e., row count is upward).

In the case where a service provider signals the intended placement using px units, px may correspond to a standard length. In one example, a px unit may be equal to 1/96^th of an inch, as provided in CSS Values and Units Module Level, W3C Candidate Recommendation, 11 June 2015, which is incorporated by reference herein. In this example, for the case described above each of col_origin_px and row_origin_px would correspond to the (3168, 1648) luma sample location. In the simplest case, if the main video is 3840px x 2160px for a display device then px location of origin would be equal to luma sample location origin, i.e., the origin would be (3168px, 1648px).

For the example described above, in the case where a service provider signals the intended placement using percentage units, the signaling may be derived as follows:

In this example, a receiver device operating in the luma sample domain, may derive col_origin_luma_samples and row_origin_luma_samples from col_origin_percentage, row_origin_percentage, col_origin_percentage_frac, and row_origin_percentage_frac values as follows:

In this manner, a receiver device may be configured to parse visual language presentation information and generate video presentation including a main video presentation and a visual language presentation overlaid on the main video presentation based on the parsed visual presentation information.

As described above, a service provider may receive or generate a video component including a visual language presentation. In some examples, a service provider may wish to include multiple types of visual language presentations for a main video presentation. For example, a visual language presentation may include an interpreter performing sign language corresponding to a dialogue component of a television program and a visual language presentation may include an interpreter performing sign language corresponding to a commentary component of a television program. In one example, Table 4 may include syntax element role which may be based on the following example definition:

As described above, a service provider may receive or generate a video component including a visual language presentation. In some examples, a service provider may wish to include column extent and row extent of the visual language presentation. For example, a service provider may receive a visual language presentation video having a dimension of 960x720 luma samples and desire that the visual language presentation to be presented with dimensions of 640x480 luma samples. In this case, Table 4 may be modified to include syntax element that enable the service provider to include such signaling in the closed signing descriptor. If the origin information is in luma sample units, then syntax elements col_extent_luma_samples and row_extent_luma_samples with values 640 and 480, respectively may be included in the closed signing descriptor. If the origin information is in px units, then syntax elements col_extent_px and row_extent_px with appropriate corresponding values may be included in the closed signing descriptor. If the origin information is in percentage units, then syntax elements col_extent_percentage, col_extent_percentage_frac, row_extent_percentage and row_extent_percentage_frac with appropriate corresponding values may be included in the closed signing descriptor. In one example, the signaling of extent may be replaced by signaling of location of right-top corner of the intended presentation of the closed signing video. For the example above this corresponds to luma sample position (col_origin_luma_samples + 640 - 1) and (row_origin_luma_samples + 480 - 1) for row and column respectively.

In some examples, a service provider may desire to associate a closed signing video with multiple main video assets. An example is multi-view video where a single closed signing video asset may be used for both views. In such an instance the set of main_video_asset_id_length and corresponding main_video_asset_id_byte syntax elements may be repeated to include asset identifier of each such main video asset. In some examples, when signaling position/extent using luma samples, a service provider may desire to indicate fractional luma sample position/extent. In one example, this may be achieved by including syntax elements that indicate the fractional luma sample position/extent for row and column. In one example, such a syntax elements may include one byte syntax elements. In one example, the fractional position/extent is obtained by dividing the value in the syntax element by a pre-determined value (e.g., 100) and adding it to the integer position/extent.

As described above, component encapsulator 402 may be configured to receive one or more media components and generate media presentation based on DASH. FIG. 5B is a block diagram illustrating an example of a component encapsulator that may implement one or more techniques of this disclosure. Component encapsulator 550 may be configured to generate a media presentation according to the techniques described herein. In the example illustrated in FIG. 5B, functional blocks of component encapsulator 550 correspond to functional blocks for generating a media presentation (e.g., a DASH media presentation). As illustrated in FIG. 5B, component encapsulator 550 includes media presentation description generator 552 and segment generator 554. Each of media presentation description generator 552 and segment generator 554 may be interconnected (physically, communicatively, and/or operatively) for inter-component communications and may be implemented as any of a variety of suitable circuitry, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware or any combinations thereof. It should be noted that although component encapsulator 550 is illustrated as having distinct functional blocks, such an illustration is for descriptive purposes and does not limit component encapsulator 550 to a particular hardware architecture. Functions of component encapsulator 550 may be realized using any combination of hardware, firmware and/or software implementations.

Segment generator 554 may be configured to receive media components and generate one or more segments for inclusion in a media presentation. Media presentation description generator 552 may be configured to generate media presentation description fragments. It should be noted that in some examples, a service distribution engine (e.g., service distribution engine 308 or service distribution engine 400) or specific components thereof may be configured to generate signalling messages according to the techniques described herein. As such, description of signalling messages with respect to media presentation description generator 552 should not be construed to limit the techniques described herein. Thus, component encapsulator 402 and/or service distribution engine 400 may be configured to generate MPDs and/or similar signaling data according to one or more of the techniques described herein.

FIG. 6 is a block diagram illustrating an example of a receiver device that may implement one or more techniques of this disclosure. That is, receiver device 600 may be configured to parse a signal based on the semantics described above with respect to one or more of the tables described above. Further, receiver device 600 may be configured to parse visual language presentation information and generate a video presentation including a main video presentation and a visual language presentation based on the parsed visual presentation information. It should be noted that in some examples, that receiver device 600 may be configured to overlay the visual language presentation video at the position indicated by overlay information and/or at a position indicated by a user setting. For example, receiver device 600 may be configured to enable a user to override the author’s intended placement. Further, in some examples, receiver device 600 may be configured to override an intended placement based on additional secondary content being displayed (e.g., closed captioning being displayed at the bottom of a display and/or an electronic service guide being displayed at the top of a display). Further, as illustrated in Table 4, a closed signing language may be associated with a visual language presentation. Receiver device 600 may be configured to enable a user to set a language preference and display visual language presentations corresponding to a language preference.

Receiver device 600 is an example of a computing device that may be configured to receive data from a communications network and allow a user to access multimedia content. In the example illustrated in FIG. 6, receiver device 600 is configured to receive data via a television network, such as, for example, television service network 304 described above. Further, in the example illustrated in FIG. 6, receiver device 600 is configured to send and receive data via a wide area network. It should be noted that in other examples, receiver device 600 may be configured to simply receive data through a television service network 304. The techniques described herein may be utilized by devices configured to communicate using any and all combinations of communications networks.

As illustrated in FIG. 6, receiver device 600 includes central processing unit(s) 602, system memory 604, system interface 610, data extractor 612, audio decoder 614, audio output system 616, video decoder 618, display system 620, I/O device(s) 622, and network interface 624. As illustrated in FIG. 6, system memory 604 includes operating system 606 and applications 608. Each of central processing unit(s) 602, system memory 604, system interface 610, data extractor 612, audio decoder 614, audio output system 616, video decoder 618, display system 620, I/O device(s) 622, and network interface 624 may be interconnected (physically, communicatively, and/or operatively) for inter-component communications and may be implemented as any of a variety of suitable circuitry, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware or any combinations thereof. It should be noted that although receiver device 600 is illustrated as having distinct functional blocks, such an illustration is for descriptive purposes and does not limit receiver device 600 to a particular hardware architecture. Functions of receiver device 600 may be realized using any combination of hardware, firmware and/or software implementations.

CPU(s) 602 may be configured to implement functionality and/or process instructions for execution in receiver device 600. CPU(s) 602 may include single and/or multi-core central processing units. CPU(s) 602 may be capable of retrieving and processing instructions, code, and/or data structures for implementing one or more of the techniques described herein. Instructions may be stored on a computer readable medium, such as system memory 604.

System memory 604 may be described as a non-transitory or tangible computer-readable storage medium. In some examples, system memory 604 may provide temporary and/or long-term storage. In some examples, system memory 604 or portions thereof may be described as non-volatile memory and in other examples portions of system memory 604 may be described as volatile memory. System memory 604 may be configured to store information that may be used by receiver device 600 during operation. System memory 604 may be used to store program instructions for execution by CPU(s) 602 and may be used by programs running on receiver device 600 to temporarily store information during program execution. Further, in the example where receiver device 600 is included as part of a digital video recorder, system memory 604 may be configured to store numerous video files.

Applications 608 may include applications implemented within or executed by receiver device 600 and may be implemented or contained within, operable by, executed by, and/or be operatively/communicatively coupled to components of receiver device 600. Applications 608 may include instructions that may cause CPU(s) 602 of receiver device 600 to perform particular functions. Applications 608 may include algorithms which are expressed in computer programming statements, such as, for-loops, while-loops, if-statements, do-loops, etc. Applications 608 may be developed using a specified programming language. Examples of programming languages include, Java^TM, Jini^TM, C, C++, Objective C, Swift, Perl, Python, PhP, UNIX Shell, Visual Basic, and Visual Basic Script. In the example where receiver device 600 includes a smart television, applications may be developed by a television manufacturer or a broadcaster. As illustrated in FIG. 6, applications 608 may execute in conjunction with operating system 606. That is, operating system 606 may be configured to facilitate the interaction of applications 608 with CPUs(s) 602, and other hardware components of receiver device 600. Operating system 606 may be an operating system designed to be installed on set-top boxes, digital video recorders, televisions, and the like. It should be noted that techniques described herein may be utilized by devices configured to operate using any and all combinations of software architectures.

System interface 610 may be configured to enable communications between components of receiver device 600. In one example, system interface 610 comprises structures that enable data to be transferred from one peer device to another peer device or to a storage medium. For example, system interface 610 may include a chipset supporting Accelerated Graphics Port (AGP) based protocols, Peripheral Component Interconnect (PCI) bus based protocols, such as, for example, the PCI Express^TM (PCIe) bus specification, which is maintained by the Peripheral Component Interconnect Special Interest Group, or any other form of structure that may be used to interconnect peer devices (e.g., proprietary bus protocols).

As described above, receiver device 600 is configured to receive and, optionally, send data via a television service network. As described above, a television service network may operate according to a telecommunications standard. A telecommunications standard may define communication properties (e.g., protocol layers), such as, for example, physical signaling, addressing, channel access control, packet properties, and data processing. In the example illustrated in FIG. 6, data extractor 612 may be configured to extract video, audio, and data from a signal. A signal may be defined according to, for example, aspects DVB standards, ATSC standards, ISDB standards, DTMB standards, DMB standards, and DOCSIS standards.

Data extractor 612 may be configured to extract video, audio, and data, from a signal generated by service distribution engine 400 described above. That is, data extractor 612 may operate in a reciprocal manner to service distribution engine 400. Further, data extractor 612 may be configured to parse link layer packets based on any combination of one or more of the structures described above.

Data packets may be processed by CPU(s) 602, audio decoder 614, and video decoder 618. Audio decoder 614 may be configured to receive and process audio packets. For example, audio decoder 614 may include a combination of hardware and software configured to implement aspects of an audio codec. That is, audio decoder 614 may be configured to receive audio packets and provide audio data to audio output system 616 for rendering. Audio data may be coded using multi-channel formats such as those developed by Dolby and Digital Theater Systems. Audio data may be coded using an audio compression format. Examples of audio compression formats include Motion Picture Experts Group (MPEG) formats, Advanced Audio Coding (AAC) formats, DTS-HD formats, and Dolby Digital (AC-3) formats. Audio output system 616 may be configured to render audio data. For example, audio output system 616 may include an audio processor, a digital-to-analog converter, an amplifier, and a speaker system. A speaker system may include any of a variety of speaker systems, such as headphones, an integrated stereo speaker system, a multi-speaker system, or a surround sound system.

Video decoder 618 may be configured to receive and process video packets. For example, video decoder 618 may include a combination of hardware and software used to implement aspects of a video codec. In one example, video decoder 618 may be configured to decode video data encoded according to any number of video compression standards, such as ITU-T H.262 or ISO/IEC MPEG-2 Visual, ISO/IEC MPEG-4 Visual, ITU-T H.264 (also known as ISO/IEC MPEG-4 Advanced video Coding (AVC)), and High-Efficiency Video Coding (HEVC). Display system 620 may be configured to retrieve and process video data for display. For example, display system 620 may receive pixel data from video decoder 618 and output data for visual presentation. Further, display system 620 may be configured to output graphics in conjunction with video data, e.g., graphical user interfaces. Display system 620 may comprise one of a variety of display devices such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or another type of display device capable of presenting video data to a user. A display device may be configured to display standard definition content, high definition content, or ultra-high definition content.

I/O device(s) 622 may be configured to receive input and provide output during operation of receiver device 600. That is, I/O device(s) 622 may enable a user to select multimedia content to be rendered. Input may be generated from an input device, such as, for example, a push-button remote control, a device including a touch-sensitive screen, a motion-based input device, an audio-based input device, or any other type of device configured to receive user input. I/O device(s) 622 may be operatively coupled to receiver device 600 using a standardized communication protocol, such as for example, Universal Serial Bus protocol (USB), Bluetooth, ZigBee or a proprietary communications protocol, such as, for example, a proprietary infrared communications protocol.

Network interface 624 may be configured to enable receiver device 600 to send and receive data via a local area network and/or a wide area network. Network interface 624 may include a network interface card, such as an Ethernet card, an optical transceiver, a radio frequency transceiver, or any other type of device configured to send and receive information. Network interface 624 may be configured to perform physical signaling, addressing, and channel access control according to the physical and Media Access Control (MAC) layers utilized in a network. Receiver device 600 may be configured to parse a signal generated according to any of the techniques described above with respect to FIG. 5A. In this manner, receiver device 600 represents an example of a device configured parse one or more syntax element including information associated with a visual language presentation and rendering a visual presentation including the visual language presentation based on the one or more syntax elements.

In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, which corresponds to a tangible medium such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol. In this manner, computer-readable media generally may correspond to (1) tangible computer-readable storage media which is non-transitory or (2) a communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure. A computer program product may include a computer-readable medium.

By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are instead directed to non-transitory, tangible storage media. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

Instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Accordingly, the term “processor,” as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques could be fully implemented in one or more circuits or logic elements.

The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of ICs (e.g., a chip set). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described above, various units may be combined in a codec hardware unit or provided by a collection of interoperative hardware units, including one or more processors as described above, in conjunction with suitable software and/or firmware.

Moreover, each functional block or various features of the base station device and the terminal device (the video decoder and the video encoder) used in each of the aforementioned embodiments may be implemented or executed by a circuitry, which is typically an integrated circuit or a plurality of integrated circuits. The circuitry designed to execute the functions described in the present specification may comprise a general-purpose processor, a digital signal processor (DSP), an application specific or general application integrated circuit (ASIC), a field programmable gate array (FPGA), or other programmable logic devices, discrete gates or transistor logic, or a discrete hardware component, or a combination thereof. The general-purpose processor may be a microprocessor, or alternatively, the processor may be a conventional processor, a controller, a microcontroller or a state machine. The general-purpose processor or each circuit described above may be configured by a digital circuit or may be configured by an analogue circuit. Further, when a technology of making into an integrated circuit superseding integrated circuits at the present time appears due to advancement of a semiconductor technology, the integrated circuit by this technology is also able to be used.

Various examples have been described. These and other examples are within the scope of the following claims.

Claims

A method for signaling information associated with a visual language presentation, the method comprising:
receiving a visual language presentation;
signaling a syntax element indicating a type of unit used to indicate an origin for overlaying the visual language presentation with respect to a main video presentation; and
signaling one or more syntax elements providing values for the origin based on the indicated type of unit.
The method of claim 1, wherein the indicated type of unit includes a percentage unit, and wherein signaling one or more syntax elements providing values for the origin includes signaling one or more syntax elements indicating a percentage of the main video width from the left edge of the main video presentation and signaling one or more syntax elements indicating a percentage of the main video height from the top edge of the main video presentation.
The method of claim 2, wherein signaling one or more syntax elements indicating a percentage of the main video width includes signaling a syntax element indicating an integer part of the percentage and a syntax element indicating a fractional part of the percentage.
The method of claim 3, wherein signaling one or more syntax elements indicating a percentage of the main video height includes signaling a syntax element indicating an integer part of the percentage and a syntax element indicating a fractional part of the percentage.
The method of claim 1, wherein the indicated type of unit includes a standard length unit, and wherein signaling one or more syntax elements providing values for the origin includes signaling a syntax element indicating a number of standard length units from the left edge of the main video presentation and signaling a syntax element indicating a number of standard length units from the top edge of the main video presentation.
The method of claim 5, wherein the standard length unit is equal to 1/96th of an inch.
The method of claim 1, wherein the indicated type of unit includes luma sample units, and wherein signaling one or more syntax elements providing values for the origin includes signaling a syntax element indicating a number of luma samples from the left edge of the main video presentation and signaling a syntax element indicating a number of luma samples from the top edge of the main video presentation.
A device for rendering a visual language presentation, the device comprising a non-transitory computer readable medium and one or more processors configured to:
receive a main video presentation;
receive a visual language presentation;
parse a syntax element indicating a type of unit used to indicate an origin for overlaying the visual language presentation with respect to the main video presentation;
parse one or more syntax elements providing values for the origin based on the indicated type of unit; and
render a presentation including the visual language presentation overlaid on the main video presentation based on the one or more syntax elements providing values for the origin.
The device of claim 8, wherein parsing a syntax element indicating a type of unit includes determining a type of unit is a percentage unit and wherein rendering a presentation including the visual language presentation overlaid on the main video presentation includes determining a percentage of the main video width unit from the left edge of the main video presentation and a percentage of main video height from the top edge of the main video presentation.
The device of claim 9, wherein determining a percentage of the main video width from the left edge of the main video presentation includes determining an integer part of the percentage and determining a fractional part of the percentage.
The device of claim 10, wherein determining a percentage of the main video height from the top edge of the main video presentation includes determining an integer part of the percentage and determining a fractional part of the percentage.
The device of claim 8, wherein parsing a syntax element indicating a type of unit includes determining a type of unit is a standard length unit and wherein rendering a presentation including the visual language presentation overlaid on the main video presentation includes determining a number of standard length units from the left edge of the main video and a number of standard length units from the top edge of the main video.
The device of claim 12, wherein the standard length unit is equal to 1/96th of an inch.
The device of claim 8, wherein parsing a syntax element indicating a type of unit includes determining a type of unit is a luma sample unit and wherein rendering a presentation including the visual language presentation overlaid on the main video presentation includes determining a number of luma samples from the left edge of the main video presentation and a number of luma samples from the top edge of the main video presentation.
The device of claim 8, wherein the device is selected from the group consisting of: a desktop or laptop computer, a mobile device, a smartphone, a cellular telephone, a personal data assistant (PDA), a television, a tablet device, or a personal gaming device.
A method for rendering a visual language presentation, the method comprising:
receiving a main video asset;
receiving a closed signing video asset;
parsing a syntax element indicating one of a plurality of types of units used to indicate an origin for overlaying the closed signing video asset with respect to a main video asset;
parsing a syntax element indicating the distance of the origin from the left edge of the main video asset;
parsing a syntax element indicating the distance of the origin from the top edge of the main video asset; and
determining the origin location based on the indicated type of units, the indicated distance of the origin from the left edge of the main video asset, and the indicated distance of the origin from the top edge of the main video asset; and
rendering a presentation including the closed signing video asset overlaid on the main video asset based on the determined origin.
The method of claim 16, wherein the plurality of types of units includes at least two of: a percentage unit, a standard length unit, and a luma sample unit.
The method of claim 17, wherein a percentage unit indicates the distance of the origin from the left edge of the main video asset as a percentage of the width of the main video asset, and indicates the distance of the origin from the top edge of the main video asset as a percentage of the height of the main video asset.
The method of claim 18, wherein a standard length unit indicates the distance of the origin from the left edge of the main video asset as a number standard length units, and indicates the distance of the origin from the top edge of the main video asset as a number standard length units.
The method of claim 19, wherein a standard length unit includes 1/96th of an inch.