WO2020184645A1

WO2020184645A1 - Systems and methods for signaling viewpoint information in omnidirectional media

Info

Publication number: WO2020184645A1
Application number: PCT/JP2020/010700
Authority: WO
Inventors: Sachin G. Deshpande
Original assignee: Sharp Kabushiki Kaisha
Priority date: 2019-03-14
Filing date: 2020-03-12
Publication date: 2020-09-17

Abstract

To transmit information associated with an omnidirectional video, following three syntax elements are used. (See paragraph [0056] for details.) (1) First syntax element specifying a number of object identifiers for which a second syntax element and a third syntax element values are signaled. (2) Second syntax element specifying an object identifier for a m-th object. (3) Third syntax element specifying a description of an object identified by the second syntax element. (For example, object label.)

Description

SYSTEMS AND METHODS FOR SIGNALING VIEWPOINT INFORMATION IN OMNIDIRECTIONAL MEDIA

This disclosure relates to the field of interactive video distribution and more particularly to techniques for signaling viewpoint information in a virtual reality application.

Digital media playback capabilities may be incorporated into a wide range of devices, including digital televisions, including so-called “smart” televisions, set-top boxes, laptop or desktop computers, tablet computers, digital recording devices, digital media players, video gaming devices, cellular phones, including so-called “smart” phones, dedicated video streaming devices, and the like. Digital media content (e.g., video and audio programming) may originate from a plurality of sources including, for example, over-the-air television providers, satellite television providers, cable television providers, online media service providers, including, so-called streaming service providers, and the like. Digital media content may be delivered over packet-switched networks, including bidirectional networks, such as Internet Protocol (IP) networks and unidirectional networks, such as digital broadcast networks.

Digital video included in digital media content may be coded according to a video coding standard. Video coding standards may incorporate video compression techniques. Examples of video coding standards include ISO/IEC MPEG-4 Visual and ITU-T H.264 (also known as ISO/IEC MPEG-4 AVC) and High-Efficiency Video Coding (HEVC). Video compression techniques enable data requirements for storing and transmitting video data to be reduced. Video compression techniques may reduce data requirements by exploiting the inherent redundancies in a video sequence. Video compression techniques may sub-divide a video sequence into successively smaller portions (i.e., groups of frames within a video sequence, a frame within a group of frames, slices within a frame, coding tree units (e.g., macroblocks) within a slice, coding blocks within a coding tree unit, etc.). Prediction coding techniques may be used to generate difference values between a unit of video data to be coded and a reference unit of video data. The difference values may be referred to as residual data. Residual data may be coded as quantized transform coefficients. Syntax elements may relate residual data and a reference coding unit. Residual data and syntax elements may be included in a compliant bitstream. Compliant bitstreams and associated metadata may be formatted according to data structures. Compliant bitstreams and associated metadata may be transmitted from a source to a receiver device (e.g., a digital television or a smart phone) according to a transmission standard. Examples of transmission standards include Digital Video Broadcasting (DVB) standards, Integrated Services Digital Broadcasting Standards (ISDB) standards, and standards developed by the Advanced Television Systems Committee (ATSC), including, for example, the so-called ATSC 3.0 suite of standards.

In general, this disclosure describes various techniques for signaling information associated with a virtual reality application. In particular, this disclosure describes techniques for signaling viewpoint information. It should be noted that although in some examples, the techniques of this disclosure are described with respect to transmission standards, the techniques described herein may be generally applicable. For example, the techniques described herein are generally applicable to any of DVB standards, ISDB standards, ATSC Standards, Digital Terrestrial Multimedia Broadcast (DTMB) standards, Digital Multimedia Broadcast (DMB) standards, Hybrid Broadcast and Broadband Television (HbbTV) standards, World Wide Web Consortium (W3C) standards, and Universal Plug and Play (UPnP) standard. Further, it should be noted that although techniques of this disclosure are described with respect to ITU-T H.264 and ITU-T H.265, the techniques of this disclosure are generally applicable to video coding, including omnidirectional video coding. For example, the coding techniques described herein may be incorporated into video coding systems, (including video coding systems based on future video coding standards, including those currently under development, such as, e.g., Versatile Video Coding (VCC)) including block structures, intra prediction techniques, inter prediction techniques, transform techniques, filtering techniques, and/or entropy coding techniques other than those included in ITU-T H.265. Thus, reference to ITU-T H.264 and ITU-T H.265 is for descriptive purposes and should not be construed to limit the scope of the techniques described herein. Further, it should be noted that incorporation by reference of documents herein should not be construed to limit or create ambiguity with respect to terms used herein. For example, in the case where an incorporated reference provides a different definition of a term than another incorporated reference and/or as the term is used herein, the term should be interpreted in a manner that broadly includes each respective definition and/or in a manner that includes each of the particular definitions in the alternative.

In one example, a method of signaling viewpoint information associated with an omnidirectional video comprises for each of a plurality of viewpoints, signaling a unique identifier and a label, wherein the label provides a viewpoint selection mechanism.

In one example, a device comprises one or more processors configured to for each of a plurality of viewpoints, signal a unique identifier and a label, wherein the label provides a viewpoint selection mechanism.

In one example, a non-transitory computer-readable storage medium comprises instructions stored thereon that, when executed, cause one or more processors of a device to for each of a plurality of viewpoints, signal a unique identifier and a label, wherein the label provides a viewpoint selection mechanism.

In one example, an apparatus comprises means for signaling a unique identifier and a label for each of a plurality of viewpoints, wherein the label provides a viewpoint selection mechanism.

In one example, a method of determining viewpoint information associated with an omnidirectional video comprises parsing syntax elements indicating for each of a plurality of viewpoints, a unique identifier and a label and rendering video based on values of the parsed syntax elements.

In one example, a device comprises one or more processors configured to parse syntax elements indicating for each of a plurality of viewpoints, a unique identifier and a label and render video based on values of the parsed syntax elements.

In one example, a non-transitory computer-readable storage medium comprises instructions stored thereon that, when executed, cause one or more processors of a device to parse syntax elements indicating for each of a plurality of viewpoints, a unique identifier and a label and render video based on values of the parsed syntax elements.

In one example, an apparatus comprises means for parsing syntax elements indicating for each of a plurality of viewpoints, a unique identifier and a label and means for rendering video based on values of the parsed syntax elements.

The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.

FIG. 1 is a block diagram illustrating an example of a system that may be configured to transmit coded video data according to one or more techniques of this this disclosure.

FIG. 2A is a conceptual diagram illustrating coded video data and corresponding data structures according to one or more techniques of this this disclosure.

FIG. 2B is a conceptual diagram illustrating coded video data and corresponding data structures according to one or more techniques of this this disclosure.

FIG. 3 is a conceptual diagram illustrating coded video data and corresponding data structures according to one or more techniques of this this disclosure.

FIG. 4 is a conceptual diagram illustrating an example of a coordinate system according to one or more techniques of this disclosure.

FIG. 5A is a conceptual diagram illustrating an example of specifying regions on a sphere according to one or more techniques of this this disclosure.

FIG. 5B is a conceptual diagram illustrating an example of specifying regions on a sphere according to one or more techniques of this this disclosure.

FIG. 6 is a conceptual drawing illustrating an example of components that may be included in an implementation of a system that may be configured to transmit coded video data according to one or more techniques of this this disclosure.

FIG. 7 is a block diagram illustrating an example of a receiver device that may implement one or more techniques of this disclosure.

Video content typically includes video sequences comprised of a series of frames. A series of frames may also be referred to as a group of pictures (GOP). Each video frame or picture may include a one or more slices, where a slice includes a plurality of video blocks. A video block may be defined as the largest array of pixel values (also referred to as samples) that may be predictively coded. Video blocks may be ordered according to a scan pattern (e.g., a raster scan). A video encoder performs predictive encoding on video blocks and sub-divisions thereof. ITU-T H.264 specifies a macroblock including 16 x 16 luma samples. ITU-T H.265 specifies an analogous Coding Tree Unit (CTU) structure where a picture may be split into CTUs of equal size and each CTU may include Coding Tree Blocks (CTB) having 16 x 16, 32 x 32, or 64 x 64 luma samples. As used herein, the term video block may generally refer to an area of a picture or may more specifically refer to the largest array of pixel values that may be predictively coded, sub-divisions thereof, and/or corresponding structures. Further, according to ITU-T H.265, each video frame or picture may be partitioned to include one or more tiles, where a tile is a sequence of coding tree units corresponding to a rectangular area of a picture.

In ITU-T H.265, the CTBs of a CTU may be partitioned into Coding Blocks (CB) according to a corresponding quadtree block structure. According to ITU-T H.265, one luma CB together with two corresponding chroma CBs and associated syntax elements are referred to as a coding unit (CU). A CU is associated with a prediction unit (PU) structure defining one or more prediction units (PU) for the CU, where a PU is associated with corresponding reference samples. That is, in ITU-T H.265 the decision to code a picture area using intra prediction or inter prediction is made at the CU level and for a CU one or more predictions corresponding to intra prediction or inter prediction may be used to generate reference samples for CBs of the CU. In ITU-T H.265, a PU may include luma and chroma prediction blocks (PBs), where square PBs are supported for intra prediction and rectangular PBs are supported for inter prediction. Intra prediction data (e.g., intra prediction mode syntax elements) or inter prediction data (e.g., motion data syntax elements) may associate PUs with corresponding reference samples. Residual data may include respective arrays of difference values corresponding to each component of video data (e.g., luma (Y) and chroma (Cb and Cr)). Residual data may be in the pixel domain. A transform, such as, a discrete cosine transform (DCT), a discrete sine transform (DST), an integer transform, a wavelet transform, or a conceptually similar transform, may be applied to pixel difference values to generate transform coefficients. It should be noted that in ITU-T H.265, CUs may be further sub-divided into Transform Units (TUs). That is, an array of pixel difference values may be sub-divided for purposes of generating transform coefficients (e.g., four 8 x 8 transforms may be applied to a 16 x 16 array of residual values corresponding to a 16 x16 luma CB), such sub-divisions may be referred to as Transform Blocks (TBs). Transform coefficients may be quantized according to a quantization parameter (QP). Quantized transform coefficients (which may be referred to as level values) may be entropy coded according to an entropy encoding technique (e.g., content adaptive variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), probability interval partitioning entropy coding (PIPE), etc.). Further, syntax elements, such as, a syntax element indicating a prediction mode, may also be entropy coded. Entropy encoded quantized transform coefficients and corresponding entropy encoded syntax elements may form a compliant bitstream that can be used to reproduce video data. A binarization process may be performed on syntax elements as part of an entropy coding process. Binarization refers to the process of converting a syntax value into a series of one or more bits. These bits may be referred to as “bins.”

Virtual Reality (VR) applications may include video content that may be rendered with a head-mounted display, where only the area of the spherical video that corresponds to the orientation of the user’s head is rendered. VR applications may be enabled by omnidirectional video, which is also referred to as 360 degree spherical video of 360 degree video. Omnidirectional video is typically captured by multiple cameras that cover up to 360 degrees of a scene. A distinct feature of omnidirectional video compared to normal video is that, typically only a subset of the entire captured video region is displayed, i.e., the area corresponding to the current user’s field of view (FOV) is displayed. A FOV is sometimes also referred to as viewport. In other cases, a viewport may be described as part of the spherical video that is currently displayed and viewed by the user. It should be noted that the size of the viewport can be smaller than or equal to the field of view. Further, it should be noted that omnidirectional video may be captured using monoscopic or stereoscopic cameras. Monoscopic cameras may include cameras that capture a single view of an object. Stereoscopic cameras may include cameras that capture multiple views of the same object (e.g., views are captured using two lenses at slightly different angles). As used herein, the term viewpoint when associated with a camera (e.g., camera viewpoint), may refer to information associated (i.e., the position of the camera) with a camera used to capture a view(s) of an object (e.g., camera parameters). It should be noted that since multiple cameras may be used to capture an omnidirectional video, multiple instances of viewpoint information may be associated with an omnidirectional video (i.e., one for each camera). Also, it should be noted that viewpoint information may be static or dynamic. That is, for example, a camera may be stationary or moving while capturing video. Further, it should be noted that in some cases, images for use in omnidirectional video applications may be captured using ultra wide-angle lens (i.e., so-called fisheye lens). In any case, the process for creating 360 degree spherical video may be generally described as stitching together input images and projecting the stitched together input images onto a three-dimensional structure (e.g., a sphere or cube), which may result in so-called projected frames. Further, in some cases, regions of projected frames may be transformed, resized, and relocated, which may result in a so-called packed frame.

Transmission systems may be configured to transmit omnidirectional video to one or more computing devices. Computing devices and/or transmission systems may be based on models including one or more abstraction layers, where data at each abstraction layer is represented according to particular structures, e.g., packet structures, modulation schemes, etc. An example of a model including defined abstraction layers is the so-called Open Systems Interconnection (OSI) model. The OSI model defines a 7-layer stack model, including an application layer, a presentation layer, a session layer, a transport layer, a network layer, a data link layer, and a physical layer. It should be noted that the use of the terms upper and lower with respect to describing the layers in a stack model may be based on the application layer being the uppermost layer and the physical layer being the lowermost layer. Further, in some cases, the term “Layer 1” or “L1” may be used to refer to a physical layer, the term “Layer 2” or “L2” may be used to refer to a link layer, and the term “Layer 3” or “L3” or “IP layer” may be used to refer to the network layer.

A physical layer may generally refer to a layer at which electrical signals form digital data. For example, a physical layer may refer to a layer that defines how modulated radio frequency (RF) symbols form a frame of digital data. A data link layer, which may also be referred to as a link layer, may refer to an abstraction used prior to physical layer processing at a sending side and after physical layer reception at a receiving side. As used herein, a link layer may refer to an abstraction used to transport data from a network layer to a physical layer at a sending side and used to transport data from a physical layer to a network layer at a receiving side. It should be noted that a sending side and a receiving side are logical roles and a single device may operate as both a sending side in one instance and as a receiving side in another instance. A link layer may abstract various types of data (e.g., video, audio, or application files) encapsulated in particular packet types (e.g., Motion Picture Expert Group - Transport Stream (MPEG-TS) packets, Internet Protocol Version 4 (IPv4) packets, etc.) into a single generic format for processing by a physical layer. A network layer may generally refer to a layer at which logical addressing occurs. That is, a network layer may generally provide addressing information (e.g., Internet Protocol (IP) addresses) such that data packets can be delivered to a particular node (e.g., a computing device) within a network. As used herein, the term network layer may refer to a layer above a link layer and/or a layer having data in a structure such that it may be received for link layer processing. Each of a transport layer, a session layer, a presentation layer, and an application layer may define how data is delivered for use by a user application.

Wang et al., ISO/IEC JTC1/SC29/WG11 W18227 “WD 4 of ISO/IEC 23090-2 OMAF 2nd edition,” January 2019, Marrakech, Morocco, which is incorporated by reference and herein referred to as Wang, defines a media application format that enables omnidirectional media applications. Wang specifies a coordinate system for omnidirectional video; projection and rectangular region-wise packing methods that may be used for conversion of a spherical video sequence or image into a two-dimensional rectangular video sequence or image, respectively; storage of omnidirectional media and the associated metadata using the ISO Base Media File Format (ISOBMFF); encapsulation, signaling, and streaming of omnidirectional media in a media streaming system; and media profiles and presentation profiles. It should be noted that for the sake of brevity, a complete description of Wang is not provided herein. However, reference is made to relevant sections of Wang.

Wang provides media profiles where video is coded according to ITU-T H.265. ITU-T H.265 is described in High Efficiency Video Coding (HEVC), Rec. ITU-T H.265 December 2016, which is incorporated by reference, and referred to herein as ITU-T H.265. As described above, according to ITU-T H.265, each video frame or picture may be partitioned to include one or more slices and further partitioned to include one or more tiles. FIGS. 2A-2B are conceptual diagrams illustrating an example of a group of pictures including slices and further partitioning pictures into tiles. In the example illustrated in FIG. 2A, Pic₄is illustrated as including two slices (i.e., Slice₁ and Slice₂) where each slice includes a sequence of CTUs (e.g., in raster scan order). In the example illustrated in FIG. 2B, Pic₄is illustrated as including six tiles (i.e., Tile₁ to Tile₆), where each tile is rectangular and includes a sequence of CTUs. It should be noted that in ITU-T H.265, a tile may consist of coding tree units contained in more than one slice and a slice may consist of coding tree units contained in more than one tile. However, ITU-T H.265 provides that one or both of the following conditions shall be fulfilled: (1) All coding tree units in a slice belong to the same tile; and (2) All coding tree units in a tile belong to the same slice.

360 degree spherical video may include regions. Referring to the example illustrated in FIG. 3, the 360 degree spherical video includes Regions A, B, and C and as illustrated in FIG. 3, tiles (i.e., Tile₁ to Tile₆) may form a region of an omnidirectional video. In the example illustrated in FIG. 3, each of the regions are illustrated as including CTUs. As described above, CTUs may form slices of coded video data and/or tiles of video data. Further, as described above, video coding techniques may code areas of a picture according to video blocks, sub-divisions thereof, and/or corresponding structures and it should be noted that video coding techniques enable video coding parameters to be adjusted at various levels of a video coding structure, e.g., adjusted for slices, tiles, video blocks, and/or at sub-divisions. In one example, the 360 degree video illustrated in FIG. 3 may represent a sporting event where Region A and Region C include views of the stands of a stadium and Regions B includes a view of the playing field (e.g., the video is captured by a 360 degree camera placed at the 50-yard line).

As described above, a viewport may be part of the spherical video that is currently displayed and viewed by the user. As such, regions of omnidirectional video may be selectively delivered depending on the user’s viewport, i.e., viewport-dependent delivery may be enabled in omnidirectional video streaming. Typically, to enable viewport-dependent delivery, source content is split into sub-picture sequences before encoding, where each sub-picture sequence covers a subset of the spatial area of the omnidirectional video content, and sub-picture sequences are then encoded independently from each other as a single-layer bitstream. For example, referring to FIG. 3, each of Region A, Region B, and Region C, or portions thereof, may correspond to independently coded sub-picture bitstreams. Each sub-picture bitstream may be encapsulated in a file as its own track and tracks may be selectively delivered to a receiver device based on viewport information. It should be noted that in some cases, it is possible that sub-pictures overlap. For example, referring to FIG. 3, Tile₁, Tile₂, Tile₄, and Tile₅ may form a sub-picture and Tile₂, Tile₃, Tile₅, and Tile₆ may form a sub-picture. Thus, a particular sample may be included in multiple sub-pictures. Wang provides where a composition-aligned sample includes one of a sample in a track that is associated with another track, the sample has the same composition time as a particular sample in the another track, or, when a sample with the same composition time is not available in the another track, the closest preceding composition time relative to that of a particular sample in the another track. Further, Wang provides where a constituent picture includes part of a spatially frame-packed stereoscopic picture that corresponds to one view, or a picture itself when frame packing is not in use or the temporal interleaving frame packing arrangement is in use.

As described above, Wang specifies a coordinate system for omnidirectional video. In Wang, the coordinate system consists of a unit sphere and three coordinate axes, namely the X (back-to-front) axis, the Y (lateral, side-to-side) axis, and the Z (vertical, up) axis, where the three axes cross at the center of the sphere. The location of a point on the sphere is identified by a pair of sphere coordinates azimuth (φ) and elevation (θ). FIG. 4 illustrates the relation of the sphere coordinates azimuth (φ) and elevation (θ) to the X, Y, and Z coordinate axes as specified in Wang. It should be noted that in Wang the value ranges of azimuth is -180.0, inclusive, to 180.0, exclusive, degrees and the value range of elevation is -90.0 to 90.0, inclusive, degrees. Wang specifies where a region on a sphere may be specified by four great circles, where a great circle (also referred to as a Riemannian circle) is an intersection of the sphere and a plane that passes through the center point of the sphere, where the center of the sphere and the center of a great circle are co-located. Wang further describes where a region on a sphere may be specified by two azimuth circles and two elevation circles, where a azimuth circle is a circle on the sphere connecting all points with the same azimuth value, and an elevation circle is a circle on the sphere connecting all points with the same elevation value. The sphere region structure in Wang forms the basis for signaling various types of metadata.

It should be noted that with respect to the equations used herein, the following arithmetic operators may be used:

It should be noted that with respect to the equations used herein, the following logical operators may be used:

It should be noted that with respect to the equations used herein, the following relational operators may be used:

It should be noted in the syntax used herein, unsigned int(n) refers to an unsigned integer having n-bits. Further, bit(n) refers to a bit value having n-bits.

As described above, Wang specifies how to store omnidirectional media and the associated metadata using the International Organization for Standardization (ISO) base media file format (ISOBMFF). Wang specifies where a file format that supports metadata specifying the area of the spherical surface covered by the projected frame. In particular, Wang includes a sphere region structure specifying a sphere region having the following definition, syntax and semantic:

Definition
The sphere region structure (SphereRegionStruct) specifies a sphere region.
When centre_tilt is equal to 0, the sphere region specified by this structure is derived as follows:
-If both azimuth_range and elevation_range are equal to 0, the sphere region specified by this structure is a point on a spherical surface.
-Otherwise, the sphere region is defined using variables centreAzimuth, centreElevation, cAzimuth1, cAzimuth, cElevation1, and cElevation2 derived as follows:

The sphere region is defined as follows with reference to the shape type value specified in the semantics of the structure containing this instance of SphereRegionStruct:
-When the shape type value is equal to 0, the sphere region is specified by four great circles defined by four points cAzimuth1, cAzimuth2, cElevation1, cElevation2 and the centre point defined by centreAzimuth and centreElevation and as shown in FIG. 5A.
-When the shape type value is equal to 1, the sphere region is specified by two azimuth circles and two elevation circles defined by four points cAzimuth1, cAzimuth2, cElevation1, cElevation2 and the centre point defined by centreAzimuth and centreElevation and as shown in FIG. 5B.

When centre_tilt is not equal to 0, the sphere region is firstly derived as above and then a tilt rotation is applied along the axis originating from the sphere origin passing through the centre point of the sphere region, where the angle value increases clockwise when looking from the origin towards the positive end of the axis. The final sphere region is the one after applying the tilt rotation.
Shape type value equal to 0 specifies that the sphere region is specified by four great circles as illustrated in FIG. 5A.
Shape type value equal to 1 specifies that the sphere region is specified by two azimuth circles and two elevation circles as illustrated in FIG. 5B.
Shape type values greater than 1 are reserved.

Syntax

Semantics
centre_azimuth and centre_elevation specify the centre of the sphere region. centre_azimuth shall be in the range of -180 * 2¹⁶ to 180 * 2¹⁶ - 1, inclusive. centre_elevation shall be in the range of -90 * 2¹⁶ to 90 * 2¹⁶, inclusive.

centre_tilt specifies the tilt angle of the sphere region. centre_tilt shall be in the range of -180 * 2¹⁶ to 180 * 2¹⁶ - 1, inclusive.

azimuth_range and elevation_range, when present, specify the azimuth and elevation ranges, respectively, of the sphere region specified by this structure in units of 2^-16 degrees. azimuth_range and elevation_range specify the range through the centre point of the sphere region, as illustrated by FIG. 5A or FIG. 5B. When azimuth_range and elevation_range are not present in this instance of SphereRegionStruct, they are inferred as specified in the semantics of the structure containing this instance of SphereRegionStruct. azimuth_range shall be in the range of 0 to 360 * 2¹⁶, inclusive. elevation_range shall be in the range of 0 to 180 * 2¹⁶, inclusive.
The semantics of interpolate are specified by the semantics of the structure containing this instance of SphereRegionStruct.

As described above, the term viewpoint may refer to information associated with a camera used to capture a view. Wang further includes viewpoint information structures which provide information of a viewpoint. In particular Wang describes viewpoint information structures having the following definition, syntax and semantics:

Definition
The ViewpointPosStruct(), ViewpointGlobalCoordinateSysRotationStruct(), and ViewpointGroupStruct() provide information of a viewpoint, including the position of the viewpoint and the yaw, pitch, and roll rotation angles of X, Y, and Z axes, respectively, of the global coordinate system of the viewpoint relative to the common reference coordinate system, and viewpoint group information.

Syntax

Semantics
viewpoint_pos_x, viewpoint_pos_y, and viewpoint_pos_z specify the position of the viewpoint (when the position of the viewpoint is static) or the initial position of viewpoint (when the position of the viewpoint is dynamic), in units of 10^-1 millimeters, in 3D space, relative to the common reference coordinate system. If a viewpoint is associated with a timed metadata track with sample entry type 'dyvp', the position of the viewpoint is dynamic. Otherwise, the position of the viewpoint is static. In the former case, the dynamic position of the viewpoint is signalled in the associated timed metadata track with sample entry type 'dyvp'.
viewpoint_gpspos_present_flag equal to 1 indicates that viewpoint_gpspos_longitude, viewpoint_gpspos_latitude, and viewpoint_gpspos_altitude are present. viewpoint_gpspos_present_flag equal to 0 indicates that viewpoint_gpspos_longitude, viewpoint_gpspos_latitude, and viewpoint_gpspos_altitude are not present.
viewpoint_gpspos_longitude indicates the longitude of the geolocation of the viewpoint in units of 2^-23 degrees. viewpoint_gpspos_longitude shall be in range of -180 * 2²³ to 180 * 2²³ - 1, inclusive. Positive values represent eastern longitude and negative values represent western longitude.
viewpoint_gpspos_latitude indicates the latitude of the geolocation of the viewpoint in units of 2^-23 degrees. viewpoint_gpspos_latitude shall be in range of -90 * 2²³ to 90 * 2²³ - 1, inclusive. Positive value represents northern latitude and negative value represents southern latitude.
viewpoint_gpspos_altitude indicates the altitude of the geolocation of the viewpoint in units of milimeters above the WGS 84 reference ellipsoid as specified in the EPSG:4326 database.
viewpoint_gcs_yaw, viewpoint_gcs_pitch, and viewpoint_gcs_roll specify the yaw, pitch, and roll angles, respectively, of the rotation angles of X, Y, Z axes of the global coordinate system of the viewpoint relative to the common reference coordinate system, in units of 2^-16 degrees. viewpoint_gcs_yaw shall be in the range of -180 * 2¹⁶ to 180 *2¹⁶ - 1, inclusive. viewpoint_gcs_pitch shall be in the range of -90 * 2¹⁶ to 90 * 2¹⁶, inclusive. viewpoint_gcs_roll shall be in the range of -180 * 2¹⁶ to 180 * 2¹⁶ - 1, inclusive.
vwpt_group_id indicates the identifier of a viewpoint group. All viewpoints in a viewpoint group share a common reference coordinate system.
NOTE 1: When two viewpoints have different values of vwpt_group_id, their position coordinates are not comparable, because the viewpoints belong to different coordinate systems.
vwpt_group_description is a null-terminated UTF-8 string which indicates the description of a viewpoint group. A null string is allowed.
NOTE 2: An OMAF player is expected to start with the initial viewpoint timed metadata as defined below. Subsequently, if the user wishes to switch to a viewpoint group and the initial viewpoint information is not present, the OMAF player is expected to switch to the viewpoint with the least value of the viewpoint identifier in the viewpoint group.

As described above, viewpoint information may be dynamic. Wang specifies timed metadata tracks conveying viewpoint information. In Wang, timed metadata may be signaled based on a sample entry and a sample format. With respect to viewpoint information, Wang provides a dynamic viewpoint timed metadata track, an initial viewpoint timed metadata track, and an object centre points correspondence between viewpoints timed metadata track. In Wang, the sample entry structure and sample format structure for the dynamic viewpoint timed metadata track is specified as follows:

General
The dynamic viewpoint timed metadata track indicates the viewpoint parameters that are dynamically changing over time.
An OMAF player should use the signalled information as follows when starting playing back of one viewpoint after switching from another viewpoint:
- If there is a recommended viewing orientation explicitly signalled, the OMAF player is expected to parse this information and follow the recommended viewing orientation.
- Otherwise, the OMAF player is expected to keep the same viewing orientation as in the switching-from viewpoint just before the switching occurs.

Sample entry
The track sample entry type 'dyvp' shall be used. The sample entry of this sample entry type is specified as follows:

ViewpointPosStruct() is defined as above but indicates the initial viewpoint position.
dynamic_gcs_rotated_flag equal to 0 specifies that the yaw, pitch, and roll rotation angles of X, Y, and Z axes, respectively, of the global coordinate system of the viewpoint relative to the common reference coordinate system remain unchanged in all samples referring to this sample entry. dynamic_gcs_rotated_flag equal to 1 specifies that the yaw, pitch, and roll rotation angles of X, Y, and Z axes, respectively, of the global coordinate system of the viewpoint relative to the common reference coordinate system are indicated in the samples.
dynamic_vwpt_group_flag equal to 0 specifies that the vwpt_group_flag and ViewpointGroupStruct() are not present in the samples and the viewpoint group information (vwpt_group_id and vwpt_group_description) signalled in ViewpointGroupStruct() in the ViewpointTrackGroupBox applies to each sample referring to this sample entry. dynamic_vwpt_group_flag equal to 1 specifies that the ViewpointGroupStruct() for the viewpoint may change compared to that signalled in the ViewpointGroupStruct() in the ViewpointTrackGroupBox and new ViewpointGroupStruct() may be signalled in the samples based on the value of vwpt_group_flag in the samples.
ViewpointGlobalCoordinateSysRotationStruct() is defined in clause above but indicates the yaw, pitch, and roll rotation angles of X, Y, and Z axes, respectively, of the global coordinate system of the viewpoint relative to the common reference coordinate system for each sample referring to this sample entry.
Sample format
The sample syntax of this sample entry type ('dyvp') is specified as follows:

The semantics of ViewpointPosStruct(), ViewpointGlobalCoordinateSysRotationStruct(), and ViewpointGroupStruct() are specified above.
vwpt_group_flag equal to 1 specifies that the ViewpointGroupStruct() is present. vwpt_group_flag equal to 0 specifies that the ViewpointGroupStruct() is not present. When not present, the value of vwpt_group_flag is inferred to be equal to 0.
When dynamic_vwpt_group_flag is equal to 1, the first sample shall have vwpt_group_flag equal to 1. For subsequent samples when the viewpoint group information does not change, the ViewpointGroupStruct() can be absent. When the ViewpointGrouptStruct() is absent in a sample, the following applies:
- If dynamic_vwpt_group_flag is equal to 1, it is inferred to be identical to the ViewpointGroupStruct() of the previous sample, in decoding order.
- Otherwise (dynamic_vwpt_group_flag is equal to 0), it is inferred to be identical to the ViewpointGroupStruct() in the ViewpointTrackGroupBox.

In Wang, the sample entry structure and sample format structure for the initial viewpoint timed metadata track is specified as follows:
General
This timed metadata track, named the initial viewpoint timed metadata track, indicates the initial viewpoint that should be used.
When viewpoints are not indicated by the viewpoint track grouping (i.e., the track grouping with track_group_type equal to 'vipo') in a file, the initial viewpoint timed metadata track shall not be present in the file.
In the absence of this information when viewpoints are indicated by the viewpoint track grouping, the initial viewpoint should be inferred to be the viewpoint that has the least value of viewpoint identifier among all viewpoints in the file.
The initial viewpoint timed metadata track, when present, shall be indicated as being associated with all viewpoints in the file.
Sample Entry
The track sample entry type 'invp' shall be used. The sample entry of this sample entry type is specified as follows:

id_of_initial_viewpoint indicates the value of the viewpoint identifier of the initial viewpoint for the first sample to which this sample entry applies.
Sample format
The sample syntax of this sample entry type ('invp') is specified as follows:

id_of_initial_viewpoint indicates the value of the viewpoint identifier of the initial viewpoint for the sample.

In Wang, the sample entry structure and sample format structure for the object centre points correspondence between viewpoints timed metadata track is specified as follows:
General
This timed metadata track, named the object centre points correspondence (OCPC) timed metadata track, indicates information on object centre points correspondence between viewpoints. An OCPC timed metadata track applies to all omnidirectional video tracks in the file.
Sample Entry
The track sample entry type 'ocpc' shall be used. The sample entry of this sample entry type is specified as follows:

num_viewpoint_sets indicates the number of viewpoint sets for which object centre points correspondence is signalled in the samples to which this sample entry applies.

Sample Format
The sample syntax of this sample entry type ('ocpc') is specified as follows:

num_viewpoints_in_this_set[i] indicates the number of viewpoints in the i-th viewpoint set.
viewpoint_id[i][j] indicates the viewpoint ID of the j-th viewpoint in the i-th viewpoint set.
num_corresponding_object_centre_points_in_this_set[i] indicates the number of corresponding object centre points signalled in this sample for the i-th viewpoint set.
SphereRegionStruct(0)[i][k][j] specifies the k-th corresponding object centre point of the j-th viewpoint in the i-th viewpoint set.
object_id[i][k][j] specifies the k-th object identifier of the j-th viewpoint in the i-th viewpoint set.
For any particular value of k in the range of 0 to num_corresponding_object_centre_points_in_this_set[i] - 1, inclusive, the sphere points indicated by SphereRegionStruct(0)[i][k][j] for j ranging from 0 to num_viewpoints_in_this_set[i] - 1, inclusive, are object centre points that are corresponding to each other for the viewpoints in the i-th viewpoint set.

Further, for the object centre points correspondence between viewpoints timed metadata track, Wang provides the following Information derivation and OMAF player behavior:

Content providers can perform scene or object matching among video streams representing different viewpoints frame by frame, and choose a representative point of the scene or object, e.g., the centre point of an object, as the corresponding object centre point to be indicated by the OCPC timed metadata track.
When a viewpoint switching occurs, the client checks whether the user's field of view in the switch-from viewpoint covers a corresponding object centre point that is indicated by the time-aligned sample of the OCPC timed metadata track. If yes, just after the switching, the client should render to the user the viewport in the switching-to viewpoint for which the corresponding centre point is indicated by the time-aligned sample of the OCPC timed metadata track. When the user’s field of view covers more than one indicated object centre point, one of those that are the closest to the centre of the user's field of view should be chosen.
If both recommended viewport metadata information for the switch-to viewpoint and the OCPC timed metadata track are available, since both types information do no impose mandatory OMAF player behavior, then it is up to the OMAF player to choose to follow either one or no one.

As described above, a viewport may be part of the spherical video that is currently displayed and viewed by the user, i.e., region of a sphere. Viewpoint switching may happen due to several reasons. In one example, a user may deliberately switch from one viewpoint to another (for example from one camera to another). In another example, the switching may happen based on a current viewport, e.g., as a user’s head turns a switch from video captured by a first camera to video captured by a second camera occurs. Thus, there may be a relationship between viewpoint timed metadata and a timed metadata track syntax for sphere regions. With respect to specifying a generic timed metadata track syntax for sphere regions, in Wang, a sample entry and a sample format are specified as follows:

General
This clause specifies a generic timed metadata track syntax for indicating sphere regions. The purpose for the timed metadata track is indicated by the track sample entry type. The sample format of all metadata tracks specified in this clause starts with a common part and may be followed by an extension part that is specific to the sample entry of the metadata track. Each sample specifies a sphere region.
When a sphere region timed metadata track is linked to one or more media tracks with a 'cdsc' track reference, it describes each media track individually.
NOTE: The syntax allows for one sample to specify multiple sphere regions. However, there is a semantic restriction that limits the samples to have only one sphere region.

Sample Entry
Definition
Exactly one SphereRegionConfigBox shall be present in the sample entry. SphereRegionConfigBox specifies the shape of the sphere region specified by the samples. When the azimuth and elevation ranges of the sphere region in the samples do not change, they may be indicated in the sample entry.

Syntax

Semantics
shape_type equal to 0 specifies that the sphere region is specified by four great circles. shape_type equal to 1 specifies that the sphere region is specified by two azimuth circles and two elevation circles. shape_type values greater than 1 are reserved. The value of shape_type is used as the shape type value when applying the clause describing the Sphere region (provided above) to the semantics of the samples of the sphere region metadata track.

dynamic_range_flag equal to 0 specifies that the azimuth and elevation ranges of the sphere region remain unchanged in all samples referring to this sample entry. dynamic_range_flag equal to 1 specifies that the azimuth and elevation ranges of the sphere region are indicated in the sample format.

static_azimuth_range and static_elevation_range specify the azimuth and elevation ranges, respectively, of the sphere region for each sample referring to this sample entry in units of 2^-16 degrees. static_azimuth_range and static_elevation_range specify the ranges through the centre point of the sphere region, as illustrated by FIG. 5A or FIG. 5B. static_azimuth_range shall be in the range of 0 to 360 * 2¹⁶, inclusive. static_elevation_range shall be in the range of 0 to 180 * 2¹⁶, inclusive. When static_azimuth_range and static_elevation_range are present and are both equal to 0, the sphere region for each sample referring to this sample entry is a point on a spherical surface. When static_azimuth_range and static_elevation_range are present, the values of azimuth_range and elevation_range are inferred to be equal to static_azimuth_range and static_elevation_range, respectively, when applying the clause describing the Sphere region (provided above) to the semantics of the samples of the sphere region metadata track.

num_regions specifies the number of sphere regions in the samples referring to this sample entry. num_regions shall be equal to 1. Other values of num_regions are reserved.

Sample Format

Definition
Each sample specifies a sphere region. The SphereRegionSample structure may be extended in derived track formats.

Syntax

Semantics
The sphere region structure clause, provided above, applies to the sample that contains the SphereRegionStruct structure.
Let the target media samples be the media samples in the referenced media tracks with composition times greater than or equal to the composition time of this sample and less than the composition time of the next sample.
interpolate equal to 0 specifies that the values of centre_azimuth, centre_elevation, centre_tilt, azimuth_range (if present), and elevation_range (if present) in this sample apply to the target media samples. interpolate equal to 1 specifies that the values of centre_azimuth, centre_elevation, centre_tilt, azimuth_range (if present), and elevation_range (if present) that apply to the target media samples are linearly interpolated from the values of the corresponding fields in this sample and the previous sample.
The value of interpolate for a sync sample, the first sample of the track, and the first sample of a track fragment shall be equal to 0.

Further, Wang specifies a sample group for recommended viewports of multiple viewpoints as follows:

A timed metadata track having sample entry type 'rcvp' may contain zero or one SampleToGroupBox with grouping_type equal to 'vwpt'. This SampleToGroupBox represents the assignment of samples in this timed metadata (and consequently the corresponding samples in the media tracks) to viewpoints.
When a SampleToGroupBox with grouping_type equal to 'vwpt' is present, an accompanying SampleGroupDescriptionBox with the same grouping type shall be present, and contain the ID of the particular viewpoint this group of samples belong to.
When viewpoints are not indicated by the viewpoint track grouping (i.e., the track grouping with track_group_type equal to 'vipo') in a file, SampleToGroupBox with grouping_type equal to 'vwpt' shall not be present in any timed metadata track in the file.
The sample group entry of grouping_type equal to 'vwpt', named ViewpointEntry, is defined as follows:

viewpoint_id indicates the viewpoint identifier of the viewpoint this group of samples belong to.

The timed metadata tracks conveying object centre points correspondence information in Wang may be less than ideal. In particular, when using the timed metadata tracks conveying object centre points correspondence information in Wang, it may be difficult to include such information in a user application in a useful manner.

FIG. 1 is a block diagram illustrating an example of a system that may be configured to code (i.e., encode and/or decode) video data according to one or more techniques of this disclosure. System 100 represents an example of a system that may encapsulate video data according to one or more techniques of this disclosure. As illustrated in FIG. 1, system 100 includes source device 102, communications medium 110, and destination device 120. In the example illustrated in FIG. 1, source device 102 may include any device configured to encode video data and transmit encoded video data to communications medium 110. Destination device 120 may include any device configured to receive encoded video data via communications medium 110 and to decode encoded video data. Source device 102 and/or destination device 120 may include computing devices equipped for wired and/or wireless communications and may include, for example, set top boxes, digital video recorders, televisions, desktop, laptop or tablet computers, gaming consoles, medical imagining devices, and mobile devices, including, for example, smartphones, cellular telephones, personal gaming devices.

Communications medium 110 may include any combination of wireless and wired communication media, and/or storage devices. Communications medium 110 may include coaxial cables, fiber optic cables, twisted pair cables, wireless transmitters and receivers, routers, switches, repeaters, base stations, or any other equipment that may be useful to facilitate communications between various devices and sites. Communications medium 110 may include one or more networks. For example, communications medium 110 may include a network configured to enable access to the World Wide Web, for example, the Internet. A network may operate according to a combination of one or more telecommunication protocols. Telecommunications protocols may include proprietary aspects and/or may include standardized telecommunication protocols. Examples of standardized telecommunications protocols include Digital Video Broadcasting (DVB) standards, Advanced Television Systems Committee (ATSC) standards, Integrated Services Digital Broadcasting (ISDB) standards, Data Over Cable Service Interface Specification (DOCSIS) standards, Global System Mobile Communications (GSM) standards, code division multiple access (CDMA) standards, 3rd Generation Partnership Project (3GPP) standards, European Telecommunications Standards Institute (ETSI) standards, Internet Protocol (IP) standards, Wireless Application Protocol (WAP) standards, and Institute of Electrical and Electronics Engineers (IEEE) standards.

Storage devices may include any type of device or storage medium capable of storing data. A storage medium may include a tangible or non-transitory computer-readable media. A computer readable medium may include optical discs, flash memory, magnetic memory, or any other suitable digital storage media. In some examples, a memory device or portions thereof may be described as non-volatile memory and in other examples portions of memory devices may be described as volatile memory. Examples of volatile memories may include random access memories (RAM), dynamic random access memories (DRAM), and static random access memories (SRAM). Examples of non-volatile memories may include magnetic hard discs, optical discs, floppy discs, flash memories, or forms of electrically programmable memories (EPROM) or electrically erasable and programmable (EEPROM) memories. Storage device(s) may include memory cards (e.g., a Secure Digital (SD) memory card), internal/external hard disk drives, and/or internal/external solid state drives. Data may be stored on a storage device according to a defined file format.

FIG. 6 is a conceptual drawing illustrating an example of components that may be included in an implementation of system 100. In the example implementation illustrated in FIG. 6, system 100 includes one or more computing devices 602A-602N, television service network 604, television service provider site 606, wide area network 608, local area network 610, and one or more content provider sites 612A-612N. The implementation illustrated in FIG. 6 represents an example of a system that may be configured to allow digital media content, such as, for example, a movie, a live sporting event, etc., and data and applications and media presentations associated therewith to be distributed to and accessed by a plurality of computing devices, such as computing devices 602A-602N. In the example illustrated in FIG. 6, computing devices 602A-602N may include any device configured to receive data from one or more of television service network 604, wide area network 608, and/or local area network 610. For example, computing devices 602A-602N may be equipped for wired and/or wireless communications and may be configured to receive services through one or more data channels and may include televisions, including so-called smart televisions, set top boxes, and digital video recorders. Further, computing devices 602A-602N may include desktop, laptop, or tablet computers, gaming consoles, mobile devices, including, for example, “smart” phones, cellular telephones, and personal gaming devices.

Television service network 604 is an example of a network configured to enable digital media content, which may include television services, to be distributed. For example, television service network 604 may include public over-the-air television networks, public or subscription-based satellite television service provider networks, and public or subscription-based cable television provider networks and/or over the top or Internet service providers. It should be noted that although in some examples television service network 604 may primarily be used to enable television services to be provided, television service network 604 may also enable other types of data and services to be provided according to any combination of the telecommunication protocols described herein. Further, it should be noted that in some examples, television service network 604 may enable two-way communications between television service provider site 606 and one or more of computing devices 602A-602N. Television service network 604 may comprise any combination of wireless and/or wired communication media. Television service network 604 may include coaxial cables, fiber optic cables, twisted pair cables, wireless transmitters and receivers, routers, switches, repeaters, base stations, or any other equipment that may be useful to facilitate communications between various devices and sites. Television service network 604 may operate according to a combination of one or more telecommunication protocols. Telecommunications protocols may include proprietary aspects and/or may include standardized telecommunication protocols. Examples of standardized telecommunications protocols include DVB standards, ATSC standards, ISDB standards, DTMB standards, DMB standards, Data Over Cable Service Interface Specification (DOCSIS) standards, HbbTV standards, W3C standards, and UPnP standards.

Referring again to FIG. 6, television service provider site 606 may be configured to distribute television service via television service network 604. For example, television service provider site 606 may include one or more broadcast stations, a cable television provider, or a satellite television provider, or an Internet-based television provider. For example, television service provider site 606 may be configured to receive a transmission including television programming through a satellite uplink/downlink. Further, as illustrated in FIG. 6, television service provider site 606 may be in communication with wide area network 608 and may be configured to receive data from content provider sites 612A-612N. It should be noted that in some examples, television service provider site 606 may include a television studio and content may originate therefrom.

Wide area network 608 may include a packet based network and operate according to a combination of one or more telecommunication protocols. Telecommunications protocols may include proprietary aspects and/or may include standardized telecommunication protocols. Examples of standardized telecommunications protocols include Global System Mobile Communications (GSM) standards, code division multiple access (CDMA) standards, 3^rd Generation Partnership Project (3GPP) standards, European Telecommunications Standards Institute (ETSI) standards, European standards (EN), IP standards, Wireless Application Protocol (WAP) standards, and Institute of Electrical and Electronics Engineers (IEEE) standards, such as, for example, one or more of the IEEE 802 standards (e.g., Wi-Fi). Wide area network 608 may comprise any combination of wireless and/or wired communication media. Wide area network 608 may include coaxial cables, fiber optic cables, twisted pair cables, Ethernet cables, wireless transmitters and receivers, routers, switches, repeaters, base stations, or any other equipment that may be useful to facilitate communications between various devices and sites. In one example, wide area network 608 may include the Internet. Local area network 610 may include a packet based network and operate according to a combination of one or more telecommunication protocols. Local area network 610 may be distinguished from wide area network 608 based on levels of access and/or physical infrastructure. For example, local area network 610 may include a secure home network.

Referring again to FIG. 6, content provider sites 612A-612N represent examples of sites that may provide multimedia content to television service provider site 606 and/or computing devices 602A-602N. For example, a content provider site may include a studio having one or more studio content servers configured to provide multimedia files and/or streams to television service provider site 606. In one example, content provider sites 612A-612N may be configured to provide multimedia content using the IP suite. For example, a content provider site may be configured to provide multimedia content to a receiver device according to Real Time Streaming Protocol (RTSP), HTTP, or the like. Further, content provider sites 612A-612N may be configured to provide data, including hypertext based content, and the like, to one or more of receiver devices computing devices 602A-602N and/or television service provider site 606 through wide area network 608. Content provider sites 612A-612N may include one or more web servers. Data provided by data provider site 612A-612N may be defined according to data formats.

Referring again to FIG. 1, source device 102 includes video source 104, video encoder 106, data encapsulator 107, and interface 108. Video source 104 may include any device configured to capture and/or store video data. For example, video source 104 may include a video camera and a storage device operably coupled thereto. Video encoder 106 may include any device configured to receive video data and generate a compliant bitstream representing the video data. A compliant bitstream may refer to a bitstream that a video decoder can receive and reproduce video data therefrom. Aspects of a compliant bitstream may be defined according to a video coding standard. When generating a compliant bitstream video encoder 106 may compress video data. Compression may be lossy (discernible or indiscernible to a viewer) or lossless.

Referring again to FIG. 1, data encapsulator 107 may receive encoded video data and generate a compliant bitstream, e.g., a sequence of NAL units according to a defined data structure. A device receiving a compliant bitstream can reproduce video data therefrom. It should be noted that the term conforming bitstream may be used in place of the term compliant bitstream. It should be noted that data encapsulator 107 need not necessary be located in the same physical device as video encoder 106. For example, functions described as being performed by video encoder 106 and data encapsulator 107 may be distributed among devices illustrated in FIG. 6.

As described above, the timed metadata tracks conveying viewpoint information in Wang may be less than ideal. In one example, data encapsulator 107 may be configured to signal an object_label for each object_id value in an object centre points correspondence (OCPC) timed metadata track. Signaling an object_label for each object_id value provides a human understandable label or description indicating what an object_id corresponds to. This is useful for applications and for user interface purposes. For example, when using the object centre points correspondence (OCPC) timed metadata track to cover information during a basketball game, one of the object of interest could be the basketball having its own object_id. However, simply knowing the object_id will not allow user to know what object it is. So, in this case the object_label such as “basksetball” could be used to associate the object_id with human readable information about the object. Similarly, another object may be a particular player on a team. In one example, according to the techniques described herein, data encapsulator 107 may be configured to signal an object centre points correspondence (OCPC) timed metadata track using the following the sample entry structure and sample format structure:

Sample Entry
The track sample entry type 'ocpc' shall be used. The sample entry of this sample entry type is specified as follows:

num_viewpoint_sets indicates the number of viewpoint sets for which object centre points correspondence is signalled in the samples to which this sample entry applies.
num_object_ids indicates the number of object identifiers for which object_id and object_label values are signalled.
It should be noted that in general, the bit-width of some of the fields may be changed. For example, unsigned int(16) num_object_ids could instead be signalled as unsigned int(8) num_object_ids or as unsigned int(12) num_object_ids;
object_id[m] specifies the object identifier for m-th object. The value of object_id[i][k][j] for each i in the range of 0 to num_viewpoint_sets-1, inclusive, for j in the range of 0 to num_viewpoints_in_this_set[i]-1, inclusive, for k in the range of 0 to num_corresponding_object_centre_points_in_this_set[i]-1, inclusive shall match one value of object_id[m] for m in the range of 0 to num_object_ids-1, inclusive.
object_label[m] is a null-terminated UTF-8 string which specifies the description of an object identified by object_id[m]. A null string is allowed. In another example null string is disallowed.
In one example num_object_ids may be instead signalled as num_object_ids_minus1, in this case following may apply:
num_object_ids_minus1 plus 1 indicates the number of object identifiers for which object_id and object_label values are signalled.

Sample Format
The sample syntax of this sample entry type ('ocpc') is specified as follows:

In this case, in one example, information derivation and OMAF player behavior may be specified as follows:
Content providers can perform scene or object matching among video streams representing different viewpoints frame by frame, and choose a representative point of the scene or object, e.g., the centre point of an object, as the corresponding object centre point to be indicated by the OCPC timed metadata track. The object centre point may be assigned an object label to indicate what the object represents.
The available object center labels for each viewpoint may be shown to the user via a user interface. The user should be allowed to select an object label corresponding to an object of interest to view. The viewport corresponding to the selected object label’s object center should be shown to the user for the selected viewpoint. When a viewpoint switching occurs, the client checks whether the user's field of view in the switch-from viewpoint covers a corresponding object centre point that is indicated by the time-aligned sample of the OCPC timed metadata track. If yes, just after the switching, the client should render to the user the viewport in the switching-to viewpoint for which the corresponding centre point is indicated by the time-aligned sample of the OCPC timed metadata track. When the user's field of view covers more than one indicated object centre point, one of those that are the closest to the centre of the user's field of view should be chosen.
If both recommended viewport metadata information for the switch-to viewpoint and the OCPC timed metadata track are available, since both types information do no impose mandatory OMAF player behavior, then it is up to the OMAF player to choose to follow either one or no one.

In one example, according to the techniques herein, the above object label based selection mechanism may be mandated (or particularly useful) when 360 degree video content is shown on a traditional 2D display where the user does not have control of the viewing orientation or has released control of the viewing orientation.

In another example, the object_label is included in the OcpcSample() directly. In one example, in this case, the sample format structure may be as follows:

Sample Format
The sample syntax of this sample entry type ('ocpc') is specified as follows:

num_viewpoints_in_this_set[i] indicates the number of viewpoints in the i-th viewpoint set.
viewpoint_id[i][j] indicates the viewpoint ID of the j-th viewpoint in the i-th viewpoint set.
num_corresponding_object_centre_points_in_this_set[i] indicates the number of corresponding object centre points signalled in this sample for the i-th viewpoint set.
SphereRegionStruct(0)[i][k][j] specifies the k-th corresponding object centre point of the j-th viewpoint in the i-th viewpoint set.
object_id[i][k][j] specifies the k-th object identifier of the j-th viewpoint in the i-th viewpoint set.
object_label[i][k][j] is a null-terminated UTF-8 string which specifies the description of an object identified by object_id[i][k][j]. A null string is allowed. In another example null string is disallowed.
For any particular value of k in the range of 0 to num_corresponding_object_centre_points_in_this_set[i] - 1, inclusive, the sphere points indicated by SphereRegionStruct(0)[i][k][j] for j ranging from 0 to num_viewpoints_in_this_set[i] - 1, inclusive, are object centre points that are corresponding to each other for the viewpoints in the i-th viewpoint set.

In Wang ( 2* num_viewpoint_sets * num_corresponding_object_centre_points_in_this_set[i] * num_viewpoints_in_this_set[i] ) total bytes are used to signal object_id[i][k][j] values in each sample. This could be potentially a large size of data. In one example, a loop of object_id values may be signaled and/or a number of bits used for signaling object index in OcpcSample() may be signaled. This can result in bit savings.

In one example, according to the techniques described herein, data encapsulator 107 may be configured to signal an object centre points correspondence (OCPC) timed metadata track using a loop of object_id values and then to signal in OcpcSample() an index which corresponds to an object_id for each viewpoint set (num_viewpoint_sets), for each corresponding object center point in the set (num_corresponding_object_centre_points_in_this_set[i]), for each number of viewpoints in the set (num_viewpoints_in_this_set[i]). In this case, in one example, according to the techniques described herein, data encapsulator 107 may be configured to signal an object centre points correspondence (OCPC) timed metadata track using the following the sample entry structure and sample format structure. It should be noted that in general, the bit-width of some of the fields may be changed. For example, with respect to the sample entry structure and sample format structure below, unsigned int(16) object_id[i][k][j] could instead be signalled as unsigned int(8) object_id[i][k][j] or as unsigned int(12) object_id[i][k][j] or using some other bitwidth

Sample entry
The track sample entry type 'ocpc' shall be used. The sample entry of this sample entry type is specified as follows:

num_viewpoint_sets indicates the number of viewpoint sets for which object centre points correspondence is signalled in the samples to which this sample entry applies.
num_object_ids indicates the number of object identifiers for which object_id values are signalled. The variable N is derived as follows:
N=Ceil(Log2(num_object_ids))
where Ceil( x ) the smallest integer greater than or equal to x and Log2( x ) the base-2 logarithm of x.
object_id[m] specifies the object identifier for the m-th object.
Sample Format
The sample syntax of this sample entry type ('ocpc') is specified as follows:

num_viewpoints_in_this_set[i] indicates the number of viewpoints in the i-th viewpoint set.
viewpoint_id[i][j] indicates the viewpoint ID of the j-th viewpoint in the i-th viewpoint set.
num_corresponding_object_centre_points_in_this_set[i] indicates the number of corresponding object centre points signalled in this sample for the i-th viewpoint set.
SphereRegionStruct(0)[i][k][j] specifies the k-th corresponding object centre point of the j-th viewpoint in the i-th viewpoint set.
object_index[i][k][j] specifies the value of index m in the for loop of indices signaled in the OcpcSampleEntry(), which identifies object identifier of k-th object in the j-th viewpoint in the i-th viewpoint set. The object_id for the k-th object the j-th viewpoint in the i-th viewpoint set is equal to object_id[object_index[i][k][j]]
For any particular value of k in the range of 0 to num_corresponding_object_centre_points_in_this_set[i] - 1, inclusive, the sphere points indicated by SphereRegionStruct(0)[i][k][j] for j ranging from 0 to num_viewpoints_in_this_set[i] - 1, inclusive, are object centre points that are corresponding to each other for the viewpoints in the i-th viewpoint set.

In another example, according to the techniques described herein, data encapsulator 107 may be configured to use the following syntax for sample format structure, where reserved bits are added for byte alignment:

Where variable X is derived as follows:
X = 8 - N%8

In another example, according to the techniques described herein, data encapsulator 107 may be configured to use the following syntax for sample format structure, where the for loops are separated so that the byte alignment is improved for data structure SphereRegionStruct(0)[i][k][j] and for object_index[i][k][j]:

In another example, according to the techniques described herein, data encapsulator 107 may be configured use the following syntax for sample format structure, where the for loops are separated so that the byte alignment is improved for data structure SphereRegionStruct(0)[i][k][j]: and where reserved bits are added:

Where variable Y is derived as follows:

OR

Where variable Y is derived as follows:

In another example, according to the techniques described herein, data encapsulator 107 may be configured to signal an additional syntax element to indicate the number of bits used for the signaling of object_index (i.e., object_index[i][k][j]) or for object_id. It should be noted that this syntax element may be signaled in class OcpcSampleEntry() and/or class OcpcSample(). In this case, in one example, according to the techniques described herein, data encapsulator 107 may be configured to signal an object centre points correspondence (OCPC) timed metadata track using the following the sample entry structure:

Sample Entry
The track sample entry type 'ocpc' shall be used. The sample entry of this sample entry type is specified as follows:

num_viewpoint_sets indicates the number of viewpoint sets for which object centre points correspondence is signalled in the samples to which this sample entry applies.
num_object_ids indicates the number of object identifiers for which object_id values are signalled.
object_id[m] specifies the object identifier for the m-th object.
num_bits_for_object_index specifies the number of bits used (N) for signalling object_index[i][k][j] in OcpcSample().
In another example,
num_bits_for_object_index specifies the number of bits used (N) for signalling object index in OcpcSample().

The variable N is derived as follows:
N= num_bits_for_object_index

In one example, num_bits_for_object_index may instead be signalled as num_bits_for_object_index_minus1 having the following semantics:

num_bits_for_object_index_minus1 plus 1 specifies the number of bits used (N) for signalling object_index[i][k][j].

It should be noted that with respect to the sphere region structure specifying a sphere region in Wang, in a case where both azimuth_range and elevation_range are equal to 0, the sphere region specified by this structure is a point on a spherical surface. Further, it should be noted that the azimuth_range and elevation_range syntax elements of SphereRegionStruct are optionally signalled controlled by the input argument, range_included_flag. However, the last byte of SphereRegionStruct which includes a bit for indication of interpolate and seven reserved bits is always signaled. It is observed that as per semantics in SphereRegionStruct:
The semantics of interpolate syntax element are defined by the semantics of the structure which contain the instance of the SphereRegionStruct.
It is observed that in OMAF WD the SphereRegionSample() defines the following semantics for the interpolate syntax element:
interpolate equal to 0 specifies that the values of centre_azimuth, centre_elevation, centre_tilt, azimuth_range (if present), and elevation_range (if present) in this sample apply to the target media samples. interpolate equal to 1 specifies that the values of centre_azimuth, centre_elevation, centre_tilt, azimuth_range (if present), and elevation_range (if present) that apply to the target media samples are linearly interpolated from the values of the corresponding fields in this sample and the previous sample.
InitialViewingOrientationSample()which extends SphereRegionSample() requires that interpolate shall be equal to 0.
ContentCoverageStruct() and SphereRegionQualityRankingStruct(), both of which include SphereRegionStruct(1) require that interpolate shall be equal to 0.

As a result it is asserted that in many typical cases when SphereRegionStruct is used for signaling information about the sphere region, the interpolate syntax element may not be meaningful or that useful. Also, it is observed that in many cases the OMAF 1^st edition text fails to specify semantics for intepolate and/or has not defined a meaning and a value for it when SphereRegionStruct is included in another structure. As such, in this case, the current version of SphereRegionStruct which does not allow exclusion of the last byte may be inefficient as it wastes a byte. In one example, a new sphere region structure, SphereRegionStruct2, may be defined which allows inclusion or exclusion of the last byte. Thus, in one example, according to the techniques herein, the following definition, syntax and semantics may be used for a new sphere region structure specifying a sphere region.

Definition
The sphere region structure (SphereRegionStruct2) specifies a sphere region.
When centre_tilt is equal to 0, the sphere region specified by this structure is derived as follows:
-If both azimuth_range and elevation_range are equal to 0, the sphere region specified by this structure is a point on a spherical surface.
-Otherwise, the sphere region is defined using variables centreAzimuth, centreElevation, cAzimuth1, cAzimuth, cElevation1, and cElevation2 derived as follows:

The sphere region is defined as follows with reference to the shape type value specified in the semantics of the structure containing this instance of SphereRegionStruct2:
-When the shape type value is equal to 0, the sphere region is specified by four great circles defined by four points cAzimuth1, cAzimuth2, cElevation1, cElevation2 and the centre point defined by centreAzimuth and centreElevation and as shown in FIG. 5A.
-When the shape type value is equal to 1, the sphere region is specified by two azimuth circles and two elevation circles defined by four points cAzimuth1, cAzimuth2, cElevation1, cElevation2 and the centre point defined by centreAzimuth and centreElevation and as shown in FIG. 5B.

When centre_tilt is not equal to 0, the sphere region is firstly derived as above and then a tilt rotation is applied along the axis originating from the sphere origin passing through the centre point of the sphere region, where the angle value increases clockwise when looking from the origin towards the positive end of the axis. The final sphere region is the one after applying the tilt rotation.
Shape type value equal to 0 specifies that the sphere region is specified by four great circles as illustrated in FIG. 5A.
Shape type value equal to 1 specifies that the sphere region is specified by two azimuth circles and two elevation circles as illustrated in FIG. 5B.
Shape type values greater than 1 are reserved.

Syntax

Semantics
centre_azimuth and centre_elevation specify the centre of the sphere region. centre_azimuth shall be in the range of -180 * 2¹⁶ to 180 * 2¹⁶ - 1, inclusive. centre_elevation shall be in the range of -90 * 2¹⁶ to 90 * 2¹⁶, inclusive.

centre_tilt specifies the tilt angle of the sphere region. centre_tilt shall be in the range of -180 * 2¹⁶ to 180 * 2¹⁶ - 1, inclusive.

azimuth_range and elevation_range, when present, specify the azimuth and elevation ranges, respectively, of the sphere region specified by this structure in units of 2^-16 degrees. azimuth_range and elevation_range specify the range through the centre point of the sphere region, as illustrated by FIG. 5A or FIG. 5B. When azimuth_range and elevation_range are not present in this instance of SphereRegionStruct2, they are inferred as specified in the semantics of the structure containing this instance of SphereRegionStruct2. azimuth_range shall be in the range of 0 to 360 * 2¹⁶, inclusive. elevation_range shall be in the range of 0 to 180 * 2¹⁶, inclusive.
The semantics of interpolate are specified by the semantics of the structure containing this instance of SphereRegionStruct. When interpolate is not present in this instance of SphereRegionStruct2, it is inferred as specified in the semantics of the syntax structure containing this instance of SphereRegionStruct2.

With respect to SphereRegionStruct2 above, in some cases it may be useful to allow a content creator to decide if the interpolate bit is to be used (i.e., if it is useful for their application) or not and to not include the last byte of the sphere region structure when not using interpolate bit. In one example according to the techniques herein, a last_byte_sent bit may be signaled in OcpcSampleEntry() to signal if the last byte is sent or not in the sphere region structure in each OcpcSample(). This saves (num_viewpoint_sets * num_corresponding_object_centre_points_in_this_set[i] * num_viewpoints_in_this_set[i]) total bytes compared to Wang when the last byte is not signalled. In this case, in one example, according to the techniques described herein, data encapsulator 107 may be configured to signal an object centre points correspondence (OCPC) timed metadata track using the following the sample entry structure and sample format structure:

General
This timed metadata track, named the object centre points correspondence (OCPC) timed metadata track, indicates information on object centre points correspondence between viewpoints. An OCPC timed metadata track applies to all omnidirectional video tracks in the file.
Sample Entry
The track sample entry type 'ocpc' shall be used. The sample entry of this sample entry type is specified as follows:

num_viewpoint_sets indicates the number of viewpoint sets for which object centre points correspondence is signalled in the samples to which this sample entry applies.
last_byte_sent equal to 1 indicates the last byte of sphere region structure which includes the interpolate bit and 7 reserved bits is signaled in OcpcSample()’s sphere region structure. last_byte_sent equal to 0 indicates the last byte of sphere region structure which includes the interpolate bit and 7 reserved bits is not signaled in OcpcSample()’s sphere region structure.

Sample Format
The sample syntax of this sample entry type ('ocpc') is specified as follows:

num_viewpoints_in_this_set[i] indicates the number of viewpoints in the i-th viewpoint set.
viewpoint_id[i][j] indicates the viewpoint ID of the j-th viewpoint in the i-th viewpoint set.
num_corresponding_object_centre_points_in_this_set[i] indicates the number of corresponding object centre points signalled in this sample for the i-th viewpoint set.
SphereRegionStruct2(0)[i][k][j] specifies the k-th corresponding object centre point of the j-th viewpoint in the i-th viewpoint set.
object_id[i][k][j] specifies the k-th object identifier of the j-th viewpoint in the i-th viewpoint set.
For any particular value of k in the range of 0 to num_corresponding_object_centre_points_in_this_set[i] - 1, inclusive, the sphere points indicated by SphereRegionStruct2(0)[i][k][j] for j ranging from 0 to num_viewpoints_in_this_set[i] - 1, inclusive, are object centre points that are corresponding to each other for the viewpoints in the i-th viewpoint set.

In this case, in one example, the following semantics may be defined for interpolate in SphereRegionStruct2:

interpolate equal to 0 specifies that the values of centre_azimuth, centre_elevation, centre_tilt, in the OcpcSample specify the object centre at the sample instance. interpolate equal to 1 specifies that the values of centre_azimuth, centre_elevation, centre_tilt, in the OcpcSample can be linearly interpolated from the values of the corresponding fields in this sample and the previous sample for estimating the information about object centres in between those two OcpcSample()s.

When last_byte_sent is equal to 0, interpolate is inferred to be equal to 0.

In another example, the following might be a requirement:
interpolate shall be equal to 0. Or
interpolate is inferred to be equal to 0.

In a variant the format of OcpcSampleEntry() may be as follows:

In this example, 32 bits are used for num_viewpoint_sets as in Wang and the new information, i.e., last_byte_sent and reserved bits are added in OcpcSampleEntry using its own byte.

In another example, according to the techniques herein, the Sphere region structure information in OcpcSample() may always be signaled without including the interpolate bit and 7 reserved bits. This saves (num_viewpoint_sets * num_corresponding_object_centre_points_in_this_set[i] * num_viewpoints_in_this_set[i]) total bytes Wang. In this case, in one example, data encapsulator 107 may be configured to use the following syntax for sample format structure:

Sample Format
The sample syntax of this sample entry type ('ocpc') is specified as follows:

In this case, in one example, the following semantics may be defined for interpolate in SphereRegionStruct2:

interpolate equal to 0 specifies that the values of centre_azimuth, centre_elevation, centre_tilt, in the OcpcSample specify the object centre at the sample instance. interpolate equal to 1 specifies that the values of centre_azimuth, centre_elevation, centre_tilt, in the OcpcSample can be linearly interpolated from the values of the corresponding fields in this sample and the previous sample for estimating the information about object centres in between those two OcpcSample()s.

interpolate is inferred to be equal to 0.

As described above, Wang describes viewpoint information structures. W18227: “WD of ISO/IEC 23090-2 OMAF 3rd edition,” output document of MPEG #125, January 2019, Marrakech, Morocco, which is incorporated by reference and herein referred to as W18227. W18227 and additionally m47789 describe viewpoint information structures having the following definition, syntax and semantics:

Definition
The ViewpointPosStruct(), ViewpointGlobalCoordinateSysRotationStruct(), and ViewpointGroupStruct() provide information of a viewpoint, including the position of the viewpoint and the yaw, pitch, and roll rotation angles of X, Y, and Z axes, respectively, of the global coordinate system of the viewpoint relative to the common reference coordinate system, and viewpoint group information.

Syntax

Semantics
viewpoint_pos_x, viewpoint_pos_y, and viewpoint_pos_z specify the position of the viewpoint (when the position of the viewpoint is static) or the initial position of viewpoint (when the position of the viewpoint is dynamic), in units of 10^-1 millimeters, in 3D space, relative to the common reference coordinate system. If a viewpoint is associated with a timed metadata track with sample entry type 'dyvp', the position of the viewpoint is dynamic. Otherwise, the position of the viewpoint is static. In the former case, the dynamic position of the viewpoint is signalled in the associated timed metadata track with sample entry type 'dyvp'.
viewpoint_gpspos_present_flag equal to 1 indicates that viewpoint_gpspos_longitude, viewpoint_gpspos_latitude, and viewpoint_gpspos_altitude are present. viewpoint_gpspos_present_flag equal to 0 indicates that viewpoint_gpspos_longitude, viewpoint_gpspos_latitude, and viewpoint_gpspos_altitude are not present.
viewpoint_gpspos_longitude indicates the longitude of the geolocation of the viewpoint in units of 2^-23 degrees. viewpoint_gpspos_longitude shall be in range of -180 * 2²³ to 180 * 2²³ - 1, inclusive. Positive values represent eastern longitude and negative values represent western longitude.
viewpoint_gpspos_latitude indicates the latitude of the geolocation of the viewpoint in units of 2^-23 degrees. viewpoint_gpspos_latitude shall be in range of -90 * 2²³ to 90 * 2²³ - 1, inclusive. Positive value represents northern latitude and negative value represents southern latitude.
viewpoint_gpspos_altitude indicates the altitude of the geolocation of the viewpoint in units of milimeters above the WGS 84 reference ellipsoid as specified in the EPSG:4326 database.
Viewpoint_gpspos_present_flag shall be equal to 1 for at least one viewpoint in a viewpoint group to enable location aware VR content. viewpoint_geomagnetic_yaw, viewpoint_geomagnetic_pitch, and viewpoint_geomagnetic_roll specify the yaw, pitch, and roll angles, respectively, of the rotation angles of X, Y, Z axes of the common reference coordinate system relative to the geomagnetic North direction, in units of 2^-16 degrees. viewpoint_geomagnetic_yaw shall be in the range of -180 * 2¹⁶ to 180 *2¹⁶ - 1, inclusive. viewpoint_geomagnetic_pitch shall be in the range of -90 * 2¹⁶ to 90 * 2¹⁶, inclusive. viewpoint_geomagnetic_roll shall be in the range of -180 * 2¹⁶ to 180 * 2¹⁶ - 1, inclusive.

NOTE 1: An OMAF player is expected to start with the initial viewpoint timed metadata as defined below. Subsequently, if the user wishes to switch to a viewpoint group and the initial viewpoint information is not present, the OMAF player is expected to switch to the viewpoint with the least value of the viewpoint identifier in the viewpoint group.

It should be noted that W18227 supports having viewpoints with geolocation coordinates. This enables locating the viewpoints belonging to a viewpoint group, using the ViewpointPosStruct(). However, the current specification is not optimal, for example, for tourism or any other use case which can benefit from aligning content with real-world points of interest. Following is a brief use case description which may benefit from aligning content with real-world points of interest.

A user initiates watching a VR scene presenting the points of interest in his vicinity as different viewpoints. When he puts on the Augmented Reality/Virtual Reality (AR/VR) Head Mounted Display (HMD) or uses his mobile device display, the viewpoints in VR content are spatially aligned with the real-world points of interest. This facilitates a coherent experience while moving around the town and subsequently, when putting on the HMD or watching with a mobile device for the viewpoint related to the Point of Interest (POI). Thus, viewpoints aligned with points of interest makes exploring a new location easier. There is a need to align the common reference coordinate system with a reference direction to enable use cases such as the above described ones. Facilitating such alignment will enable an OMAF player to view the VR content aligned with real-world geolocation coordinates. The reference direction should be detectable to the OMAF player. Geomagnetic North is one such reference direction due to the availability of magnetic compass in most mobile devices.

The following describes optionally signaling the offset between the common reference coordinate system for a group of viewpoints and the Geomagnetic North. Subsequently, additional constraints for the content creation and the player requirements to support location aware VR content are presented.

In one example, according to the techniques herein, data encapsulator 107 may be configured to signal viewpoint information structures having the following definition, syntax and semantics:

Definition
The ViewpointPosStruct(), ViewpointGlobalCoordinateSysRotationStruct(), and ViewpointGroupStruct() provide information of a viewpoint, including the position of the viewpoint and the yaw, pitch, and roll rotation angles of X, Y, and Z axes, respectively, of the global coordinate system of the viewpoint relative to the common reference coordinate system, and viewpoint group information.

Syntax

Semantics
viewpoint_pos_x, viewpoint_pos_y, and viewpoint_pos_z specify the position of the viewpoint (when the position of the viewpoint is static) or the initial position of viewpoint (when the position of the viewpoint is dynamic), in units of 10^-1 millimeters, in 3D space, relative to the common reference coordinate system. If a viewpoint is associated with a timed metadata track with sample entry type 'dyvp', the position of the viewpoint is dynamic. Otherwise, the position of the viewpoint is static. In the former case, the dynamic position of the viewpoint is signalled in the associated timed metadata track with sample entry type 'dyvp'.
viewpoint_gpspos_present_flag equal to 1 indicates that viewpoint_gpspos_longitude, viewpoint_gpspos_latitude, and viewpoint_gpspos_altitude are present. viewpoint_gpspos_present_flag equal to 0 indicates that viewpoint_gpspos_longitude, viewpoint_gpspos_latitude, and viewpoint_gpspos_altitude are not present.
viewpoint_geomagnetic_info_present_flag equal to 1 indicates that viewpoint_geomagnetic_yaw, viewpoint_geomagentic_pitch, and viewpoint_geomagentic_roll are present. viewpoint_geomagentic_info_present_flag equal to 0 indicates that viewpoint_geomagnetic_yaw, viewpoint_geomagentic_pitch, and viewpoint_geomagentic_roll are not present.
viewpoint_geomagnetic_info_present_flag shall be equal to 1 for at most one viewpoint in a viewpoint group to enable location aware VR content.
viewpoint_gpspos_longitude indicates the longitude of the geolocation of the viewpoint in units of 2^-23 degrees. viewpoint_gpspos_longitude shall be in range of -180 * 2²³ to 180 * 2²³ - 1, inclusive. Positive values represent eastern longitude and negative values represent western longitude.
viewpoint_gpspos_latitude indicates the latitude of the geolocation of the viewpoint in units of 2^-23 degrees. viewpoint_gpspos_latitude shall be in range of -90 * 2²³ to 90 * 2²³ - 1, inclusive. Positive value represents northern latitude and negative value represents southern latitude.
viewpoint_gpspos_altitude indicates the altitude of the geolocation of the viewpoint in units of milimeters above the WGS 84 reference ellipsoid as specified in the EPSG:4326 database.
viewpoint_geomagnetic_yaw, viewpoint_geomagnetic_pitch, and viewpoint_geomagnetic_roll specify the yaw, pitch, and roll angles, respectively, of the rotation angles of X, Y, Z axes of the common reference coordinate system relative to the geomagnetic North direction, in units of 2^-16 degrees. viewpoint_geomagnetic_yaw shall be in the range of -180 * 2¹⁶ to 180 *2¹⁶ - 1, inclusive. viewpoint_geomagnetic_pitch shall be in the range of -90 * 2¹⁶ to 90 * 2¹⁶, inclusive. viewpoint_geomagnetic_roll shall be in the range of -180 * 2¹⁶ to 180 * 2¹⁶ - 1, inclusive.

NOTE 1: An OMAF player is expected to start with the initial viewpoint timed metadata as defined below. Subsequently, if the user wishes to switch to a viewpoint group and the initial viewpoint information is not present, the OMAF player is expected to switch to the viewpoint with the least value of the viewpoint identifier in the viewpoint group.

NOTE 2: In order to successfully locate the viewpoints, an OMAF player is expected to have access to geolocation tracking and magnetometer. This enables the player to align the common reference coordinate system with the geolocation coordinates and find the player device position with respect to the geolocation coordinates.

In a variant example, the order of the flags may be interchanged as follows, where the flag viewpoint_geomagnetic_info_present_flag is sent before the flag
viewpoint_gpspos_present_flag.

Syntax

As described above, Wang, W18227, and additionally, m47789 describe viewpoint information structures. ISO/IEC JTC1/SC29/WG11 N18393 “WD 5 of ISO/IEC 23090-2 OMAF 2nd edition,” April 2019, Geneva, CH, which is incorporated by reference and herein referred to as N18393, is an update to Wang and describes viewpoint information structures having the following definition, syntax and semantics:

Definition
The ViewpointPosStruct(), ViewpointGlobalCoordinateSysRotationStruct(), and ViewpointGroupStruct() provide information of a viewpoint, including the position of the viewpoint, and the yaw, pitch, and roll rotation angles of X, Y, and Z axes, respectively, of the global coordinate system of the viewpoint relative to the common reference coordinate system, and viewpoint group information.

In order to successfully locate the viewpoints, an OMAF player is expected to have access to geolocation tracking and magnetometer. This enables the player to align the common reference coordinate system with the geolocation coordinates and find the player device position with respect to the geolocation coordinates.
Syntax

Semantics
viewpoint_pos_x, viewpoint_pos_y, and viewpoint_pos_z specify the position of the viewpoint (when the position of the viewpoint is static) or the initial position of viewpoint (when the position of the viewpoint is dynamic), in units of 10^-1 millimeters, in 3D space, relative to the common reference coordinate system. If a viewpoint is associated with a timed metadata track with sample entry type 'dyvp', the position of the viewpoint is dynamic. Otherwise, the position of the viewpoint is static. In the former case, the dynamic position of the viewpoint is signalled in the associated timed metadata track with sample entry type 'dyvp'.

viewpoint_gpspos_present_flag equal to 1 indicates that viewpoint_gpspos_longitude, viewpoint_gpspos_latitude, and viewpoint_gpspos_altitude are present. viewpoint_gpspos_present_flag equal to 0 indicates that viewpoint_gpspos_longitude, viewpoint_gpspos_latitude, and viewpoint_gpspos_altitude are not present.

viewpoint_geomagnetic_info_present_flag equal to 1 indicates that viewpoint_geomagnetic_yaw, viewpoint_geomagnetic_pitch, and viewpoint_geomagnetic_roll are present. viewpoint_geomagnetic_info_present_flag equal to 0 indicates that viewpoint_geomagnetic_yaw, viewpoint_geomagnetic_pitch, and viewpoint_geomagnetic_roll are not present. viewpoint_geomagnetic_info_present_flag shall be equal to 1 for at most one viewpoint in viewpoint group to enable location aware VR content.

viewpoint_gpspos_longitude indicates the longitude of the geolocation of the viewpoint in units of 2^-23 degrees. viewpoint_gpspos_longitude shall be in range of -180 * 2²³ to 180 * 2²³ - 1, inclusive. Positive values represent eastern longitude and negative values represent western longitude.

viewpoint_gpspos_latitude indicates the latitude of the geolocation of the viewpoint in units of 2^-23 degrees. viewpoint_gpspos_latitude shall be in range of -90 * 2²³ to 90 * 2²³ - 1, inclusive. Positive value represents northern latitude and negative value represents southern latitude.

viewpoint_gpspos_altitude indicates the altitude of the geolocation of the viewpoint in units of milimeters above the WGS 84 reference ellipsoid as specified in the EPSG:4326 database.

viewpoint_geomagnetic_yaw, viewpoint_geomagnetic_pitch, and viewpoint_geomagnetic_roll specify the yaw, pitch, and roll angles, respectively, of the rotation angles of X, Y, Z axes of the common reference coordinate system relative to the geomagnetic North direction, in units of 2^-16degrees. viewpoint_geomagnetic_yaw shall be in the range of -180 * 2¹⁶ to 180 *2¹⁶ - 1, inclusive. viewpoint_geomagnetic_pitch shall be in the range of -90 * 2¹⁶ to 90 * 2¹⁶, inclusive. viewpoint_geomagnetic_roll shall be in the range of -180 * 2¹⁶ to 180 * 2¹⁶ - 1, inclusive.
viewpoint_gcs_yaw, viewpoint_gcs_pitch, and viewpoint_gcs_roll specify the yaw, pitch, and roll angles, respectively, of the rotation angles of X, Y, Z axes of the global coordinate system of the viewpoint relative to the common reference coordinate system, in units of 2^-16 degrees. viewpoint_gcs_yaw shall be in the range of -180 * 2¹⁶ to 180 *2¹⁶ - 1, inclusive. viewpoint_gcs_pitch shall be in the range of -90 * 2¹⁶ to 90 * 2¹⁶, inclusive. viewpoint_gcs_roll shall be in the range of -180 * 2¹⁶ to 180 * 2¹⁶ - 1, inclusive.

vwpt_group_id indicates the identifier of a viewpoint group. All viewpoints in a viewpoint group share a common reference coordinate system.

NOTE 1: When two viewpoints have different values of vwpt_group_id, their position coordinates are not comparable, because the viewpoints belong to different coordinate systems.
vwpt_group_description is a null-terminated UTF-8 string which indicates the description of a viewpoint group. A null string is allowed.

NOTE 2: An OMAF player is expected to start with the initial viewpoint timed metadata as defined in the Initial viewpoint clause. Subsequently, if the user wishes to switch to a viewpoint group and the initial viewpoint information is not present, the OMAF player is expected to switch to the viewpoint with the least value of the viewpoint identifier in the viewpoint group.

As described above, viewpoint information may be dynamic and Wang specifies timed metadata tracks conveying viewpoint information. In Wang, timed metadata may be signaled based on a sample entry and a sample format. N18393 similarly provides a dynamic viewpoint timed metadata track, an initial viewpoint timed metadata track, and an object centre points correspondence between viewpoints timed metadata track as provided in Wang. Thus, the sample entry structure and sample format structure for the dynamic viewpoint timed metadata track as specified above with respect to Wang is the same as the sample entry structure and sample format structure for the dynamic viewpoint timed metadata track specified N18393 and for the sake of brevity is not repeated herein.

N18393 further includes OMAF-specific extensions to the ISOBMFF for viewpoint track grouping which are specified as follows:

Sync samples in timed metadata tracks
For timed metadata tracks specified in this document, a sample in a timed metadata track is defined as a sync sample if and only if at least one of media samples in the referenced media tracks having the same decoding time is a sync sample.

Viewpoint track grouping
Tracks belonging to the same viewpoint have the same value of track_group_id for track_group_type 'vipo', and the track_group_id of tracks from one viewpoint differs from the track_group_id of tracks from any other viewpoint.
By default, when this track grouping is not indicated for any track in a file, the file is considered containing content for one viewpoint only.

Syntax

Semantics
Tracks that have the same value of track_group_id within TrackGroupTypeBox with track_group_type equal to ‘vipo’ belong to the same viewpoint. The track_group_id within TrackGroupTypeBox with track_group_type equal to ‘vipo’ is therefore used as the identifier of the viewpoint.
viewpoint_label is a null-terminated UTF-8 string that provides a human readable text label for the viewpoint.
ViewpointPosStruct(), ViewpointGroupStruct() and ViewpointGlobalCoordinateSysRotationStruct() are defined as specified above.

The ViewpointPosStruct() provided in N18393 is less than ideal. In particular, signaling may be improved if information regarding GPS position and Geomagentic position for a viewpoint are included in separate structures. According to the techniques herein, information regarding GPS position and Geomagentic position for a viewpoint may be included in separate structures and individual flags may control the presence of this GPS position information and Geomagentic position information, separately from (X, Y, Z) position information. Further, example modifications are provided for Viewpoint track group information signaling and dynamic viewpoint information signaling to support viewpoint information structures described herein. These modifications, in some examples, include controlling presence or absence of the GPS position information and/or Geomagentic position information and/or (X, Y, Z) position information, separately. Further, an example flag is described to conditionally include viewpoint group information in a structure and when viewpoint group information is not included in a structure, an inference rule is defined for viewpoint group related syntax elements.

In one example, according to the techniques herein, data encapsulator 107 may be configured to signal viewpoint information structures having the following definition, syntax and semantics:

Definition
The ViewpointPosStruct(), ViewPointGpsPositionStruct(), ViewpointGeoMagenticInfoStruct(), ViewpointGlobalCoordinateSysRotationStruct(), and ViewpointGroupStruct() provide information of a viewpoint, including the (X, Y, Z) position of the viewpoint, GPS position of the viewpoint, geomagentic position information for the viewpoint and the yaw, pitch, and roll rotation angles of X, Y, and Z axes, respectively, of the global coordinate system of the viewpoint relative to the common reference coordinate system, and viewpoint group information.

In order to successfully locate the viewpoints, an OMAF player is expected to have access to geolocation tracking and magnetometer. This enables the player to align the common reference coordinate system with the geolocation coordinates and find the player device position with respect to the geolocation coordinates.

Syntax

Semantics
viewpoint_pos_x, viewpoint_pos_y, and viewpoint_pos_z specify the position of the viewpoint (when the position of the viewpoint is static) or the initial position of viewpoint (when the position of the viewpoint is dynamic), in units of 10^-1 millimeters, in 3D space, relative to the common reference coordinate system. If a viewpoint is associated with a timed metadata track with sample entry type 'dyvp', the position of the viewpoint is dynamic. Otherwise, the position of the viewpoint is static. In the former case, the dynamic position of the viewpoint is signalled in the associated timed metadata track with sample entry type 'dyvp'.

viewpoint_gpspos_longitude indicates the longitude of the geolocation of the viewpoint in units of 2^-23 degrees. viewpoint_gpspos_longitude shall be in range of -180 * 2²³ to 180 * 2²³ - 1, inclusive. Positive values represent eastern longitude and negative values represent western longitude.

viewpoint_gpspos_latitude indicates the latitude of the geolocation of the viewpoint in units of 2^-23 degrees. viewpoint_gpspos_latitude shall be in range of -90 * 2²³ to 90 * 2²³ - 1, inclusive. Positive value represents northern latitude and negative value represents southern latitude.

viewpoint_gpspos_altitude indicates the altitude of the geolocation of the viewpoint in units of milimeters above the WGS 84 reference ellipsoid as specified in the EPSG:4326 database.

viewpoint_geomagnetic_yaw, viewpoint_geomagnetic_pitch, and viewpoint_geomagnetic_roll specify the yaw, pitch, and roll angles, respectively, of the rotation angles of X, Y, Z axes of the common reference coordinate system relative to the geomagnetic North direction, in units of 2^-16degrees. viewpoint_geomagnetic_yaw shall be in the range of -180 * 2¹⁶ to 180 *2¹⁶ - 1, inclusive. viewpoint_geomagnetic_pitch shall be in the range of -90 * 2¹⁶ to 90 * 2¹⁶, inclusive. viewpoint_geomagnetic_roll shall be in the range of -180 * 2¹⁶ to 180 * 2¹⁶ - 1, inclusive.
viewpoint_gcs_yaw, viewpoint_gcs_pitch, and viewpoint_gcs_roll specify the yaw, pitch, and roll angles, respectively, of the rotation angles of X, Y, Z axes of the global coordinate system of the viewpoint relative to the common reference coordinate system, in units of 2^-16 degrees. viewpoint_gcs_yaw shall be in the range of -180 * 2¹⁶ to 180 *2¹⁶ - 1, inclusive. viewpoint_gcs_pitch shall be in the range of -90 * 2¹⁶ to 90 * 2¹⁶, inclusive. viewpoint_gcs_roll shall be in the range of -180 * 2¹⁶ to 180 * 2¹⁶ - 1, inclusive.

vwpt_group_id indicates the identifier of a viewpoint group. All viewpoints in a viewpoint group share a common reference coordinate system.

NOTE 1: When two viewpoints have different values of vwpt_group_id, their position coordinates are not comparable, because the viewpoints belong to different coordinate systems.
vwpt_group_description is a null-terminated UTF-8 string which indicates the description of a viewpoint group. A null string is allowed.

NOTE 2: An OMAF player is expected to start with the initial viewpoint timed metadata as defined in the Initial viewpoint clause. Subsequently, if the user wishes to switch to a viewpoint group and the initial viewpoint information is not present, the OMAF player is expected to switch to the viewpoint with the least value of the viewpoint identifier in the viewpoint group.

In one example, according to the techniques herein, data encapsulator 107 may be configured to signal viewpoint information structures having the following definition, syntax and semantics. It should be noted that in the example below, a flag controls the presence or absence of viewpoint GPS altitude information. In some cases, the GPS altitude information is not available or is not accurate (especially for low-cost GPS receivers and for consumer devices). As a result, it is useful to allow not signaling the GPS altitude information while signaling the GPS latitude and longitude information.

Definition
The ViewpointPosStruct(), ViewPointGpsPositionStruct(), ViewpointGeoMagenticInfoStruct(), ViewpointGlobalCoordinateSysRotationStruct(),ViewpointGroupStruct(), and ViewpointSwitchingStruct() provide information of a viewpoint, including the (X, Y, Z) position of the viewpoint, GPS position of the viewpoint, geomagentic position information for the viewpoint and the yaw, pitch, and roll rotation angles of X, Y, and Z axes, respectively, of the global coordinate system of the viewpoint relative to the common reference coordinate system, and viewpoint group information.

In order to successfully locate the viewpoints, an OMAF player is expected to have access to geolocation tracking and magnetometer. This enables the player to align the common reference coordinate system with the geolocation coordinates and find the player device position with respect to the geolocation coordinates.

Syntax

Semantics
viewpoint_pos_x, viewpoint_pos_y, and viewpoint_pos_z specify the position of the viewpoint (when the position of the viewpoint is static) or the initial position of viewpoint (when the position of the viewpoint is dynamic), in units of 10^-1 millimeters, in 3D space, relative to the common reference coordinate system. If a viewpoint is associated with a timed metadata track with sample entry type 'dyvp', the position of the viewpoint is dynamic or viewpoint switching information is changing over time. Otherwise, the position of the viewpoint is static. In the former case, the dynamic position of the viewpoint is signalled in the associated timed metadata track with sample entry type 'dyvp'.

viewpoint_gpspos_altitude_present_flag equal to 1 indicates that the viewpoint_gpspos_altitude syntax element is present. viewpoint_gpspos_altitude_present_flag equal to 0 indicates that the viewpoint_gpspos_altitude syntax element is not present. When viewpoint_gpspos_altitude_present_flag is not present it is inferred to be equal to 0.

viewpoint_gpspos_longitude indicates the longitude of the geolocation of the viewpoint in units of 2^-23 degrees. viewpoint_gpspos_longitude shall be in range of -180 * 2²³ to 180 * 2²³ - 1, inclusive. Positive values represent eastern longitude and negative values represent western longitude.

viewpoint_gpspos_latitude indicates the latitude of the geolocation of the viewpoint in units of 2^-23 degrees. viewpoint_gpspos_latitude shall be in range of -90 * 2²³ to 90 * 2²³ - 1, inclusive. Positive value represents northern latitude and negative value represents southern latitude.

viewpoint_gpspos_altitude indicates the altitude of the geolocation of the viewpoint in units of milimeters above the WGS 84 reference ellipsoid as specified in the EPSG:4326 database. When viewpoint_gpspos_altitude is not present it is inferred to be equal to 0 or unspecified or unknown.

viewpoint_geomagnetic_yaw, viewpoint_geomagnetic_pitch, and viewpoint_geomagnetic_roll specify the yaw, pitch, and roll angles, respectively, of the rotation angles of X, Y, Z axes of the common reference coordinate system relative to the geomagnetic North direction, in units of 2^-16degrees. viewpoint_geomagnetic_yaw shall be in the range of -180 * 2¹⁶ to 180 *2¹⁶ - 1, inclusive. viewpoint_geomagnetic_pitch shall be in the range of -90 * 2¹⁶ to 90 * 2¹⁶, inclusive. viewpoint_geomagnetic_roll shall be in the range of -180 * 2¹⁶ to 180 * 2¹⁶ - 1, inclusive.

viewpoint_gcs_yaw, viewpoint_gcs_pitch, and viewpoint_gcs_roll specify the yaw, pitch, and roll angles, respectively, of the rotation angles of X, Y, Z axes of the global coordinate system of the viewpoint relative to the common reference coordinate system, in units of 2^-16 degrees. viewpoint_gcs_yaw shall be in the range of -180 * 2¹⁶ to 180 *2¹⁶ - 1, inclusive. viewpoint_gcs_pitch shall be in the range of -90 * 2¹⁶ to 90 * 2¹⁶, inclusive. viewpoint_gcs_roll shall be in the range of -180 * 2¹⁶ to 180 * 2¹⁶ - 1, inclusive.

vwpt_group_id indicates the identifier of a viewpoint group. All viewpoints in a viewpoint group share a common reference coordinate system.

NOTE 1: When two viewpoints have different values of vwpt_group_id, their position coordinates are not comparable, because the viewpoints belong to different coordinate systems.

vwpt_group_description is a null-terminated UTF-8 string which indicates the description of a viewpoint group. A null string is allowed.

NOTE 2: An OMAF player is expected to start with the initial viewpoint timed metadata as defined in the Initial viewpoint clause. Subsequently, if the user wishes to switch to a viewpoint group and the initial viewpoint information is not present, the OMAF player is expected to switch to the viewpoint with the least value of the viewpoint identifier in the viewpoint group.

In one example, according to the techniques herein, data encapsulator 107 may be configured to signal viewpoint track grouping according to the following definition, syntax and semantics:

Definition
Tracks belonging to the same viewpoint have the same value of track_group_id for track_group_type 'vipo', and the track_group_id of tracks from one viewpoint differs from the track_group_id of tracks from any other viewpoint.
By default, when this track grouping is not indicated for any track in a file, the file is considered containing content for one viewpoint only.

Syntax

Semantics
Tracks that have the same value of track_group_id within TrackGroupTypeBox with track_group_type equal to ‘vipo’ belong to the same viewpoint. The track_group_id within TrackGroupTypeBox with track_group_type equal to ‘vipo’ is therefore used as the identifier of the viewpoint.

viewpoint_label is a null-terminated UTF-8 string that provides a human readable text label for the viewpoint.

viewpoint_gpspos_present_flag equal to 1 indicates that viewpoint_gpspos_longitude, viewpoint_gpspos_latitude, and viewpoint_gpspos_altitude are present. viewpoint_gpspos_present_flag equal to 0 indicates that viewpoint_gpspos_longitude, viewpoint_gpspos_latitude, and viewpoint_gpspos_altitude are not present.

viewpoint_geomagnetic_info_present_flag equal to 1 indicates that viewpoint_geomagnetic_yaw, viewpoint_geomagnetic_pitch, and viewpoint_geomagnetic_roll are present. viewpoint_geomagnetic_info_present_flag equal to 0 indicates that viewpoint_geomagnetic_yaw, viewpoint_geomagnetic_pitch, and viewpoint_geomagnetic_roll are not present. viewpoint_geomagnetic_info_present_flag shall be equal to 1 for at most one viewpoint in viewpoint group to enable location aware VR content.

ViewpointPosStruct(), ViewpointGpsPositionStruct(), ViewpointGeomagenticInfoStruct(), ViewpointGroupStruct() and ViewpointGlobalCoordinateSysRotationStruct() are defined as specified above.

In another example, viewpoint_gpspos_present_flag and viewpoint_geomagnetic_info_present_flag may have the following semantics:

viewpoint_gpspos_present_flag equal to 1 indicates that ViewpointGpsPositionStruct() is present. viewpoint_gpspos_present_flag equal to 0 indicates that ViewpointGpsPositionStruct() is not present.

viewpoint_geomagnetic_info_present_flag equal to 1 indicates that ViewpointGeomagenticInfoStruct() is present. viewpoint_geomagnetic_info_present_flag equal to 0 indicates that ViewpointGeomagenticInfoStruct() is not present. viewpoint_geomagnetic_info_present_flag shall be equal to 1 for at most one viewpoint in viewpoint group to enable location aware VR content.

In another example, according to the techniques herein, data encapsulator 107 may be configured to signal viewpoint track grouping according to the following definition, syntax and semantics, where TrackGroupTypeBox flags are used for specifying presence or absence of ViewpointGpsPositionStruct(), ViewpointGeomagenticInfoStruct(), which may save bits.

Syntax

Semantics
Tracks that have the same value of track_group_id within TrackGroupTypeBox with track_group_type equal to ‘vipo’ belong to the same viewpoint. The track_group_id within TrackGroupTypeBox with track_group_type equal to ‘vipo’ is therefore used as the identifier of the viewpoint.

viewpoint_label is a null-terminated UTF-8 string that provides a human readable text label for the viewpoint.

Bit 0 of the flags (with bit 0 being the least significant bit) of the TrackGroupTypeBox is used to indicate the presence or absence of ViewpointGpsPositionStruct(). The semantics of this flag is specified as follows:
(flags & 0x000001) equal to 1 in a TrackGroupTypeBox of a particular track_group_type specifies that ViewpointGpsPositionStruct() is present. (flags & 1) equal to 0 in a TrackGroupTypeBox of a particular track_group_type specifies that ViewpointGpsPositionStruct() is not present.

Bit 1 of the flags (with bit 0 being the least significant bit) of the TrackGroupTypeBox is used to indicate the presence or absence of ViewpointGeomagneticInfoStruct (). The semantics of this flag is specified as follows:
(flags & 0x000002) equal to 1 in a TrackGroupTypeBox of a particular track_group_type specifies that ViewpointGeomagneticInfoStruct() is present. (flags & 1) equal to 0 in a TrackGroupTypeBox of a particular track_group_type specifies that ViewpointGeomagneticInfoStruct() is not present.

ViewpointPosStruct(), ViewpointGpsPositionStruct(), ViewpointGeomagenticInfoStruct(), ViewpointGroupStruct() and ViewpointGlobalCoordinateSysRotationStruct() are defined as specified above.

In another example, instead of two least significant bits some other bits from the flags may be used to conditionally indicate the inclusion of ViewpointGpsPositionStruct() and/or ViewpointGeomagneticInfoStruct().

In one example, according to the techniques herein, data encapsulator 107 may be configured to signal dynamic viewpoint information where the sample entry structure and sample format structure for the dynamic viewpoint timed metadata track is specified as follows:

General
The dynamic viewpoint timed metadata track indicates the viewpoint parameters that are dynamically changing over time.
An OMAF player should use the signalled information as follows when starting playing back of one viewpoint after switching from another viewpoint:
- If there is a recommended viewing orientation explicitly signalled, the OMAF player is expected to parse this information and follow the recommended viewing orientation.
- Otherwise, the OMAF player is expected to keep the same viewing orientation as in the switching-from viewpoint just before the switching occurs.

Sample entry
The track sample entry type 'dyvp' shall be used. The sample entry of this sample entry type is specified as follows:

ViewpointPosStruct() is defined as above but indicates the initial viewpoint position.
dynamic_gcs_rotated_flag equal to 0 specifies that the yaw, pitch, and roll rotation angles of X, Y, and Z axes, respectively, of the global coordinate system of the viewpoint relative to the common reference coordinate system remain unchanged in all samples referring to this sample entry. dynamic_gcs_rotated_flag equal to 1 specifies that the yaw, pitch, and roll rotation angles of X, Y, and Z axes, respectively, of the global coordinate system of the viewpoint relative to the common reference coordinate system are indicated in the samples.
dynamic_vwpt_group_flag equal to 0 specifies that the vwpt_group_flag and ViewpointGroupStruct() are not present in the samples and the viewpoint group information (vwpt_group_id and vwpt_group_description) signalled in ViewpointGroupStruct() in the ViewpointTrackGroupBox applies to each sample referring to this sample entry. dynamic_vwpt_group_flag equal to 1 specifies that the ViewpointGroupStruct() for the viewpoint may change compared to that signalled in the ViewpointGroupStruct() in the ViewpointTrackGroupBox and new ViewpointGroupStruct() may be signalled in the samples based on the value of vwpt_group_flag in the samples.
dynamic_vwpt_gps_flag equal to 0 specifies that the GPS position information of the viewpoint remains unchanged or is not updated in all samples referring to this sample entry. dynamic_vwpt_gps_flag equal to 1 specifies that the GPS position information of the viewpoint is indicated in the samples.
vwpt_geomagnetic_info_present_flag equal to 1 specifies that ViewpointGeomagenticInfoStruct() is present in the sample entry. viewpoint_geomagnetic_info_present_flag equal to 0 specifies that ViewpointGeomagenticInfoStruct() is not present in the sample entry. In one example, vwpt_geomagnetic_info_present_flag shall be to 1 for at most one viewpoint in a viewpoint group to enable location aware VR content.
ViewpointGlobalCoordinateSysRotationStruct() is defined in clause above but indicates the yaw, pitch, and roll rotation angles of X, Y, and Z axes, respectively, of the global coordinate system of the viewpoint relative to the common reference coordinate system for each sample referring to this sample entry.
ViewpointGpsPositionStruct() is defined above but indicates the GPS position information of the viewpoint for each sample referring to this sample entry.
ViewpointGeomagenticInfoStruct() is defined above but indicates the geomagentic position information of the viewpoint for each sample referring to this sample entry.
Sample format
The sample syntax of this sample entry type ('dyvp') is specified as follows:

The semantics of ViewpointPosStruct(), ViewpointGpsPositionStruct(), ViewpointGlobalCoordinateSysRotationStruct(), and ViewpointGroupStruct() are specified above.
vwpt_group_flag equal to 1 specifies that the ViewpointGroupStruct() is present. vwpt_group_flag equal to 0 specifies that the ViewpointGroupStruct() is not present. When not present, the value of vwpt_group_flag is inferred to be equal to 0.
When dynamic_vwpt_group_flag is equal to 1, the first sample shall have vwpt_group_flag equal to 1. For subsequent samples when the viewpoint group information does not change, the ViewpointGroupStruct() can be absent. When the ViewpointGrouptStruct() is absent in a sample, the following applies:
- If dynamic_vwpt_group_flag is equal to 1, it is inferred to be identical to the ViewpointGroupStruct() of the previous sample, in decoding order.
- Otherwise (dynamic_vwpt_group_flag is equal to 0), it is inferred to be identical to the ViewpointGroupStruct() in the ViewpointTrackGroupBox.

It should be noted that in N18393 each viewpoint track group box requires an inclusion of a ViewpointGroupStruct(). It may be useful to have the inclusion of ViewpointGroupStruct() based on a flag. This would enable applications including only one viewpoint group to optionally omit this information. In one example, according to the techniques herein, data encapsulator 107 may be configured to signal viewpoint track grouping information according to the following syntax and semantics:

Viewpoint track grouping
Tracks belonging to the same viewpoint have the same value of track_group_id for track_group_type 'vipo', and the track_group_id of tracks from one viewpoint differs from the track_group_id of tracks from any other viewpoint.
By default, when this track grouping is not indicated for any track in a file, the file is considered containing content for one viewpoint only.

Syntax

Semantics
Tracks that have the same value of track_group_id within TrackGroupTypeBox with track_group_type equal to ‘vipo’ belong to the same viewpoint. The track_group_id within TrackGroupTypeBox with track_group_type equal to ‘vipo’ is therefore used as the identifier of the viewpoint.

viewpoint_label is a null-terminated UTF-8 string that provides a human readable text label for the viewpoint.

Bit 0 of the flags (with bit 0 being the least significant bit) of the TrackGroupTypeBox is used to indicate the presence or absence of ViewpointGroupStruct(). The semantics of the flag is specified as follows:
(flags & 0x000001) equal to 1 in a TrackGroupTypeBox of a particular track_group_type specifies that ViewpointGroupStruct() is present. (flags & 1) equal to 0 in a TrackGroupTypeBox of a particular track_group_type specifies that ViewpointGroupStruct() is not present. When ViewpointGroupStruct() is not present :
o vwpt_group_id is inferred to be equal to 0.
o vwpt_group_description is inferred to be equal to null.
In this case all the viewpoints belong to a common “default” viewpoint group.

ViewpointPosStruct(), ViewpointGroupStruct() and ViewpointGlobalCoordinateSysRotationStruct() are defined as specified above.

In another example, instead of the least significant bit some other bit from the flags may be used to conditionally indicate the inclusion of ViewpointGroupStruct().

In a variant example, instead of using the flags bit to control presence or absence of ViewpointGroupStruct(), a flag may be included inside the ViewpointTrackGroupBox as follows:

Syntax

Semantics
Tracks that have the same value of track_group_id within TrackGroupTypeBox with track_group_type equal to ‘vipo’ belong to the same viewpoint. The track_group_id within TrackGroupTypeBox with track_group_type equal to ‘vipo’ is therefore used as the identifier of the viewpoint.

viewpoint_label is a null-terminated UTF-8 string that provides a human readable text label for the viewpoint.

Viewpoint_group_presence_flag equal to 1 specifies that ViewpointGroupStruct() is present. Viewpoint_group_presence_flag equal to 0 specifies that ViewpointGroupStruct() is not present. When ViewpointGroupStruct() is not present :
vwpt_group_id is inferred to be equal to 0.
vwpt_group_description is inferred to be equal to null.

ViewpointPosStruct(), ViewpointGroupStruct() and ViewpointGlobalCoordinateSysRotationStruct() are defined as specified above.

As described above, Wang and N18393 specify encapsulation, signaling, and streaming of omnidirectional media in a media streaming system. In particular, Wang and N18393 specify how to encapsulate, signal, and stream omnidirectional media using dynamic adaptive streaming over Hypertext Transfer Protocol (HTTP) (DASH). DASH is described in ISO/IEC: ISO/IEC 23009-1:2014, “Information technology - Dynamic adaptive streaming over HTTP (DASH) - Part 1: Media presentation description and segment formats,” International Organization for Standardization, 2nd Edition, 5/15/2014 (hereinafter, “ISO/IEC 23009-1:2014”), which is incorporated by reference herein. A DASH media presentation may include data segments, video segments, and audio segments. In some examples, a DASH Media Presentation may correspond to a linear service or part of a linear service of a given duration defined by a service provider (e.g., a single TV program, or the set of contiguous linear TV programs over a period of time). According to DASH, a Media Presentation Description (MPD) is a document that includes metadata required by a DASH Client to construct appropriate HTTP-URLs to access segments and to provide the streaming service to the user. A MPD document fragment may include a set of eXtensible Markup Language (XML)-encoded metadata fragments. The contents of the MPD provide the resource identifiers for segments and the context for the identified resources within the Media Presentation. The data structure and semantics of the MPD fragment are described with respect to ISO/IEC 23009-1:2014. Further, it should be noted that draft editions of ISO/IEC 23009-1 are currently being proposed. Thus, as used herein, a MPD may include a MPD as described in ISO/IEC 23009-1:2014, currently proposed MPDs, and/or combinations thereof. In ISO/IEC 23009-1:2014, a media presentation as described in a MPD may include a sequence of one or more Periods, where each Period may include one or more Adaptation Sets. It should be noted that in the case where an Adaptation Set includes multiple media content components, then each media content component may be described individually. Each Adaptation Set may include one or more Representations. In ISO/IEC 23009-1:2014 each Representation is provided: (1) as a single Segment, where Subsegments are aligned across Representations with an Adaptation Set; and (2) as a sequence of Segments where each Segment is addressable by a template-generated Universal Resource Locator (URL). The properties of each media content component may be described by an AdaptationSet element and/or elements within an Adaption Set, including for example, a ContentComponent element. It should be noted that the sphere region structure forms the basis of DASH descriptor signaling for various descriptors.

Wang and N18393 provide definitions for a number of XML elements and attributes. These XML elements are defined in a separate namespace "urn:mpeg:mpegI:omaf:2018". These are defined in normative schema documents in each clause of Wang and N18393 where a new MPD descriptor(s), element(s) or attribute(s) are specified. The namespace designator "xs:" shall correspond to namespace http://www.w3.org/2001/XMLSchema as defined in XML Schema Part 1 [XMLS]. Items in the "Data type" column of Tables use datatypes defined in XML Schema Part 2 [XMLD] and shall have the meaning as defined in [XMLD].

N18393 further provides the following with respect to signaling of viewpoints:

In DASH MPD, a Viewpoint element with a @schemeIdUri attribute equal to "urn:mpeg:mpegI:omaf:2018:vwpt" is referred to as a viewpoint information (VWPT) descriptor.

At most one VWPT descriptor may be present at adaptation set level and no VWPT descriptor shall be present at any other level. When no Adaptation Set in the Media Presentation contains a VWPT descriptor, the Media Presentation is inferred to be contain only one viewpoint.

The VWPT descriptor indicates the viewpoint the Adaptation Set belongs to.

Table 1 illustrates the Semantics of elements and attributes of the VWPT descriptor.

If the viewpoint is associated with a timed metadata Representation carrying a timed metadata track with sample entry type 'dyvp', the position of the viewpoint is dynamic. Otherwise, the position of the viewpoint is static. In the former case, the dynamic position of the viewpoint is signalled in the associated timed metadata Representation carrying a timed metadata track with sample entry type 'dyvp'.
The data types for various elements and attributes shall be as defined in the XML schema. An XML schema for this shall be as shown below. The schema shall be represented in an XML schema that has namespace urn:mpeg:mpegI:omaf:2018 and is specified as follows:

The viewpoint information (VWPT) descriptor provided in N18393 may be less than ideal. In particular, the viewpoint information (VWPT) descriptor provided in N18393 does not signal information about viewpoint’s common reference coordinate system relative to geomagnetic North direction in DASH MPD. This information is useful to enable the location aware VR use case. According to the techniques herein, data encapsulator 107 may be configured to signal information about viewpoint’s common reference coordinate system relative to geomagnetic North direction. Further, data encapsulator 107 may be configured to signal default values that are defined for an inference when the parent syntax element (ViewPointInfo.GeomagneticInfo) is signaled and the attributes are not signaled. In one example, according to the techniques herein, data encapsulator 107 may be configured to signal viewpoint information based on the following definition for a VWPT descriptor:

In DASH MPD, a Viewpoint element with a @schemeIdUri attribute equal to "urn:mpeg:mpegI:omaf:2018:vwpt" is referred to as a viewpoint information (VWPT) descriptor.

At most one VWPT descriptor may be present at adaptation set level and no VWPT descriptor shall be present at any other level. When no Adaptation Set in the Media Presentation contains a VWPT descriptor, the Media Presentation is inferred to be contain only one viewpoint.

The VWPT descriptor indicates the viewpoint the Adaptation Set belongs to.

Table 2 illustrates the Semantics of elements and attributes of the VWPT descriptor.

If the viewpoint is associated with a timed metadata Representation carrying a timed metadata track with sample entry type 'dyvp', the position of the viewpoint is dynamic. Otherwise, the position of the viewpoint is static. In the former case, the dynamic position of the viewpoint is signalled in the associated timed metadata Represention carrying a timed metadata track with sample entry type 'dyvp'.
The data types for various elements and attributes shall be as defined in the XML schema. An XML schema for this shall be as shown below. The schema shall be represented in an XML schema that has namespace urn:mpeg:mpegI:omaf:2018 and is specified as follows:

Further, in one example, according to the techniques herein, attributes of ViewPointInfo.GeomagneticInfo@yaw, ViewPointInfo.GeomagneticInfo@pitch, ViewPointInfo.GeomagneticInfo@roll provided in Table 2 may be mandatory to be signaled when element ViewPointInfo.GeomagneticInfo is signaled. Such a modification to Table 2 is illustrated below in Table 3.

With respect to Table 3 the XML schema may be as follows:

Where omaf:Range1 and omaf:Range2 may be defined as:

In one example, according to the techniques herein, data encapsulator 107 may be configured to signal viewpoint information based on the following definition for a VWPT descriptor. In the example below, the DASH viewpoint descriptor’s ViewPointInfo.Position@altitude attribute is optional and when not present ViewPointInfo.Position@altitude value may be is inferred. Further, new data types with value ranges are defined for signalling GPS latitude (ViewPointInfo.Position@latitude) and longitude (ViewPointInfo.Position@longitude) information in DASH Viewpoint descriptor.

In DASH MPD, a Viewpoint element with a @schemeIdUri attribute equal to "urn:mpeg:mpegI:omaf:2018:vwpt" is referred to as a viewpoint information (VWPT) descriptor.

At most one VWPT descriptor may be present at adaptation set level and no VWPT descriptor shall be present at any other level. When no Adaptation Set in the Media Presentation contains a VWPT descriptor, the Media Presentation is inferred to be contain only one viewpoint.

The VWPT descriptor indicates the viewpoint the Adaptation Set belongs to.

Table 4 illustrates the Semantics of elements and attributes of the VWPT descriptor.

In another example the viewpoint GPS altitude related attribute’s XML schema part may be as defined below (without a predefined default value):
<xs:attribute name="altitude" type="xs:int" use="optional" />

It should be noted that in one example, ViewPointInfo.GeomagneticInfo@yaw, ViewPointInfo.GeomagneticInfo@pitch, ViewPointInfo.GeomagneticInfo@roll may be renamed as ViewPointInfo.GeomagneticInfo@azimuth, ViewPointInfo.GeomagneticInfo@elevation, ViewPointInfo.GeomagneticInfo@tilt respectively.

As described above, Wang, W18227, m47789, and N18393 describe viewpoint information structures. ISO/IEC JTC1/SC29/WG11 N18587 “WD 6 of ISO/IEC 23090-2 OMAF 2nd edition,” July 2019, Gothenburg, SE, which is incorporated by reference and herein referred to as N18393, viewpoint information structures having the following definition, syntax and semantics:

Definition
Tracks belonging to the same viewpoint have the same value of track_group_id for track_group_type 'vipo', and the track_group_id of tracks from one viewpoint differs from the track_group_id of tracks from any other viewpoint.
By default, when this track grouping is not indicated for any track in a file, the file is considered containing content for one viewpoint only.

Syntax

Semantics
Tracks that have the same value of track_group_id within TrackGroupTypeBox with track_group_type equal to ‘vipo’ belong to the same viewpoint. The track_group_id within TrackGroupTypeBox within track_group_type equal to ‘vipo’ is therefore used as the identifier of the viewpoint.

viewpoint_label is a null-terminated UTF-8 string that provides a human readable text label for the viewpoint.

viewpoint_gpspos_present_flag equal to 1 indicates that viewpoint_gpspos_longitude, viewpoint_gpspos_latitude, and viewpoint_gpspos_altitude are present. viewpoint_gpspos_present_flag equal to 0 indicates that viewpoint_gpspos_longitude, viewpoint_gpspos_latitude, and viewpoint_gpspos_altitude are not present.

viewpoint_geomagnetic_info_present_flag equal to 1 indicates that viewpoint_geomagnetic_yaw, viewpoint_geomagnetic_pitch, and viewpoint_geomagnetic_roll are present. viewpoint_geomagnetic_info_present_flag equal to 0 indicates that viewpoint_geomagnetic_yaw, viewpoint_geomagnetic_pitch, and viewpoint_geomagnetic_roll are not present. viewpoint_geomagnetic_info_present_flag shall be equal to 1 for at most one viewpoint in viewpoint group to enable location aware VR content.

ViewpointPosStruct(), ViewpointGpsPositionStruct(), ViewpointGeomagenticInfoStruct(), ViewpointGroupStruct(), ViewpointGlobalCoordinateSysRotationStruct(), and ViewpointSwitchingStruct() are defined follows:

Definition
The ViewpointPosStruct(), ViewpointGpsPositionStruct(), ViewpointGeomagneticInfoStruct(), ViewpointGlobalCoordinateSysRotationStruct(), ViewpointGroupStruct(), and ViewpointSwitchingStruct() provide information of a viewpoint, including the (X, Y, Z) position of the viewpoint, GPS position of the viewpoint, geomagnetic position information for the viewpoint, and the yaw, pitch, and roll rotation angles of X, Y, and Z axes, respectively, of the global coordinate system of the viewpoint relative to the common reference coordinate system, viewpoint group information, and viewpoint switching information.
In order to successfully locate the viewpoints, an OMAF player is expected to have access to geolocation tracking and magnetometer. This enables the player to align the common reference coordinate system with the geolocation coordinates and find the player device position with respect to the geolocation coordinates.

Syntax

Semantics
viewpoint_pos_x, viewpoint_pos_y, and viewpoint_pos_z specify the position of the viewpoint (when the position of the viewpoint is static) or the initial position of viewpoint (when the position of the viewpoint is dynamic), in units of 10^-1 millimeters, in 3D space, relative to the common reference coordinate system. If a viewpoint is associated with a timed metadata track with sample entry type 'dyvp', the position of the viewpoint is dynamic or viewpoint switching information is changing over time. Otherwise, the position of the viewpoint is static. In the former case, the dynamic position of the viewpoint is signalled in the associated timed metadata track with sample entry type 'dyvp'.
viewpoint_gpspos_longitude indicates the longitude of the geolocation of the viewpoint in units of 2^-23 degrees. viewpoint_gpspos_longitude shall be in range of -180 * 2²³ to 180 * 2²³ - 1, inclusive. Positive values represent eastern longitude and negative values represent western longitude.
viewpoint_gpspos_latitude indicates the latitude of the geolocation of the viewpoint in units of 2^-23 degrees. viewpoint_gpspos_latitude shall be in range of -90 * 2²³ to 90 * 2²³ - 1, inclusive. Positive value represents northern latitude and negative value represents southern latitude.
viewpoint_gpspos_altitude indicates the altitude of the geolocation of the viewpoint in units of milimeters above the WGS 84 reference ellipsoid as specified in the EPSG:4326 database.
viewpoint_geomagnetic_yaw, viewpoint_geomagnetic_pitch, and viewpoint_geomagnetic_roll specify the yaw, pitch, and roll angles, respectively, of the rotation angles of X, Y, Z axes of the common reference coordinate system relative to the geomagnetic North direction, in units of 2^-16 degrees. viewpoint_geomagnetic_yaw shall be in the range of -180 * 2¹⁶ to 180 *2¹⁶ - 1, inclusive. viewpoint_geomagnetic_pitch shall be in the range of -90 * 2¹⁶ to 90 * 2¹⁶, inclusive. viewpoint_geomagnetic_roll shall be in the range of -180 * 2¹⁶ to 180 * 2¹⁶ - 1, inclusive.
viewpoint_gcs_yaw, viewpoint_gcs_pitch, and viewpoint_gcs_roll specify the yaw, pitch, and roll angles, respectively, of the rotation angles of X, Y, Z axes of the global coordinate system of the viewpoint relative to the common reference coordinate system, in units of 2^-16 degrees. viewpoint_gcs_yaw shall be in the range of -180 * 2¹⁶ to 180 *2¹⁶ - 1, inclusive. viewpoint_gcs_pitch shall be in the range of -90 * 2¹⁶ to 90 * 2¹⁶, inclusive. viewpoint_gcs_roll shall be in the range of -180 * 2¹⁶ to 180 * 2¹⁶ - 1, inclusive.
vwpt_group_id indicates the identifier of a viewpoint group. All viewpoints in a viewpoint group share a common reference coordinate system.
NOTE : When two viewpoints have different values of vwpt_group_id, their position coordinates are not comparable, because the viewpoints belong to different coordinate systems.
vwpt_group_description is a null-terminated UTF-8 string which indicates the description of a viewpoint group. A null string is allowed.
NOTE: An OMAF player is expected to start with the initial viewpoint timed metadata as defined above Subsequently, if the user wishes to switch to a viewpoint group and the initial viewpoint information is not present, the OMAF player is expected to switch to the viewpoint with the least value of the viewpoint identifier in the viewpoint group.
num_viewpoint_switching indicates the number of switching transitions possible from the viewpoint to which the ViewpointSwitchingStruct is associated.
destination_viewpoint_id indicates the viewpoint_id of the destination viewpoint of a viewpoint switching.
destination_viewport_flag equal to 1 specifies that a SphereRegionStruct() specifying the destination viewport is present.
transition_effect_flag equal to 1 specifies that a transition effect description is present.
timeline_switching_offset_flag equal to 1 specifies that a time offset is present.
content_type indicates the type of content being delivered. 0=On Demand; 1=Live.
absolute_relative_t_offset_flag equal to 0 indicates that the time offset is absolute with respect to the beginning of the stream. When absolute_relative_t_offset_flag is equal to 1 it indicates that the time offset is relative with respect to the current time.
absolute_t_offset specifies the absolute time offset to be used when switching to destination_viewpoint_id. When content_type is equal to 1, absolute_t_offset shall be less than or equal to the current time.
relative_t_offset specifies the relative time offset to be used when switching to destination_viewpoint_id. When content_type is equal to 1, relative_t_offset shall be negative or zero.
destination_viewport_flag equal to 1 specifies that a specific viewport shall be used after transitioning to the new viewpoint. If equal to 0, the default OMAF viewport switching process shall be used (recommended viewport if present, or default viewport otherwise).
min_time_flag equal to 1 specifies that a minimum time with regards to the current viewport media timeline is present. If present, the transition can only be activated after this minimum playout time.
max_time_flag equal to 1 specifies that a maximum time with regards to the current viewport media timeline is present. If present, the transition can only be activated before this maximum playout time.
transition_effect_type indicates the type of transition effects, as listed in Table 5 when switching to this viewpoint.

transition_video_track_id indicates the track_id of the video to be played when rendering the transition.
transition_video_URL indicates the URL of the video to be played when rendering the transition.
t_offset specifies the time offset to be used when switching.
t_min specifies the minimum playout time of the current viewport media timeline after which the switching can be activated.
t_max specifies the maximum playout time of the current viewport media timeline before which the switching can be activated.

In one example 32 bits may be used for viewpoint ID. Thus Data type of destination_viewpoint_id is changed from unsigned int(16) to unsigned int(32).

In one example, according to the techniques herein, a flag may be used to control the presence or absence of viewpoint switching information, i.e., ViewpointSwitchingStruct), in the ViewpointTrackGroupBox. It is asserted that for various scenarios the content provider may decide to include or not include viewpoint switching information. For example, in a scenario where there is only one viewpoint, or if there is no recommended behavior when switching between viewpoints, then this information may not be signaled. That is, in one example, according to the techniques herein, data encapsulator 107 may be configured to signal viewpoint track grouping information according to the following syntax and semantics:

Syntax

Semantics
Tracks that have the same value of track_group_id within TrackGroupTypeBox with track_group_type equal to ‘vipo’ belong to the same viewpoint. The track_group_id within TrackGroupTypeBox within track_group_type equal to ‘vipo’ is therefore used as the identifier of the viewpoint.

viewpoint_label is a null-terminated UTF-8 string that provides a human readable text label for the viewpoint.

viewpoint_gpspos_present_flag equal to 1 indicates that viewpoint_gpspos_longitude, viewpoint_gpspos_latitude, are present and viewpoint_gpspos_altitude may be present. viewpoint_gpspos_present_flag equal to 0 indicates that viewpoint_gpspos_longitude, viewpoint_gpspos_latitude, and viewpoint_gpspos_altitude are not present.

viewpoint_geomagnetic_info_present_flag equal to 1 indicates that viewpoint_geomagnetic_yaw, viewpoint_geomagnetic_pitch, and viewpoint_geomagnetic_roll are present. viewpoint_geomagnetic_info_present_flag equal to 0 indicates that viewpoint_geomagnetic_yaw, viewpoint_geomagnetic_pitch, and viewpoint_geomagnetic_roll are not present. viewpoint_geomagnetic_info_present_flag shall be equal to 1 for at most one viewpoint in viewpoint group to enable location aware VR content.

viewpoint_switching_info_present_flag equal to 1 indicates that ViewpointSwitchingStruct() is present in this ViewpointTrackGroupBox. viewpoint_switching_info_present_flag equal to 0 indicates that ViewpointSwitchingStruct() is not present in this ViewpointTrackGroupBox.
In one example when ViewpointSwitchingStruct() is not present it means that there is no recommended behavior when switching between viewpoints and an OMAF player can freely switch between this viewpoint and any other viewpoint as desired.

ViewpointPosStruct(), ViewpointGpsPositionStruct(), ViewpointGeomagenticInfoStruct(), ViewpointGroupStruct(), ViewpointGlobalCoordinateSysRotationStruct(), are defined above. ViewpointSwitchingStruct() may be defined as provided above or below.

In one example, according to the techniques herein, when destination_viewport_flag is equal to 1 ViewpointSwitchingStruct(), the sphere region structure is used to signal the destination viewport information. In this case, the azimuth_range and elevation_range in the signalled sphere region structure are included to be able to define the viewport correctly. For this, SphereRegionStruct(1,0) may be signaled when destination_viewport_flag is equal to 1, instead of SphereRegionStruct(0,0). That is, in one example, according to the techniques herein, data encapsulator 107 may be configured to signal viewpoint switching structure according to the following syntax:

In another one example, according to the techniques herein, when the sphere region structure is used to signal the destination viewport information, the shape type of the sphere region may be used to be able to completely specify the viewport region. That is, in one example, according to the techniques herein, data encapsulator 107 may be configured to signal viewpoint switching structure according to the following syntax:

Where, in one example, sw_shape_type has the following semantics:
sw_shape_type equal to 0 specifies that the sphere region is specified by four great circles. sw_shape_type equal to 1 specifies that the sphere region is specified by two azimuth circles and two elevation circles. sw_shape_type values greater than 1 are reserved.
Further, in one example, the semantics may be as follows:
sw_shape_type has identical semantics to shape_type of SphereRegionConfigBox.
destination_viewport_flag equal to 1 specifies that a SphereRegionStruct() specifying the destination viewport is present. It is inferred that the sphere region by the signalled SphereRegionStruct() is specified by four great circles. In another example: It is inferred that the sphere region by the signalled SphereRegionStruct() is specified by two azimuth circles and two elevation circles.

In N18587, dynamic viewpoint timed metadata tracks have the following sample entry and sample format:
The dynamic viewpoint timed metadata track indicates the viewpoint parameters that are dynamically changing over time.
An OMAF player should use the signalled information as follows when starting playing back of one viewpoint after switching from another viewpoint:
- If there is a recommended viewing orientation explicitly signalled, the OMAF player is expected to parse this information and follow the recommended viewing orientation.
- Otherwise, the OMAF player is expected to keep the same viewing orientation as in the switching-from viewpoint just before the switching occurs.

Sample Entry
The track sample entry type 'dyvp' shall be used. The sample entry of this sample entry type is specified as follows:

vwpt_pos_flag equal to 0 specifies that ViewpointPosStruct()is not present in the sample entry. vwpt_pos_flag equal to 1 specifies that ViewpointPosStruct() is present in the sample entry.

dynamic_gcs_rotated_flag equal to 0 specifies that the yaw, pitch, and roll rotation angles of X, Y, and Z axes, respectively, of the global coordinate system of the viewpoint relative to the common reference coordinate system remain unchanged in all samples referring to this sample entry. dynamic_gcs_rotated_flag equal to 1 specifies that the yaw, pitch, and roll rotation angles of X, Y, and Z axes, respectively, of the global coordinate system of the viewpoint relative to the common reference coordinate system are indicated in the samples.

dynamic_vwpt_group_flag equal to 0 specifies that the vwpt_group_flag and ViewpointGroupStruct() are not present in the samples and the viewpoint group information (vwpt_group_id and vwpt_group_description) signalled in ViewpointGroupStruct() in the ViewpointTrackGroupBox applies to each sample referring to this sample entry. dynamic_vwpt_group_flag equal to 1 specifies that the ViewpointGroupStruct() for the viewpoint may change compared to that signalled in the ViewpointGroupStruct() in the ViewpointTrackGroupBox and new ViewpointGroupStruct() may be signalled in the samples based on the value of vwpt_group_flag in the samples.

dynamic_vwpt_gps_flag equal to 0 specifies that the GPS position information of the viewpoint remains unchanged or is not updated in all samples referring to this sample entry. dynamic_vwpt_gps_flag equal to 1 specifies that the GPS position information of the viewpoint is indicated in the samples.

dynamic_vwpt_geomagnetic_info_flag equal to 0 specifies that ViewpointGeomagneticInfoStruct() is present in the sample entry. dynamic_vwpt_geomagnetic_info_flag equal to 1 specifies that ViewpointGeomagneticInfoStruct() is not present in the sample entry and is indicated in the samples.

vwpt_switch_flag equal to 0 specifies that ViewpointSwitchingStruct()is not present in the sample entry. vwpt_pos_flag equal to 1 specifies that ViewpointSwitchingStruct()is present in the sample entry.

ViewpointPosStruct() is defined above but indicates the initial viewpoint position.

ViewpointGlobalCoordinateSysRotationStruct() is defined above but indicates the yaw, pitch, and roll rotation angles of X, Y, and Z axes, respectively, of the global coordinate system of the viewpoint relative to the common reference coordinate system for each sample referring to this sample entry.

ViewpointGpsPositionStruct() is defined above but indicates the GPS position information of the viewpoint for each sample referring to this sample entry.

ViewpointGeomagneticInfoStruct() is defined above but indicates the geomagnetic position information of the viewpoint for each sample referring to this sample entry.

ViewpointSwitchingStruct() is defined above but indicates the viewpoint switching information of for each sample referring to this sample entry.

Sample Format
The sample syntax of this sample entry type ('dyvp') is specified as follows:

The semantics of ViewpointPosStruct(), ViewpointGpsPositionStruct(), ViewpointGeomagneticInfoStruct(), ViewpointGlobalCoordinateSysRotationStruct(), and ViewpointGroupStruct() are specified above.

vwpt_group_flag equal to 1 specifies that the ViewpointGroupStruct() is present. vwpt_group_flag equal to 0 specifies that the ViewpointGroupStruct() is not present. When not present, the value of vwpt_group_flag is inferred to be equal to 0.

When dynamic_vwpt_group_flag is equal to 1, the first sample shall have vwpt_group_flag equal to 1. For subsequent samples when the viewpoint group information does not change, the ViewpointGroupStruct() can be absent. When the ViewpointGrouptStruct() is absent in a sample, the following applies:
-If dynamic_vwpt_group_flag is equal to 1, it is inferred to be identical to the ViewpointGroupStruct() of the previous sample, in decoding order.
-Otherwise (dynamic_vwpt_group_flag is equal to 0), it is inferred to be identical to the ViewpointGroupStruct() in the ViewpointTrackGroupBox.

In another one example, according to the techniques herein, ViewpointSwitchingStruct may be included in the dynamic viewpoint samples controlled by a flag to allow specifying dynamic viewpoint switching information. That is, in one example, according to the techniques herein, data encapsulator 107 may be configured to signal dynamic viewpoint timed metadata tracks have thing following sample entry and sample format:

Sample Entry
The track sample entry type 'dyvp' shall be used. The sample entry of this sample entry type is specified as follows:

vwpt_pos_flag equal to 0 specifies that ViewpointPosStruct()is not present in the sample entry. vwpt_pos_flag equal to 1 specifies that ViewpointPosStruct() is present in the sample entry.

dynamic_gcs_rotated_flag equal to 0 specifies that the yaw, pitch, and roll rotation angles of X, Y, and Z axes, respectively, of the global coordinate system of the viewpoint relative to the common reference coordinate system remain unchanged in all samples referring to this sample entry. dynamic_gcs_rotated_flag equal to 1 specifies that the yaw, pitch, and roll rotation angles of X, Y, and Z axes, respectively, of the global coordinate system of the viewpoint relative to the common reference coordinate system are indicated in the samples.

dynamic_vwpt_group_flag equal to 0 specifies that the vwpt_group_flag and ViewpointGroupStruct() are not present in the samples and the viewpoint group information (vwpt_group_id and vwpt_group_description) signalled in ViewpointGroupStruct() in the ViewpointTrackGroupBox applies to each sample referring to this sample entry. dynamic_vwpt_group_flag equal to 1 specifies that the ViewpointGroupStruct() for the viewpoint may change compared to that signalled in the ViewpointGroupStruct() in the ViewpointTrackGroupBox and new ViewpointGroupStruct() may be signalled in the samples based on the value of vwpt_group_flag in the samples.

dynamic_vwpt_gps_flag equal to 0 specifies that the GPS position information of the viewpoint remains unchanged or is not updated in all samples referring to this sample entry. dynamic_vwpt_gps_flag equal to 1 specifies that the GPS position information of the viewpoint is indicated in the samples.

dynamic_vwpt_geomagnetic_info_flag equal to 0 specifies that ViewpointGeomagneticInfoStruct() is present in the sample entry. dynamic_vwpt_geomagnetic_info_flag equal to 1 specifies that ViewpointGeomagneticInfoStruct() is not present in the sample entry and is indicated in the samples.

dynamic_vwpt_switch_flag equal to 0 specifies that ViewpointSwitchingStruct() is present in the sample entry and is not present in the samples. dynamic_vwpt_switch_flag equal to 1 specifies that ViewpointSwitchingStruct() is not present in the sample entry and is present in the samples.

ViewpointPosStruct() is defined above but indicates the initial viewpoint position.

ViewpointGlobalCoordinateSysRotationStruct() is defined above but indicates the yaw, pitch, and roll rotation angles of X, Y, and Z axes, respectively, of the global coordinate system of the viewpoint relative to the common reference coordinate system for each sample referring to this sample entry.

ViewpointGpsPositionStruct() is defined above but indicates the GPS position information of the viewpoint for each sample referring to this sample entry.

ViewpointGeomagneticInfoStruct() is defined above but indicates the geomagnetic position information of the viewpoint for each sample referring to this sample entry.

ViewpointSwitchingStruct() is defined above but indicates the viewpoint switching information of for each sample referring to this sample entry.

Sample Format
The sample syntax of this sample entry type ('dyvp') is specified as follows:

The semantics of ViewpointPosStruct(), ViewpointGpsPositionStruct(), ViewpointGeomagneticInfoStruct(), ViewpointGlobalCoordinateSysRotationStruct(), ViewpointSwitchingStruct(), and ViewpointGroupStruct() are specified above.

vwpt_group_flag equal to 1 specifies that the ViewpointGroupStruct() is present. vwpt_group_flag equal to 0 specifies that the ViewpointGroupStruct() is not present. When not present, the value of vwpt_group_flag is inferred to be equal to 0.

When dynamic_vwpt_group_flag is equal to 1, the first sample shall have vwpt_group_flag equal to 1. For subsequent samples when the viewpoint group information does not change, the ViewpointGroupStruct() can be absent. When the ViewpointGrouptStruct() is absent in a sample, the following applies:
-If dynamic_vwpt_group_flag is equal to 1, it is inferred to be identical to the ViewpointGroupStruct() of the previous sample, in decoding order.
-Otherwise (dynamic_vwpt_group_flag is equal to 0), it is inferred to be identical to the ViewpointGroupStruct() in the ViewpointTrackGroupBox.

In this manner, data encapsulator 107 represents an example of a device configured to signal a unique identifier and a label for each of a plurality of viewpoints, wherein the label provides a viewpoint selection mechanism.

Referring again to FIG. 1, interface 108 may include any device configured to receive data generated by data encapsulator 107 and transmit and/or store the data to a communications medium. Interface 108 may include a network interface card, such as an Ethernet card, and may include an optical transceiver, a radio frequency transceiver, or any other type of device that can send and/or receive information. Further, interface 108 may include a computer system interface that may enable a file to be stored on a storage device. For example, interface 108 may include a chipset supporting Peripheral Component Interconnect (PCI) and Peripheral Component Interconnect Express (PCIe) bus protocols, proprietary bus protocols, Universal Serial Bus (USB) protocols, I²C, or any other logical and physical structure that may be used to interconnect peer devices.

Referring again to FIG. 1, destination device 120 includes interface 122, data decapsulator 123, video decoder 124, and display 126. Interface 122 may include any device configured to receive data from a communications medium. Interface 122 may include a network interface card, such as an Ethernet card, and may include an optical transceiver, a radio frequency transceiver, or any other type of device that can receive and/or send information. Further, interface 122 may include a computer system interface enabling a compliant video bitstream to be retrieved from a storage device. For example, interface 122 may include a chipset supporting PCI and PCIe bus protocols, proprietary bus protocols, USB protocols, I²C, or any other logical and physical structure that may be used to interconnect peer devices. Data decapsulator 123 may be configured to receive a bitstream generated by data encaspulator 107 and perform sub-bitstream extraction according to one or more of the techniques described herein.

Video decoder 124 may include any device configured to receive a bitstream and/or acceptable variations thereof and reproduce video data therefrom. Display 126 may include any device configured to display video data. Display 126 may comprise one of a variety of display devices such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or another type of display. Display 126 may include a High Definition display or an Ultra High Definition display. Display 126 may include a stereoscopic display. Display 126 may include a head-mounted display. It should be noted that although in the example illustrated in FIG. 1, video decoder 124 is described as outputting data to display 126, video decoder 124 may be configured to output video data to various types of devices and/or sub-components thereof. For example, video decoder 124 may be configured to output video data to any communication medium, as described herein. Destination device 120 may include a receive device.

FIG. 7 is a block diagram illustrating an example of a receiver device that may implement one or more techniques of this disclosure. That is, receiver device 700 may be configured to parse a signal based on the semantics described above. Further, receiver device 700 may be configured to operate according to expected player behavior described herein. Further, receiver device 700 may be configured to perform translation techniques described herein. Receiver device 700 is an example of a computing device that may be configured to receive data from a communications network and allow a user to access multimedia content, including a virtual reality application. In the example illustrated in FIG. 7, receiver device 700 is configured to receive data via a television network, such as, for example, television service network 704 described above. Further, in the example illustrated in FIG. 7, receiver device 700 is configured to send and receive data via a wide area network. It should be noted that in other examples, receiver device 700 may be configured to simply receive data through a television service network 704. The techniques described herein may be utilized by devices configured to communicate using any and all combinations of communications networks.

As illustrated in FIG. 7, receiver device 700 includes central processing unit(s) 702, system memory 704, system interface 710, data extractor 712, audio decoder 714, audio output system 716, video decoder 718, display system 720, I/O device(s) 722, and network interface 724. As illustrated in FIG. 7, system memory 704 includes operating system 706 and applications 708. Each of central processing unit(s) 702, system memory 704, system interface 710, data extractor 712, audio decoder 714, audio output system 716, video decoder 718, display system 720, I/O device(s) 722, and network interface 724 may be interconnected (physically, communicatively, and/or operatively) for inter-component communications and may be implemented as any of a variety of suitable circuitry, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware or any combinations thereof. It should be noted that although receiver device 700 is illustrated as having distinct functional blocks, such an illustration is for descriptive purposes and does not limit receiver device 700 to a particular hardware architecture. Functions of receiver device 700 may be realized using any combination of hardware, firmware and/or software implementations.

CPU(s) 702 may be configured to implement functionality and/or process instructions for execution in receiver device 700. CPU(s) 702 may include single and/or multi-core central processing units. CPU(s) 702 may be capable of retrieving and processing instructions, code, and/or data structures for implementing one or more of the techniques described herein. Instructions may be stored on a computer readable medium, such as system memory 704.

System memory 704 may be described as a non-transitory or tangible computer-readable storage medium. In some examples, system memory 704 may provide temporary and/or long-term storage. In some examples, system memory 704 or portions thereof may be described as non-volatile memory and in other examples portions of system memory 704 may be described as volatile memory. System memory 704 may be configured to store information that may be used by receiver device 700 during operation. System memory 704 may be used to store program instructions for execution by CPU(s) 702 and may be used by programs running on receiver device 700 to temporarily store information during program execution. Further, in the example where receiver device 700 is included as part of a digital video recorder, system memory 704 may be configured to store numerous video files.

Applications 708 may include applications implemented within or executed by receiver device 700 and may be implemented or contained within, operable by, executed by, and/or be operatively/communicatively coupled to components of receiver device 700. Applications 708 may include instructions that may cause CPU(s) 702 of receiver device 700 to perform particular functions. Applications 708 may include algorithms which are expressed in computer programming statements, such as, for-loops, while-loops, if-statements, do-loops, etc. Applications 708 may be developed using a specified programming language. Examples of programming languages include, Java^TM, Jini^TM, C, C++, Objective C, Swift, Perl, Python, PhP, UNIX Shell, Visual Basic, and Visual Basic Script. In the example where receiver device 700 includes a smart television, applications may be developed by a television manufacturer or a broadcaster. As illustrated in FIG. 7, applications 708 may execute in conjunction with operating system 706. That is, operating system 706 may be configured to facilitate the interaction of applications 708 with CPUs(s) 702, and other hardware components of receiver device 700. Operating system 706 may be an operating system designed to be installed on set-top boxes, digital video recorders, televisions, and the like. It should be noted that techniques described herein may be utilized by devices configured to operate using any and all combinations of software architectures.

System interface 710 may be configured to enable communications between components of receiver device 700. In one example, system interface 710 comprises structures that enable data to be transferred from one peer device to another peer device or to a storage medium. For example, system interface 710 may include a chipset supporting Accelerated Graphics Port (AGP) based protocols, Peripheral Component Interconnect (PCI) bus based protocols, such as, for example, the PCI Express^TM (PCIe) bus specification, which is maintained by the Peripheral Component Interconnect Special Interest Group, or any other form of structure that may be used to interconnect peer devices (e.g., proprietary bus protocols).

As described above, receiver device 700 is configured to receive and, optionally, send data via a television service network. As described above, a television service network may operate according to a telecommunications standard. A telecommunications standard may define communication properties (e.g., protocol layers), such as, for example, physical signaling, addressing, channel access control, packet properties, and data processing. In the example illustrated in FIG. 7, data extractor 712 may be configured to extract video, audio, and data from a signal. A signal may be defined according to, for example, aspects DVB standards, ATSC standards, ISDB standards, DTMB standards, DMB standards, and DOCSIS standards.

Data extractor 712 may be configured to extract video, audio, and data, from a signal. That is, data extractor 712 may operate in a reciprocal manner to a service distribution engine. Further, data extractor 712 may be configured to parse link layer packets based on any combination of one or more of the structures described above.

Data packets may be processed by CPU(s) 702, audio decoder 714, and video decoder 718. Audio decoder 714 may be configured to receive and process audio packets. For example, audio decoder 714 may include a combination of hardware and software configured to implement aspects of an audio codec. That is, audio decoder 714 may be configured to receive audio packets and provide audio data to audio output system 716 for rendering. Audio data may be coded using multi-channel formats such as those developed by Dolby and Digital Theater Systems. Audio data may be coded using an audio compression format. Examples of audio compression formats include Motion Picture Experts Group (MPEG) formats, Advanced Audio Coding (AAC) formats, DTS-HD formats, and Dolby Digital (AC-3) formats. Audio output system 716 may be configured to render audio data. For example, audio output system 716 may include an audio processor, a digital-to-analog converter, an amplifier, and a speaker system. A speaker system may include any of a variety of speaker systems, such as headphones, an integrated stereo speaker system, a multi-speaker system, or a surround sound system.

Video decoder 718 may be configured to receive and process video packets. For example, video decoder 718 may include a combination of hardware and software used to implement aspects of a video codec. In one example, video decoder 718 may be configured to decode video data encoded according to any number of video compression standards, such as ITU-T H.262 or ISO/IEC MPEG-2 Visual, ISO/IEC MPEG-4 Visual, ITU-T H.264 (also known as ISO/IEC MPEG-4 Advanced video Coding (AVC)), and High-Efficiency Video Coding (HEVC). Display system 720 may be configured to retrieve and process video data for display. For example, display system 720 may receive pixel data from video decoder 718 and output data for visual presentation. Further, display system 720 may be configured to output graphics in conjunction with video data, e.g., graphical user interfaces. Display system 720 may comprise one of a variety of display devices such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or another type of display device capable of presenting video data to a user. Display 720 may include a stereoscopic display. Display system 720 may include a head-mounted display. A display device may be configured to display standard definition content, high definition content, or ultra-high definition content.

I/O device(s) 722 may be configured to receive input and provide output during operation of receiver device 700. That is, I/O device(s) 722 may enable a user to select multimedia content to be rendered. Input may be generated from an input device, such as, for example, a push-button remote control, a device including a touch-sensitive screen, a motion-based input device, an audio-based input device, or any other type of device configured to receive user input. I/O device(s) 722 may be operatively coupled to receiver device 700 using a standardized communication protocol, such as for example, Universal Serial Bus protocol (USB), Bluetooth, ZigBee or a proprietary communications protocol, such as, for example, a proprietary infrared communications protocol.

Network interface 724 may be configured to enable receiver device 700 to send and receive data via a local area network and/or a wide area network. Network interface 724 may include a network interface card, such as an Ethernet card, an optical transceiver, a radio frequency transceiver, or any other type of device configured to send and receive information. Network interface 724 may be configured to perform physical signaling, addressing, and channel access control according to the physical and Media Access Control (MAC) layers utilized in a network. In this manner, receiver device 700 represents an example of a device configured to parse syntax elements indicating for each of a plurality of viewpoints, a unique identifier and a label and render video based on values of the parsed syntax elements.

In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, which corresponds to a tangible medium such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol. In this manner, computer-readable media generally may correspond to (1) tangible computer-readable storage media which is non-transitory or (2) a communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure. A computer program product may include a computer-readable medium.

By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are instead directed to non-transitory, tangible storage media. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

Instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Accordingly, the term “processor,” as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques could be fully implemented in one or more circuits or logic elements.

The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of ICs (e.g., a chip set). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described above, various units may be combined in a codec hardware unit or provided by a collection of interoperative hardware units, including one or more processors as described above, in conjunction with suitable software and/or firmware.

Moreover, each functional block or various features of the base station device and the terminal device used in each of the aforementioned embodiments may be implemented or executed by a circuitry, which is typically an integrated circuit or a plurality of integrated circuits. The circuitry designed to execute the functions described in the present specification may comprise a general-purpose processor, a digital signal processor (DSP), an application specific or general application integrated circuit (ASIC), a field programmable gate array (FPGA), or other programmable logic devices, discrete gates or transistor logic, or a discrete hardware component, or a combination thereof. The general-purpose processor may be a microprocessor, or alternatively, the processor may be a conventional processor, a controller, a microcontroller or a state machine. The general-purpose processor or each circuit described above may be configured by a digital circuit or may be configured by an analogue circuit. Further, when a technology of making into an integrated circuit superseding integrated circuits at the present time appears due to advancement of a semiconductor technology, the integrated circuit by this technology is also able to be used.

Various examples have been described. These and other examples are within the scope of the following claims.

<Cross Reference>
This Nonprovisional application claims priority under 35 U.S.C. § 119 on provisional Application No. 62/818,581 on March 14, 2019, provisional Application No. 62/819,321 on March 15, 2019, provisional Application No. 62/845,830 on May 9, 2019, provisional Application No. 62/854,122 on May 29, 2019, provisional Application No. 62/868,680 on June 28, 2019, provisional Application No. 62/902,794 on September 19, 2019, and provisional Application No. 62/906,623 on Septembere 26, 2019, the entire contents of which are hereby incorporated by reference.

Claims

A method of signaling information associated with an omnidirectional video, the method comprising:
signaling a first syntax element specifying a number of object identifiers for which a second syntax element and a third syntax element values are signaled;
signaling the second syntax element specifying an object identifier for a m-th object; and
signaling the third syntax element specifying a description of an object identified by the second syntax element.
The method of claim 1, wherein the first syntax element is signaled by an 8-bit unsigned integer and is a number of object identifiers syntax element.
The method of claim 1, wherein the second syntax element is an object identifier syntax element.
The method of claim 1, wherein the third syntax element is a null-terminated UTF-8 string and is an object label syntax element.
The method of claim 1, wherein the first syntax element, the second syntax element and the third syntax element are included in an object centre points correspondence timed metadata track.
The method of claim 1, where an object element includes a physical object in a scene.
A method of receiving information associated with an omnidirectional video, the method comprising:
receiving a first syntax element specifying a number of object identifiers for which a second syntax element and a third syntax element values are received;
receiving the second syntax element specifying an object identifier for a m-th object; and
receiving the third syntax element specifying a description of an object identified by the second syntax element.
A device comprising one or more processors configured to perform any and all combinations of the steps of claim 1.