WO2019193011A1

WO2019193011A1 - Region description for 360 or spherical video

Info

Publication number: WO2019193011A1
Application number: PCT/EP2019/058314
Authority: WO
Inventors: Yago SÁNCHEZ DE LA FUENTE; Cornelius Hellge; Robert SKUPIN; Thomas Schierl; Thomas Wiegand; Dimitri PODBORSKI
Original assignee: Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.
Priority date: 2018-04-05
Filing date: 2019-04-02
Publication date: 2019-10-10

Abstract

A data stream includes an encoded picture, the picture composed of one or more rectangular regions on respective faces of a polyhedron onto which a full view sphere of the picture is projected according to a predetermined projection scheme, first data indicating for a rectangular region a sphere region of the full view sphere associated with the rectangular region, the first data indicating an azimuth range of the sphere region and an elevation range of the sphere region, and second data indicating that the rectangular region has a center that is offset from a center of the face of the polyhedron on which the rectangular region is arranged, the second data causing azimuth and elevation coordinates of the sphere region to be obtained using an azimuth coordinate and an elevation coordinate for a center of the face of the polyhedron that is closest to the center of the rectangular region.

Description

REGION DESCRIPTION FOR 360 OR SPHERICAL VIDEO

Description

The present invention relates to the field of encoding/decoding pictures, images or videos. Embodiments of the inventive approach concern improvements in the description of regions on respective faces of a polyhedron onto which a full view sphere of a picture is projected according to a predetermined projection scheme, e.g., during virtual reality (VR) streaming.

For example, VR streaming may involve the transmission of a high-resolution video. The resolving capacity of the human fovea is around 60 pixels per degree. To alleviate bandwidth requirements, only a viewport shown at a Head Mounted Display (HMD) may be send with high resolution, while neighboring data or the rest of the omnidirectional video, also referred to as spherical video, is send at a lower resolution or with a lower quality.

It is an object of the present invention to provide an improved approach for reducing errors when presenting decoded contents for a full view sphere.

This object is achieved by the subject-matter as defined in the independent claims, and favorable further developments are defined in the pending claims.

Embodiments of the present invention are now described in further detail with reference to the accompanying drawings, in which: Fig. 1 is a schematic representation of a system for transferring picture or video data from a server to a client in accordance with embodiments of the present invention;

Fig. 2 shows a schematic diagram illustrating a system including a client and a server for virtual reality applications as an example where embodiments of the inventive approach described herein may be used; Fig. 3 illustrates the definition of the spherical region by four great circles;

Fig. 4 illustrates respective faces of a cube, each face having associated four offset tiles, i.e., each tile or rectangular region does not correspond to the center of the respective face; and

Fig. 5 illustrates an example of a computer system on which units or modules as well as the steps of the methods described in accordance with the inventive approach may execute.

Embodiments of the present invention are now described in more detail with reference to the accompanying drawings in which the same or similar elements have the same reference signs assigned.

Omnidirectional video content typically undergoes a projection to a rectangular video frame as used in traditional video services with non-omnidirectional video content. One flavor of those projections uses continuously differentiable functions to map 3D points to the picture plane, e.g. linear and trigonometric functions as in the Equirectangular projection. Another kind of these projections is based on geometric primitives with an integer number of surface planes, such as pyramids, cubes or other polyhedrons. The procedure is twofold: First, 3D points are mapped to the faces of a polyhedron, typically using a perspective projection to a camera point within the polyhedron, e.g. in the geometric center. Common examples of the polyhedron are regular symmetric six-sided cubes, also referred to as the cubic projection. Second, the faces of the polyhedron are arranged into a rectangular video frame for encoding. The rectangular video frame may include one or more rectangular regions associated with a polyhedron face. So far it has been asserted that the one or more rectangular regions do not correspond to rectangular regions that have a center that is not aligned with a polyhedron face, e.g., a cubeface center. In other words, the one or more rectangular regions are considered to be aligned with a polyhedron face. However, this is not necessarily the case, and some or all of the rectangular regions may not be aligned with a polyhedron face or in the center of the polyhedron face, also referred to as being offset rom the center. Due to this offset the contents of the respective one or more rectangular regions may not be correctly described for a full view sphere. For example, when considering a face of a cube that is perpendicular to the x axis of a global x/y/z coordinate system defining the full view sphere, and implementing tile streaming with a cube mapping projection, CMP, configuration where each face is split into a 2x2 grid, it has been found this it is may not be possible to define coverage information accurately when generating the content. The coverage information may signal which area is covered and the coverage information may be used for content selection purposes. The incorrect description one or more rectangular regions may generate interoperability issues as some clients or players, based on the incorrect description, may erroneously determine the presence of gaps when it comes to the implementation of the viewport dependent profile, because some clients or players rely on CC information to select representations with higher bitrate for tiles on the viewport and when erroneously determining the presence of gaps some clients or players may stop the playback since they are expecting content without gaps.

The present invention is based on the finding that such problems may be avoided when orienting the sphere region to a center of the face of the polyhedron that is closest to the center of the rectangular region, e.g., by obtaining azimuth and elevation coordinates of the sphere region using an azimuth coordinate and an elevation coordinate for a center of the face of the polyhedron that is closest to the center of the rectangular region. In other words, in accordance with embodiments, the sphere region for an offset rectangular region may be defined using four great circles, namely the azimuth great circles and the elevation great circles which are orientated to the center point among the polyhedron faces that is closest to the center of the offset rectangular region. The inventive approach provides a correct description of the regions thereby avoiding that some clients erroneously determine the presence of gaps. A client or receiver may perform an optimization based on the mapping of the regions to the viewport. For instance, in an adaptive streaming situation high resolution tiles may be chosen that are gap-less and match the viewport of the client. In addition, since the accurate regions may be defined without gaps, it is easier for a receiver to prioritize regions with the accurate information of which pixels are inside by taking regions within the viewport that are closest to the viewport center or gaze of the viewer at higher bitrates than others.

Fig. 1 is a schematic representation of a system for communicating video or picture information between a server 100 and a client 200. The server 100 and the client 200 may communicate via a wired or wireless communication link for transmitting a data stream 300 including the video or picture information. The server 100 includes a signal processor 102 and may operate in accordance with the inventive teachings described herein. The client 200 includes a signal processor 202 and may operate in accordance with the inventive teachings described herein. The data stream 300 includes data in accordance with the inventive teachings described herein.

Data Stream

The present invention provides a data stream, e.g., the data stream 300 of Fig. 1 , comprising an encoded picture, the picture composed of one or more rectangular regions on respective faces of a polyhedron onto which a full view sphere of the picture is projected according to a predetermined projection scheme; first data indicating for a rectangular region a sphere region of the full view sphere associated with the rectangular region, the first data indicating an azimuth range of the sphere region and an elevation range of the sphere region; and second data indicating that the rectangular region has a center that is offset from a center of the face of the polyhedron on which the rectangular region is arranged, the second data causing azimuth and elevation coordinates of the sphere region to be obtained using an azimuth coordinate and an elevation coordinate for a center of the face of the polyhedron that is closest to the center of the rectangular region.

For example, according to inventive approach, the data stream signals that a rectangular region is offset from a center of a face of a polyhedron and causes the azimuth and elevation coordinates for great circles defining the sphere region to be calculated based on the azimuth and elevation coordinates of the center of the closest face of the polyhedron so that the great circles are oriented to this center.

In accordance with embodiments, the second data indicates, from a predefined class of polyhedrons, the polyhedron onto which a full view sphere of the picture is projected, the predefined class defining for one or more polyhedrons the coordinates of the centers of the respective faces of the polyhedron, so that the azimuth coordinates and elevation coordinates of the sphere region are obtained using an azimuth coordinate and an elevation coordinate from the predefined class.

For example, the coordinates of the center of the faces are selected from a new box/class defining for one or more polyhedrons the respective face center coordinates. In accordance with embodiments, the sphere region is defined by four great circles on the surface of the full view sphere, each great circle having a center coinciding with a center of the full view sphere, the four great circles including two azimuth great circles and two elevation great circles defined by four points indicated by the first data, and the second data causing a modification or adapting of the elevation great circles such that the azimuth great circles and the elevation great circles are orientated to the center of the face of the polyhedron that is closest to the center of the rectangular region.

For example, the great circles are determined in accordance with reference [1] with a modification or an adaption of the azimuth and elevation coordinates of the great circles according to inventive approach.

In accordance with embodiments, the first data indicates for the sphere region a center azimuth coordinate, centre Azimuth, and a center elevation coordinate, centreElevation, for a center of the sphere region, a sphere azimuth range, azimuth_range, of the sphere region, and a sphere elevation range, elevation_range, of the sphere region.

In accordance with embodiments, a center of the face of the polyhedron is determined to be closest to the center of the rectangular region when the center point coordinates of the region is within the coordinate range defined by the center of the face of the polyhedron and the coverage range of a face. For instance, in case of a cube, a face covers 90x90 degrees. Assuming there is no rotation, if the center point of the front face is in the coordinate (0,0) any azimuth_centre value that is in the range [-45,45] and elevation_centre value that is in the range [-atan(cos(azimuth_centre)), atan(cos(azimuth_centre))] has its closest face center in (0,0).

In accordance with embodiments, the polyhedron comprises n faces, and the centers of the faces of the polyhedron are defined by the azimuth and elevation coordinates (x1 ,y1), (x2,y2), (x3,y3), ...(xn.yn).

In accordance with embodiments, the polyhedron comprises n=6 faces so that the full view sphere of the picture is projected onto a cube according to the cube mapping scheme, and the centers of the faces of the polyhedron are defined by the following azimuth and elevation coordinates: (0,0), (90,0), (-90,0), (180,0), (0,90), and (0,-90). In accordance with embodiments, the data stream comprises third data indicating a rotation or a degree of rotation of the polyhedron relative to a global coordinate system as defined by the full view sphere. In accordance with embodiments, in case of a non-rotated polyhedron, the azimuth and elevation coordinates for the centers of the respective faces of the polyhedron are global coordinates of the global coordinate system.

In accordance with embodiments, in case of a rotated polyhedron, the azimuth and elevation coordinates for the centers of the respective faces of the polyhedron are local coordinates rotated relative to the global coordinate system, and the first data indicates a rotation so that the local coordinates for the center of the face of the polyhedron that is closest to the center of the rectangular region is obtained by rotating the local coordinates for the center using the indicated rotation.

In accordance with embodiments, the data stream comprises fourth data indicating the predetermined projection scheme.

In accordance with embodiments, the fourth data comprises an identifier indexing one of a plurality of spherical projections.

In accordance with embodiments, the second data further indicates whether the rectangular region covers the sphere region fully or in part, and/or a quality with which a content of the rectangular region is encoded.

In accordance with embodiments, one or more of the faces of the polyhedron have associated therewith one or more of rectangular regions.

In accordance with embodiments, the data stream signals the first and second data using file format boxes or a media presentation description, MPD, for streaming with DASH.

Receiving the Bit stream

In accordance with an aspect of the inventive approach, an apparatus is provided which receives as an input the inventive data stream 300, e.g. from the server 100. The apparatus may implement the client 200 or may be part of the client 200. The apparatus comprises a processing unit, e.g., implemented using the signal processor 202, the processing unit for receiving a data stream, e.g. the data stream 300, the data stream 300 including an encoded picture, the picture composed of one or more rectangular regions on respective faces of a polyhedron onto which a full view sphere of the picture is projected according to a predetermined projection scheme; wherein the processing unit is configured to derive from the data stream first data and second data, wherein the first data indicates for a rectangular region a sphere region of the full view sphere associated with the rectangular region, the first data indicating an azimuth range of the sphere region and an elevation range of the sphere region, and wherein the second data indicates that the rectangular region has a center that is offset from a center of the face of the polyhedron on which the rectangular region is arranged, and wherein, responsive to deriving the second data from the data stream, the processing unit is configured to modify or adapt the azimuth and elevation coordinates of the sphere region using an azimuth coordinate and an elevation coordinate for a center of the face of the polyhedron that is closest to the center of the rectangular region.

In accordance with another aspect of the inventive approach, an apparatus for decoding a video from a video bitstream, is provided. The apparatus comprises the above mentioned inventive apparatus and a decoder receiving the data stream including the encoded pictures of the video, decoding content of the rectangular regions from the data stream, and providing the decoded content on the full view sphere according to the first and second data derived from the data stream.

In accordance with yet another aspect of the inventive approach, a client in a video streaming environment providing a video bit stream, for example an environment using file format boxes or a media presentation description, MPD, is provided, the client comprising one or both of the above apparatus.

Providing the Bit Stream

In accordance with an aspect of the inventive approach, an apparatus is provided which provides as an output the inventive data stream 300, e.g. to the client 200. The apparatus may implement the server 100 or may be part of the server 100. The apparatus is configured to insert into a data stream first data and second data, the data stream including an encoded picture, the picture composed of one or more rectangular regions on respective faces of a polyhedron onto which a full view sphere of the picture is projected according to a predetermined projection scheme; wherein first data indicates for a rectangular region a sphere region of the full view sphere associated with the rectangular region, the first data indicating an azimuth range of the sphere region and an elevation range of the sphere region; and wherein the second data indicates that the rectangular region has a center that is offset from a center of the face of the polyhedron on which the rectangular region is arranged, the second data causing the azimuth and elevation coordinates of the sphere region to be obtained using an azimuth coordinate and an elevation coordinate for a center of the face of the polyhedron that is closest to the center of the rectangular region.

In accordance with another aspect of the inventive approach, an apparatus for encoding a video into a video bitstream is provided. The apparatus comprises an encoder for generating a data stream including an encoded picture, the picture composed of one or more rectangular regions on respective faces of a polyhedron onto which a full view sphere of the picture is projected according to a predetermined projection scheme, the encoder receiving the pictures of the video to be encoded, and encoding the content of the rectangular regions into the data stream, and the above mentioned inventive apparatus to insert into the data stream the first data and the second the data.

In accordance with yet another aspect of the inventive approach, a server device in a video streaming environment providing a video bit stream, for example an environment using file format boxes or a media presentation description, MPD, is provided the server comprising one or both of the above apparatus.

Computer Program Product

The present invention provides a computer program product comprising instructions which, when the program is executed by a computer, cause the computer to carry out the one or more methods of in accordance with the present invention.

More detailed embodiments of the inventive approach will now be described with reference to following figures. Fig. 2 shows an example for an environment, similar to Fig. 1 , where embodiments of the present application may be applied and advantageously used. In particular, Fig. 2 shows a system composed of a server 100 and a client 200, like the system of Fig. 1. The server 100 and the client 200 may interact using adaptive streaming. For instance, dynamic adaptive streaming over HTTP (DASH) employing a media presentation description (MPD) may be used for the communication 310 between the server 100 and the client 200. However, the inventive approach described herein is not limited to DASH, and in accordance with embodiments, the inventive approach may be implemented using file format boxes. Thus, any term used herein is to be understand as being broad so as to also cover manifest files defined differently than in DASH. Fig. 2 illustrates a system for implementing a virtual reality application. For example, the system presents to a user wearing a head up display 204, e.g., using an internal display 206 of the head up display 204, a view section 208 of a temporally-varying spatial scene 210. The section 208 may correspond to an orientation of the head up display 204 that may be measured by an internal orientation sensor 212, like an inertial sensor of the head up display 204. Thus, the section 208 presented to the user is a section of the spatial scene 210, and the spatial position of the spatial scene 210 corresponds to the orientation of the head up display 204. The temporally-varying spatial scene 210 is depicted as an omnidirectional video or spherical video, however, the present invention is not limited to such embodiments. In accordance with other embodiments, the section 208 displayed to the user may from a video with a spatial position of the section 208 being determined by an intersection of a facial access or eye access with a virtual or real projector wall or the like. Further, the sensor 212 and the display 206 may be separate or different devices, such as a remote control and a corresponding television set. In accordance with other embodiments, the sensor 212 and the display 206 may be part of a hand-held device, like a mobile device, e.g., a tablet or a mobile phone.

The server 100 may comprise a controller 102, e.g., implemented using the signal processor 102 of Fig. 1 , and a storage 104. The controller 102 may be an appropriately programmed computer, an application-specific integrated circuit or the like. The storage 104 stores media segments which represent the temporally-varying spatial scene 210.

The controller 102, responsive to requests from the client 200, sends to the client 200 the requested media segments together with a media presentation description and further information. The controller 102 may fetch the requested media segments from the storage 104. Within this storage 104, also other information may be stored such as the media presentation description or parts of the media presentation description.

The client 200 comprises a client device or controller 202, e.g., implemented using the signal processor 202 of Fig. 1 , one or more decoder units 214 and a re-projector 216. The client device 202 may be an appropriately programmed computer, a microprocessor, a programmed hardware device, such as an FPGA, an application specific integrated circuit or the like. The client device 202 assumes responsibility for selecting the media segments to be retrieved from the server 100 out of one or more media segments 106 offered at the server 100. To this end, the client device 202 initially retrieves a manifest or media presentation description from the server 100. From the retrieved manifest, the client device 202 obtains a computational rule for computing addresses of one or more of the media segments 106 which correspond to certain, needed spatial portions of the spatial scene 210. The selected media segments are retrieved by the client device 202 from the server 100.

The media segments retrieved by the client device 202 are forwarded to the one or more decoders 214 for decoding. In the example of Fig. 2, the retrieved and decoded media segments represent, for a temporal time unit, a spatial section 218 of the temporally- varying spatial scene 210. As mentioned above, this may be different in accordance with other embodiments, where, for instance, the view section 208 to be presented constantly covers the whole scene. The re-projector 216 may re-project and cut-out from the retrieved and decoded scene content 218 (defined by the selected, retrieved and decoded media segments) the view section 208 to be displayed to the user. To this end, the client device 202 may continuously track and update a spatial position of the view section 208, e.g., responsive to the user orientation data from the sensor 212 and inform the re- projector 216 about the current spatial position of scene section 208 as well as of the reprojection mapping to be applied onto the retrieved and decoded media content so as to be mapped onto an area forming the view section 208. The re-projector 216 may apply a mapping and an interpolation onto a regular grid of pixels to be displayed on the display 206. Fig. 2 illustrates an embodiment where a cubic mapping has been used to map the spatial scene 210 onto the respective cube faces using for each face one or more tiles 220. In the depicted embodiment, each cube face has associated four tiles. The tiles 220 are depicted as rectangular sub-regions of the cube onto which the scene 210, which has the form of a sphere, has been projected. The re-projector 216 reverses the projection. However, the present invention is not limited to a cubic projection or cube mapping. In accordance with other embodiments, instead of a cubic projection, a projection onto a truncated pyramid or a pyramid without truncation may be used. In general, any polyhedron having n faces may be used. Although the tiles 220 are depicted to be nonoverlapping in terms of coverage of the spatial scene 210, in accordance with other embodiments, some or all of the tiles 220 may at least partially overlap. In the embodiment depicted in Fig. 2, the whole spatial scene 210 is spatially subdivided into the tiles 220, and each of the six faces of the cube is subdivided into four tiles. For illustration purposes, the tiles 220 are numbered as tiles 1 to 24, of which tiles 1 to 12 are visible in Fig. 2. For each tile 220, the server 100 offers a video 108 which may be temporally subdivided into temporal segments 1 10. The server 100 may offer more than one video

108 per tile 220, the videos differing in quality Q1 , Q2. The temporal segments 1 10 of the videos 108 of all tiles T1-T24 may form or may be encoded into one of the media segments 106 stored in the storage 104. It is noted that the tile-based streaming illustrated in Fig. 2 is merely an example from which many deviations are possible. For instance, a different number of tiles 220 may be used for some or all of the cube faces.

As described above, the omnidirectional video content to be presented to the user may undergo a projection to a rectangular video frame as used in traditional video services with non-omnidirectional video content. One flavor of those projections uses continuously differentiable functions to map 3D points to the picture plane, e.g. linear and trigonometric functions as in the Equirectangular projection. Another kind of these projections is based on geometric primitives with an integer number of surface planes, such as pyramids, cubes or other polyhedrons. 3D points are mapped to the faces of a polyhedron, typically using perspective projection to a camera point within the polyhedron, e.g. in the geometric center. A common example is the use of a regular symmetric six-sided cube, as described above with reference to Fig. 2, also referred to as the cubic projection. The faces of the polyhedron are then arranged into a rectangular video frame for encoding.

The mapping from the 3D points to a rectangular projected frame is defined by a projection that is signaled, e.g. in the bitstream. For example, in case of an Equirectangular projection this is signaled by a Equirectangular Projection supplemental enhancement information, SEI, message, or by a Projection type equal to 0 in the Projection FormatStruct() of the ISO-BMFF (ISO base media file format). In case of a cubic projection this may be signaled by a CubeMap Projection SEI message, or by a Projection type equal to 1 in the ProjectionFormatStruct() of the ISO-BMFF. In addition to the metadata that signals the projection used, there may be further signaling that describes regions of a sphere for different purposes. For example, the further signaling may indicate: • “coverage” information: in case a rectangular video encoded does not fully cover the whole sphere, coverage information signals which area is covered and the coverage information may be used for content selection purposes, and/or

“region-wise” quality signaling: in case the content is encoded or transmitted with regions having different qualities, e.g., different QPs or different pixel density per degree, the qualities of different regions may be signaled, since there may be different versions of the content with different qualities for different regions so that depending on the viewing orientation of the user, one of the version may be better than another version.

Two types of spherical regions may be defined, namely:

• a spherical region that corresponds to the surface of a sphere that is limited by four great circles each having a center coinciding with a center of the sphere, and the four great circles include two great circles (azimuth circles) limiting an azimuth interval and two great circles (elevation circles) limiting an elevation interval, and

• a spherical region that corresponds to two great circles (azimuth circles) limiting an azimuth interval, and two small circles (elevation circles) limiting an elevation interval, the great circles each having a center coinciding with a center of the sphere, and the small circles each having a center coinciding with an elevation axis of the sphere.

Fig. 3 illustrates the definition of the spherical region by four great circles. In Fig. 3 the sphere 400 represents the spatial scene 210 described above with reference to Fig. 2. Fig. 3 shows the spherical region 402 limited by the two azimuth great circles 406a, 406b limiting an azimuth interval 406, and by two elevation great circles 408a, 408b limiting an elevation interval 408. Further, Fig. 3 illustrates:

• a center point 410 of the spherical region 402 described by the coordinates centre Azimuth and centreElevation,

• a first azimuth location 412a of the spherical region 402 by the coordinate cAzimuthl ,

• a second azimuth location 412b of the spherical region 402 by the coordinate cAzimuth2,

• a first elevation location 414a of the spherical region 402 by the coordinate cElevationl , and

• a second elevation location 414b of the spherical region 402 by the coordinate cElevation2.

In accordance with embodiments, the spherical region may be specified by the standard according to reference [1] using sphereRegionstruct:

In accordance with embodiments, the SphereRegionStruct may be used for signaling the above mentioned content coverage or coverage information:

In accordance with other embodiments, the SphereRegionStruct may be used for signaling the above mentioned region quality rankings or the region-wise quality signaling content coverage or coverage information:

Both embodiments contain the sphereRegionstruct and some indication of the shape type: namely coverage_shape_type or region_definition_type.

When using sphereRegionstruct, the sphere region may be specified or determined as follows:

in case both azimuth_range and elevation_range are equal to 0, the sphere region specified is a point on a spherical surface.

otherwise, the sphere region is defined using the variables centreAzimuth, centreElevation, cAzimuthl , cAzimuth, cElevationl , and cElevation2 which may be derived as follows:

The sphere region may be defined as follows with reference to the shape type value specified in the semantics of the structure containing this instance of

SphereRegionStruct:

• when the shape type value is equal to 0, the sphere region is specified by four great circles defined by four points cAzimuth1 , cAzimuth2, cElevation1 , cElevation2 and the center point defined by centreAzimuth and centreElevation as shown in Fig. 3,

• when the shape type value is equal to 1 , the sphere region is specified by two azimuth great circles and two elevation small circles defined by four points cAzimuth1 , cAzimuth2, cElevation1 , cElevation2 and the center point defined by centreAzimuth and centreElevation.

The reason for having different shape type values is that rectangular regions from a projected frame result into different spherical regions defined by great or small circles. For example, a rectangular region in a Equirectangular projected frame corresponds to a spectral region that is defined by two great circles and two small circles. However, a rectangular region in a Cubemap projected frame with centreAzimuth and centreElevation coinciding with the x axis or the y axis corresponds to a sphere region that is defined by the four great circles in Fig. 3. When considering any projection corresponding to a polyhedron, any rectangular region within a face of the polyhedron is defined by the four great circles, when the center is such that the vector=center-(0,0,0) is perpendicular to the given polyhedron face.

However, there are situations when one or more of the rectangular regions do not necessarily need to be in the center of a polyhedron face. Such a rectangular region is also referred to as an offset rectangular region. For instance, in case of the cubemap projection, if tile streaming is used, as e.g., in the embodiment of Fig. 2, the tiles 220 do not correspond to rectangular regions that correspond to the center of the face. This is shown in more detail in Fig. 4 illustrating the respective faces of the cube and the tiles 1 to 24. As may be seen from Fig. 4, each tile or rectangular region does not correspond to the center of the respective face. In other words, the center of a tile is offset from the face center. When considering the front face, the tile boundaries correspond to cAzimuthl , cAzimuth2 {45,0} or {0,-45} and cElevationl , cElevation2 {45,0} or {0,-45} with the great circles for the elevation defined for an azimuth equal to 0.

As described above, when taking the center of the offset tiles, it is not possible to define the great circles for azimuth and elevation for the tiles boundaries adequately so that, for example, the description of the region is not correct. This may be avoided in accordance with the inventive approach.

In accordance with embodiments of the inventive approach the sphereRegionstruct (first data) may be used for describing the sphere region for an offset rectangular region together with an new shape type value (second data), e.g. shape_type(X) with X being certain number or letter, that may be signaled in the data stream from the server 100 to the client 200 of Fig. 2. In other words, the new shape type value may indicate that the rectangular region has a center that is offset from a center of the face of the polyhedron on which the rectangular region is arranged, and that azimuth and elevation coordinates of the sphere region are to be obtained using an azimuth coordinate and an elevation coordinate for a center of the face of the polyhedron that is closest to the center of the rectangular region. The first data may indicate for the sphere region a center azimuth coordinate, centreAzimuth, and a center elevation coordinate, centreElevation, for a center of the sphere region, a sphere azimuth range, azimuth_range, of the sphere region, and a sphere elevation range, elevation_range, of the sphere region, and in accordance with embodiments, a center of the face of the polyhedron may be determined to be closest to the center of the rectangular region when the center point coordinates of the region is within the coordinate range defined by the center of the face of the polyhedron an the coverage range of a face.

In accordance with embodiments, for a cubic mapping using tiles as illustrated in Fig. 4 the new shape type value indicates that the offset rectangular region is defined using the four great circles defined by the four points cAzimuthl , cAzimuth2, cElevationl , cElevation2 (see Fig. 3), wherein the azimuth great circles 406a, 406b and the elevation great circles 408a, 408b are orientated to the center point among the cube faces ( azimuth, elevation)={(0,0)-front; (90,0)-left; (-90,0)-right; (180,0)-back; (0,90)-top; (0,-90)- bottom} that is closest to the center of the offset rectangular region centreAzimuth and centreElevation. In other words, when the new shape type value is signaled, the sphere region is specified by four great circles defined by the four points cAzimuthl , cAzimuth2, cElevationl , cElevation2 and the center point defined by centreAzimuth and centreElevation. However, the coordinates cAzimuthl , cAzimuth2, cElevationl , cElevation2 are defined for the a center point of the closest face center (azimuth, elevation)={(0,0); (90,0); (-90,0); (180,0); (0,90); (0,-90);}, i.e., the face of the cube that is the closest to centreAzimuth and centreElevation of the offset rectangular region.

Conventionally, when the four great circles, which specify the sphere region, are defined by four points cAzimuthl , cAzimuth2, cElevationl , cElevation2 and the centre point which is defined by centreAzimuth and centreElevation, responsive to a type value equal to 0, the four great circles are obtained as follows, followed by the rotation operationat the end:

In other words, the regions may be computed based only on the azimuth and elevation ranges [see above values (cAzimuthl- centreAzimuth), (cAzimuth2- centreAzimuth) are equal to azimuth_range + 2 and (cElevationl- centreElevation), (cElevation2- centreElevation) are equal to elevation_range ÷ 2] and once the great circles are defined they are rotated to the center of the region defined by centreAzimuth, centreElevation.

On the other hand, in accordance with embodiments, when the four great circles, which specify the sphere region, are defined by four points cAzimuthl , cAzimuth2, cElevationl , cElevation2 and the centre point which is defined by centreAzimuth and centreElevation, responsive to the second data, the four great circles are obtained as below, taking into consideration the closest face of the cube already when deteriming the respective points defining the circles:

(b) For the top and bottom faces, the computation may be done similarly. In accordance with embodiments, the great circles may also be intrepreted all as elevation great circles, and, in such a case, may be computed as azimuth and elevation circles as before towards (0,0) and then rotated to be oriented towards the center of the top and bottom faces respectively. In other words, the great circles may be directly obtained in a way described above without the need for a subsequent rotation operation that deviates the orientation of the great circles from the face centers. However, in accordance with other embodiments, a rotation may still be applied, e.g., the formulas above may be written in a similar manner as done before for the shape_type 0; i.e. , first describing the great circles for orientation (0,0) and then rotating towards the face center of the closest face center (as already pointed out for top and bottom faces).

In accordance with embodiments, all coordinates used for the regions, e.g. for“coverage” or“region-wise quality raking", are global coordinates.

In accordance with other embodiments, a rotation of the content may be performed as a preprocessing step, e.g., because the rotation leads to a better coding efficiency. In case a rotation is applied, the center of the faces of a cubemap do not correspond to the global (azimuthGlobal,elevationGlobal)={(0,0); (90,0);(-90,0); (180,0); (0,90); (0,-90)} but to a local center (azimuthLocal,elevationLocal)={(0,0); (90,0); (-90,0); (180,0); (0,90); (0,-90)} of the rotated space.

In accordance with embodiments, the defined shape type may signal the rotation, e.g., using a RotationBox, so as to allow the client to derive the local coordinates to be used for determining cElevationl , cElevation2 by rotation of the local coordinates (azimuthLocal,elevationLocal)={(0,0); (90,0); (-90,0); (180,0); (0,90); (0,-90)}. In accordance with other embodiments, a new box or class may be used to define for one or more polyhedrons the global coordinates (azimuthGlobal,elevationGlobal)={(x1 ,y1); (x2,y2); (x3,y3); ... (xn.yn)} for the n faces of the polyhedron. In accordance with such embodiments the client matches the centreAzimuth and centreElevation to the closest face center coordinates (azimuthGlobal.elevationGlobal) that are then used to derive the great circles with cAzimuthl , cAzimuth2, cElevationl , cElevation2 that define the region. This embodiment is advantageous as it allows for more flexibility and forward compatibility. For instance, in case another projection is used that does not correspond to a Cubemap but to another polyhedron, defining the face centers may be done easily by specifying the coordinates of centers for the n faces (azimuthGlobal, elevationGlobal) = {(x1 ,y1 ); (x2,y2); (x3,y3); ... (xn.yn)}. Although, the embodiments have been described above with reference to a Cubemap, it is noted that the inventive approach is not limited to such a projection. The inventive approach is applicable to any kind of polyhedron. Although, the embodiments have been described above with reference to file format boxes, it is noted that the inventive approach is not limited to thus. In accordance with other embodiments, the information about offset rectangular regions (e.g. the shape type value) may be signaled using the Media Presentation Description (MPD) for streaming use case with DASH, where clients may request data from a sever and knowing beforehand this information may improve the 360 streaming experience.

Although some aspects of the described concept have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or a device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.

Various elements and features of the present invention may be implemented in hardware using analog and/or digital circuits, in software, through the execution of instructions by one or more general purpose or special-purpose processors, or as a combination of hardware and software. For example, embodiments of the present invention may be implemented in the environment of a computer system or another processing system. Fig. 5 illustrates an example of a computer system 500. The units or modules as well as the steps of the methods performed by these units may execute on one or more computer systems 500. The computer system 500 includes one or more processors 502, like a special purpose or a general purpose digital signal processor. The processor 502 is connected to a communication infrastructure 504, like a bus or a network. The computer system 500 includes a main memory 506, e.g., a random access memory (RAM), and a secondary memory 508, e.g., a hard disk drive and/or a removable storage drive. The secondary memory 508 may allow computer programs or other instructions to be loaded into the computer system 500. The computer system 500 may further include a communications interface 510 to allow software and data to be transferred between computer system 500 and external devices. The communication may be in the from electronic, electromagnetic, optical, or other signals capable of being handled by a communications interface. The communication may use a wire or a cable, fiber optics, a phone line, a cellular phone link, an RF link and other communications channels 512.

The terms“computer program medium” and "computer readable medium” are used to generally refer to tangible storage media such as removable storage units or a hard disk installed in a hard disk drive. These computer program products are means for providing software to the computer system 500. The computer programs, also referred to as computer control logic, are stored in main memory 506 and/or secondary memory 508. Computer programs may also be received via the communications interface 510. The computer program, when executed, enables the computer system 500 to implement the present invention. In particular, the computer program, when executed, enables processor 502 to implement the processes of the present invention, such as any of the methods described herein. Accordingly, such a computer program may represent a controller of the computer system 500. Where the disclosure is implemented using software, the software may be stored in a computer program product and loaded into computer system 500 using a removable storage drive, an interface, like communications interface 510.

The implementation in hardware or in software may be performed using a digital storage medium, for example cloud storage, a floppy disk, a DVD, a Blue-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.

Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.

Generally, embodiments of the present invention may be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier. Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier. In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer. A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet. A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein. A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.

In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are preferably performed by any hardware apparatus.

The above described embodiments are merely illustrative for the principles of the present invention. It is understood that modifications and variations of the arrangements and the details described herein are apparent to others skilled in the art. It is the intent, therefore, to be limited only by the scope of the impending patent claims and not by the specific details presented by way of description and explanation of the embodiments herein. REFERENCES

[1 ] JCTVC-AC0032,"On coverage signalling for omnidirectional video"

Claims

1. A data stream, comprising: an encoded picture, the picture composed of one or more rectangular regions on respective faces of a polyhedron onto which a full view sphere of the picture is projected according to a predetermined projection scheme; first data indicating for a rectangular region a sphere region of the full view sphere associated with the rectangular region, the first data indicating an azimuth range of the sphere region and an elevation range of the sphere region; and second data indicating that the rectangular region has a center that is offset from a center of the face of the polyhedron on which the rectangular region is arranged, the second data causing azimuth and elevation coordinates of the sphere region to be obtained using an azimuth coordinate and an elevation coordinate for a center of the face of the polyhedron that is closest to the center of the rectangular region.

2. The data stream of claim 1 , wherein the second data indicates, from a predefined class of polyhedrons, the polyhedron onto which a full view sphere of the picture is projected, the predefined class defining for one or more polyhedrons the coordinates of the centers of the respective faces of the polyhedron, so that the azimuth coordinates and elevation coordinates of the sphere region are obtained using an azimuth coordinate and an elevation coordinate from the predefined class.

3. The data stream of claim 1 or 2, wherein the sphere region is defined by four great circles on the surface of the full view sphere, each great circle having a center coinciding with a center of the full view sphere, the four great circles including two azimuth great circles and two elevation great circles defined by four points indicated by the first data, and the second data causing a modification or an adaption of the elevation great circles such that the azimuth great circles and the elevation great circles are orientated to the center of the face of the polyhedron that is closest to the center of the rectangular region.

4. The data stream claim 3, wherein the first data indicates for the sphere region a center azimuth coordinate, centreAzimuth, and a center elevation coordinate, centreElevation, for a center of the sphere region, a sphere azimuth range, azimuth_range, of the sphere region, and a sphere elevation range, elevation_range, of the sphere region.

5. The data stream claim 4, wherein a center of the face of the polyhedron is determined to be closest to the center of the rectangular region when the center point coordinates of the region is within the coordinate range defined by the center of the face of the polyhedron and the coverage range of a face.

6. The data stream of any one of claims 1 to 5, wherein the polyhedron comprises n faces, and the centers of the faces of the polyhedron are defined by the azimuth and elevation coordinates (x1 ,y1), (x2,y2), (x3,y3), ...(xn.yn).

7 The data stream of claim 6, wherein the polyhedron comprises n=6 faces so that the full view sphere of the picture is projected onto a cube according to the cube mapping scheme, and the centers of the faces of the polyhedron are defined by the following azimuth and elevation coordinates: (0,0), (90,0), (-90,0), (180,0), (0,90), and (0,-90).

8. The data stream claim 7, wherein the four great circles, which specify the sphere region, are defined by four points cAzimuthl , cAzimuth2, cElevationl , cElevation2 and the centre point which is defined by centreAzimuth and centreElevation, and wherein responsive to the second data, the four great circles for the front, back, left, or right faces, i.e., cElevationl and cElevation2 are in the range [-45,45], are obtained as follows:

9. The data stream of any one of claims 1 to 8, wherein the data stream comprises third data indicating a rotation or a degree of rotation of the polyhedron relative to a global coordinate system as defined by the full view sphere.

10. The data stream of claim 9, wherein, in case of a non-rotated polyhedron, the azimuth and elevation coordinates for the centers of the respective faces of the polyhedron are global coordinates of the global coordinate system.

1 1. The data stream of claim 9, wherein in case of a rotated polyhedron, the azimuth and elevation coordinates for the centers of the respective faces of the polyhedron are local coordinates rotated relative to the global coordinate system, and the first data indicates a rotation so that the local coordinates for the center of the face of the polyhedron that is closest to the center of the rectangular region is obtained by rotating the local coordinates for the center using the indicated rotation.

12. The data stream of any one of claims 1 to 11 , wherein the data stream comprises fourth data indicating the predetermined projection scheme.

13. The data stream of claim 12, wherein the fourth data comprises an identifier indexing one of a plurality of spherical projections .

14. The data stream of any one of claims 1 to 13, wherein the second data further indicates whether the rectangular region covers the sphere region fully or in part, and/or a quality with which a content of the rectangular region is encoded.

15. The data stream of any one of claims 1 to 14, wherein one or more of the faces of the polyhedron have associated therewith one or more rectangular regions.

16. The data stream of any one of claims 1 to 15, wherein the data stream signals the first and second data using file format boxes or a media presentation description, MPD, for streaming with DASH.

17. An apparatus, comprising: a processing unit for receiving a data stream, the data stream including an encoded picture, the picture composed of one or more rectangular regions on respective faces of a polyhedron onto which a full view sphere of the picture is projected according to a predetermined projection scheme; wherein the processing unit is configured to derive from the data stream first data and second data, wherein the first data indicates for a rectangular region a sphere region of the full view sphere associated with the rectangular region, the first data indicating an azimuth range of the sphere region and an elevation range of the sphere region, and wherein the second data indicates that the rectangular region has a center that is offset from a center of the face of the polyhedron on which the rectangular region is arranged, and wherein, responsive to deriving the second data from the data stream, the processing unit is configured to modify or adapt the azimuth and elevation coordinates of the sphere region using an azimuth coordinate and an elevation coordinate for a center of the face of the polyhedron that is closest to the center of the rectangular region.

18. An apparatus for decoding a video from a video bitstream, the apparatus comprising: the apparatus of claim 17, and a decoder receiving the data stream including the encoded pictures of the video, decoding content of the rectangular regions from the data stream, and providing the decoded content on the full view sphere according to the first and second data derived from the data stream.

19. A client in a video streaming environment providing a video bit stream, for example an environment using file format boxes or a media presentation description, MPD, the client comprising an apparatus of claim 17 or 18.

20. An apparatus, wherein the apparatus is configured to insert into a data stream first data and second data, the data stream including an encoded picture, the picture composed of one or more rectangular regions on respective faces of a polyhedron onto which a full view sphere of the picture is projected according to a predetermined projection scheme; wherein first data indicates for a rectangular region a sphere region of the full view sphere associated with the rectangular region, the first data indicating an azimuth range of the sphere region and an elevation range of the sphere region; and wherein the second data indicates that the rectangular region has a center that is offset from a center of the face of the polyhedron on which the rectangular region is arranged, the second data causing the azimuth and elevation coordinates of the sphere region to be obtained using an azimuth coordinate and an elevation coordinate for a center of the face of the polyhedron that is closest to the center of the rectangular region.

21. An apparatus for encoding a video into a video bitstream, the apparatus comprising: an encoder for generating a data stream including an encoded picture, the picture composed of one or more rectangular regions on respective faces of a polyhedron onto which a full view sphere of the picture is projected according to a predetermined projection scheme, the encoder receiving the pictures of the video to be encoded, and encoding the content of the rectangular regions into the data stream, and the apparatus of claim 20 to insert into the data stream the first data and the second the data.

22. A server in a video streaming environment providing a video bit stream, for example an environment using file format boxes or a media presentation description, MPD, the server comprising an apparatus of claim 20 or 21.

23. A method, comprising: receiving a data stream, the data stream including an encoded picture, the picture composed of one or more rectangular regions on respective faces of a polyhedron onto which a full view sphere of the picture is projected according to a predetermined projection scheme; deriving from the data stream first data and second data, wherein first data indicates for a rectangular region a sphere region of the full view sphere associated with the rectangular region, the first data indicating an azimuth range of the sphere region and an elevation range of the sphere region, and wherein the second data indicates that the rectangular region has a center that is offset from a center of the face of the polyhedron on which the rectangular region is arranged, and responsive to deriving the second data from the data stream, modifying or adapting the azimuth and elevation coordinates of the sphere region using an azimuth coordinate and an elevation coordinate for a center of the face of the polyhedron that is closest to the center of the rectangular region.

24. A method for decoding a video from a video bitstream, the method comprising: deriving from the data stream the first data and the second data according to the method of claim 23 i receiving the data stream including the encoded pictures of the video, decoding content of the rectangular regions from the data stream, and providing the decoded content on the full view sphere according to the first and second data derived from the data stream.

25. A method, comprising inserting into a data stream first data and second data, the data stream including an encoded picture, the picture composed of one or more rectangular regions on respective faces of a polyhedron onto which a full view sphere of the picture is projected according to a predetermined projection scheme; wherein first data indicates for a rectangular region a sphere region of the full view sphere associated with the rectangular region, the first data indicating an azimuth range of the sphere region and an elevation range of the sphere region; and wherein the second data indicates that the rectangular region has a center that is offset from a center of the face of the polyhedron on which the rectangular region is arranged, the second data causing the azimuth and elevation coordinates of the sphere region to be obtained using an azimuth coordinate and an elevation coordinate for a center of the face of the polyhedron that is closest to the center of the rectangular region.

26. A method for encoding a video into a video bitstream, the method comprising: generating a data stream including an encoded picture, the picture composed of one or more rectangular regions on respective faces of a polyhedron onto which a full view sphere of the picture is projected according to a predetermined projection scheme, the encoding comprising receiving the pictures of the video to be encoded, and encoding the content of the rectangular regions into the data stream, and inserting into the data stream the first data and the second the data according to the method of claim 25.

27. A computer program product comprising instructions which, when the program is executed by a computer, cause the computer to carry out the method of any one of claims 23 to 26.