WO2019038433A1 - Characteristics signaling for omnidirectional content - Google Patents

Characteristics signaling for omnidirectional content Download PDF

Info

Publication number
WO2019038433A1
WO2019038433A1 PCT/EP2018/072910 EP2018072910W WO2019038433A1 WO 2019038433 A1 WO2019038433 A1 WO 2019038433A1 EP 2018072910 W EP2018072910 W EP 2018072910W WO 2019038433 A1 WO2019038433 A1 WO 2019038433A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
picture
data stream
guard band
full view
Prior art date
Application number
PCT/EP2018/072910
Other languages
French (fr)
Inventor
Robert SKUPIN
Yago SÁNCHEZ DE LA FUENTE
Cornelius Hellge
Thomas Schierl
Original Assignee
Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. filed Critical Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.
Priority to EP18756252.5A priority Critical patent/EP3673659A1/en
Publication of WO2019038433A1 publication Critical patent/WO2019038433A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/21805Source of audio or video content, e.g. local disk arrays enabling multiple viewpoints, e.g. using a plurality of cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/84Generation or processing of descriptive data, e.g. content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/85406Content authoring involving a specific file format, e.g. MP4 format

Definitions

  • the present Invention relates to the field of a encoding/decoding pictures, images or videos, more specifically to improvements in the characteristics signaling for omnidirectional content.
  • Embodiments relate to a signaling of the projection coverage type, to a signaling of static projection or region-wise-packing characteristics and to a signaling of guard band usage.
  • VR streaming may involve the transmission of a high-resolution video.
  • the resolving capacity of the human fovea is around 60 pixels per degree.
  • HMD Head Mounted Display
  • spherical video neighboring data or the rest of the omnidirectional video, also referred to as spherical video, is send at a lower resolution or with a lower quality.
  • FIG. 1 is a schematic representation of a system for transferring picture or video data from a server to a client in accordance with embodiments of the present invention
  • Fig. 2 shows a schematic diagram illustrating a system including a client and a server for virtual reality applications as an example where embodiments of the inventive approach described herein may be used; illustrates the definition of a spherical region, wherein Fig. 3(a) illustrates the definition of the spherical region by four great circles, and Fig. 3(b) illustrates the definition of the spherical region by two great circles and two small circles; illustrates an example of a CMP picture with an indication of yaw/pitch angle ranges and different picture subsets on the left and on the right; illustrates embodiments of the first aspect of the present invention, wherein Fig.
  • FIG. 5(a) illustrates a first embodiment of an explicit_coverage_idc indicator
  • Fig. 5(b) illustrates a second embodiment of an explicit_coverage_idc indicator; schematically illustrates, in combination with a system as described with reference to Fig.
  • the encoding/decoding of a data stream the composition of the pictures of the video out of one or more rectangular regions of the projected plane and the mapping between a projected plane and a full view sphere; illustrates an embodiment for a possible guard band layout around six faces of a CMP; illustrates an example of a mixed resolution CMP with guard band regions around the left CMP face; illustrates the syntax that may be used according to an embodiment of the third aspect of the present invention based on a region-wise packing box "rwpk" defined in OMAF; illustrates a syntax used according to an embodiment of the third aspect of the present invention indicating how guard bands are used in the rendering process;
  • Fig. 1 1 illustrates a syntax used according to an embodiment of the third aspect of the present invention for enforcing a quality ranking of CMP faces and guard band regions on the basis of a common quality scale; and illustrates an example of a computer system on which units or modules as well as the steps of the methods described in accordance with the inventive approach may execute.
  • Omnidirectional video content typically undergoes a projection to a rectangular video frame as used in traditional video services with non-omnidirectional video content.
  • One flavor of those projections uses continuously differentiable functions to map 3D points to the picture plane, e.g. linear and trigonometric functions as in the Equirectangular projection, ERP.
  • Another kind of these projections is based on geometric primitives with an integer number of surface planes, such as pyramids, cubes or other polyhedrons. The procedure is twofold: First, 3D points are mapped to the faces of a polyhedron, typically using a perspective projection to a camera point within the polyhedron, e.g. in the geometric center. Common examples of the polyhedron are regular symmetric six-sided cubes, also referred to as the cubic projection.
  • the faces of the polyhedron are arranged into a rectangular video frame for encoding.
  • the rectangular video frame may include one or more rectangular regions associated with a polyhedron face.
  • Fig. 1 is a schematic representation of a system for communicating video or picture information between a server 100 and a client 200.
  • the server 100 and the client 200 may communicate via a wired or wireless communication link for transmitting a data stream 300 including the video or picture information.
  • the server 100 includes a signal processor 102 and may operate in accordance with the inventive teachings described herein.
  • the client 200 includes a signal processor 202 and may operate in accordance with the inventive teachings described herein.
  • the data stream 300 includes data in accordance with the inventive teachings described herein.
  • Embodiments of the first aspect of the present invention address a discrepancy found in conventional approaches when an addressed subsection of a picture carrying a polygon- based omnidirectional projection plane does not have a rectangular shape.
  • Embodiments of the first aspect provide for a coverage information signaling which may be accompanied by an additional signaling or a constraint which indicates that no video pictures are located outside an indicated coverage range and/or which indicates that the indicated coverage area is located completely within the video picture and some pixels are located outside the indicated coverage range.
  • the present invention provides a data stream having encoded thereinto a picture, the data stream comprising first data indicating a portion of a full view sphere which is captured by the picture, and second data indicating whether the first data indicates the portion so that a section of the full view sphere sampled by the picture's samples completely resides within the portion, or whether the first data indicates the portion so that the portion completely resides within the section; or the portion of the full view sphere which is captured by the picture, wherein the first data indicates the portion so that the section of the full view sphere sampled by the picture's samples completely resides within the portion, and the second data indicates the portion so that the portion completely resides within the section.
  • the indication of the portion indicates the portion so that the portion is confined by two lines of constant pitch, and two lines of constant yaw.
  • the first data comprises a first syntax element indicating a pitch angle for a first line of constant pitch, a second syntax element indicating a pitch angle for a second line of constant pitch, a third syntax element indicating a yaw angle for a third line of constant yaw, a fourth syntax element indicating a yaw angle for a fourth line of constant yaw, wherein, if the first data indicates the portion so that the section of the full view sphere sampled by the picture's samples completely resides within the portion, the first to fourth lines confine all samples of the picture, and , if the first data indicates the portion so that the portion completely resides within the section, the first to fourth lines completely extend within the section.
  • the second data indicates the portion of the full view sphere which is captured by the picture
  • the first data indicates the portion so that the section of the full view sphere sampled by the picture's samples completely resides within the portion
  • the second data indicates the portion so that the portion completely resides within the section
  • the second data comprises a fifth syntax element indicating a pitch angle for a fifth line of constant pitch, a sixth syntax element indicating a pitch angle for a sixth line of constant pitch, wherein the first to fourth lines confine all samples of the picture, and the fifth and sixth lines along with seventh and eighth lines being of constant yaw and pitch, respectively, equal to yaw and pitch angles indicated by the third and fourth syntax element completely extend within the section.
  • the second data further comprises a seventh syntax element indicating the yaw angle for the seventh line of constant yaw, and an eight syntax element indicating the yaw angle for the eight line of constant yaw.
  • the second data indicates the portion of the full view sphere which is captured by the picture
  • the first data indicates the portion so that the section of the full view sphere sampled by the picture's samples completely resides within the portion
  • the second data indicates the portion so that the portion completely resides within the section
  • the first data comprises a first syntax element indicating a pitch angle for a first line of constant pitch, a second syntax element indicating a pitch angle for a second line of constant pitch
  • the second data comprises a fifth syntax element indicating a pitch angle for a fifth line of constant pitch, a sixth syntax element indicating a pitch angle for a sixth line of constant pitch, a seventh syntax element indicating a yaw angle for a seventh line of constant yaw, and an eight syntax element indicating a yaw angle for an eight line of constant yaw
  • the first and second lines along with third and fourth lines being of constant yaw and pitch, respectively, equal to yaw and pitch angles indicated
  • the fifth and sixth syntax elements and the first and second syntax elements are coded differentially to each other.
  • the data stream additionally comprises third data indicating one or more rectangular regions out of a projected plane onto which the full view sphere is projected according to a predetermined projection scheme, which the one or more rectangular regions the picture is composed of.
  • the data stream additionally comprises fourth data indicating the predetermined projection scheme.
  • the fourth data comprises an identifier indexing one of a plurality of spherical projections.
  • the fourth data comprises a rotation of fuii view sphere relative to a global coordinate system.
  • the indication of the portion relates to the full view sphere in a situation rotated or in a situation not-rotated relative to the global coordinate system according to the fourth data.
  • the second data indicates whether the first data indicates the portion so that a section of the full view sphere sampled by the picture's samples completely resides within the portion, or whether the first data indicates the portion so that the portion completely resides within the section, and wherein a scope of the second data is larger than a scope of the first data.
  • the scope of the first data is a picture wise scope.
  • the second data has a further option alternatively indicating that the first data indicates the portion so that a section of the full view sphere sampled by the picture's samples is exactly circumferenced by the portion.
  • the data stream has encoded thereinto a content of the picture along with additional picture content, and has encoded thereinto the picture in a subportion of the data stream in a motion constrained manner along with extraction information which indicates how to derive a data stream specifically having encoded thereinto the picture from the subportion of the data stream.
  • the data stream has encoded thereinto a further picture in a further subportion of the data stream in a motion constrained manner along with further extraction information which indicates how to derive a further data stream specifically having encoded thereinto the further picture from the further subportion of the data stream, wherein the data stream comprises further first data indicating a further portion of the full view sphere which is captured by the further picture, wherein the second data indicates whether the first data indicates the portion so that a section of the full view sphere sampled by the picture's samples completely resides within the portion, or whether the first data indicates the portion so that the portion completely resides within the section, and wherein the second data is also valid for the further first data.
  • the present invention provides an apparatus for forming a data stream having encoded thereinto a picture, the apparatus configured to insert into the data stream first data indicating a portion of a full view sphere which is captured by the picture; and insert into the data stream second data indicating whether the first data indicates the portion so that a section of the full view sphere sampled by the picture's samples completely resides within the portion, or whether the first data indicates the portion so that the portion completely resides within the section.; or the portion of the full view sphere which is captured by the picture, wherein the first data indicates the portion so that the section of the full view sphere sampled by the picture's samples completely resides within the portion, and the second data indicates the portion so that the portion completely resides within the section.
  • the present invention provides a network entity for processing a data stream having encoded thereinto a picture, the network entity being configured to derive from first data in the data stream an indication of a portion of a full view sphere which is captured by the picture; and derive from second data in the data stream whether the first data indicates the portion so that a section of the full view sphere sampled by the picture's samples completely resides within the portion, or whether the first data indicates the portion so that the portion completely resides within the section; or a further indication of the portion of the full view sphere which is captured by the picture, wherein the first data indicates the portion so that the section of the full view sphere sampled by the picture's samples completely resides within the portion, and the second data indicates the portion so that the portion completely resides within the section; process the data stream depending on a match between the portion and a wanted portion of the full view sphere.
  • the processing the data stream depending on the comparison comprises deciding on performing motion constrained tile set extraction one the data stream so as to obtain a data stream specifically having encoded thereinto the picture from a subportion of the data stream depending the match; forming a Media Presentation Description, MPD, including one or more representations offering the data stream or motion-constrained tile set, MCTS, extracted versions thereof, depending on the match.
  • MPD Media Presentation Description
  • the present invention provides a method for forming a data stream having encoded thereinto a picture, the method comprising inserting into the data stream first data indicating a portion of a full view sphere which is captured by the picture; and inserting into the data stream second data indicating whether the first data indicates the portion so that a section of the full view sphere sampled by the picture's samples completely resides within the portion, or whether the first data indicates the portion so that the portion completely resides within the section.; or the portion of the full view sphere which is captured by the picture, wherein the first data indicates the portion so that the section of the full view sphere sampled by the picture's samples completely resides within the portion, and the second data indicates the portion so that the portion completely resides within the section.
  • the present invention provides a method for processing a data stream having encoded thereinto a picture, the method comprising deriving from first data in the data stream an indication of a portion of a full view sphere which is captured by the picture; deriving from second data in the data stream whether the first data indicates the portion so that a section of the full view sphere sampled by the picture's samples completely resides within the portion, or whether the first data indicates the portion so that the portion completely resides within the section; or a further indication of the portion of the full view sphere which is captured by the picture, wherein the first data indicates the portion so that the section of the full view sphere sampled by the picture's samples completely resides within the portion, and the second data indicates the portion so that the portion completely resides within the section; and processing the data stream depending on a match between the portion and a wanted portion of the full view sphere.
  • h. pr gfant Inv tion provides a data stream having encoded ⁇ he ei—to a *-'ideo th0 data stream comprising first data indicating a. projection of pictures of the video onto a fuli view sphere; and second data indicating whether the projection persists until an end of the video, or whether the projection is allowed to change during the video.
  • the first data is updated intermittently in the data stream between consecutive pictures and remains constant at each update if the second data indicates that the projection persists until an end of the video, and is free to vary at each update if the second data indicates that the projection is allowed to change during the video.
  • the first data is updated in the data stream on a per picture basis.
  • the second data indicates whether the projection persists until an end of the video, or whether the projection persists until a next update of the first data.
  • the second data indicates whether the projection persists until an end of the video, or whether the projection is validly indicated merely for the current picture.
  • the second data has a further option alternatively indicating that the projection persists until a next update of the first data.
  • the first data indicates one or more of a predetermined projection scheme mapping between a projected plane one or more rectangular regions of which are contained in each of the pictures of the video, and the full view sphere, a composition of the pictures of the video out of one or more rectangular regions of the projected plane; a rotation of the full view sphere.
  • the data stream further comprises coverage information data indicating a portion of the full view sphere which is captured by the projection of the picture onto the full view sphere.
  • the first data indicates the portion so that a section of the full view sphere sampled by the picture's samples completely resides within the portion, or so that the first data indicates the portion so that the portion completely resides within the section.
  • second data further indicates whether the portion persists until an end of the video, or whether the portion is allowed to change during the video.
  • the present invention provides a network entity for processing a data stream having encoded thereinto a video, the network entity configured to derive from first data of the data stream an indication of a projection of pictures of the video onto a full view sphere; and derive from second data of the data stream an indication whether the projection persists until an end of the video, or whether the projection is allowed to change during the video; and process the data stream depending on whether the projection persists until an end of the video, or whether the projection is allowed to change during the video.
  • the processing the data stream depending on whether the projection persists until an end of the video, or whether the projection is allowed to change during the video comprises deciding on transcoding, modifying or forwarding the data stream, or not transcoding, modifying or forwarding the data stream, depending whether the projection persists until an end of the video, or whether the projection is allowed to change during the video.
  • the present invention provides an apparatus for forming a data stream having encoded thereinto a video, the apparatus configured to insert into the data stream first data indicating a projection of pictures of the video onto a full view sphere; and insert into the data stream second data indicating whether the projection persists until an end of the video, or whether the projection is allowed to change during the video.
  • the present invention provides a method for processing a data stream having encoded thereinto a video, the method comprising deriving from first data of the data stream an indication of a projection of pictures of the video onto a full view sphere; deriving from second data of the data stream an indication whether the projection persists until an end of the video, or whether the projection is allowed to change during the video; and processing the data stream depending on whether the projection persists until an end of the video, or whether the projection is allowed to change during the video.
  • the present invention provides a method for forming a data stream having encoded thereinto a video, the method comprising inserting into the data stream first data indicating a projection of pictures of the video onto a full view sphere; and inserting into the data stream second data indicating whether the projection persists until an end of the video, or whether the projection is allowed to change during the video.
  • an approach is provided to give a client device an indication or recommendation where the guard band pixels are to be used in rendering and, in the case they are to be used, how the guard band pixels should be used.
  • the present invention provides a data stream having encoded thereinto a picture, the data stream comprising first data indicating a composition of the picture out of a plurality of regions of a projected plane which is projected onto a full view sphere according to a predetermined spherical projection scheme, second data indicating for one or more of the plurality of regions that same are flanked in the picture by a guard band region, wherein the guard band region shows a flanking portion of the full view sphere flanking the one or more of the plurality of regions in the full sphere view and being also shown in a corresponding subportion of one or more further regions of the projected plane which the picture is composed of; and third data indicating for the guard band region as to which of the guard band region and the corresponding subportion is to be used preferably in rendering an output picture with respect to the flanking portion.
  • the first data indicates qualities for the regions at which the regions are represented by the picture and the third data indicates a quality for the guard band region such that the qualities for the regions at which the regions are represented by the picture and the quality for the guard band region are defined on a common ordinal scale.
  • the third data additionally indicates whether or not the quality for the guard band region is to be used, by comparison of the quality for the guard band region with the qualities for the regions, as the indication of as to which of the predetermined guard band region and the corresponding subportion is to be used preferably in rendering the output picture with respect to the flanking portion.
  • the first data indicates for the regions at which quality the regions are represented by the picture.
  • the quality indicated by the first data for the regions pertains spatial resolution.
  • the guard band region is represented by the picture at a reduced quality, reduced compared to the one or more regions flanked by the guard band region.
  • the third data indicates one of a group of preference options including two or more of the guard band region is not to be used in rendering the output picture with respect to the flanking portion; the guard band region may optionally be used in rendering the output picture with respect to the flanking portion; the corresponding subportion is not to be used in rendering the output picture with respect to the flanking portion; the guard band region and the corresponding subportion should be used in rendering the output picture with respect to the flanking portion.
  • the third data indicates a weight at which the guard band region and the corresponding subportion should be blended in rendering the output picture with respect to the flanking portion.
  • the third data comprises an indicator for the guard band region assuming one of a group of possible states, the group comprising one or more states which when assumed by the indicator, indicate that a quality at which the one or more region flanked by the guard band region is represented by the picture is greater than a quality at which the guard band region is represented by the picture or that the guard band region may merely be used for rendering up to a predetermined maximum distance from the one or more regions flanked thereby; and/or one or more states which when assumed by the indicator, indicate an unspecified quality relationship between a quality at which the one or more region flanked by the guard band region is represented by the picture and a quality at which the guard band region is represented by the picture, and/or one or more states which when assumed by the indicator, indicate an equality between a quality at which the one or more region flanked by the guard band region is represented by the picture and a quality at which the guard band region is represented by the picture; and/or one or more states which when assumed by the indicator, indicate a gradual
  • the data stream is in file format and comprises an extractor track which copies the plurality of regions from other tracks of the data stream.
  • the copying results in a composition of the picture in units of tiles of the data stream, each tile comprising one or more regions along with a guard band region flanking same.
  • the third data performs the indication guard band region individually.
  • the present invention provides a collection of the above data streams, wherein the data streams differ in qualities of tiles corresponding in other terms of full view sphere coverage and in the third data with respect to a guard band region corresponding in other terms of full view sphere coverage.
  • the present invention provides an apparatus for processing a data stream having encoded thereinto a picture, the renderer configured to derive from the data stream a composition of the picture out of a plurality of regions of a projected plane which is projected onto a full view sphere according to a predetermined spherical projection scheme, derive from the data stream an indication of one or more of the plurality of regions which are flanked in the picture by a guard band region, wherein the guard band region shows a flanking portion of the projected plane flanking the one or more of the plurality of regions in the projected plane and being also shown in a corresponding subportion of one or more further regions of the projected plane which the picture is composed of; and derive preference data from the data stream for the predetermined guard band and forward the picture, at least partially, to a renderer for rendition of an output picture with informing the renderer depending on the preference data on which of the predetermined guard band and the corresponding subportion is to be used preferably in rendering the output picture with respect to the flanking portion.
  • the present invention provides an apparatus for forming a data stream having encoded thereinto a picture, the apparatus configured to insert first data into the data stream indicating a composition of the picture out of a plurality of regions of a projected plane which is projected onto a full view sphere according to a predetermined spherical projection scheme, insert second data into the data stream indicating for one or more of the plurality of regions that same are flanked in the picture by a guard band region, wherein the guard band region shows a flanking portion of the projected plane flanking the one or more of the plurality of regions in the projected plane and being also shown in a corresponding subportion of one or more further regions of the projected plane which the picture is composed of; and insert third data into the data stream indicating for the predetermined guard band as to which of the predetermined guard band and the corresponding subportion is to be used preferably in rendering an output picture with respect to the flanking portion.
  • the apparatus is configured to encode the guard band region at reduced SNR compared to the one or more regions flanked by the guard band region.
  • the present invention provides a method for processing a data stream having encoded thereinto a picture, the method comprising deriving from the data stream a composition of the picture out of a plurality of regions of a projected plane which is projected onto a full view sphere according to a predetermined spherical projection scheme, deriving from the data stream an indication of one or more of the plurality of regions which are flanked in the picture by a guard band region, wherein the guard band region shows a flanking portion of the projected plane flanking the one or more of the plurality of regions in the projected plane and being also shown in a corresponding subportion of one or more further regions of the projected plane which the picture is composed of; and deriving preference data from the data stream for the predetermined guard band and forwarding the picture, at least partially, to a renderer for rendition of an output picture with informing the renderer depending on the preference data on which of the predetermined guard band and the corresponding subportion is to be used preferably in rendering the output picture with respect to the flanking portion.
  • the present invention provides a method for forming a data stream having encoded thereinto a picture, the method comprising inserting first data into the data stream indicating a composition of the picture out of a plurality of regions of a projected plane which is projected onto a full view sphere according to a predetermined spherical projection scheme, inserting second data into the data stream indicating for one or more of the plurality of regions that same are flanked in the picture by a guard band region, wherein the guard band region shows a flanking portion of the projected plane flanking the one or more of the plurality of regions in the projected plane and being also shown in a corresponding subportion of one or more further regions of the projected plane which the picture is composed of; and inserting third data into the data stream indicating for the predetermined guard band as to which of the predetermined guard band and the corresponding subportion is to be used preferably in rendering an output picture with respect to the flanking portion.
  • the present invention provides a computer program product comprising instructions which, when the program is executed by a computer, causes the computer to carry out one or more methods in accordance with the present invention.
  • Fig. 2 shows an example for an environment, similar to Fig. 1 , where embodiments of the present application may be applied and advantageously used.
  • Fig. 2 shows a system composed of a server 100 and a client 200, like the system of Fig. 1.
  • the server 100 and the client 200 may interact using adaptive streaming.
  • DASH dynamic adaptive streaming over HTTP
  • MPD media presentation description
  • the inventive approach described herein is not limited to DASH, and in accordance with embodiments, the inventive approach may be implemented using file format boxes.
  • any term used h p r p in ic. tn h p ! inHpr.c.tanH as h»inn hrosd so as to also cover manifes files defined differently than in DASH.
  • Fig. 2 illustrates a system for implementing a virtual reality application.
  • the system presents to a user wearing a head up display 204, e.g., using an internal display 206 of the head up display 204, a view section 208 of a temporally-varying spatial scene 210.
  • the section 208 may correspond to an orientation of the head up display 204 that may be measured by an internal orientation sensor 212, like an inertial sensor of the head up display 204.
  • the section 208 presented to the user is a section of the spatial scene 210, and the spatial position of the spatial scene 210 corresponds to the orientation of the head up display 204.
  • the temporally-varying spatial scene 210 is depicted as an omnidirectional video or spherical video, however, the present invention is not limited to such embodiments.
  • the section 208 displayed to the user may from a video with a spatial position of the section 208 being determined by an intersection of a facial access or eye access with a virtual or real projector wall or the like.
  • the sensor 212 and the display 206 may be separate or different devices, such as a remote control and a corresponding television set.
  • the sensor 212 and the display 206 may be part of a hand-held device, like a mobile device, e.g., a tablet or a mobile phone.
  • the server 100 may comprise a controller 102, e.g., implemented using the signal processor 102 of Fig. 1 , and a storage 104.
  • the controller 102 may be an appropriately programmed computer, an application-specific integrated circuit or the like.
  • the storage 104 stores media segments which represent the temporally-varying spatial scene 210.
  • the controller 102 responsive to requests from the client 200, sends to the client 200 the requested media segments together with a media presentation description and further information.
  • the controller 102 may fetch the requested media segments from the storage 104. Within this storage 104, also other information may be stored such as the media presentation description or parts of the media presentation description.
  • the client 200 comprises a client device or controller 202, e.g., implemented using the signal processor 202 of Fig.
  • the client device 202 may be an appropriately programmed computer, a microprocessor, a programmed hardware device, such as an FPGA, an application specific integrated circuit or the like.
  • the client device 202 assumes responsibility for selecting the media segments to be retrieved from the server 100 out of one or more media segments 106 offered at the server 100.
  • the client device 202 initially retrieves a manifest or media presentation description from the server 100. From the retrieved manifest, the client device 202 obtains a computational rule for computing addresses of one or more of the media segments 106 which correspond to certain, needed spatial portions of the spatial scene 210.
  • the selected media segments are retrieved by the client device 202 from the server 100.
  • the media segments retrieved by the client device 202 are forwarded to the one or more decoders 214 for decoding.
  • the retrieved and decoded media segments represent, for a temporal time unit, a spatial section 218 of the temporally- varying spatial scene 210.
  • this may be different in accordance with other embodiments, where, for instance, the view section 208 to be presented constantly covers the whole scene.
  • the re-projector 216 may re-project and cut-out from the retrieved and decoded scene content 218 (defined by the selected, retrieved and decoded media segments) the view section 208 to be displayed to the user.
  • the client device 202 may continuously track and update a spatial position of the view section 208, e.g., responsive to the user orientation data from the sensor 212 and inform the re- projector 216 about the current spatial position of scene section 208 as well as of the reprojection mapping to be applied onto the retrieved and decoded media content so as to be mapped onto an area forming the view section 208.
  • the re-projector 216 may apply a mapping and an interpolation onto a regular grid of pixels to be displayed on the display 206.
  • Fig. 2 illustrates an embodiment where a cubic mapping has been used to map the spatial scene 210 onto the respective cube faces using for each face one or more tiles 220. In the depicted embodiment, each cube face has associated four tiles.
  • the tiles 220 are depicted as rectangular sub-regions of the cube onto which the scene 210, which has the form of a sphere, has been projected.
  • the re-projector 216 reverses the projection.
  • the present invention is not limited to a cubic projection or cube mapping.
  • a projection onto a truncated pyramid or a pyramid without truncation may be used instead of a cubic projection.
  • any polyhedron having n faces may be used.
  • the tiles 220 are depicted to be non- overlapping in terms of coverage of the spatial scene 210, in accordance with other embodiments, some or all of the tiles 220 may at least partially overlap. In the embodiment depicted in Fig.
  • the whole spatial scene 210 is spatially subdivided into the ti!ss 220, and each of the six faces of the cube is subdivided into four tiles.
  • the tiies 220 are numbered as tiles 1 to 24, of which tiles 1 to 12 are visible in Fig. 2.
  • the server 100 offers a video 108 which may be temporally subdivided into temporal segments 1 10.
  • the server 100 may offer more than one video 108 per tile 220, the videos differing in quality Q1 , Q2
  • the temporal segments 1 10 of the videos 108 of all tiles T1 -T24 may form or may be encoded into one of the media segments 106 stored in the storage 104.
  • the tile-based streaming illustrated in Fig. 2 is merely an example from which many deviations are possible. For instance, a different number of tiles 220 may be used for some or all of the cube faces.
  • the omnidirectional video content to be presented to the user may undergo a projection to a rectangular video frame as used in traditional video services with non-omnidirectional video content.
  • One flavor of those projections uses continuously differentiable functions to map 3D points to the picture plane, e.g. linear and trigonometric functions as in the Equirectangular projection.
  • Another kind of these projections is based on geometric primitives with an integer number of surface planes, such as pyramids, cubes or other polyhedrons.
  • 3D points are mapped to the faces of a polyhedron, typically using perspective projection to a camera point within the polyhedron, e.g. in the geometric center.
  • a common example is the use of a regular symmetric six-sided cube, as described above with reference to Fig. 2, also referred to as the cubic projection.
  • the faces of the polyhedron are then arranged into a rectangular video frame for encoding.
  • the mapping from the 3D points to a rectangular projected frame is defined by a projection that is signaled, e.g. in the bitstream.
  • a projection that is signaled, e.g. in the bitstream.
  • this is signaled by a Equirectangular Projection supplemental enhancement information, SEI, message, or by a Projection type equal to 0 in the ProjectionFormatStructO of the ISO-BMFF (ISO base media file format).
  • SEI Equirectangular Projection supplemental enhancement information
  • Projection type equal to 0 in the ProjectionFormatStructO of the ISO-BMFF (ISO base media file format).
  • a cubic projection this may be signaled by a CubeMap Projection, CMP, SEI message, or by a Projection type equal to 1 in the ProjectionFormatStructO of the ISO-BMFF. 1 st Aspect: Signaling of the Projection Coverage Type
  • a polygon based omnidirectional projection like a cubemap projection, CMP
  • a high-ievei coverage indication may be expressed through straightforward interpretably yaw and pitch ranges, optionally, plus the roll, as is used, for instance, via the coverage information box in OMAF.
  • a rectangular region of a projected frame may express the coverage information differently.
  • the coverage of a rectangular region of a CMP projected content may correspond to the surface of a sphere or a spherical region that is limited by 4 great circles
  • the coverage of a rectangular region in an ERP projected content may correspond to the surface of a sphere or a spherical region that is limited by two great circles and two circles.
  • two types of spherical regions may be defined, namely:
  • a spherical region that corresponds to the surface of a sphere that is limited by four great circles each having a center coinciding with a center of the sphere, and the four great circles include two great circles (azimuth or yaw circles) limiting an azimuth interval and two great circles (elevation or pitch circles) limiting an elevation interval, and
  • a spherical region that corresponds to two great circles (azimuth or yaw circles) limiting an azimuth interval, and two small circles (elevation or pitch circles) limiting an elevation interval, the great circles each having a center coinciding with a center of the sphere, and the small circles each having a center coinciding with an elevation axis of the sphere.
  • Fig. 3(a) illustrates the definition of the spherical region by four great circles.
  • the sphere 400 represents the spatial scene 210 described above with reference to Fig. 2.
  • Fig. 3(a) shows the spherical region 402 limited by the two azimuth or yaw great circles 406a, 406b limiting an azimuth interval 406, and by two elevation or pitch great circles 408a, 408b limiting an elevation interval 408.
  • Fig. 3(a) illustrates:
  • Fig. 3(b) illustrates the definition of the spherical region by two great circles and two small circles.
  • the sphere 400 represents the spatial scene 210 described above with reference to Fig. 2.
  • Fig. 3(b) shows the spherical region 402 limited by the two azimuth or yaw great circles 406a, 406b limiting an azimuth interval 406, and by two elevation or pitch small circles 416a, 416b limiting an elevation interval 416.
  • Fig. 3(b) illustrates:
  • the surface or region contains all samples of the certain coverage interval. Further, the surface or region contains only samples of the certain coverage interval. This is, however, not the case when the certain coverage interval is described by the 4 great circles, as is indicated in Fig. 3(a).
  • Fig. 4 illustrates an example of a CMP picture with an indication of yaw/pitch angle ranges shown by the thick white lines and different picture subsets on the left and on the right.
  • the area defined by the thick white lines on the left side of Fig. 4 is the coverage range 500, also referred to as a portion 500 of a full view sphere which is captured by the picture.
  • the coverage range 500 is also referred to as a portion 500 of the full view sphere which is captured by the image.
  • the pictures 504 are represented in Fig.
  • Fig. 4 by the area surrounded by the thick back lines, and is also referred to as a section 504 of the full view sphere as sampled by samples of the picture.
  • Fig. 4 illustrates the lines L1 to L4, the thick white lines, defining the portion 500 of the full view sphere that is captured by the image or picture, and the lines L5 to L8 defining the section of the sphere actually sampled by the picture samples.
  • Video and transport systems rely on rectangular video picture planes, as it is described above, for example, for efficient reusability through a motion-constrained tile set, MCTS, extraction in case of High Efficiency Video Coding, HEVC, coded video.
  • the portions 500 in Fig. 4 are examples for the above mentioned video subsections.
  • different types of signaling may be needed.
  • the 360 video content may be split into multiple regions so as to be able to provide tilted streaming or sub-picture streaming which allows for an efficient viewport dependent video streaming.
  • samples outside the selected coverage area may be included into the transmitted content, however, such samples which lay outside the coverage range are not intended for presentation so that a signaling of only 180 x 70 may be sufficient.
  • the separate streams for example, if a face of a CMP is split into 4 x 4 streams, are intended to be played back together.
  • the signaling used corresponds to the coverage range for which all samples of the defined coverage are present, when combining the ranges from the multiple streams, missing ranges may appear. Such a signaling may be misleading since it may be interpreted as if samples of the sphere are missing, even though the full sphere is there.
  • the first aspect of the present invention described herein is directed to coverage information signaling that may be accompanied by additional signaling or a constraint, like supplemental enhancement information, SEI, video usability information, VUI, a Media Presentation Description, MPD, and the like, that indicates whether a coverage indication is one or a combination of the following:
  • the indicated coverage area 500 is located completely within the video picture 504 and some pixels of the video picture 504 are located outside the indicated coverage range 500.
  • the signaling may include an identifier so as to indicate
  • Fig. 5(a) illustrates an embodiment for such an indicator referred to as explicit_coverage_idc (second data) and specifying the meaning of the coverage (first data) defined by the parameters azimuth_min, azimuthjnax, elevation_min and elevation_max, as described, for example, above with reference to Fig. 3(a) and Fig. 3(b).
  • a value "0" for explicit_coverage_idc indicates that all samples in the output picture 504 are within the indicated coverage 500 and contain all necessary samples to represent the indicated coverage. This means that samples at the border of the output picture have an azimuth equal to azimuth_min or azimuthjmax or an elevation equal to elevationjnin or elevation_max.
  • a value of "1" for explicit_coverage_idc indicates that all samples in the output picture 504 necessary to represent the indicated coverage 500 are present and further samples may be present, as is illustrated in Fig. 4 on the right. This means that samples of the output picture 504 have an azimuth equal to azimuthjnin or azimuth_max or an elevation equal to or smaller than elevation min or equal to or bigger than elevation max.
  • a value of "2" for explicit_coverage_idc indicates that all samples in the output picture 504 are within the indicated coverage 500 but not all samples necessary to represent the indicated coverage 500 are present, as is illustrated in Fig. 4 on the left. This means that samples at the border of the output picture having a azimuth equal to azimuth_min or azimuth_max or an elevation equal to or bigger than elevation_min or equal to or smaller than elevation_max.
  • both the minimum and maximum coverage may be given, as is indicated in Fig. 5(b) in which a value "3" for the explicit_coverage_idc is indicated, meaning that all samples the output picture 504 necessary to represent the indicated coverage 500 are present and further samples may be present.
  • a data stream 510 see Fig. 6, may be provided which has encoded thereinto a picture 512.
  • Fig. 6 schematically illustrates, in combination with a system as described with reference to Fig. 2, the encoding/decoding of a data stream, the composition of the pictures of the video out of one or more rectangular regions of the projected plane and the mapping between a projected plane and a full view sphere.
  • the data stream 510 comprises first data, like the ranges of yaw and pitch angles indicated, e.g., by azimuth_min, azimuth_max, elevation_min elevation_max as described above.
  • the first data describes the coverage or first portion 500 of a full view sphere 516 in Fig. 4 (or 400 in Fig. 3(a) and Fig. 3(b)).
  • the first portion 500 of the full view sphere may be indicated in a non-rotated state, i.e. in a state not rotated to a real world or global coordinate system.
  • the first portion 500 may capture a scene within a portion rotated relative to the portion indicated by the first data which is captured by the picture 512, which may be an overall picture or a MCTS extracted partial picture.
  • Second data is included in the data stream, like the identifier explicit_coverage_idc discussed above.
  • the first data defines or indicates the portion full view sphere 516 sampled by the samples 519 of the picture, as is indicated in Fig. 4 on the right.
  • the first data defines or indicates the portion 500 such that the section 504 of the full view sphere 516 sampled by the samples 519 of the picture completely resides within the first portion 500, as is indicated in Fig. 4 on the right.
  • the portion 500 of the full view sphere 516 which is captured by the picture 512 is indicated by the second data, and the receiving network entity may know the convention in interpreting the first data and the second data.
  • azimuthjnin, azimuth_max, elevationjnin, elevation_max define the second data
  • azimuth_min, azimuth_max, elevationjnin - elevation_min_offset, elevation_max + elevation_max_offset define the first data, meaning that the first data indicates the portion 500 such that the section 504 of the full view sphere 516 sampled by the samples 519 of the picture 512 completely resides within the portion 500, as is indicated in Fig.
  • the second data indicates that the portion 500 of the full view sphere 516 which is captured by the picture 512 completely resides within the section 504, as is indicated in Fig. 4 on the right.
  • the portion 500 is confined by two lines of constant pitch or constant elevation (see the upper and lower lines L1 , L2 of the cushion like areas in Fig. 4, and two lines of constant yaw (see the left and right lines L3, L4 of the cushion like areas in Fig. 4 which are straight in this example because CMP projection is used).
  • the first data comprises
  • a first syntax element like elevationjnin, indicating a pitch angle for a first line L1 of constant pitch
  • a second syntax element like elevationjnax, indicating a pitch angle for a second line L2 of constant pitch
  • a third syntax element like azimuthjnin, indicating a yaw angle for a third line L3 of constant yaw
  • a fourth syntax element like azimuth_max, indicating a yaw angle for a fourth line L4 of constant yaw.
  • the first data indicates or defines the portion 500 so that the section 504 of the full view sphere sampled by the picture's samples completely resides within the portion 500 (Fig. 4
  • the second data may indicate the portion 500 of the full view sphere 516 which is captured by the picture 512.
  • the first data may indicate the portion 500 so that the section 504 of the full view sphere sampled by the samples of the pictures completely resides within the portion 500, and the second data may indicate the portion 500 so that the portion 500 completely resides within the section 504.
  • the second data may have
  • the first to fourth lines L1 -L4 confine all samples of the picture, and fifth and sixth lines L5. L6 along with seventh and eighth lines L7, L8 being of constant yaw and pitch, respectively, equal to yaw and pitch angles indicated by the third and fourth syntax element completely extend within the section 504.
  • the second data may further have ⁇ a seventh syntax element indicating the yaw angle for the seventh line of constant yaw, and
  • the receiving network entity may know the convention in interpreting the first data and the second data.
  • the first data indicates the portion 500 of the full view sphere which is captured by the picture.
  • the first data indicates or defines that the section 504 of the full view sphere sampled by the picture's samples completely resides within the portion 500, and the second data indicates the portion so that the portion 500 completely resides within the section 504.
  • the first data comprises
  • the second data comprises
  • First and second lines along with third and fourth lines being of constant yaw and pitch, respectively, equal to yaw and pitch angles indicated by the seventh and eighth syntax elements confine all samples of the picture, and the fifth to eighth lines completely extend within the section 504.
  • the fifth and sixth syntax elements and the first and second syntax elements may be coded differentially to each other, i.e., merely the offset is transmitted, e.g., only a offset with respect to pitch/elevation
  • the data stream may additionally comprise third data for a mapping between a packed frame and a projected plane, the third data indicating one or more rectangular regions out of a projected plane onto which the full view sphere is projected according to a predetermined projection scheme, like the actually used projection scheme, e.g. CMP, which the one or more rectangular regions the picture is composed of.
  • the data stream may include fourth data indicating the predetermined projection scheme, like an identifier indexing one of a plurality of spherical projections, e.g. ERP or CMP.
  • the fourth data may indicate a rotation of the full view sphere relative to a global coordinate system, e.g., in terms of pitch, yaw and roll.
  • the indication of the portion may relate to the full view sphere in a situation rotated or in a situation not-rotated relative to the global coordinate system according to the fourth data.
  • the second data may indicate whether the first data indicates the portion so that a section 504 of the full view sphere sampled by the picture's samples completely resides within the portion, or whether the first data indicates the portion so that the portion completely resides within the section 504, and a scope of the second data may be larger than a nf iha firet Hois Ths e. ⁇ rsno nf thp firQt riota ma ⁇ ; o a nir-ti ire ⁇ /ico srnnp
  • the second data may indicate that a section 504 of the full view sphere sampled by the picture's samples is exactly surrounded or circumferenced by the portion.
  • the data stream may have encoded thereinto a content of the picture along with additional picture content, and may have encoded thereinto the picture, like a MCTS extracted picture, in a subportion of the data stream in a motion constrained manner along with extraction information which indicates how to derive a data stream specifically having encoded thereinto the picture from the subportion of the data stream.
  • the data stream may have encoded thereinto a further picture in a further subportion of the data stream in a motion constrained manner along with further extraction information which indicates how to derive a further data stream specifically having encoded thereinto the further picture from the further subportion of the data stream, wherein the data stream comprises further first data indicating a further portion of the full view sphere which is captured by the further picture, wherein the second data indicates whether the first data indicates the portion so that a section 504 of the full view sphere sampled by the picture's samples completely resides within the portion, or whether the first data indicates the portion so that the portion completely resides within the section 504.
  • the second data may also be valid for the further first data, and second data is common for two MCTS extractions
  • Further embodiments provide an apparatus for forming, not necessarily including the actual encoding/compression, the above described data stream, e.g., a server as depicted in Fig. 1 or in Fig. 2.
  • the apparatus inserts into the data stream the first data and the second data described above in more detail.
  • the network entity may process the data stream depending on the comparison to decide on performing motion constrained tile set extraction one the data stream so as to obtain a data stream specifically having encoded thereinto the picture from a subportion of the data stream depending the match, and form an MPD including one or more representations offering the data stream or MCTS extracted versions thereof, depending on the match.
  • services like DASH may be used for offering content into several adaption sets, where different adaption sets may include content corresponding to a different coverage, and the content may be split into two different regions.
  • DASH clients For DASH clients to be able to perform the content selection based on the current viewport, they need to know that the content in each of the representations in the adaption sets have static projection characteristics so that as long as the client is interested in the same viewport no adaption with regard to the request is needed.
  • a 360 video might be consumed in other devices different from a head mounted device, HMD, having limited interactivity, like a TV set.
  • the above-mentioned, different use cases, and also other use cases which the skilled person will readily recognize, may require dynamic or static projection characteristics, and the present invention teaches signaling such characteristics in the bitstream. For example, this allows exposing such information to higher levels for content selection or negotiation. Further, this information may be used for conforming a validation of bitstreams or restrictions imposed by further application standards.
  • the second aspect of the inventive approach may be considered a guarantee to the decoder or an intermediate device between the encoder and the decoder, like a media aware network element, MANE, that the projection format type or other characteristics of the video within the CVS are static and are not going to change, or are dynamic or non- o ia uu cSi i i en vi i si iyc.
  • additional information indicating whether a projection is static or dynamic, i.e., is allowed to change, over a certain period, like until the end of the video is indicated together with the encoded video inside a data stream which also includes data indicating the projection of pictures of the video onto a full view sphere.
  • the additional data may be represented by additional bits in the data stream, like general_reserved_zero_Xbits also referred to as general_constraint_projection_flag (second data).
  • general_constraint_projection_flag sets the general_constraint_projection_flag to "1 " specifies in the data stream that, in case an equirectangular projection SEI message or a cubemap projection SEI message is present in the CVS, the result is the same as if there is a single SEI for an AU of an Intra Random Access Point, IRAP, with the erp_persistence_flag being set to "1 ". This means that projection information, like rotation, coverage and the like, does not change within the CVS.
  • the existing persistence_flag may be changed to a persistence_type.
  • the persistence_type may have the following values:
  • persistence_type 0: this refers to a convention or current AU
  • persistence_type 1 : when considering picA to be a current picture, the persistence_type being equal to "1 ", specifies that the projection SEI messages persists for the current layer in the output order until one or more of the following conditions are true:
  • PicOrderCnt(picB) is greater than PicOrderCnt(picA), where PicOrderCnt(picB) and PicOrderCnt(picA) are the PicOrderCntVal values of picB and picA, respectively, immediately after the invocation of the decoding process for the picture order count for picB.
  • persistence_type 2: this indicates that there will not be any change within the CVS. For example, when considering picA to be the current picture, the persistence_type being equal to "2" specifies that the projection SEI message persists for the current layer in the output until one or more of the following conditions are true:
  • a flag may be added to the projection SEIs, and persistency may be reinterpreted so as to distinguish persistence_type 1 and persistence_type 2 dependent on the flag value.
  • region-wise-packing regions in a projected plane are defined and mapped to a packed frame. Those regions may appear at different positions on the projected and packed frame and may undergo further image transformations such as scaling, rotation, mirroring and the like.
  • region-wise-packaging it may be indicated whether the characteristics of the video within the CVS are static or dynamic, i.e., are not going to change or are allowed to change.
  • a data stream 510 may be provided which has encoded thereinto a picture 512.
  • the data stream 510 comprises first data indicating a projection of the picture 512 of the video 513 onto a full view sphere 516' (or 402 in Fig. 3(a) and Fig. 3(b)).
  • the data stream includes second data, like the above mentioned persitance_flag, which when having a value of "2" indicates that the projection persists or is static until an end of the video, or which, when having a value of "0" or "1 " indicates that the projection is allowed to change or is dynamic during the video.
  • the data stream may also include coverage information data indicating a portion 500 of the full view sphere 516' which is captured by the projection of the picture onto the full view sphere, as is indicated in Fig. 4.
  • the first data may indicate the portion 500 so that a section 504 of the full view sphere sampled by the picture's samples completely resides within the portion 500, or so that the first data indicates the portion 500 so that the portion 500 completely resides within the section 504.
  • Further embodiments provide a network entity processing the data stream described above and exploiting the data in the data stream, e.g., a client as depicted in Fig. 1 or in Fig. 2.
  • the processing may include deciding on transcoding, modifying or forwarding the data stream, or not transcoding, modifying or forwarding the data stream, depending whether the projection persists until an end of the video, or whether the projection is allowed to change during the video.
  • the apparatus inserts into the data stream the first data and the second data described above in more detail.
  • guard band regions may be included in a video picture, for example an omnidirectional video like CMP. These guard band regions extend or flank a given region using picture content of spatially neighboring regions, for example in the 3D space, and thereby, effectively, duplicate content.
  • Fig. 7 illustrates an embodiment for a possible guard band layout around six faces of a CMP.
  • the guard band region is indicated by the dashed regions labeled "guard band region".
  • the picture boundaries and regions that do not share a common edge are padded, i.e., the top and bottom row of faces.
  • the area 532 indicates the guard band area of the left CMP face that is spatially neighboring to the top CMP face.
  • the guard bands are provided to serve, for example, the following two purposes:
  • • for video coding these regions allow that artifacts stemming from video coding around the boundaries, like bleeding from one region into the other, are limited to the guard band regions, • for rendering the doubled content may be used for blending between two representations of the content when rendering the respective video area in a spatially neighboring fashion in a 3D space thereby avoiding the occurrence of rendering seems.
  • each CMP face as a tile may be available at different qualities, so that it is impossible to signal for all possible different setups that the quality gradually changes from one quality of the flanked region to the quality of the neighboring region.
  • Fig. 8 illustrates an example of a mixed resolution CMP with guard band regions around the left CMP face. In the example depicted in Fig.
  • Fig. 8 also illustrates the guard band region around the left CMP face and the doubled content 530, 532 also referred to above with reference to Fig. 7.
  • the CMP faces may have equal spatial quality but may vary in other quality dimensions, for example with respect to the quantization in lossy video coding, such as HEVC.
  • the projection mapping used to generate the samples within each CMP face may lead to the fact that the guard band region is sampled at a finer granularity than a respective original region. In such a case, and following the layout of Fig. 7, an approach is needed to enable a server or content author to advise a client on how to benefit from the content characteristics during rendering.
  • embodiments of the third aspect of the Invention give a client device an indication or recommendation of how to use the guard band pixels in rendering, for example by indicating
  • Fig. 9 illustrate the syntax that may be used according to an embodiment of the third aspect of the present invention based on a region-wise packing box "rwpk" defined in OMAF. As is highlighted in Fig. 9 the indicator RectRegionPacking(i) may be included into a data stream including the encoded video.
  • the gb_not_used_for_pred_flag[i] (first data), when being set to "0", specifies that the guard bands may or may not be used, for example during the inter-prediction process.
  • Setting gb_not_used_for_pred_flag[i] to "1" specifies that the sample values of the guard bands are not to be used in the inter- prediction process.
  • the sample values within the guard bands in the decoded pictures may be rewritten even if the decoded pictures are used as references for inter-prediction of subsequent pictures to be decoded.
  • the content of the packed region may be seamlessly expanded to its guard band with decoded and re-projected samples of another packed region.
  • Packing_type[i] (second data) in Fig. 9 indicates for one or more of the plurality of regions that same are flanked in the picture, not necessarily at all sides, by a guard band region, e.g. the region or area above region "left" in Fig. 7.
  • the parameter gb_type[i][j] (third data) specifies the type of guard bands for the i-th packed region as follows, with j equal to 0, 1 , 2, and 3 indicating that the semantics below apply to the left, right, top and bottom edge, respectively, of the guard band of the i-th packed region.
  • ⁇ gb_type[i]fj] when being set to "0" specifies that the content of the guard bands in relation to the content of the packed regions is unspecified.
  • gb_type[i]fj] is not set to "0" when the gb_not_used_for_pred_flag[i] is set to "0".
  • ⁇ gb_type[i] when being set to 3, specifies that the content of the guard bands represents actual image content at the picture quality of the packed region.
  • Qb_type[i]0 when being set to Y, specifies that content of the guard bands of a packed region represents an actual image content at the picture quality higher than that of the spherically adjacent packed region, i.e., the original region.
  • the use_gb_for_rendering_preferably_flag[i] may be added to the data stream having a syntax as shown in Fig. 10.
  • the following semantics may apply.
  • the use_gb_for_rendering_preferably_flag[i] when being set to 0 specifies that the guard bands may or may not be used in the rendering process, whereas when setting the flag to "1", this specifies that the sample values of the guard bands are to be used preferably in the rendering process.
  • a region-wise quality ranking is facilitated, for example a spherical region-wise quality ranking "srqr” or a 2D region-wise quality ranking "2dqr” as is reproduced in Fig. 11 including the data element "quality_ranking".
  • the regions defined in Fig. 11 in the 2D region quality ranking box (num_regions) and the associated quality ranking (quality_ranking) may be separate from the regions defined in the RectRegionPacking in the RegionWisePackingStruct, meaning that the quality ranking has the freedom to treat samples of the guard bands as separate regions with their own separate quality ranking values.
  • the signaling of the quality ranking may be mandated/recommended so as to enforce a quality ranking of CMP faces and guard band regions on the basis of a common quality scale, and, for example, a further gb type syntax element value may be included so as to trigger the following:
  • a data stream 510 may be provided which has encoded thereinto a picture 512.
  • the data stream may be in file format and have an extractor track which copies the plurality of regions from other tracks of the data stream.
  • the copying may result in a composition of the picture in units of tiles of the data stream, each tile comprising one or more regions along with a guard band region flanking same.
  • the data stream 510 comprises first data , like RectRegionPacking(i), which indicates a composition 528 of the picture 512 out of a plurality of regions 526 (or 402 in Fig. 3(a) and Fig. 3(b)), e.g., the regions “right”, “left”, “top”, “bottom”, “front”, “back” in Fig. 7, of a projected plane 524, e.g. section 500 in Fig. 4, which is projected 522 onto a full view sphere 516 (or 402 in Fig. 3(a) and Fig. 3(b)) according to a predetermined spherical projection scheme 522.
  • RectRegionPacking(i) which indicates a composition 528 of the picture 512 out of a plurality of regions 526 (or 402 in Fig. 3(a) and Fig. 3(b)), e.g., the regions “right”, “left”, “top”, “bottom”, “front”, “back” in Fig. 7, of
  • the data stream further includes second data , like packing_type[i] of Fig. 9, indicating for one or more of the plurality of regions 526 that same are flanked in the picture 512, not necessarily at all sides, by a guard band region 530, e.g. the region or area above region "left” in Fig. 7.
  • the guard band region 530 shows a flanking portion of the full view sphere 516 flanking the one or more of the plurality of regions 526 in the full sphere view and being also shown in a corresponding subportion 532, like the upper subportion of region "top” in Fig. 7, of one or more further regions 526 of the projected plane 524 which the picture 512 is composed of.
  • the flanking portion is the projection of the guard band region onto the sphere.
  • the flanking portion is the portion coded by the guard band region.
  • the flanking portion may be the region within the projected frame, like the frame 500 depicted in Fig. 4, which the guard band region codes.
  • the flanking portion may be obtained in the sphere depicted in Fig. 3, for instance, when the projected plane is mapped onto the sphere 400.
  • the third data may indicate whether one is to be used exclusively, or whether both are used with a respective weighting more in rendering or the like.
  • the third data may indicate a weight at which the guard band region and the corresponding subportion should be blended in rendering the output picture with respect to the flanking portion.
  • the first data may indicate qualities, like e.g. the quality_ranking of Fig. 11 for the flanked regions, for the regions at which the regions are represented by the picture.
  • the third data may indicate a quality, e.g. the quality_ranking of Fig.
  • the third data may additionally indicate, e.g. the gb_type[i][j] equals to X in Fig. 9, whether or not the quality, like the quality_ranking of Fig.
  • the guard band region for the guard band region is to be used, by comparison of the quality for the guard band region with the qualities for the regions, like the quality_ranking of Fig. 11 , as the indication of as to which of the predetermined guard band region 530 and the corresponding subportion 532 is to be used preferably in rendering the output picture 534 with respect to the flanking portion.
  • the first data may indicate for the regions at which quality the regions are represented by the picture.
  • the quality may pertain the spatial resolution, e.g., by the ratio packed_reg_width/proj_reg_width which, when being larger, indicates larger quality, and is present per region i.
  • the guard band region may represented by the picture at a reduced quality, like a reduced SNR.
  • the quality may be reduced compared to the one or more regions flanked by the guard band region, e.g., inherently indicated by the second data on the basis of the pure existence of the guard band region.
  • a guard band region may, per definition, be a region coded with reduced aua!itv compared to the one or more regions it flanks.
  • the third data may indicate, e.g., using the gb_for_rendering_preferably_flag in Fig. 10, one of a group of preference options including two or more of
  • guard band region is not to be used in rendering the output picture with respect to the flanking portion
  • guard band region may optionally be used in rendering the output picture with respect to the flanking portion
  • guard band region and the corresponding subportion should be used in rendering the output picture with respect to the flanking portion.
  • the third data may comprises an indicator, e.g., e.g. gb_type[i][jj of Fig. 9, for the guard band region assuming one of a group of possible states, the group comprising
  • the third data performs the indication guard band region individually, e.g. for each i, or even for each j, i.e., each side of the guard band.
  • the data streams may differ in qualities of tiles corresponding in other terms of full view sphere coverage and in the third data with respect to a guard band region corresponding in other terms of full view sphere coverage.
  • a network entity processing the data stream described above and exploiting the data in the data stream, like a client as depicted in Fig. 1 or in Fig. 2.
  • the apparatus derives from the data stream a composition of the picture out of a plurality of regions of a projected plane which is projected onto a full view sphere according to a predetermined spherical projection scheme. Further, the apparatus derives from the data stream an indication of one or more of the plurality of regions which are flanked in the picture by a guard band region.
  • the guard band region shows a flanking portion of the projected plane flanking the one or more of the plurality of regions in the projected plane and is also shown in a corresponding subportion of one or more further regions of the projected plane which the picture is composed of.
  • the apparatus derives preference data from the data stream for the predetermined guard band and forwards the picture, at least partially, to a renderer for rendition or rendering of an output picture with informing the renderer depending on the preference data on which of the predetermined guard band and the corresponding subportion is to be used preferably in rendering the output picture with respect to the flanking portion.
  • FIG. 1 For brevity, FIG. 1 or in Fig. 2.
  • the apparatus inserts into the data stream the first, second and third data described above in more detail.
  • the apparatus may also encode the guard band region at a reduced SNR compared to the one or more regions flanked by the guard band region.
  • Various elements and features of the present invention may be implemented in hardware using analog and/or digital circuits, in software, through the execution of instructions by one or more general purpose or special-purpose processors, or as a combination of hardware and software.
  • embodiments of the present invention may be implemented in the environment of a computer system or another processing system.
  • Fig. 12 illustrates an example of a computer system 550.
  • the units or modules as well as the steps of the methods performed by these units may execute on one or more computer systems 550.
  • the computer system 550 includes one or more processors 552, like a special purpose or a general purpose digital signal processor.
  • the processor 552 is connected to a communication infrastructure 554, like a bus or a network.
  • the computer system 550 includes a main memory 556, e.g., a random access memory (RAM), and a secondary memory 558, e.g., a hard disk drive and/or a removable storage drive.
  • the secondary memory 558 may allow computer programs or other instructions to be loaded into the computer system 550.
  • the computer system 550 may further include a communications interface 560 to allow software and data to be transferred between computer system 550 and external devices.
  • the communication may be in the from electronic, electromagnetic, optical, or other signals capable of being handled by a communications interface.
  • the communication may use a wire or a cable, fiber optics, a phone line, a cellular phone link, an RF link and other communications channels 562.
  • computer program medium and “computer readable medium” are used to generally refer to tangible storage media such as removable storage units or a hard disk installed in a hard disk drive.
  • These computer program products are means for providing software to the computer system 550.
  • the computer programs also referred to as computer control logic, are stored in main memory 556 and/or secondary memory 558. Computer programs may also be received via the communications interface 560.
  • the computer program when executed, enables the computer system 550 to implement the present invention.
  • the computer program when executed, enables processor 552 to implement the processes of the present invention, such as any of the methods described herein.
  • such a computer program may represent a controller of the comouter svstero 550
  • the software may be stored in a computer program product and ioaded into computer system 550 using a removable storage drive, an interface, like communications interface 560.
  • the implementation in hardware or in software may be performed using a digital storage medium, for example cloud storage, a floppy disk, a DVD, a Blue-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
  • Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
  • embodiments of the present invention may be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer.
  • the program code may for example be stored on a machine readable carrier.
  • inventions comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
  • an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
  • a further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein.
  • a further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
  • a further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
  • a further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
  • a programmable logic device for example a field programmable gate array
  • a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein.
  • the methods are preferably performed by any hardware apparatus.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A data stream is described which has encoded thereinto a picture. The data stream includes first data indicating a portion of a full view sphere which is captured by the picture, and second data indicating whether the first data indicates the portion so that a section of the full view sphere sampled by the picture's samples completely resides within the portion, or whether the first data indicates the portion so that the portion completely resides within the section, or the portion of the full view sphere which is captured by the picture, wherein the first data indicates the portion so that the section of the full view sphere sampled by the picture's samples completely resides within the portion, and the second data indicates the portion so that the portion completely resides within the section.

Description

CHARACTERISTICS SIGNALING FOR OMNIDIRECTIONAL CONTENT
Description
The present Invention relates to the field of a encoding/decoding pictures, images or videos, more specifically to improvements in the characteristics signaling for omnidirectional content. Embodiments relate to a signaling of the projection coverage type, to a signaling of static projection or region-wise-packing characteristics and to a signaling of guard band usage.
For example, VR streaming may involve the transmission of a high-resolution video. The resolving capacity of the human fovea is around 60 pixels per degree. To alleviate bandwidth requirements, only a viewport shown at a Head Mounted Display (HMD) may be send with high resolution, while neighboring data or the rest of the omnidirectional video, also referred to as spherical video, is send at a lower resolution or with a lower quality.
It is an object of the present invention to provide improvements in the characteristics signaling for omnidirectional content.
This object is achieved by the subject matter as defined in the independent claims. Embodiments are defined in the dependent claims.
Embodiments of the present invention are now described in further detail with reference to the accompanying drawings, in which: Fig. 1 is a schematic representation of a system for transferring picture or video data from a server to a client in accordance with embodiments of the present invention;
Fig. 2 shows a schematic diagram illustrating a system including a client and a server for virtual reality applications as an example where embodiments of the inventive approach described herein may be used; illustrates the definition of a spherical region, wherein Fig. 3(a) illustrates the definition of the spherical region by four great circles, and Fig. 3(b) illustrates the definition of the spherical region by two great circles and two small circles; illustrates an example of a CMP picture with an indication of yaw/pitch angle ranges and different picture subsets on the left and on the right; illustrates embodiments of the first aspect of the present invention, wherein Fig. 5(a) illustrates a first embodiment of an explicit_coverage_idc indicator, and Fig. 5(b) illustrates a second embodiment of an explicit_coverage_idc indicator; schematically illustrates, in combination with a system as described with reference to Fig. 2, the encoding/decoding of a data stream, the composition of the pictures of the video out of one or more rectangular regions of the projected plane and the mapping between a projected plane and a full view sphere; illustrates an embodiment for a possible guard band layout around six faces of a CMP; illustrates an example of a mixed resolution CMP with guard band regions around the left CMP face; illustrates the syntax that may be used according to an embodiment of the third aspect of the present invention based on a region-wise packing box "rwpk" defined in OMAF; illustrates a syntax used according to an embodiment of the third aspect of the present invention indicating how guard bands are used in the rendering process;
Fig. 1 1 illustrates a syntax used according to an embodiment of the third aspect of the present invention for enforcing a quality ranking of CMP faces and guard band regions on the basis of a common quality scale; and illustrates an example of a computer system on which units or modules as well as the steps of the methods described in accordance with the inventive approach may execute.
Omnidirectional video content typically undergoes a projection to a rectangular video frame as used in traditional video services with non-omnidirectional video content. One flavor of those projections uses continuously differentiable functions to map 3D points to the picture plane, e.g. linear and trigonometric functions as in the Equirectangular projection, ERP. Another kind of these projections is based on geometric primitives with an integer number of surface planes, such as pyramids, cubes or other polyhedrons. The procedure is twofold: First, 3D points are mapped to the faces of a polyhedron, typically using a perspective projection to a camera point within the polyhedron, e.g. in the geometric center. Common examples of the polyhedron are regular symmetric six-sided cubes, also referred to as the cubic projection. Second, the faces of the polyhedron are arranged into a rectangular video frame for encoding. The rectangular video frame may include one or more rectangular regions associated with a polyhedron face. Embodiments of various aspects of the present invention are now described in more detail with reference to the accompanying drawings in which the same or similar elements have the same reference signs assigned.
Fig. 1 is a schematic representation of a system for communicating video or picture information between a server 100 and a client 200. The server 100 and the client 200 may communicate via a wired or wireless communication link for transmitting a data stream 300 including the video or picture information. The server 100 includes a signal processor 102 and may operate in accordance with the inventive teachings described herein. The client 200 includes a signal processor 202 and may operate in accordance with the inventive teachings described herein. The data stream 300 includes data in accordance with the inventive teachings described herein.
Various aspects of the present invention are now described, and it is noted that the respective aspects may be used either separately or in combination with each other.
1 st Aspect: Signaling of the Projection Coverage Type Embodiments of the first aspect of the present invention address a discrepancy found in conventional approaches when an addressed subsection of a picture carrying a polygon- based omnidirectional projection plane does not have a rectangular shape. Embodiments of the first aspect provide for a coverage information signaling which may be accompanied by an additional signaling or a constraint which indicates that no video pictures are located outside an indicated coverage range and/or which indicates that the indicated coverage area is located completely within the video picture and some pixels are located outside the indicated coverage range. Data Stream
The present invention provides a data stream having encoded thereinto a picture, the data stream comprising first data indicating a portion of a full view sphere which is captured by the picture, and second data indicating whether the first data indicates the portion so that a section of the full view sphere sampled by the picture's samples completely resides within the portion, or whether the first data indicates the portion so that the portion completely resides within the section; or the portion of the full view sphere which is captured by the picture, wherein the first data indicates the portion so that the section of the full view sphere sampled by the picture's samples completely resides within the portion, and the second data indicates the portion so that the portion completely resides within the section.
In accordance with embodiments, the indication of the portion indicates the portion so that the portion is confined by two lines of constant pitch, and two lines of constant yaw. In accordance with embodiments, the first data comprises a first syntax element indicating a pitch angle for a first line of constant pitch, a second syntax element indicating a pitch angle for a second line of constant pitch, a third syntax element indicating a yaw angle for a third line of constant yaw, a fourth syntax element indicating a yaw angle for a fourth line of constant yaw, wherein, if the first data indicates the portion so that the section of the full view sphere sampled by the picture's samples completely resides within the portion, the first to fourth lines confine all samples of the picture, and , if the first data indicates the portion so that the portion completely resides within the section, the first to fourth lines completely extend within the section. In accordance with embodiments, the second data indicates the portion of the full view sphere which is captured by the picture, wherein the first data indicates the portion so that the section of the full view sphere sampled by the picture's samples completely resides within the portion, and the second data indicates the portion so that the portion completely resides within the section, and the second data comprises a fifth syntax element indicating a pitch angle for a fifth line of constant pitch, a sixth syntax element indicating a pitch angle for a sixth line of constant pitch, wherein the first to fourth lines confine all samples of the picture, and the fifth and sixth lines along with seventh and eighth lines being of constant yaw and pitch, respectively, equal to yaw and pitch angles indicated by the third and fourth syntax element completely extend within the section. In accordance with embodiments, the second data further comprises a seventh syntax element indicating the yaw angle for the seventh line of constant yaw, and an eight syntax element indicating the yaw angle for the eight line of constant yaw.
In accordance with embodiments, the second data indicates the portion of the full view sphere which is captured by the picture, wherein the first data indicates the portion so that the section of the full view sphere sampled by the picture's samples completely resides within the portion, and the second data indicates the portion so that the portion completely resides within the section, wherein the first data comprises a first syntax element indicating a pitch angle for a first line of constant pitch, a second syntax element indicating a pitch angle for a second line of constant pitch, wherein the second data comprises a fifth syntax element indicating a pitch angle for a fifth line of constant pitch, a sixth syntax element indicating a pitch angle for a sixth line of constant pitch, a seventh syntax element indicating a yaw angle for a seventh line of constant yaw, and an eight syntax element indicating a yaw angle for an eight line of constant yaw, wherein the first and second lines along with third and fourth lines being of constant yaw and pitch, respectively, equal to yaw and pitch angles indicated by the seventh and eighth syntax elements confine all samples of the picture, and the fifth to eighth lines completely extend within the section.
In accordance with embodiments, the fifth and sixth syntax elements and the first and second syntax elements are coded differentially to each other.
In accordance with embodiments, the data stream additionally comprises third data indicating one or more rectangular regions out of a projected plane onto which the full view sphere is projected according to a predetermined projection scheme, which the one or more rectangular regions the picture is composed of. In accordance with embodiments, the data stream additionally comprises fourth data indicating the predetermined projection scheme.
In accordance with embodiments, the fourth data comprises an identifier indexing one of a plurality of spherical projections. in accordance with embodiments, the fourth data comprises a rotation of fuii view sphere relative to a global coordinate system. In accordance with embodiments, the indication of the portion relates to the full view sphere in a situation rotated or in a situation not-rotated relative to the global coordinate system according to the fourth data.
In accordance with embodiments, the second data indicates whether the first data indicates the portion so that a section of the full view sphere sampled by the picture's samples completely resides within the portion, or whether the first data indicates the portion so that the portion completely resides within the section, and wherein a scope of the second data is larger than a scope of the first data. In accordance with embodiments, the scope of the first data is a picture wise scope.
In accordance with embodiments, the second data has a further option alternatively indicating that the first data indicates the portion so that a section of the full view sphere sampled by the picture's samples is exactly circumferenced by the portion.
In accordance with embodiments, the data stream has encoded thereinto a content of the picture along with additional picture content, and has encoded thereinto the picture in a subportion of the data stream in a motion constrained manner along with extraction information which indicates how to derive a data stream specifically having encoded thereinto the picture from the subportion of the data stream.
In accordance with embodiments, the data stream has encoded thereinto a further picture in a further subportion of the data stream in a motion constrained manner along with further extraction information which indicates how to derive a further data stream specifically having encoded thereinto the further picture from the further subportion of the data stream, wherein the data stream comprises further first data indicating a further portion of the full view sphere which is captured by the further picture, wherein the second data indicates whether the first data indicates the portion so that a section of the full view sphere sampled by the picture's samples completely resides within the portion, or whether the first data indicates the portion so that the portion completely resides within the section, and wherein the second data is also valid for the further first data.
Apparatus
The present invention provides an apparatus for forming a data stream having encoded thereinto a picture, the apparatus configured to insert into the data stream first data indicating a portion of a full view sphere which is captured by the picture; and insert into the data stream second data indicating whether the first data indicates the portion so that a section of the full view sphere sampled by the picture's samples completely resides within the portion, or whether the first data indicates the portion so that the portion completely resides within the section.; or the portion of the full view sphere which is captured by the picture, wherein the first data indicates the portion so that the section of the full view sphere sampled by the picture's samples completely resides within the portion, and the second data indicates the portion so that the portion completely resides within the section. Network Entity
The present invention provides a network entity for processing a data stream having encoded thereinto a picture, the network entity being configured to derive from first data in the data stream an indication of a portion of a full view sphere which is captured by the picture; and derive from second data in the data stream whether the first data indicates the portion so that a section of the full view sphere sampled by the picture's samples completely resides within the portion, or whether the first data indicates the portion so that the portion completely resides within the section; or a further indication of the portion of the full view sphere which is captured by the picture, wherein the first data indicates the portion so that the section of the full view sphere sampled by the picture's samples completely resides within the portion, and the second data indicates the portion so that the portion completely resides within the section; process the data stream depending on a match between the portion and a wanted portion of the full view sphere.
In accordance with embodiments, the processing the data stream depending on the comparison comprises deciding on performing motion constrained tile set extraction one the data stream so as to obtain a data stream specifically having encoded thereinto the picture from a subportion of the data stream depending the match; forming a Media Presentation Description, MPD, including one or more representations offering the data stream or motion-constrained tile set, MCTS, extracted versions thereof, depending on the match.
Method
The present invention provides a method for forming a data stream having encoded thereinto a picture, the method comprising inserting into the data stream first data indicating a portion of a full view sphere which is captured by the picture; and inserting into the data stream second data indicating whether the first data indicates the portion so that a section of the full view sphere sampled by the picture's samples completely resides within the portion, or whether the first data indicates the portion so that the portion completely resides within the section.; or the portion of the full view sphere which is captured by the picture, wherein the first data indicates the portion so that the section of the full view sphere sampled by the picture's samples completely resides within the portion, and the second data indicates the portion so that the portion completely resides within the section.
The present invention provides a method for processing a data stream having encoded thereinto a picture, the method comprising deriving from first data in the data stream an indication of a portion of a full view sphere which is captured by the picture; deriving from second data in the data stream whether the first data indicates the portion so that a section of the full view sphere sampled by the picture's samples completely resides within the portion, or whether the first data indicates the portion so that the portion completely resides within the section; or a further indication of the portion of the full view sphere which is captured by the picture, wherein the first data indicates the portion so that the section of the full view sphere sampled by the picture's samples completely resides within the portion, and the second data indicates the portion so that the portion completely resides within the section; and processing the data stream depending on a match between the portion and a wanted portion of the full view sphere.
2nd Aspect: Signaling of Static Projection or Region-Wise-Packing Characteristics
Embodiments of the second aspect of the present invention address use-cases requiring different projection characteristics, for example, a dynamic projection characteristic or a static projection characteristic which allows exposing this information to higher levels for content selection or a negotiation as well as for usage for conforming a validation of bitstreams or restrictions imposed by further application standards. Embodiments provide an approach signaling that a projection format type or another characteristic of the video is static and is not going to change, or is dynamic and is allowed to change. Data Stream
h. pr gfant Inv tion provides a data stream having encoded ^he ei—to a *-'ideo th0 data stream comprising first data indicating a. projection of pictures of the video onto a fuli view sphere; and second data indicating whether the projection persists until an end of the video, or whether the projection is allowed to change during the video.
In accordance with embodiments, the first data is updated intermittently in the data stream between consecutive pictures and remains constant at each update if the second data indicates that the projection persists until an end of the video, and is free to vary at each update if the second data indicates that the projection is allowed to change during the video.
In accordance with embodiments, the first data is updated in the data stream on a per picture basis. In accordance with embodiments, the second data indicates whether the projection persists until an end of the video, or whether the projection persists until a next update of the first data.
In accordance with embodiments, the second data indicates whether the projection persists until an end of the video, or whether the projection is validly indicated merely for the current picture.
In accordance with embodiments, the second data has a further option alternatively indicating that the projection persists until a next update of the first data. the first data indicates one or more of a predetermined projection scheme mapping between a projected plane one or more rectangular regions of which are contained in each of the pictures of the video, and the full view sphere, a composition of the pictures of the video out of one or more rectangular regions of the projected plane; a rotation of the full view sphere. In accordance with embodiments, the data stream further comprises coverage information data indicating a portion of the full view sphere which is captured by the projection of the picture onto the full view sphere. In accordance with embodiments, the first data indicates the portion so that a section of the full view sphere sampled by the picture's samples completely resides within the portion, or so that the first data indicates the portion so that the portion completely resides within the section. In accordance with embodiments, second data further indicates whether the portion persists until an end of the video, or whether the portion is allowed to change during the video.
Network Entity
The present invention provides a network entity for processing a data stream having encoded thereinto a video, the network entity configured to derive from first data of the data stream an indication of a projection of pictures of the video onto a full view sphere; and derive from second data of the data stream an indication whether the projection persists until an end of the video, or whether the projection is allowed to change during the video; and process the data stream depending on whether the projection persists until an end of the video, or whether the projection is allowed to change during the video.
In accordance with embodiments, the processing the data stream depending on whether the projection persists until an end of the video, or whether the projection is allowed to change during the video comprises deciding on transcoding, modifying or forwarding the data stream, or not transcoding, modifying or forwarding the data stream, depending whether the projection persists until an end of the video, or whether the projection is allowed to change during the video. Apparatus
The present invention provides an apparatus for forming a data stream having encoded thereinto a video, the apparatus configured to insert into the data stream first data indicating a projection of pictures of the video onto a full view sphere; and insert into the data stream second data indicating whether the projection persists until an end of the video, or whether the projection is allowed to change during the video. Method
The present invention provides a method for processing a data stream having encoded thereinto a video, the method comprising deriving from first data of the data stream an indication of a projection of pictures of the video onto a full view sphere; deriving from second data of the data stream an indication whether the projection persists until an end of the video, or whether the projection is allowed to change during the video; and processing the data stream depending on whether the projection persists until an end of the video, or whether the projection is allowed to change during the video. The present invention provides a method for forming a data stream having encoded thereinto a video, the method comprising inserting into the data stream first data indicating a projection of pictures of the video onto a full view sphere; and inserting into the data stream second data indicating whether the projection persists until an end of the video, or whether the projection is allowed to change during the video.
3rd Aspect: Signaling of Guard Band Usage
In accordance with embodiments of the third aspect, an approach is provided to give a client device an indication or recommendation where the guard band pixels are to be used in rendering and, in the case they are to be used, how the guard band pixels should be used.
Data Stream
The present invention provides a data stream having encoded thereinto a picture, the data stream comprising first data indicating a composition of the picture out of a plurality of regions of a projected plane which is projected onto a full view sphere according to a predetermined spherical projection scheme, second data indicating for one or more of the plurality of regions that same are flanked in the picture by a guard band region, wherein the guard band region shows a flanking portion of the full view sphere flanking the one or more of the plurality of regions in the full sphere view and being also shown in a corresponding subportion of one or more further regions of the projected plane which the picture is composed of; and third data indicating for the guard band region as to which of the guard band region and the corresponding subportion is to be used preferably in rendering an output picture with respect to the flanking portion. In accordance with embodiments, the first data indicates qualities for the regions at which the regions are represented by the picture and the third data indicates a quality for the guard band region such that the qualities for the regions at which the regions are represented by the picture and the quality for the guard band region are defined on a common ordinal scale. In accordance with embodiments, the third data additionally indicates whether or not the quality for the guard band region is to be used, by comparison of the quality for the guard band region with the qualities for the regions, as the indication of as to which of the predetermined guard band region and the corresponding subportion is to be used preferably in rendering the output picture with respect to the flanking portion.
In accordance with embodiments, the first data indicates for the regions at which quality the regions are represented by the picture.
In accordance with embodiments, the quality indicated by the first data for the regions pertains spatial resolution.
In accordance with embodiments, the guard band region is represented by the picture at a reduced quality, reduced compared to the one or more regions flanked by the guard band region.
In accordance with embodiments, the third data indicates one of a group of preference options including two or more of the guard band region is not to be used in rendering the output picture with respect to the flanking portion; the guard band region may optionally be used in rendering the output picture with respect to the flanking portion; the corresponding subportion is not to be used in rendering the output picture with respect to the flanking portion; the guard band region and the corresponding subportion should be used in rendering the output picture with respect to the flanking portion.
In accordance with embodiments, the third data indicates a weight at which the guard band region and the corresponding subportion should be blended in rendering the output picture with respect to the flanking portion.
In accordance with embodiments, the third data comprises an indicator for the guard band region assuming one of a group of possible states, the group comprising one or more states which when assumed by the indicator, indicate that a quality at which the one or more region flanked by the guard band region is represented by the picture is greater than a quality at which the guard band region is represented by the picture or that the guard band region may merely be used for rendering up to a predetermined maximum distance from the one or more regions flanked thereby; and/or one or more states which when assumed by the indicator, indicate an unspecified quality relationship between a quality at which the one or more region flanked by the guard band region is represented by the picture and a quality at which the guard band region is represented by the picture, and/or one or more states which when assumed by the indicator, indicate an equality between a quality at which the one or more region flanked by the guard band region is represented by the picture and a quality at which the guard band region is represented by the picture; and/or one or more states which when assumed by the indicator, indicate a gradual transition of a quality at which the guard band region is represented by the picture from the one or more region flanked by the guard band region to the one or more further regions; and further comprising one or more states which when assumed by the indicator, indicate that quality at which the guard band region is represented by the picture is higher than a quality at which the one or more region flanked by the guard band region is represented by the picture.
In accordance with embodiments, the data stream is in file format and comprises an extractor track which copies the plurality of regions from other tracks of the data stream.
In accordance with embodiments, the copying results in a composition of the picture in units of tiles of the data stream, each tile comprising one or more regions along with a guard band region flanking same. In accordance with embodiments, the third data performs the indication guard band region individually.
Collection of Data Streams
The present invention provides a collection of the above data streams, wherein the data streams differ in qualities of tiles corresponding in other terms of full view sphere coverage and in the third data with respect to a guard band region corresponding in other terms of full view sphere coverage.
Apparatus
The present invention provides an apparatus for processing a data stream having encoded thereinto a picture, the renderer configured to derive from the data stream a composition of the picture out of a plurality of regions of a projected plane which is projected onto a full view sphere according to a predetermined spherical projection scheme, derive from the data stream an indication of one or more of the plurality of regions which are flanked in the picture by a guard band region, wherein the guard band region shows a flanking portion of the projected plane flanking the one or more of the plurality of regions in the projected plane and being also shown in a corresponding subportion of one or more further regions of the projected plane which the picture is composed of; and derive preference data from the data stream for the predetermined guard band and forward the picture, at least partially, to a renderer for rendition of an output picture with informing the renderer depending on the preference data on which of the predetermined guard band and the corresponding subportion is to be used preferably in rendering the output picture with respect to the flanking portion.
The present invention provides an apparatus for forming a data stream having encoded thereinto a picture, the apparatus configured to insert first data into the data stream indicating a composition of the picture out of a plurality of regions of a projected plane which is projected onto a full view sphere according to a predetermined spherical projection scheme, insert second data into the data stream indicating for one or more of the plurality of regions that same are flanked in the picture by a guard band region, wherein the guard band region shows a flanking portion of the projected plane flanking the one or more of the plurality of regions in the projected plane and being also shown in a corresponding subportion of one or more further regions of the projected plane which the picture is composed of; and insert third data into the data stream indicating for the predetermined guard band as to which of the predetermined guard band and the corresponding subportion is to be used preferably in rendering an output picture with respect to the flanking portion.
In accordance with embodiments, the apparatus is configured to encode the guard band region at reduced SNR compared to the one or more regions flanked by the guard band region.
Method
The present invention provides a method for processing a data stream having encoded thereinto a picture, the method comprising deriving from the data stream a composition of the picture out of a plurality of regions of a projected plane which is projected onto a full view sphere according to a predetermined spherical projection scheme, deriving from the data stream an indication of one or more of the plurality of regions which are flanked in the picture by a guard band region, wherein the guard band region shows a flanking portion of the projected plane flanking the one or more of the plurality of regions in the projected plane and being also shown in a corresponding subportion of one or more further regions of the projected plane which the picture is composed of; and deriving preference data from the data stream for the predetermined guard band and forwarding the picture, at least partially, to a renderer for rendition of an output picture with informing the renderer depending on the preference data on which of the predetermined guard band and the corresponding subportion is to be used preferably in rendering the output picture with respect to the flanking portion.
The present invention provides a method for forming a data stream having encoded thereinto a picture, the method comprising inserting first data into the data stream indicating a composition of the picture out of a plurality of regions of a projected plane which is projected onto a full view sphere according to a predetermined spherical projection scheme, inserting second data into the data stream indicating for one or more of the plurality of regions that same are flanked in the picture by a guard band region, wherein the guard band region shows a flanking portion of the projected plane flanking the one or more of the plurality of regions in the projected plane and being also shown in a corresponding subportion of one or more further regions of the projected plane which the picture is composed of; and inserting third data into the data stream indicating for the predetermined guard band as to which of the predetermined guard band and the corresponding subportion is to be used preferably in rendering an output picture with respect to the flanking portion.
COMPUTER PROGRAM PRODUCT
The present invention provides a computer program product comprising instructions which, when the program is executed by a computer, causes the computer to carry out one or more methods in accordance with the present invention.
More detailed embodiments of the various aspects of the inventive approach will now be described with reference to following figures. Fig. 2 shows an example for an environment, similar to Fig. 1 , where embodiments of the present application may be applied and advantageously used. In particular, Fig. 2 shows a system composed of a server 100 and a client 200, like the system of Fig. 1. The server 100 and the client 200 may interact using adaptive streaming. For instance, dynamic adaptive streaming over HTTP (DASH) employing a media presentation description (MPD) may be used for the communication 310 between the server 100 and the client 200. However, the inventive approach described herein is not limited to DASH, and in accordance with embodiments, the inventive approach may be implemented using file format boxes. Thus, any term used hprpin ic. tn hp ! inHpr.c.tanH as h»inn hrosd so as to also cover manifes files defined differently than in DASH.
Fig. 2 illustrates a system for implementing a virtual reality application. For example, the system presents to a user wearing a head up display 204, e.g., using an internal display 206 of the head up display 204, a view section 208 of a temporally-varying spatial scene 210. The section 208 may correspond to an orientation of the head up display 204 that may be measured by an internal orientation sensor 212, like an inertial sensor of the head up display 204. Thus, the section 208 presented to the user is a section of the spatial scene 210, and the spatial position of the spatial scene 210 corresponds to the orientation of the head up display 204. The temporally-varying spatial scene 210 is depicted as an omnidirectional video or spherical video, however, the present invention is not limited to such embodiments. In accordance with other embodiments, the section 208 displayed to the user may from a video with a spatial position of the section 208 being determined by an intersection of a facial access or eye access with a virtual or real projector wall or the like. Further, the sensor 212 and the display 206 may be separate or different devices, such as a remote control and a corresponding television set. In accordance with other embodiments, the sensor 212 and the display 206 may be part of a hand-held device, like a mobile device, e.g., a tablet or a mobile phone.
The server 100 may comprise a controller 102, e.g., implemented using the signal processor 102 of Fig. 1 , and a storage 104. The controller 102 may be an appropriately programmed computer, an application-specific integrated circuit or the like. The storage 104 stores media segments which represent the temporally-varying spatial scene 210. The controller 102, responsive to requests from the client 200, sends to the client 200 the requested media segments together with a media presentation description and further information. The controller 102 may fetch the requested media segments from the storage 104. Within this storage 104, also other information may be stored such as the media presentation description or parts of the media presentation description. The client 200 comprises a client device or controller 202, e.g., implemented using the signal processor 202 of Fig. 1 , one or more decoder units 214 and a re-projector 216. The client device 202 may be an appropriately programmed computer, a microprocessor, a programmed hardware device, such as an FPGA, an application specific integrated circuit or the like. The client device 202 assumes responsibility for selecting the media segments to be retrieved from the server 100 out of one or more media segments 106 offered at the server 100. To this end, the client device 202 initially retrieves a manifest or media presentation description from the server 100. From the retrieved manifest, the client device 202 obtains a computational rule for computing addresses of one or more of the media segments 106 which correspond to certain, needed spatial portions of the spatial scene 210. The selected media segments are retrieved by the client device 202 from the server 100.
The media segments retrieved by the client device 202 are forwarded to the one or more decoders 214 for decoding. In the example of Fig. 2, the retrieved and decoded media segments represent, for a temporal time unit, a spatial section 218 of the temporally- varying spatial scene 210. As mentioned above, this may be different in accordance with other embodiments, where, for instance, the view section 208 to be presented constantly covers the whole scene. The re-projector 216 may re-project and cut-out from the retrieved and decoded scene content 218 (defined by the selected, retrieved and decoded media segments) the view section 208 to be displayed to the user. To this end, the client device 202 may continuously track and update a spatial position of the view section 208, e.g., responsive to the user orientation data from the sensor 212 and inform the re- projector 216 about the current spatial position of scene section 208 as well as of the reprojection mapping to be applied onto the retrieved and decoded media content so as to be mapped onto an area forming the view section 208. The re-projector 216 may apply a mapping and an interpolation onto a regular grid of pixels to be displayed on the display 206. Fig. 2 illustrates an embodiment where a cubic mapping has been used to map the spatial scene 210 onto the respective cube faces using for each face one or more tiles 220. In the depicted embodiment, each cube face has associated four tiles. The tiles 220 are depicted as rectangular sub-regions of the cube onto which the scene 210, which has the form of a sphere, has been projected. The re-projector 216 reverses the projection. However, the present invention is not limited to a cubic projection or cube mapping. In accordance with other embodiments, instead of a cubic projection, a projection onto a truncated pyramid or a pyramid without truncation may be used. In general, any polyhedron having n faces may be used. Although the tiles 220 are depicted to be non- overlapping in terms of coverage of the spatial scene 210, in accordance with other embodiments, some or all of the tiles 220 may at least partially overlap. In the embodiment depicted in Fig. 2, the whole spatial scene 210 is spatially subdivided into the ti!ss 220, and each of the six faces of the cube is subdivided into four tiles. For illustration purposes, the tiies 220 are numbered as tiles 1 to 24, of which tiles 1 to 12 are visible in Fig. 2. For each tile 220, the server 100 offers a video 108 which may be temporally subdivided into temporal segments 1 10. The server 100 may offer more than one video 108 per tile 220, the videos differing in quality Q1 , Q2 The temporal segments 1 10 of the videos 108 of all tiles T1 -T24 may form or may be encoded into one of the media segments 106 stored in the storage 104. It is noted that the tile-based streaming illustrated in Fig. 2 is merely an example from which many deviations are possible. For instance, a different number of tiles 220 may be used for some or all of the cube faces.
As described above, the omnidirectional video content to be presented to the user may undergo a projection to a rectangular video frame as used in traditional video services with non-omnidirectional video content. One flavor of those projections uses continuously differentiable functions to map 3D points to the picture plane, e.g. linear and trigonometric functions as in the Equirectangular projection. Another kind of these projections is based on geometric primitives with an integer number of surface planes, such as pyramids, cubes or other polyhedrons. 3D points are mapped to the faces of a polyhedron, typically using perspective projection to a camera point within the polyhedron, e.g. in the geometric center. A common example is the use of a regular symmetric six-sided cube, as described above with reference to Fig. 2, also referred to as the cubic projection. The faces of the polyhedron are then arranged into a rectangular video frame for encoding.
The mapping from the 3D points to a rectangular projected frame is defined by a projection that is signaled, e.g. in the bitstream. For example, in case of an Equirectangular projection this is signaled by a Equirectangular Projection supplemental enhancement information, SEI, message, or by a Projection type equal to 0 in the ProjectionFormatStructO of the ISO-BMFF (ISO base media file format). In case of a cubic projection this may be signaled by a CubeMap Projection, CMP, SEI message, or by a Projection type equal to 1 in the ProjectionFormatStructO of the ISO-BMFF. 1 st Aspect: Signaling of the Projection Coverage Type
When using a polygon based omnidirectional projection, like a cubemap projection, CMP, it may be desirable to cover less than the full surroundings, like 360° x 180° spherically speaking, for example, just the top sphere half.
From a system perspective, for example, in a MPEG OMAF (Omnidirectional MediA Format) context, a high-ievei coverage indication may be expressed through straightforward interpretably yaw and pitch ranges, optionally, plus the roll, as is used, for instance, via the coverage information box in OMAF. Dependent on the projection format used, a rectangular region of a projected frame may express the coverage information differently. For example, the coverage of a rectangular region of a CMP projected content may correspond to the surface of a sphere or a spherical region that is limited by 4 great circles, whereas the coverage of a rectangular region in an ERP projected content may correspond to the surface of a sphere or a spherical region that is limited by two great circles and two circles. Thus, two types of spherical regions may be defined, namely:
• a spherical region that corresponds to the surface of a sphere that is limited by four great circles each having a center coinciding with a center of the sphere, and the four great circles include two great circles (azimuth or yaw circles) limiting an azimuth interval and two great circles (elevation or pitch circles) limiting an elevation interval, and
• a spherical region that corresponds to two great circles (azimuth or yaw circles) limiting an azimuth interval, and two small circles (elevation or pitch circles) limiting an elevation interval, the great circles each having a center coinciding with a center of the sphere, and the small circles each having a center coinciding with an elevation axis of the sphere.
Fig. 3(a) illustrates the definition of the spherical region by four great circles. In Fig. 3(a) the sphere 400 represents the spatial scene 210 described above with reference to Fig. 2. Fig. 3(a) shows the spherical region 402 limited by the two azimuth or yaw great circles 406a, 406b limiting an azimuth interval 406, and by two elevation or pitch great circles 408a, 408b limiting an elevation interval 408. Further, Fig. 3(a) illustrates:
• a center point 410 of the spherical region 402 described by the coordinates centreAzimuth and centreElevation,
• a minimum azimuth location 412a of the spherical region 402 by the coordinate azimuth_min, a maximum azimuth location 412b of the spherical region 402 by the coordinate azimuth_max,
a minimum elevation location 414a of the spherical region 402 by the coordinate elevation_min, and
a maximum elevation location 414b of the spherical region 402 by the coordinate
Fig. 3(b) illustrates the definition of the spherical region by two great circles and two small circles. In Fig. 3(b) the sphere 400 represents the spatial scene 210 described above with reference to Fig. 2. Fig. 3(b) shows the spherical region 402 limited by the two azimuth or yaw great circles 406a, 406b limiting an azimuth interval 406, and by two elevation or pitch small circles 416a, 416b limiting an elevation interval 416. Further, Fig. 3(b) illustrates:
• a center point 410 of the spherical region 402 described by the coordinates centreAzimuth and centreElevation,
• a minimum azimuth location 412a of the spherical region 402 by the coordinate azimuth_min,
• a maximum azimuth location 412b of the spherical region 402 by the coordinate azimuth_max,
• a minimum elevation location 414a of the spherical region 402 by the coordinate elevation_min, and
• a maximum elevation location 414b of the spherical region 402 by the coordinate elevation_max. When considering a rectangular region of a CMP projected content and a certain coverage interval which corresponds to the surface of a sphere or a spherical region defined by the two great circles, which limit the yaw interval, and the two small circles, which limit the pitch interval, as is indicated in Fig. 3(b), the surface or region contains all samples of the certain coverage interval. Further, the surface or region contains only samples of the certain coverage interval. This is, however, not the case when the certain coverage interval is described by the 4 great circles, as is indicated in Fig. 3(a). There are other projections or other flavors of CMP, like equal area CMP, EAC, for which neither 4 great circles (Fig. 3(a)) nor 2 great circles and 2 small circles (Fig. 3(b)) translate to a rectangle in the projected content or frame. This may lead to a situation that the addressed subsection of the picture carrying the polygon based omnidirectional projection planes in some pre-defined or dynamic arrangement is not rectangular dependent on the projection characteristics.
Fig. 4 illustrates an example of a CMP picture with an indication of yaw/pitch angle ranges shown by the thick white lines and different picture subsets on the left and on the right. Along each !ine: either the pitch or yaw angle is constant. The area defined by the thick white lines on the left side of Fig. 4 is the coverage range 500, also referred to as a portion 500 of a full view sphere which is captured by the picture. On the right side of Fig. 4, the coverage range 500 is also referred to as a portion 500 of the full view sphere which is captured by the image. The pictures 504 are represented in Fig. 4 by the area surrounded by the thick back lines, and is also referred to as a section 504 of the full view sphere as sampled by samples of the picture. Moreover, Fig. 4 illustrates the lines L1 to L4, the thick white lines, defining the portion 500 of the full view sphere that is captured by the image or picture, and the lines L5 to L8 defining the section of the sphere actually sampled by the picture samples.
Video and transport systems, however, rely on rectangular video picture planes, as it is described above, for example, for efficient reusability through a motion-constrained tile set, MCTS, extraction in case of High Efficiency Video Coding, HEVC, coded video. The portions 500 in Fig. 4 are examples for the above mentioned video subsections. For different use-cases different types of signaling may be needed. For example, in accordance with the first use case, only a part of a sphere is transmitted, like in situations in which premium users may use 360 video while non-premium users may only use 180 x 75 video. In another use case, the 360 video content may be split into multiple regions so as to be able to provide tilted streaming or sub-picture streaming which allows for an efficient viewport dependent video streaming.
For the first use case, when the 360 CMP content is scrubbed out from the whole content, samples outside the selected coverage area may be included into the transmitted content, however, such samples which lay outside the coverage range are not intended for presentation so that a signaling of only 180 x 70 may be sufficient.
For the second use case, the separate streams, for example, if a face of a CMP is split into 4 x 4 streams, are intended to be played back together. If the signaling used corresponds to the coverage range for which all samples of the defined coverage are present, when combining the ranges from the multiple streams, missing ranges may appear. Such a signaling may be misleading since it may be interpreted as if samples of the sphere are missing, even though the full sphere is there. To be able to interpret whether this is the case or not, full knowledge of the used projection and the flavor, e.g., CMP, Adaptive Entropy Coding, AEC, and the like, is required, which is burden for the servers, the Media Aware Network Elements, MANEs, the device encapsulating the stream Into different formats like an mo4 file a rea!-time orotocol. RTP. stream and the like.
To address this discrepancy, the first aspect of the present invention described herein is directed to coverage information signaling that may be accompanied by additional signaling or a constraint, like supplemental enhancement information, SEI, video usability information, VUI, a Media Presentation Description, MPD, and the like, that indicates whether a coverage indication is one or a combination of the following:
• a maximum coverage signaling, like for picture 504 in Fig. 4 on the left, i.e., there are no video pixels located outside the indicated coverage range 500, and/or
• a minimum coverage signaling, like for picture 504 in Fig. 4 on the right, i.e., the indicated coverage area 500 is located completely within the video picture 504 and some pixels of the video picture 504 are located outside the indicated coverage range 500.
In accordance with embodiments, the signaling may include an identifier so as to indicate
• the exact match for coverage,
• a minimum coverage,
• a maximum coverage, or
· both the minimum and maximum coverage.
Fig. 5(a) illustrates an embodiment for such an indicator referred to as explicit_coverage_idc (second data) and specifying the meaning of the coverage (first data) defined by the parameters azimuth_min, azimuthjnax, elevation_min and elevation_max, as described, for example, above with reference to Fig. 3(a) and Fig. 3(b).
In accordance with embodiments, a value "0" for explicit_coverage_idc indicates that all samples in the output picture 504 are within the indicated coverage 500 and contain all necessary samples to represent the indicated coverage. This means that samples at the border of the output picture have an azimuth equal to azimuth_min or azimuthjmax or an elevation equal to elevationjnin or elevation_max. A value of "1" for explicit_coverage_idc indicates that all samples in the output picture 504 necessary to represent the indicated coverage 500 are present and further samples may be present, as is illustrated in Fig. 4 on the right. This means that samples of the output picture 504 have an azimuth equal to azimuthjnin or azimuth_max or an elevation equal to or smaller than elevation min or equal to or bigger than elevation max.
A value of "2" for explicit_coverage_idc indicates that all samples in the output picture 504 are within the indicated coverage 500 but not all samples necessary to represent the indicated coverage 500 are present, as is illustrated in Fig. 4 on the left. This means that samples at the border of the output picture having a azimuth equal to azimuth_min or azimuth_max or an elevation equal to or bigger than elevation_min or equal to or smaller than elevation_max. In accordance with yet other embodiments, both the minimum and maximum coverage may be given, as is indicated in Fig. 5(b) in which a value "3" for the explicit_coverage_idc is indicated, meaning that all samples the output picture 504 necessary to represent the indicated coverage 500 are present and further samples may be present. In addition, the elevation_min_offset and elevation_max_offset are indicated and for the coverage range defined by azimuth_min, azimuth_max, elevation_min minus elevation_min_offset and elevation_max plus elevation_max_offset all samples in the output picture 504 are within the indicated coverage 500 but not all samples necessary to represent the coverage 500 are present. In other words, in accordance with embodiments of the first aspect a data stream 510, see Fig. 6, may be provided which has encoded thereinto a picture 512. Fig. 6 schematically illustrates, in combination with a system as described with reference to Fig. 2, the encoding/decoding of a data stream, the composition of the pictures of the video out of one or more rectangular regions of the projected plane and the mapping between a projected plane and a full view sphere.
The data stream 510 comprises first data, like the ranges of yaw and pitch angles indicated, e.g., by azimuth_min, azimuth_max, elevation_min elevation_max as described above. The first data describes the coverage or first portion 500 of a full view sphere 516 in Fig. 4 (or 400 in Fig. 3(a) and Fig. 3(b)). The first portion 500 of the full view sphere may be indicated in a non-rotated state, i.e. in a state not rotated to a real world or global coordinate system. The first portion 500 may capture a scene within a portion rotated relative to the portion indicated by the first data which is captured by the picture 512, which may be an overall picture or a MCTS extracted partial picture. Second data is included in the data stream, like the identifier explicit_coverage_idc discussed above. For example, when explicit_coverage_idc == 1 , the first data defines or indicates the portion full view sphere 516 sampled by the samples 519 of the picture, as is indicated in Fig. 4 on the right. When explicit_coverage_idc == 2, the first data defines or indicates the portion 500 such that the section 504 of the full view sphere 516 sampled by the samples 519 of the picture completely resides within the first portion 500, as is indicated in Fig. 4 on the right.
In accordance with further embodiments the portion 500 of the full view sphere 516 which is captured by the picture 512 is indicated by the second data, and the receiving network entity may know the convention in interpreting the first data and the second data. E.g., when explicit_coverage_idc == 3, azimuthjnin, azimuth_max, elevationjnin, elevation_max define the second data, and azimuth_min, azimuth_max, elevationjnin - elevation_min_offset, elevation_max + elevation_max_offset define the first data, meaning that the first data indicates the portion 500 such that the section 504 of the full view sphere 516 sampled by the samples 519 of the picture 512 completely resides within the portion 500, as is indicated in Fig. 4 on the left, and the second data indicates that the portion 500 of the full view sphere 516 which is captured by the picture 512 completely resides within the section 504, as is indicated in Fig. 4 on the right. In accordance with embodiments, the portion 500 is confined by two lines of constant pitch or constant elevation (see the upper and lower lines L1 , L2 of the cushion like areas in Fig. 4, and two lines of constant yaw (see the left and right lines L3, L4 of the cushion like areas in Fig. 4 which are straight in this example because CMP projection is used). In accordance with embodiments, the first data comprises
• a first syntax element, like elevationjnin, indicating a pitch angle for a first line L1 of constant pitch,
• a second syntax element, like elevationjnax, indicating a pitch angle for a second line L2 of constant pitch,
· a third syntax element, like azimuthjnin, indicating a yaw angle for a third line L3 of constant yaw, and • a fourth syntax element, like azimuth_max, indicating a yaw angle for a fourth line L4 of constant yaw.
If the first data indicates or defines the portion 500 so that the section 504 of the full view sphere sampled by the picture's samples completely resides within the portion 500 (Fig. 4
i
indicates the portion 500 so that the portion 500 completely resides within the section. 504, the first to fourth lines L1 -L4 completely extend within the section 504 (Fig. 4 - right). In accordance with other embodiments, the second data may indicate the portion 500 of the full view sphere 516 which is captured by the picture 512. The first data may indicate the portion 500 so that the section 504 of the full view sphere sampled by the samples of the pictures completely resides within the portion 500, and the second data may indicate the portion 500 so that the portion 500 completely resides within the section 504. The second data may have
• a fifth syntax element indicating a pitch angle for a fifth line of constant pitch, and
• a sixth syntax element indicating a pitch angle for a sixth line of constant pitch.
The first to fourth lines L1 -L4 confine all samples of the picture, and fifth and sixth lines L5. L6 along with seventh and eighth lines L7, L8 being of constant yaw and pitch, respectively, equal to yaw and pitch angles indicated by the third and fourth syntax element completely extend within the section 504.
When making use of the azimuth_min/max_offset, the second data may further have · a seventh syntax element indicating the yaw angle for the seventh line of constant yaw, and
• an eight syntax element indicating the yaw angle for the eight line of constant yaw.
In accordance with embodiments, explicit_coverage_idc == 3 may be set fixedly so that azimuth jnin, azimuth_max, elevationjnin, elevation_max define the second data and azimuthjnin, azimuth_max, elevationjnin - elevation_min_offset, elevation_max + elevation_ max_offset define the first data. According to this embodiments, the receiving network entity may know the convention in interpreting the first data and the second data. The first data indicates the portion 500 of the full view sphere which is captured by the picture. The first data indicates or defines that the section 504 of the full view sphere sampled by the picture's samples completely resides within the portion 500, and the second data indicates the portion so that the portion 500 completely resides within the section 504. The first data comprises
• a second syntax element indicating a pitch angle for a second line of constant pitch. The second data comprises
• a fifth syntax element indicating a pitch angle for a fifth line of constant pitch, · a sixth syntax element indicating a pitch angle for a sixth line of constant pitch,
• a seventh syntax element indicating a yaw angle for a seventh line of constant yaw, and
• an eight syntax element indicating a yaw angle for an eight line of constant yaw. First and second lines along with third and fourth lines being of constant yaw and pitch, respectively, equal to yaw and pitch angles indicated by the seventh and eighth syntax elements confine all samples of the picture, and the fifth to eighth lines completely extend within the section 504. The fifth and sixth syntax elements and the first and second syntax elements may be coded differentially to each other, i.e., merely the offset is transmitted, e.g., only a offset with respect to pitch/elevation
The data stream may additionally comprise third data for a mapping between a packed frame and a projected plane, the third data indicating one or more rectangular regions out of a projected plane onto which the full view sphere is projected according to a predetermined projection scheme, like the actually used projection scheme, e.g. CMP, which the one or more rectangular regions the picture is composed of. The data stream may include fourth data indicating the predetermined projection scheme, like an identifier indexing one of a plurality of spherical projections, e.g. ERP or CMP.
The fourth data may indicate a rotation of the full view sphere relative to a global coordinate system, e.g., in terms of pitch, yaw and roll. The indication of the portion may relate to the full view sphere in a situation rotated or in a situation not-rotated relative to the global coordinate system according to the fourth data. The second data may indicate whether the first data indicates the portion so that a section 504 of the full view sphere sampled by the picture's samples completely resides within the portion, or whether the first data indicates the portion so that the portion completely resides within the section 504, and a scope of the second data may be larger than a nf iha firet Hois Ths e.^rsno nf thp firQt riota ma\; o a nir-ti ire ΧΛ/ico srnnp
In accordance with embodiments, the second data may indicate that a section 504 of the full view sphere sampled by the picture's samples is exactly surrounded or circumferenced by the portion.
The data stream may have encoded thereinto a content of the picture along with additional picture content, and may have encoded thereinto the picture, like a MCTS extracted picture, in a subportion of the data stream in a motion constrained manner along with extraction information which indicates how to derive a data stream specifically having encoded thereinto the picture from the subportion of the data stream.
The data stream may have encoded thereinto a further picture in a further subportion of the data stream in a motion constrained manner along with further extraction information which indicates how to derive a further data stream specifically having encoded thereinto the further picture from the further subportion of the data stream, wherein the data stream comprises further first data indicating a further portion of the full view sphere which is captured by the further picture, wherein the second data indicates whether the first data indicates the portion so that a section 504 of the full view sphere sampled by the picture's samples completely resides within the portion, or whether the first data indicates the portion so that the portion completely resides within the section 504. The second data may also be valid for the further first data, and second data is common for two MCTS extractions Further embodiments provide an apparatus for forming, not necessarily including the actual encoding/compression, the above described data stream, e.g., a server as depicted in Fig. 1 or in Fig. 2. The apparatus inserts into the data stream the first data and the second data described above in more detail.
Further embodiments provide s network entity receiving the data stream described above and exploiting the data in the data stream, e.g., a client as depicted in Fig. 1 or in Fig. 2. The network entity may process the data stream depending on the comparison to decide on performing motion constrained tile set extraction one the data stream so as to obtain a data stream specifically having encoded thereinto the picture from a subportion of the data stream depending the match, and form an MPD including one or more representations offering the data stream or MCTS extracted versions thereof, depending on the match.
2nd Aspect: Signaling of Static Projection or Region-Wise-Packing Characteristics
However, there are also use cases for which a device has lower capabilities and may not adapt the rendering engine so guickly to rotation variations on an AU basis or on the basis of a region-wise-packing. For such a use case, it may be interesting to be able to signal that the projection characteristics are static, which may then be used for content selection in an MPD for DASH (Dynamic Adaptive Streaming over http) or for a capabilities negotiation with a Session Description Protocol, SDP, for RTP streaming. This allows media aware network elements, MANEs, to parse less SEIs as the elements may be aware that the information is static. This, in turn, leads to simple implementations and less complexity. In addition, services like DASH may be used for offering content into several adaption sets, where different adaption sets may include content corresponding to a different coverage, and the content may be split into two different regions. For DASH clients to be able to perform the content selection based on the current viewport, they need to know that the content in each of the representations in the adaption sets have static projection characteristics so that as long as the client is interested in the same viewport no adaption with regard to the request is needed. At the same time, a 360 video might be consumed in other devices different from a head mounted device, HMD, having limited interactivity, like a TV set. Sometimes, it may be interesting for content providers to offer a guided representation that may correspond to the director's cut or the most viewed viewport or the like. In such a case, a single bitstream offered in DASH may correspond to a dynamic viewport cropped out from the sphere, and for such use cases, dynamic projection characteristics may be desired.
Thus, the above-mentioned, different use cases, and also other use cases which the skilled person will readily recognize, may require dynamic or static projection characteristics, and the present invention teaches signaling such characteristics in the bitstream. For example, this allows exposing such information to higher levels for content selection or negotiation. Further, this information may be used for conforming a validation of bitstreams or restrictions imposed by further application standards. Therefore, the second aspect of the inventive approach may be considered a guarantee to the decoder or an intermediate device between the encoder and the decoder, like a media aware network element, MANE, that the projection format type or other characteristics of the video within the CVS are static and are not going to change, or are dynamic or non- o ia uu cSi i i en vi i si iyc. in accordance with embodiments of the second aspect of the present invention, additional information indicating whether a projection is static or dynamic, i.e., is allowed to change, over a certain period, like until the end of the video is indicated together with the encoded video inside a data stream which also includes data indicating the projection of pictures of the video onto a full view sphere.
In accordance with a first embodiment, the additional data may be represented by additional bits in the data stream, like general_reserved_zero_Xbits also referred to as general_constraint_projection_flag (second data). Setting the general_constraint_projection_flag to "1 " specifies in the data stream that, in case an equirectangular projection SEI message or a cubemap projection SEI message is present in the CVS, the result is the same as if there is a single SEI for an AU of an Intra Random Access Point, IRAP, with the erp_persistence_flag being set to "1 ". This means that projection information, like rotation, coverage and the like, does not change within the CVS. When setting the general_constraint_projection_flag to "0", this indicates that this information may change within the AUs of the CVS. In other words, the signaling constraints the values of the syntax elements present for the SEI message describing the projection (no restriction, or syntax elements need to be syntactically equivalent), for example, the equirectangular projection SEI message or cubemap projection SEI message.
Rather than providing a general indication in the data stream, in accordance with other embodiments, within the equirectangular projection SEI message or CMP projection SEI message the existing persistence_flag may be changed to a persistence_type. In accordance with embodiments, the persistence_type may have the following values:
• persistence_type = 0: this refers to a convention or current AU
• persistence_type = 1 : when considering picA to be a current picture, the persistence_type being equal to "1 ", specifies that the projection SEI messages persists for the current layer in the output order until one or more of the following conditions are true:
- a new coded layer-wise video sequence, CLVS, of the current layer begins,
- the bitstream ends,
- a picture picB in the current layer in an access unit containing an equirectangular
Figure imgf000032_0001
SP! mscisfip that its sn !i^ah!p n tho n jrront laye-r ic r>l itni it fnr whirth
PicOrderCnt(picB) is greater than PicOrderCnt(picA), where PicOrderCnt(picB) and PicOrderCnt(picA) are the PicOrderCntVal values of picB and picA, respectively, immediately after the invocation of the decoding process for the picture order count for picB.
• persistence_type = 2: this indicates that there will not be any change within the CVS. For example, when considering picA to be the current picture, the persistence_type being equal to "2" specifies that the projection SEI message persists for the current layer in the output until one or more of the following conditions are true:
- an new CLVS of the current layer begins, and
- the bit stream ends.
In accordance with another embodiment of the second aspect of the inventive approach, a flag may be added to the projection SEIs, and persistency may be reinterpreted so as to distinguish persistence_type 1 and persistence_type 2 dependent on the flag value.
The above embodiments of the second aspect of the inventive approach have been described with reference to the projection characteristics, however, the same additional information to be provided in the data stream as defined for the projection may also be applied for region-wise-packing. With region-wise-packing regions in a projected plane are defined and mapped to a packed frame. Those regions may appear at different positions on the projected and packed frame and may undergo further image transformations such as scaling, rotation, mirroring and the like. In a similar way as described above with regard to the projection, also for region-wise-packaging, it may be indicated whether the characteristics of the video within the CVS are static or dynamic, i.e., are not going to change or are allowed to change.
In other words, in accordance with embodiments of the second aspect a data stream 510, see Fig. 6, may be provided which has encoded thereinto a picture 512. The data stream 510 comprises first data indicating a projection of the picture 512 of the video 513 onto a full view sphere 516' (or 402 in Fig. 3(a) and Fig. 3(b)). In addition, the data stream includes second data, like the above mentioned persitance_flag, which when having a value of "2" indicates that the projection persists or is static until an end of the video, or which, when having a value of "0" or "1 " indicates that the projection is allowed to change or is dynamic during the video.
In accordance with embodiments, the first data may be updated intermittently in the data stream between consecutive pictures and remains constant at each update if the second data indicates that the projection persists until an end of the video, and is free to vary at each update if the second data indicates that the projection is allowed to change during the video. The first data may be updated in the data stream on a per picture basis. The second data may indicate the projection to be static or dynamic until a next update of the first data or only for the current picture. The first data may indicate one or more of
• a predetermined projection scheme mapping 522 between a projected plane 524, which may be continuous such as ERP, and the full view sphere 516 one or more rectangular regions 526 (or 402 in Fig. 3(a) and Fig. 3(b)) of which are contained in each of the pictures of the video,
· a composition 528 of the pictures 512 of the video out of one or more rectangular regions 526 (or 402 in Fig. 3(a) and Fig. 3(b)) of the projected plane 524; and
• a rotation 530 of the full view sphere 516, 516'.
The data stream may also include coverage information data indicating a portion 500 of the full view sphere 516' which is captured by the projection of the picture onto the full view sphere, as is indicated in Fig. 4. Like in the above described first aspect, the first data may indicate the portion 500 so that a section 504 of the full view sphere sampled by the picture's samples completely resides within the portion 500, or so that the first data indicates the portion 500 so that the portion 500 completely resides within the section 504.
In accordance with further embodiments, second data may further indicate whether the portion or projection id static or persists until an end of the video, e.g., in case persitance_flag = 2, or whether the portion is dynamic or allowed to change during the video, e.g., on case persitance_flag = 0 or persitance_flag = 1. Further embodiments provide a network entity processing the data stream described above and exploiting the data in the data stream, e.g., a client as depicted in Fig. 1 or in Fig. 2. The network entity may derive from the first data of the data stream an indication of a projection of pictures of the video onto a full view sphere, and from the second data of the data stream an indication whether the projection persists until an end of the video, or ¾r f
Figure imgf000034_0001
ic? +n Joh a nna fhi¾
processes the data stream depending on whether the projection persists until an end of the video, or whether the projection is allowed to change during the video. The processing may include deciding on transcoding, modifying or forwarding the data stream, or not transcoding, modifying or forwarding the data stream, depending whether the projection persists until an end of the video, or whether the projection is allowed to change during the video.
Further embodiments provide an apparatus for forming, not necessarily including the actual encoding/compression, the above described data stream, e.g., a server as depicted in Fig. 1 or in Fig. 2. The apparatus inserts into the data stream the first data and the second data described above in more detail.
3rd Aspect: Signaling of Guard Band Usage
In accordance with embodiments of the third aspect of the present invention, so-called guard band regions may be included in a video picture, for example an omnidirectional video like CMP. These guard band regions extend or flank a given region using picture content of spatially neighboring regions, for example in the 3D space, and thereby, effectively, duplicate content. Fig. 7 illustrates an embodiment for a possible guard band layout around six faces of a CMP. In Fig. 7, the guard band region is indicated by the dashed regions labeled "guard band region". The picture boundaries and regions that do not share a common edge are padded, i.e., the top and bottom row of faces. The area 532 indicates the guard band area of the left CMP face that is spatially neighboring to the top CMP face.
The guard bands are provided to serve, for example, the following two purposes:
• for video coding these regions allow that artifacts stemming from video coding around the boundaries, like bleeding from one region into the other, are limited to the guard band regions, • for rendering the doubled content may be used for blending between two representations of the content when rendering the respective video area in a spatially neighboring fashion in a 3D space thereby avoiding the occurrence of rendering seems.
!n conventional approaches for example in the Q AF soenification there is an ootion to indicate to a client that a guard band region has a quality that changes from the quality of the flanked region to the quality of the region neighboring the flanked region. In the above embodiment depicted in Fig. 7, the guard band regions 530 and 532 duplicating content of the top CMP face may be indicated to have a quality that goes from the respective quality of the left CMP face to the quality of the top CMP face. However, there are scenarios in which it is uncertain which is the quality of the faces. For example, if dynamic streaming is used with tiled streaming each CMP face as a tile may be available at different qualities, so that it is impossible to signal for all possible different setups that the quality gradually changes from one quality of the flanked region to the quality of the neighboring region.
Another possibility is to just signal that the quality decreases or increases with respect to the flanked region, however, only with that information it is not clear to the client how to interpret the indication, i.e., which samples, like the original top CMP face samples versus the guard band samples, to rely on during rendering. For example, in a scenario in which the top CMP face itself is depicted in reduced spatial quality, like a reduced resolution, within the video picture, as is illustrated in Fig. 8, it may well be that using the guard band samples leads to higher visual quality of the respective region after rendering then using the original CMP face samples. Fig. 8 illustrates an example of a mixed resolution CMP with guard band regions around the left CMP face. In the example depicted in Fig. 8, one of the CMP faces has a higher quality, like a higher resolution, when compared to all other CMP phases. In the depicted example, it is assumed that the left CMP face has a higher resolution than the top, bottom, back, right, front CMP faces. Fig. 8 also illustrates the guard band region around the left CMP face and the doubled content 530, 532 also referred to above with reference to Fig. 7.
In the example of Fig. 8 which involves CMP faces with varying resolutions, the CMP faces may have equal spatial quality but may vary in other quality dimensions, for example with respect to the quantization in lossy video coding, such as HEVC. Additionally, even without obvious quality differences in resolution or quantization, the projection mapping used to generate the samples within each CMP face may lead to the fact that the guard band region is sampled at a finer granularity than a respective original region. In such a case, and following the layout of Fig. 7, an approach is needed to enable a server or content author to advise a client on how to benefit from the content characteristics during rendering.
Thus, embodiments of the third aspect of the Invention give a client device an indication or recommendation of how to use the guard band pixels in rendering, for example by indicating
• a client that it should not use the guard band for rendering, for example by setting a do_not_use_guard_band_for_rendering_at_all_flag,
• a quality ranking of the guard band region with respect to the original region using a common quality ranking scale,
• that the client, based on a common quality ranking scale, is to perform
- a weighted blending between the guard band region and the original region using a certain function, like a weighting function slope from a quality difference, or
- to use one of the guard band region and the original region exclusively.
Fig. 9 illustrate the syntax that may be used according to an embodiment of the third aspect of the present invention based on a region-wise packing box "rwpk" defined in OMAF. As is highlighted in Fig. 9 the indicator RectRegionPacking(i) may be included into a data stream including the encoded video.
The following semantics may apply. In Fig. 9, the gb_not_used_for_pred_flag[i] (first data), when being set to "0", specifies that the guard bands may or may not be used, for example during the inter-prediction process. Setting gb_not_used_for_pred_flag[i] to "1" specifies that the sample values of the guard bands are not to be used in the inter- prediction process. For example, when gb_not_used_for_pred_flag[i] is set to "1" the sample values within the guard bands in the decoded pictures may be rewritten even if the decoded pictures are used as references for inter-prediction of subsequent pictures to be decoded. For example, the content of the packed region may be seamlessly expanded to its guard band with decoded and re-projected samples of another packed region.
Packing_type[i] (second data) in Fig. 9 indicates for one or more of the plurality of regions that same are flanked in the picture, not necessarily at all sides, by a guard band region, e.g. the region or area above region "left" in Fig. 7. The parameter gb_type[i][j] (third data) specifies the type of guard bands for the i-th packed region as follows, with j equal to 0, 1 , 2, and 3 indicating that the semantics below apply to the left, right, top and bottom edge, respectively, of the guard band of the i-th packed region.
· gb_type[i]fj], when being set to "0" specifies that the content of the guard bands in relation to the content of the packed regions is unspecified.
• In accordance with embodiments, gb_type[i]fj], is not set to "0" when the gb_not_used_for_pred_flag[i] is set to "0".
• gb_type[i][j], when being set to " specifies that the content of the guard bands is sufficient for an interpolation of sub-pixel values within the packed region and is less than one pixel outside of the boundary of the packed region.
• gb_type[i] ], when being set to 2, specifies that the content of the guard bands represents actual image content at a quality that gradually changes from the picture quality of the packed region to that of the spherically adjacent packed region.
· gb_type[i] ], when being set to 3, specifies that the content of the guard bands represents actual image content at the picture quality of the packed region.
• Qb_type[i]0], when being set to Y, specifies that content of the guard bands of a packed region represents an actual image content at the picture quality higher than that of the spherically adjacent packed region, i.e., the original region.
In accordance with further embodiments of the third aspect of the present invention, the use_gb_for_rendering_preferably_flag[i] may be added to the data stream having a syntax as shown in Fig. 10. The following semantics may apply. The use_gb_for_rendering_preferably_flag[i], when being set to 0 specifies that the guard bands may or may not be used in the rendering process, whereas when setting the flag to "1", this specifies that the sample values of the guard bands are to be used preferably in the rendering process.
In accordance with yet another embodiment of the third aspect of the present invention, a region-wise quality ranking is facilitated, for example a spherical region-wise quality ranking "srqr" or a 2D region-wise quality ranking "2dqr" as is reproduced in Fig. 11 including the data element "quality_ranking". The regions defined in Fig. 11 in the 2D region quality ranking box (num_regions) and the associated quality ranking (quality_ranking) may be separate from the regions defined in the RectRegionPacking in the RegionWisePackingStruct, meaning that the quality ranking has the freedom to treat samples of the guard bands as separate regions with their own separate quality ranking values.
The signaling of the quality ranking, as suggested in Fig. 1 1 , may be mandated/recommended so as to enforce a quality ranking of CMP faces and guard band regions on the basis of a common quality scale, and, for example, a further gb type syntax element value may be included so as to trigger the following:
• - gb_type[i][j], when being set to X, specifies that the client's side rendering decisions should regard the respective quality ranking values, quality_ranking as shown in Fig. 1 1 , for example, for weighting or a transition zone size within a blending function between the guard band region and the original region in the rendering process.
In other words, in accordance with embodiments of the third aspect a data stream 510, see Fig. 6, may be provided which has encoded thereinto a picture 512. The data stream may be in file format and have an extractor track which copies the plurality of regions from other tracks of the data stream. The copying may result in a composition of the picture in units of tiles of the data stream, each tile comprising one or more regions along with a guard band region flanking same.
The data stream 510 comprises first data , like RectRegionPacking(i), which indicates a composition 528 of the picture 512 out of a plurality of regions 526 (or 402 in Fig. 3(a) and Fig. 3(b)), e.g., the regions "right", "left", "top", "bottom", "front", "back" in Fig. 7, of a projected plane 524, e.g. section 500 in Fig. 4, which is projected 522 onto a full view sphere 516 (or 402 in Fig. 3(a) and Fig. 3(b)) according to a predetermined spherical projection scheme 522.
The data stream further includes second data , like packing_type[i] of Fig. 9, indicating for one or more of the plurality of regions 526 that same are flanked in the picture 512, not necessarily at all sides, by a guard band region 530, e.g. the region or area above region "left" in Fig. 7. The guard band region 530 shows a flanking portion of the full view sphere 516 flanking the one or more of the plurality of regions 526 in the full sphere view and being also shown in a corresponding subportion 532, like the upper subportion of region "top" in Fig. 7, of one or more further regions 526 of the projected plane 524 which the picture 512 is composed of. The flanking portion is the projection of the guard band region onto the sphere. The flanking portion is the portion coded by the guard band region. Thus, the flanking portion may be the region within the projected frame, like the frame 500 depicted in Fig. 4, which the guard band region codes. The flanking portion may be obtained in the sphere depicted in Fig. 3, for instance, when the projected plane is mapped onto the sphere 400. 4- _>
Figure imgf000039_0001
guard band region which of the guard band region 530 and the corresponding subportion 532 is to be used preferably in rendering an output picture 534 with respect to the flanking portion. For example, the third data may indicate whether one is to be used exclusively, or whether both are used with a respective weighting more in rendering or the like. The third data may indicate a weight at which the guard band region and the corresponding subportion should be blended in rendering the output picture with respect to the flanking portion. In accordance with embodiments, the first data may indicate qualities, like e.g. the quality_ranking of Fig. 11 for the flanked regions, for the regions at which the regions are represented by the picture. The third data may indicate a quality, e.g. the quality_ranking of Fig. 11 for the guard band region, for the guard band region such that the qualities for the regions at which the regions are represented by the picture and the quality for the guard band region are defined on a common ordinal scale. Thus, a comparison of the qualities directly hints which of the guard band region and the corresponding subportion is better in quality, i.e., in general including all flavors of quality such as SNR and spatial resolution. In accordance with further embodiments, the third data may additionally indicate, e.g. the gb_type[i][j] equals to X in Fig. 9, whether or not the quality, like the quality_ranking of Fig. 11 , for the guard band region is to be used, by comparison of the quality for the guard band region with the qualities for the regions, like the quality_ranking of Fig. 11 , as the indication of as to which of the predetermined guard band region 530 and the corresponding subportion 532 is to be used preferably in rendering the output picture 534 with respect to the flanking portion.
The first data may indicate for the regions at which quality the regions are represented by the picture. The quality may pertain the spatial resolution, e.g., by the ratio packed_reg_width/proj_reg_width which, when being larger, indicates larger quality, and is present per region i. In accordance with embodiments, the guard band region may represented by the picture at a reduced quality, like a reduced SNR. The quality may be reduced compared to the one or more regions flanked by the guard band region, e.g., inherently indicated by the second data on the basis of the pure existence of the guard band region. In other words, a guard band region may, per definition, be a region coded with reduced aua!itv compared to the one or more regions it flanks.
In accordance with embodiments, the third data may indicate, e.g., using the gb_for_rendering_preferably_flag in Fig. 10, one of a group of preference options including two or more of
• the guard band region is not to be used in rendering the output picture with respect to the flanking portion;
• the guard band region may optionally be used in rendering the output picture with respect to the flanking portion;
• the corresponding subportion is not to be used in rendering the output picture with respect to the flanking portion;
• the guard band region and the corresponding subportion should be used in rendering the output picture with respect to the flanking portion.
The third data may comprises an indicator, e.g., e.g. gb_type[i][jj of Fig. 9, for the guard band region assuming one of a group of possible states, the group comprising
• one or more states, e.g. when gb_type == 1 , which when assumed by the indicator, indicate that a quality at which the one or more region flanked by the guard band region is represented by the picture is greater than a quality at which the guard band region is represented by the picture or that the guard band region may merely be used for rendering up to a predetermined maximum distance from the one or more regions flanked thereby; and/or
• one or more states, e.g. when gb_type == 0, which when assumed by the indicator, indicate an unspecified quality relationship between a quality at which the one or more region flanked by the guard band region is represented by the picture and a quality at which the guard band region is represented by the picture, and/or
• one or more states, e.g. when gb_type == 3, which when assumed by the indicator, indicate an equality between a quality at which the one or more region flanked by the guard band region is represented by the picture and a quality at which the guard band region is represented by the picture; and/or • one or more states, e.g. when gbjype == 2, which when assumed by the indicator, indicate a gradual transition of a quality at which the guard band region is represented by the picture from the one or more region flanked by the guard band region to the one or more further regions; and
· further comprising one or more states, e.g. when gb_type == Y, which when assumed by the indicator, indicate that quality at which the guard band region is represented by the picture is higher than a quality at which the one or more region flanked by the guard band region is represented by the picture. In accordance with embodiments, the third data performs the indication guard band region individually, e.g. for each i, or even for each j, i.e., each side of the guard band.
Further embodiments provide a collection of the above described data streams. The data streams may differ in qualities of tiles corresponding in other terms of full view sphere coverage and in the third data with respect to a guard band region corresponding in other terms of full view sphere coverage.
Further embodiments provide an apparatus, e.g., a network entity, processing the data stream described above and exploiting the data in the data stream, like a client as depicted in Fig. 1 or in Fig. 2. The apparatus derives from the data stream a composition of the picture out of a plurality of regions of a projected plane which is projected onto a full view sphere according to a predetermined spherical projection scheme. Further, the apparatus derives from the data stream an indication of one or more of the plurality of regions which are flanked in the picture by a guard band region. The guard band region shows a flanking portion of the projected plane flanking the one or more of the plurality of regions in the projected plane and is also shown in a corresponding subportion of one or more further regions of the projected plane which the picture is composed of. The apparatus derives preference data from the data stream for the predetermined guard band and forwards the picture, at least partially, to a renderer for rendition or rendering of an output picture with informing the renderer depending on the preference data on which of the predetermined guard band and the corresponding subportion is to be used preferably in rendering the output picture with respect to the flanking portion.
Further embodiments provide an apparatus for forming, not necessarily including the actual encoding/compression, the above described data stream, e.g., a server as depicted in Fig. 1 or in Fig. 2. The apparatus inserts into the data stream the first, second and third data described above in more detail. The apparatus may also encode the guard band region at a reduced SNR compared to the one or more regions flanked by the guard band region.
apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or a device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.
Various elements and features of the present invention may be implemented in hardware using analog and/or digital circuits, in software, through the execution of instructions by one or more general purpose or special-purpose processors, or as a combination of hardware and software. For example, embodiments of the present invention may be implemented in the environment of a computer system or another processing system. Fig. 12 illustrates an example of a computer system 550. The units or modules as well as the steps of the methods performed by these units may execute on one or more computer systems 550. The computer system 550 includes one or more processors 552, like a special purpose or a general purpose digital signal processor. The processor 552 is connected to a communication infrastructure 554, like a bus or a network. The computer system 550 includes a main memory 556, e.g., a random access memory (RAM), and a secondary memory 558, e.g., a hard disk drive and/or a removable storage drive. The secondary memory 558 may allow computer programs or other instructions to be loaded into the computer system 550. The computer system 550 may further include a communications interface 560 to allow software and data to be transferred between computer system 550 and external devices. The communication may be in the from electronic, electromagnetic, optical, or other signals capable of being handled by a communications interface. The communication may use a wire or a cable, fiber optics, a phone line, a cellular phone link, an RF link and other communications channels 562.
The terms "computer program medium" and "computer readable medium" are used to generally refer to tangible storage media such as removable storage units or a hard disk installed in a hard disk drive. These computer program products are means for providing software to the computer system 550. The computer programs, also referred to as computer control logic, are stored in main memory 556 and/or secondary memory 558. Computer programs may also be received via the communications interface 560. The computer program, when executed, enables the computer system 550 to implement the present invention. In particular, the computer program, when executed, enables processor 552 to implement the processes of the present invention, such as any of the methods described herein. Accordingly, such a computer program may represent a controller of the comouter svstero 550 Where the disclosure s imolemen ed using software, the software may be stored in a computer program product and ioaded into computer system 550 using a removable storage drive, an interface, like communications interface 560. The implementation in hardware or in software may be performed using a digital storage medium, for example cloud storage, a floppy disk, a DVD, a Blue-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention may be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier. In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet. A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein. A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are preferably performed by any hardware apparatus. The above described embodiments are merely illustrative for the principles of the present invention. It is understood that modifications and variations of the arrangements and the details described herein are apparent to others skilled in the art. It is the intent, therefore, to be limited only by the scope of the impending patent claims and not by the specific details presented by way of description and explanation of the embodiments herein.

Claims

Data stream (510) having encoded thereinto a picture (512), the data stream (510) comprising fire.t
Figure imgf000045_0001
the picture (512); and second data indicating whether the first data indicates the portion (500) so that a section (504) of the full view sphere (516) sampled by the picture's samples (519) completely resides within the portion (500), or whether the first data indicates the portion (500) so that the portion completely resides within the section (504); or the portion (500) of the full view sphere (516) which is captured by the picture (512), wherein the first data indicates the portion (500) so that the section (504) of the full view sphere (516) sampled by the picture's samples (519) completely resides within the portion (500), and the second data indicates the portion (500) so that the portion (500) completely resides within the section (504).
Data stream according to claim 1 , wherein the indication of the portion indicates the portion so that the portion is confined by two lines of constant pitch, and two lines of constant yaw.
Data stream according to claim 1 or 2, wherein the first data comprises a first syntax element indicating a pitch angle for a first line of constant pitch, a second syntax element indicating a pitch angle for a second line of constant pitch, a third syntax element indicating a yaw angle for a third line of constant yaw, a fourth syntax element indicating a yaw angle for a fourth line of constant yaw, wherein, if the first data indicates the portion so that the section of the full view sphere sampled by the picture's samples completely resides within the portion, the first to fourth lines confine all samples of the picture, and , if the first data indicates the portion so that the portion completely resides within the section, the first to fourth lines completely extend within the section.
Data stream according to ciaim 3, wherein the second data indicates the portion of the full view sphere which is captured by the picture, wherein the first data indicates the portion so that the section of the full view sphere sampled by the picture's samples completely resides within the portion, and the second data indicates the portion so that the portion completely resides within the section, and the second data comprises a fifth syntax element indicating a pitch angle for a fifth line of constant pitch, a sixth syntax element indicating a pitch angle for a sixth line of constant pitch, wherein the first to fourth lines confine all samples of the picture, and the fifth and sixth lines along with seventh and eighth lines being of constant yaw and pitch, respectively, equal to yaw and pitch angles indicated by the third and fourth syntax element completely extend within the section.
Data stream according to claim 4, wherein the second data further comprises a seventh syntax element indicating the yaw angle for the seventh line of constant yaw, and an eight syntax element indicating the yaw angle for the eight line of constant yaw,
Data stream according to claim 1 or 2, wherein the second data indicates the portion of the full view sphere which is captured by the picture, wherein the first data indicates the portion so that the section of the full view sphere sampled by the picture's samples completely resides within the portion, and the second data indicates the portion so that the portion completely resides within the section, wherein the first data comprises a first syntax element indicating a pitch angle for a first line of constant pitch, a second syntax element indicating a pitch angle for a second line of constant pitch, wherein the second data comprises a fifth syntax element indicating a pitch angle for a fifth line of constant pitch, a sixth syntax element indicating a pitch angle for a sixth line of constant pitch, a seventh syntax element indicating a yaw angle for a seventh line of constant yaw, and an eight syntax element indicating a yaw angle for an eight line of constant yaw, wherein the first and second lines along with third and fourth lines being of constant yaw and pitch, respectively, equal to yaw and pitch angles indicated by the seventh and eighth syntax elements confine all samples of the picture, and the fifth to eighth lines completely extend within the section.
Data stream according to any of claims 4 to 6, wherein the fifth and sixth syntax elements and the first and second syntax elements are coded differentially to each other.
Data stream according to any of claims 1 or 7, wherein the data stream additionally comprises third data indicating one or more rectangular regions out of a projected plane onto which the full view sphere is projected according to a predetermined projection scheme, which the one or more rectangular regions the picture is composed of.
Data stream according to claim 8, wherein the data stream additionally comprises fourth data indicating the predetermined projection scheme. 10. Data stream according to claim 9, wherein the fourth data comprises an identifier indexing one of a plurality of spherical projections. 1 1. Data stream according to claim 9 or 10, wherein the fourth data comprises a rotation of full view sohere relative to B o!obal coordinate -svstem.
12. Data stream according to claim 1 1 , wherein the indication of the portion relates to the full view sphere in a situation rotated or in a situation not-rotated relative to the global coordinate system according to the fourth data.
13. Data stream according to any of claims 1 to 12, wherein the second data indicates whether the first data indicates the portion so that a section of the full view sphere sampled by the picture's samples completely resides within the portion, or whether the first data indicates the portion so that the portion completely resides within the section, and wherein a scope of the second data is larger than a scope of the first data.
14. Data stream according to claim 13, wherein the scope of the first data is a picture wise scope.
Data stream according to any of claims 1 to 14, wherein the second data has a further option alternatively indicating that the first data indicates the portion so that a section of the full view sphere sampled by the picture's samples is exactly circumferenced by the portion.
Data stream according to any of claims 1 to 15, wherein the data stream has encoded thereinto a content of the picture along with additional picture content, and has encoded thereinto the picture in a subportion of the data stream in a motion constrained manner along with extraction information which indicates how to derive a data stream specifically having encoded thereinto the picture from the subportion of the data stream.
Data stream according to claim 16, wherein the data stream has encoded thereinto a further picture in a further subportion of the data stream in a motion constrained manner along with further extraction information which indicates how to derive a further data stream specifically having encoded thereinto the further picture from the further subportion of the data stream, wherein the data stream comprises further first data indicating a further portion of the full view sphere which is captured by the further picture, wherein the second data indicates whether the first data indicates the portion so that a section of the full view sphere sampled by the picture's samples completely resides within the oortion, or whether the first data indicates the portion so that the portion completely resides within the section, and wherein the second data is also valid for the further first data. 18. Apparatus for forming a data stream having encoded thereinto a picture, the apparatus configured to insert into the data stream first data indicating a portion of a full view sphere which is captured by the picture; and insert into the data stream second data indicating whether the first data indicates the portion so that a section of the full view sphere sampled by the picture's samples completely resides within the portion, or whether the first data indicates the portion so that the portion completely resides within the section.; or the portion of the full view sphere which is captured by the picture, wherein the first data indicates the portion so that the section of the full view sphere sampled by the picture's samples completely resides within the portion, and the second data indicates the portion so that the portion completely resides within the section.
Network entity for processing a data stream having encoded thereinto a picture, the network entity being configured to derive from first data in the data stream an indication of a portion of a full view sphere which is captured by the picture; and derive from second data in the data stream whether the first data indicates the portion so that a section of the full view sphere sampled by the picture's samples completely resides within the portion, or whether the first data indicates the portion so that the portion completely resides within the section; or a further indication of the portion of the full view sphere which is captured bv the picture, wherein the first data indicates the portion so that the section of the full view sphere sampled by the picture's samples completely resides within the portion, and the second data indicates the portion so that the portion completely resides within the section; process the data stream depending on a match between the portion and a wanted portion of the full view sphere.
Network entity according to claim 19, wherein the processing the data stream depending on the comparison comprises deciding on performing motion constrained tile set extraction one the data stream so as to obtain a data stream specifically having encoded thereinto the picture from a subportion of the data stream depending the match; forming an MPD including one or more representations offering the data stream or MCTS extracted versions thereof, depending on the match.
21 Data stream (510) having encoded thereinto a video (513), the data stream comprising first data indicating a projection of pictures (512) of the video (513) onto a full view sphere (516'); and second data indicating whether the projection persists until an end of the video, or whether the projection is allowed to change during the video.
Data stream according to claim 21 , wherein the first data is updated intermittently in the data stream between consecutive pictures and remains constant at each update if the second data indicates that the projection persists until an end of the video, and is free to vary at each update if the second data indicates that the projection is allowed to change during the video.
Data stream according to claim 22, wherein the first data is updated in the data stream on a per picture basis.
24. Data stream according to any of claims 21 to 23, wherein the second data indicates whether the projection persists until an end of the video, or whether the projection persists until a next update of the first data.
25. Data stream according to any of claims 21 to 23, wherein the second data indicates whether the projection persists until an end of the video, or whether the projection is validly indicated merely for the current picture.
26. Data stream according to claim 24, wherein the second data has a further option alternatively indicating that the projection persists until a next update of the first data.
27. Data stream according to any of claims 21 to 26, wherein the first data indicates one or more of a predetermined projection scheme mapping (522) between a projected plane (524) one or more rectangular regions (526) of which are contained in each of the pictures of the video, and the full view sphere (516), a composition (528) of the pictures (512) of the video out of one or more rectangular regions (526) of the projected plane (524); a rotation (530) of the full view sphere (516, 516').
28. Data stream according to any of claims 21 to 27, further comprising coverage information data indicating a portion (500) of the full view sphere (516') which is captured by the projection of the picture onto the full view sphere.
29. Data stream according to claim 28, wherein the first data indicates the portion so that a section of the full view sphere sampled by the picture's samples completely resides within the portion, or so that the first data indicates the portion so that the portion completely resides within the section.
30. Data stream according to claim 28 or 29, wherein second data further indicates whether the portion persists until an end of the video, or whether the portion is allowed to change during the video. 31. Network entity for processing a data stream having encoded thereinto a video, the network entity configured to derive from first data of the data stream an indication of a projection of pictures of the video onto a full view sphere; and derive from second data of the data stream an indication whether the projection persists until an end of the video, or whether the projection is allowed to change during the video; and process the data stream depending on whether the projection persists until of the video, or whether the projection is allowed to change during the video.
32. Network entity according to claim 31 , wherein the processing the data stream depending on whether the projection persists until an end of the video, or whether the projection is allowed to change during the video comprises deciding on transcoding, modifying or forwarding the data stream, or not transcoding, modifying or forwarding the data stream, depending whether the projection persists until an end of the video, or whether the projection is allowed to change during the video.
33. Apparatus for forming a data stream having encoded thereinto a video, the apparatus configured to insert into the data stream first data indicating a projection of pictures of the video onto a full view sphere; and insert into the data stream second data indicating whether the projection persists until an end of the video, or whether the projection is allowed to change during the video.
Data stream (510) having encoded thereinto a picture (512), the data stream comprising first data indicating a composition (528) of the picture (512) out of a plurality of regions (526) of a projected plane (524) which is projected (522) onto a full view
Figure imgf000054_0001
second data indicating for one or more of the plurality of regions (526) that same are flanked in the picture (512) by a guard band region (530), wherein the guard band region (530) shows a flanking portion of the full view sphere (516) flanking the one or more of the plurality of regions (526) in the full sphere view and being also shown in a corresponding subportion (532) of one or more further regions (526) of the projected plane (524) which the picture (512) is composed of; and third data indicating for the guard band region as to which of the guard band region
(530) and the corresponding subportion (532) is to be used preferably in rendering an output picture (534) with respect to the flanking portion.
Data stream according to claim 34, wherein the first data indicates qualities for the regions at which the regions are represented by the picture and the third data indicates a quality for the guard band region such that the qualities for the regions at which the regions are represented by the picture and the quality for the guard band region are defined on a common ordinal scale.
Data stream according to claim 35, wherein the third data additionally indicates whether or not the quality for the guard band region is to be used, by comparison of the quality for the guard band region with the qualities for the regions, as the indication of as to which of the predetermined guard band region (530) and the corresponding subportion (532) is to be used preferably in rendering the output picture (534) with respect to the flanking portion.
37. Data stream according to any one of claims 34 to 36, wherein the first data indicates for the regions at which quality the regions are represented by the picture. 38. Data stream according to claim 37, wherein the quality indicated by the first data for the regions pertains spatial resolution. Data stream according to any of claims 34 to 38, wherein the guard band region is represented by the picture at a reduced quality, reduced compared to the one or more regions flanked by the guard band region. n.qfa cfroam gnnnrriinn to any of claims 34 to 39 wherein the third data indicates one of a group of preference options including two or more of the guard band region is not to be used in rendering the output picture with respect to the flanking portion; the guard band region may optionally be used in rendering the output picture with respect to the flanking portion; the corresponding subportion is not to be used in rendering the output picture with respect to the flanking portion; the guard band region and the corresponding subportion should be used in rendering the output picture with respect to the flanking portion.
Data stream according to any of claims 34 to 40, wherein the third data indicates a weight at which the guard band region and the corresponding subportion should be blended in rendering the output picture with respect to the flanking portion.
Data stream according to any of claims 34 to 41 , wherein the third data comprises an indicator for the guard band region assuming one of a group of possible states, the group comprising one or more states which when assumed by the indicator, indicate that a quality at which the one or more region flanked by the guard band region is represented by the picture is greater than a quality at which the guard band region is represented by the picture or that the guard band region may merely be used for rendering up to a predetermined maximum distance from the one or more regions flanked thereby; and/or one or more states which when assumed by the indicator, indicate an unspecified quality relationship between a quality at which the one or more region flanked by the guard band region is represented by the picture and a quality at which the guard band region is represented by the picture, and/or one or more states which when assumed bv the indicator, indicate an equality between a quality at which the one or more region flanked by the guard band region is represented by the picture and a quality at which the guard band region is represented by the picture; and/or one or more states which when assumed by the indicator, indicate a gradual transition of a quality at which the guard band region is represented by the picture from the one or more region flanked by the guard band region to the one or more further regions; and further comprising one or more states which when assumed by the indicator, indicate that quality at which the guard band region is represented by the picture is higher than a quality at which the one or more region flanked by the guard band region is represented by the picture.
43. Data stream according to any of claims 34 to 42, wherein the data stream is in file format and comprises an extractor track which copies the plurality of regions from other tracks of the data stream. 44. Data stream according to claim 43, wherein the copying results in a composition of the picture in units of tiles of the data stream, each tile comprising one or more regions along with a guard band region flanking same.
45. Data stream according to any of claims 34 to 44, wherein the third data performs the indication guard band region individually.
46. Collection of data streams according to any one of claims 34 to 45, wherein the data streams differ in qualities of tiles corresponding in other terms of full view sphere coverage and in the third data with respect to a guard band region corresponding in other terms of full view sphere coverage. 47. Apparatus for processing a data stream having encoded thereinto a picture, the renderer configured to derive from the data stream a composition of the picture out of a plurality of regions of a projected plane which is projected onto a full view sphere according to a predetermined spherical projection scheme, derive from the data stream an indication of one or more of the plurality of regions which are flanked in the picture by a guard band region, wherein the guard band region shows a flanking portion of the projected plane flanking the one or more of the plurality of regions in the projected plane and being also shown in a corresponding subportion of one or more further regions of the projected plane which the picture is composed of; and derive preference data from the data stream for the predetermined guard band and forward the picture, at least partially, to a renderer for rendition of an output picture with informing the renderer depending on the preference data on which of the predetermined guard band and the corresponding subportion is to be used preferably in rendering the output picture with respect to the flanking portion.
48. Apparatus for forming a data stream having encoded thereinto a picture, the apparatus configured to insert first data into the data stream indicating a composition of the picture out of a plurality of regions of a projected plane which is projected onto a full view sphere according to a predetermined spherical projection scheme, insert second data into the data stream indicating for one or more of the plurality of regions that same are flanked in the picture by a guard band region, wherein the guard band region shows a flanking portion of the projected plane flanking the one or more of the plurality of regions in the projected plane and being also shown in a corresponding subportion of one or more further regions of the projected plane which the picture is composed of; and insert third data into the data stream indicating for the predetermined guard band as to which of the predetermined guard band and the corresponding subportion is to be used preferably in rendering an output picture with respect to the flanking portion.
49. Apparatus according to claim 48, the apparatus configured to encode the guard band region at reduced SNR compared to the one or more regions flanked by the guard band region.
Method for forming a data stream having encoded thereinto a picture, the method comprising: inserting into the data stream first data indicating a portion of a full view sphere which is captured by the picture; and
whether the first data indicates the portion so that a section of the full view sphere sampled by the picture's samples completely resides within the portion, or whether the first data indicates the portion so that the portion completely resides within the section.; or the portion of the full view sphere which is captured by the picture, wherein the first data indicates the portion so that the section of the full view sphere sampled by the picture's samples completely resides within the portion, and the second data indicates the portion so that the portion completely resides within the section. 51. Method for processing a data stream having encoded thereinto a picture, the method comprising: deriving from first data in the data stream an indication of a portion of a full view sphere which is captured by the picture; deriving from second data in the data stream whether the first data indicates the portion so that a section of the full view sphere sampled by the picture's samples completely resides within the portion, or whether the first data indicates the portion so that the portion completely resides within the section; or a further indication of the portion of the full view sphere which is captured by the picture, wherein the first data indicates the portion so that the section of the full view sphere sampled by the picture's samples completely resides within the portion, and the second data indicates the portion so that the portion completely resides within the section; and processing the data stream depending on a match between the portion and a wanted portion of the full view sphere.
Method for processing a data stream having encoded thereinto a video, the method comprising: deriving from first data of the data stream an indication of a projection of pictures of the video onto a full view sphere; deriving from second data of the data stream an indication whether the projection persists until an end of the video, or whether the projection is allowed to change during the video; and processing the data stream depending on whether the projection persists until an end of the video, or whether the projection is allowed to change during the video. 53. Method for forming a data stream having encoded thereinto a video, the method comprising: inserting into the data stream first data indicating a projection of pictures of the video onto a full view sphere; and inserting into the data stream second data indicating whether the projection persists until an end of the video, or whether the projection is allowed to change during the video.
Method for processing a data stream having encoded thereinto a picture, the method comprising: deriving from the data stream a composition of the picture out of a plurality of regions of a projected plane which is projected onto a full view sphere according to a predetermined spherical projection scheme, deriving from the data stream an indication of one or more of the plurality of regions which are flanked in the picture by a guard band region, wherein the guard band region shows a flanking portion of the projected plane flanking the one or more of the plurality of regions in the projected plane and being also shown in a corresponding subportion of one or more further regions of the projected plane which the picture is composed of; and deriving preference data from the data stream for the predetermined guard band and forwarding the picture, at least partially, to a renderer for rendition of an output picture with informing the renderer depending on the preference data on which of the predetermined guard band and the corresponding subportion is to be used preferably in rendering the output picture with respect to the flanking portion.
55. Method for forming a data stream having encoded thereinto a picture, the method comprising: inserting first data into the data stream indicating a composition of the picture out of a plurality of regions of a projected plane which is projected onto a full view sphere according to a predetermined spherical projection scheme, inserting second data into the data stream indicating for one or more of the plurality of regions that same are flanked in the picture by a guard band region, wherein the guard band region shows a flanking portion of the projected plane flanking the one or more of the plurality of regions in the projected plane and being also shown in a corresponding subportion of one or more further regions of the projected plane which the picture is composed of; and inserting third data into the data stream indicating for the predetermined guard band as to which of the predetermined guard band and the corresponding subportion is to be used preferably in rendering an output picture with respect to the flanking portion.
56. A non-transitory computer program product comprising a computer readable medium storing instructions which, when executed on a computer, perform the method of any one of claims 50 to 55.
PCT/EP2018/072910 2017-08-24 2018-08-24 Characteristics signaling for omnidirectional content WO2019038433A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP18756252.5A EP3673659A1 (en) 2017-08-24 2018-08-24 Characteristics signaling for omnidirectional content

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP17187740 2017-08-24
EP17187740.0 2017-08-24

Publications (1)

Publication Number Publication Date
WO2019038433A1 true WO2019038433A1 (en) 2019-02-28

Family

ID=59799209

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2018/072910 WO2019038433A1 (en) 2017-08-24 2018-08-24 Characteristics signaling for omnidirectional content

Country Status (2)

Country Link
EP (1) EP3673659A1 (en)
WO (1) WO2019038433A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021074005A1 (en) * 2019-10-14 2021-04-22 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Immersive viewport dependent multiparty video communication

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170104927A1 (en) * 2015-10-07 2017-04-13 Little Star Media, Inc. Systems, methods and software programs for 360 degree video distribution platforms
WO2017093611A1 (en) * 2015-12-02 2017-06-08 Nokia Technologies Oy A method for video encoding/decoding and an apparatus and a computer program product for implementing the method
WO2017127816A1 (en) * 2016-01-22 2017-07-27 Ziyu Wen Omnidirectional video encoding and streaming

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170104927A1 (en) * 2015-10-07 2017-04-13 Little Star Media, Inc. Systems, methods and software programs for 360 degree video distribution platforms
WO2017093611A1 (en) * 2015-12-02 2017-06-08 Nokia Technologies Oy A method for video encoding/decoding and an apparatus and a computer program product for implementing the method
WO2017127816A1 (en) * 2016-01-22 2017-07-27 Ziyu Wen Omnidirectional video encoding and streaming

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021074005A1 (en) * 2019-10-14 2021-04-22 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Immersive viewport dependent multiparty video communication

Also Published As

Publication number Publication date
EP3673659A1 (en) 2020-07-01

Similar Documents

Publication Publication Date Title
JP6640373B2 (en) Geometric shape and frame packing structure of truncated pyramid for representing virtual reality video content
US11405643B2 (en) Sequential encoding and decoding of volumetric video
US11778171B2 (en) Apparatus, a method and a computer program for video coding and decoding
US11523135B2 (en) Apparatus, a method and a computer program for volumetric video
US10841532B2 (en) Rectilinear viewport extraction from a region of a wide field of view using messaging in video transmission
US20220239949A1 (en) An apparatus, a method and a computer program for video encoding and decoding
CN110168600B (en) Adjusting field of view of truncated square pyramid projection of 360 degree video
US10523980B2 (en) Method, apparatus and stream of formatting an immersive video for legacy and immersive rendering devices
US11651523B2 (en) Apparatus, a method and a computer program for volumetric video
WO2019162567A1 (en) Encoding and decoding of volumetric video
WO2019115867A1 (en) An apparatus, a method and a computer program for volumetric video
KR20190126424A (en) How to send area based 360 degree video, How to receive area based 360 degree video, Area based 360 degree video transmitting device, Area based 360 degree video receiving device
WO2019038433A1 (en) Characteristics signaling for omnidirectional content
CN115567756A (en) View-angle-based VR video system and processing method
JP7416820B2 (en) Null tile coding in video coding
EP4207764A1 (en) An apparatus, a method and a computer program for volumetric video

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18756252

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018756252

Country of ref document: EP

Effective date: 20200324