EP3821608A1 - Verfahren und vorrichtung zur speicherung und signalisierung von komprimierten punktwolken - Google Patents

Verfahren und vorrichtung zur speicherung und signalisierung von komprimierten punktwolken

Info

Publication number
EP3821608A1
EP3821608A1 EP19834318.8A EP19834318A EP3821608A1 EP 3821608 A1 EP3821608 A1 EP 3821608A1 EP 19834318 A EP19834318 A EP 19834318A EP 3821608 A1 EP3821608 A1 EP 3821608A1
Authority
EP
European Patent Office
Prior art keywords
bitstream
geometry
texture
track
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP19834318.8A
Other languages
English (en)
French (fr)
Other versions
EP3821608A4 (de
Inventor
Emre Aksu
Igor Curcio
Miska Hannuksela
Vinod Kumar Malamal Vadakital
Lauri ILOLA
Sebastian Schwarz
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nokia Technologies Oy
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Publication of EP3821608A1 publication Critical patent/EP3821608A1/de
Publication of EP3821608A4 publication Critical patent/EP3821608A4/de
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Definitions

  • An example embodiment relates generally to volumetric video encoding and decoding.
  • Volumetric video data represents a three-dimensional scene or object and can be used as input for augmented reality (AR), virtual reality (VR) and mixed reality (MR) applications.
  • AR augmented reality
  • VR virtual reality
  • MR mixed reality
  • Such data describes geometry (shape, size, position in three dimensional (3D) space) and respective attributes (e.g. color, opacity, reflectance, etc.), plus temporal changes of the geometry and attributes at given time instances (like frames in two dimensional (2D) video).
  • Volumetric video is either generated from 3D models, e.g., computer-generated imagery (CGI), or captured from real-world scenes using a variety of capture solutions, e.g., multiple cameras, a laser scan, a combination of video and dedicated depth sensors, and more.
  • CGI computer-generated imagery
  • One example representation format for such volumetric data is point clouds.
  • Temporal information about the scene can be included in the form of individual capture instances, e.g., “frames” in two dimensional (2D) video
  • Point clouds are well suited for applications such as capturing real world 3D scenes where the topology is not necessarily a 2D manifold.
  • standard point clouds suffer from poor temporal compression performance when a reconstructed 3D scene contains tens or even hundreds of millions of points. Identifying correspondences for motion- compensation in 3D-space is an ill-defined problem because geometry and other respective attributes may change. For example, temporal successive frames do not necessarily have the same number of point clouds.
  • compression of dynamic 3D scenes is inefficient.
  • 2D-video based approaches for compressing volumetric data, e.g., multiview and depth have much better compression efficiency, but rarely cover the full scene.
  • These 2D based approaches provide only limited six degree of freedom (6DOF) capabilities. Therefore, an approach that has good compression efficiency and high degree of freedom is desirable.
  • 6DOF six degree of freedom
  • a compressed point cloud includes an encoded video bitstream corresponding to the texture of the 3D object, an encoded video bitstream corresponding to the geometry of the 3D object, and an encoded metadata bitstream corresponding to all other auxiliary information which is necessary during the composition of a 3D object.
  • an encoded video bitstream corresponding to the texture of the 3D object
  • an encoded video bitstream corresponding to the geometry of the 3D object
  • an encoded metadata bitstream corresponding to all other auxiliary information which is necessary during the composition of a 3D object.
  • a method, apparatus and computer program product are provided in accordance with an example embodiment to signal and store compressed point clouds in video encoding.
  • the method, apparatus and computer program product may be utilized in conjunction with a variety of video formats.
  • a method in one example embodiment, includes accessing a point cloud compression coded bitstream.
  • the point cloud compression coded bitstream comprises a texture information bitstream, a geometry information bitstream, and an auxiliary metadata bitstream.
  • the method further includes causing storage of the point cloud compression coded bitstream.
  • causing storage of the point cloud compression coded bitstream includes accessing the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; storing the texture information bitstream in a texture video media track; storing the geometry information bitstream in a geometry video media track; storing the auxiliary metadata information bitstream in a timed metadata track; and storing the texture video media track, the geometry video media track, and the timed metadata track in one group.
  • causing storage of the point cloud compression coded bitstream includes accessing the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; packing the texture information bitstream and the geometry information bitstream into one video media track; storing the auxiliary metadata information in a timed metadata track; and storing the video media track and the timed metadata track in one group.
  • causing storage of the point cloud compression coded bitstream includes accessing the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; extracting one or more texture information frames, one or more geometry information frames, and one or more auxiliary metadata frames from the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; and storing the one or more texture information frames, the one or more geometry information frames, and the one or more auxiliary metadata frames as one or more image items.
  • causing storage of the point cloud compression coded bitstream includes storing the auxiliary metadata bitstream as one or more supplemental enhancement information messages.
  • causing storage of the point cloud compression coded bitstream includes storing the auxiliary metadata bitstream as one or more network abstraction layer units.
  • storing the texture video media track, the geometry video media track, and the timed metadata track in one group includes storing an indication of the texture video media track, the geometry video media track, and the timed metadata track in an entity to group box.
  • packing the texture information bitstream and the geometry information bitstream into one video media track includes temporally interleaving one or more texture frames of the texture information bitstream and one or more geometry frames of the geometry information stream to generate one or more video frames.
  • packing the texture information bitstream and the geometry information bitstream into one video media track includes spatially packing one or more texture frames of the texture information bitstream and one or more geometry frames of the geometry information stream into one or more video frames.
  • the method further includes referencing the timed metadata track to the video media track.
  • an apparatus in another example embodiment, includes at least one processor and at least one memory including computer program code for one or more programs with the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to access a point cloud compression coded bitstream and cause storage of the point cloud compression coded bitstream.
  • the point cloud compression coded bitstream comprises a texture information bitstream, a geometry information bitstream, and an auxiliary metadata bitstream.
  • causing storage of the point cloud compression coded bitstream includes accessing the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; storing the texture information bitstream in a texture video media track; storing the geometry information bitstream in a geometry video media track; storing the auxiliary metadata information bitstream in a timed metadata track; and storing the texture video media track, the geometry video media track, and the timed metadata track in one group.
  • causing storage of the point cloud compression coded bitstream includes accessing the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; packing the texture information bitstream and the geometry information bitstream into one video media track; storing the auxiliary metadata information in a timed metadata track; and storing the video media track and the timed metadata track in one group.
  • causing storage of the point cloud compression coded bitstream includes accessing the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; extracting one or more texture information frames, one or more geometry information frames, and one or more auxiliary metadata frames from the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; and storing the one or more texture information frames, the one or more geometry information frames, and the one or more auxiliary metadata frames as one or more image items.
  • causing storage of the point cloud compression coded bitstream includes storing the auxiliary metadata bitstream as one or more supplemental enhancement information messages.
  • causing storage of the point cloud compression coded bitstream includes storing the auxiliary metadata bitstream as one or more network abstraction layer units.
  • storing the texture video media track, the geometry video media track, and the timed metadata track in one group includes storing an indication of the texture video media track, the geometry video media track, and the timed metadata track in an entity to group box.
  • packing the texture information bitstream and the geometry information bitstream into one video media track includes temporally interleaving one or more texture frames of the texture information bitstream and one or more geometry frames of the geometry information stream to generate one or more video frames.
  • packing the texture information bitstream and the geometry information bitstream into one video media track includes spatially packing one or more texture frames of the texture information bitstream and one or more geometry frames of the geometry information stream into one or more video frames.
  • the computer program code is also configured to, with the at least one processor, cause the apparatus at least to reference the timed metadata track to the video media track.
  • an apparatus in another example embodiment, includes means for accessing a point cloud compression coded bitstream and means for causing storage of the point cloud compression coded bitstream.
  • the point cloud compression coded bitstream comprises a texture information bitstream, a geometry information bitstream, and an auxiliary metadata bitstream.
  • the means for causing storage of the point cloud compression coded bitstream includes means for accessing the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; means for storing the texture information bitstream in a texture video media track; means for storing the geometry information bitstream in a geometry video media track; means for storing the auxiliary metadata information bitstream in a timed metadata track; and means for storing the texture video media track, the geometry video media track, and the timed metadata track in one group.
  • the means for causing storage of the point cloud compression coded bitstream includes means for accessing the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; means for packing the texture information bitstream and the geometry information bitstream into one video media track; means for storing the auxiliary metadata information in a timed metadata track; and means for storing the video media track and the timed metadata track in one group.
  • compression coded bitstream includes means for accessing the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; means for extracting one or more texture information frames, one or more geometry information frames, and one or more auxiliary metadata frames from the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; and means for storing the one or more texture information frames, the one or more geometry information frames, and the one or more auxiliary metadata frames as one or more image items.
  • the means for causing storage of the point cloud compression coded bitstream includes means for storing the auxiliary metadata bitstream as one or more supplemental enhancement information messages.
  • the means for causing storage of the point cloud compression coded bitstream includes means for storing the auxiliary metadata bitstream as one or more network abstraction layer units.
  • the means for storing the texture video media track, the geometry video media track, and the timed metadata track in one group includes means for storing an indication of the texture video media track, the geometry video media track, and the timed metadata track in an entity to group box.
  • the means for packing the texture information bitstream and the geometry information bitstream into one video media track includes means for temporally interleaving one or more texture frames of the texture information bitstream and one or more geometry frames of the geometry information stream to generate one or more video frames.
  • the means for packing the texture information bitstream and the geometry information bitstream into one video media track includes means for spatially packing one or more texture frames of the texture information bitstream and one or more geometry frames of the geometry information stream into one or more video frames.
  • the apparatus further includes means for referencing the timed metadata track to the video media track.
  • a computer program product includes at least one non-transitory computer-readable storage medium having computer executable program code instructions stored therein with the computer executable program code instructions comprising program code instructions configured, upon execution, to access a point cloud compression coded bitstream and cause storage of the point cloud compression coded bitstream.
  • the point cloud compression coded bitstream comprises a texture information bitstream, a geometry information bitstream, and an auxiliary metadata bitstream.
  • causing storage of the point cloud compression coded bitstream includes accessing the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; storing the texture information bitstream in a texture video media track; storing the geometry information bitstream in a geometry video media track; storing the auxiliary metadata information bitstream in a timed metadata track; and storing the texture video media track, the geometry video media track, and the timed metadata track in one group.
  • causing storage of the point cloud compression coded bitstream includes accessing the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; packing the texture information bitstream and the geometry information bitstream into one video media track; storing the auxiliary metadata information in a timed metadata track; and storing the video media track and the timed metadata track in one group.
  • causing storage of the point cloud compression coded bitstream includes accessing the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; extracting one or more texture information frames, one or more geometry information frames, and one or more auxiliary metadata frames from the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; and storing the one or more texture information frames, the one or more geometry information frames, and the one or more auxiliary metadata frames as one or more image items.
  • causing storage of the point cloud compression coded bitstream includes storing the auxiliary metadata bitstream as one or more supplemental enhancement information messages.
  • causing storage of the point cloud compression coded bitstream includes storing the auxiliary metadata bitstream as one or more network abstraction layer units.
  • storing the texture video media track, the geometry video media track, and the timed metadata track in one group includes storing an indication of the texture video media track, the geometry video media track, and the timed metadata track in an entity to group box.
  • packing the texture information bitstream and the geometry information bitstream into one video media track includes temporally interleaving one or more texture frames of the texture information bitstream and one or more geometry frames of the geometry information stream to generate one or more video frames.
  • packing the texture information bitstream and the geometry information bitstream into one video media track includes spatially packing one or more texture frames of the texture information bitstream and one or more geometry frames of the geometry information stream into one or more video frames.
  • the program code instructions are further configured, upon execution, to reference the timed metadata track to the video media track.
  • a method in another example embodiment, includes receiving a point cloud compression coded bitstream.
  • the point cloud compression coded bitstream comprises a texture information bitstream, a geometry information bitstream, and an auxiliary metadata bitstream.
  • the method further includes decoding the point cloud compression coded bitstream.
  • causing storage of the point cloud compression coded bitstream includes accessing the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; storing the texture information bitstream in a texture video media track; storing the geometry information bitstream in a geometry video media track; storing the auxiliary metadata information bitstream in a timed metadata track; and storing the texture video media track, the geometry video media track, and the timed metadata track in one group.
  • causing storage of the point cloud compression coded bitstream includes accessing the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; packing the texture information bitstream and the geometry information bitstream into one video media track; storing the auxiliary metadata information in a timed metadata track; and storing the video media track and the timed metadata track in one group.
  • causing storage of the point cloud compression coded bitstream includes accessing the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; extracting one or more texture information frames, one or more geometry information frames, and one or more auxiliary metadata frames from the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; and storing the one or more texture information frames, the one or more geometry information frames, and the one or more auxiliary metadata frames as one or more image items.
  • causing storage of the point cloud compression coded bitstream includes storing the auxiliary metadata bitstream as one or more supplemental enhancement information messages.
  • causing storage of the point cloud compression coded bitstream includes storing the auxiliary metadata bitstream as one or more network abstraction layer units.
  • storing the texture video media track, the geometry video media track, and the timed metadata track in one group includes storing an indication of the texture video media track, the geometry video media track, and the timed metadata track in an entity to group box.
  • packing the texture information bitstream and the geometry information bitstream into one video media track includes temporally interleaving one or more texture frames of the texture information bitstream and one or more geometry frames of the geometry information stream to generate one or more video frames.
  • packing the texture information bitstream and the geometry information bitstream into one video media track includes spatially packing one or more texture frames of the texture information bitstream and one or more geometry frames of the geometry information stream into one or more video frames.
  • the method further includes decoding a reference of the timed metadata track to the video media track.
  • an apparatus in another example embodiment, includes at least one processor and at least one memory including computer program code for one or more programs with the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to receive a point cloud compression coded bitstream and decode the point cloud compression coded bitstream.
  • the point cloud compression coded bitstream comprises a texture information bitstream, a geometry information bitstream, and an auxiliary metadata bitstream.
  • causing storage of the point cloud compression coded bitstream includes accessing the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; storing the texture information bitstream in a texture video media track; storing the geometry information bitstream in a geometry video media track; storing the auxiliary metadata information bitstream in a timed metadata track; and storing the texture video media track, the geometry video media track, and the timed metadata track in one group.
  • causing storage of the point cloud compression coded bitstream includes accessing the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; packing the texture information bitstream and the geometry information bitstream into one video media track; storing the auxiliary metadata information in a timed metadata track; and storing the video media track and the timed metadata track in one group.
  • causing storage of the point cloud compression coded bitstream includes accessing the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; extracting one or more texture information frames, one or more geometry information frames, and one or more auxiliary metadata frames from the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; and storing the one or more texture information frames, the one or more geometry information frames, and the one or more auxiliary metadata frames as one or more image items.
  • causing storage of the point cloud compression coded bitstream includes storing the auxiliary metadata bitstream as one or more supplemental enhancement information messages.
  • causing storage of the point cloud compression coded bitstream includes storing the auxiliary metadata bitstream as one or more network abstraction layer units.
  • storing the texture video media track, the geometry video media track, and the timed metadata track in one group includes storing an indication of the texture video media track, the geometry video media track, and the timed metadata track in an entity to group box.
  • packing the texture information bitstream and the geometry information bitstream into one video media track includes temporally interleaving one or more texture frames of the texture information bitstream and one or more geometry frames of the geometry information stream to generate one or more video frames.
  • packing the texture information bitstream and the geometry information bitstream into one video media track includes spatially packing one or more texture frames of the texture information bitstream and one or more geometry frames of the geometry information stream into one or more video frames.
  • the computer program code is also configured to, with the at least one processor, cause the apparatus at least to decode a reference of the timed metadata track to the video media track.
  • an apparatus in another example embodiment, includes means for receiving a point cloud compression coded bitstream and means for decoding the point cloud compression coded bitstream.
  • the point cloud compression coded bitstream comprises a texture information bitstream, a geometry information bitstream, and an auxiliary metadata bitstream.
  • the means for causing storage of the point cloud compression coded bitstream includes means for accessing the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; means for storing the texture information bitstream in a texture video media track; means for storing the geometry information bitstream in a geometry video media track; means for storing the auxiliary metadata information bitstream in a timed metadata track; and means for storing the texture video media track, the geometry video media track, and the timed metadata track in one group.
  • the means for causing storage of the point cloud compression coded bitstream includes means for accessing the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; means for packing the texture information bitstream and the geometry information bitstream into one video media track; means for storing the auxiliary metadata information in a timed metadata track; and means for storing the video media track and the timed metadata track in one group.
  • compression coded bitstream includes means for accessing the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; means for extracting one or more texture information frames, one or more geometry information frames, and one or more auxiliary metadata frames from the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; and means for storing the one or more texture information frames, the one or more geometry information frames, and the one or more auxiliary metadata frames as one or more image items.
  • the means for causing storage of the point cloud compression coded bitstream includes means for storing the auxiliary metadata bitstream as one or more supplemental enhancement information messages.
  • the means for causing storage of the point cloud compression coded bitstream includes means for storing the auxiliary metadata bitstream as one or more network abstraction layer units.
  • the means for storing the texture video media track, the geometry video media track, and the timed metadata track in one group includes means for storing an indication of the texture video media track, the geometry video media track, and the timed metadata track in an entity to group box.
  • the means for packing the texture information bitstream and the geometry information bitstream into one video media track includes means for temporally interleaving one or more texture frames of the texture information bitstream and one or more geometry frames of the geometry information stream to generate one or more video frames.
  • the means for packing the texture information bitstream and the geometry information bitstream into one video media track includes means for spatially packing one or more texture frames of the texture information bitstream and one or more geometry frames of the geometry information stream into one or more video frames.
  • the apparatus further includes means for decoding a reference of the timed metadata track to the video media track.
  • a computer program product includes at least one non-transitory computer-readable storage medium having computer executable program code instructions stored therein with the computer executable program code instructions comprising program code instructions configured, upon execution, to receive a point cloud compression coded bitstream and decode the point cloud compression coded bitstream.
  • the point cloud compression coded bitstream comprises a texture information bitstream, a geometry information bitstream, and an auxiliary metadata bitstream.
  • causing storage of the point cloud compression coded bitstream includes accessing the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; storing the texture information bitstream in a texture video media track; storing the geometry information bitstream in a geometry video media track; storing the auxiliary metadata information bitstream in a timed metadata track; and storing the texture video media track, the geometry video media track, and the timed metadata track in one group.
  • causing storage of the point cloud compression coded bitstream includes accessing the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; packing the texture information bitstream and the geometry information bitstream into one video media track; storing the auxiliary metadata information in a timed metadata track; and storing the video media track and the timed metadata track in one group.
  • causing storage of the point cloud compression coded bitstream includes accessing the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; extracting one or more texture information frames, one or more geometry information frames, and one or more auxiliary metadata frames from the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream; and storing the one or more texture information frames, the one or more geometry information frames, and the one or more auxiliary metadata frames as one or more image items.
  • causing storage of the point cloud compression coded bitstream includes storing the auxiliary metadata bitstream as one or more supplemental enhancement information messages.
  • causing storage of the point cloud compression coded bitstream includes storing the auxiliary metadata bitstream as one or more network abstraction layer units.
  • storing the texture video media track, the geometry video media track, and the timed metadata track in one group includes storing an indication of the texture video media track, the geometry video media track, and the timed metadata track in an entity to group box.
  • packing the texture information bitstream and the geometry information bitstream into one video media track includes temporally interleaving one or more texture frames of the texture information bitstream and one or more geometry frames of the geometry information stream to generate one or more video frames.
  • packing the texture information bitstream and the geometry information bitstream into one video media track includes spatially packing one or more texture frames of the texture information bitstream and one or more geometry frames of the geometry information stream into one or more video frames.
  • the comprising program code instructions is further configured, upon execution, to decode a reference of the timed metadata track to the video media track.
  • Figure 1 is a block diagram of an apparatus that may be specifically configured in accordance with an example embodiment of the present disclosure
  • Figure 2A and 2B are graphical representations illustrating a set of operations performed, such as by the apparatus of Figure 1 , in accordance with an example embodiment of the present disclosure
  • Figure 3 illustrates an example of terms in the present disclosure
  • Figure 4 is a flowchart illustrating a set of operations performed, such as by the apparatus of Figure 1, in accordance with an example embodiment of the present disclosure
  • Figure 5 is a flowchart illustrating a set of operations performed, such as by the apparatus of Figure 1, in accordance with an example embodiment of the present disclosure
  • Figure 6 is a flowchart illustrating a set of operations performed, such as by the apparatus of Figure 1, in accordance with an example embodiment of the present disclosure
  • Figure 7 is a flowchart illustrating a set of operations performed, such as by the apparatus of Figure 1, in accordance with an example embodiment of the present disclosure.
  • Figure 8 is a flowchart illustrating a set of operations performed, such as by the apparatus of Figure 1, in accordance with an example embodiment of the present disclosure.
  • circuitry refers to (a) hardware-only circuit implementations (e.g., implementations in analog circuitry and/or digital circuitry); (b) combinations of circuits and computer program product(s) comprising software and/or firmware instructions stored on one or more computer readable memories that work together to cause an apparatus to perform one or more functions described herein; and (c) circuits, such as, for example, a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation even if the software or firmware is not physically present.
  • This definition of‘circuitry’ applies to all uses of this term herein, including in any claims.
  • the term‘circuitry’ also includes an implementation comprising one or more processors and/or portion(s) thereof and accompanying software and/or firmware.
  • the term‘circuitry’ as used herein also includes, for example, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, other network device, and/or other computing device.
  • a method, apparatus and computer program product are provided in accordance with an example embodiment to signal and store compressed point clouds in video encoding.
  • the method, apparatus and computer program product may be utilized in conjunction with a variety of video formats including High Efficiency Video Coding standard (HEVC or H.265/HEVC), Advanced Video Coding standard (AVC or H.264/AVC), the upcoming Versatile Video Coding standard (VVC or H.266/VVC), and/or with a variety of video and multimedia file formats including International Standards Organization (ISO) base media file format (ISO/IEC 14496-12, which may be abbreviated as ISOBMFF), Moving Picture Experts Group (MPEG)-4 file format (ISO/IEC 14496-14, also known as the MP4 format), file formats for NAL (Network Abstraction Layer) unit structured video (ISO/IEC 14496-15) and 3 rd Generation Partnership Project (3 GPP file format) (3 GPP Technical Specification 26.244, also known as the 3GP format).
  • ISOBMFF International Standards Organization
  • ISOBMFF is the base for derivation of all the above mentioned file formats.
  • An example embodiment is described in conjunction with the HEVC, however, the present disclosure is not limited to HEVC, but rather the description is given for one possible basis on top of which an example embodiment of the present disclosure may be partly or fully realized.
  • Some aspects of the disclosure relate to container file formats, such as
  • ISO International Standards Organization
  • ISOBMFF Moving Picture Experts Group
  • MPEG Moving Picture Experts Group
  • NAL NAL
  • NAL unit An elementary unit for the output of an H.264/advanced video coding (AVC) or HEVC encoder and the input of an H.264/AVC or HEVC decoder, respectively, is a NAL unit.
  • AVC H.264/advanced video coding
  • HEVC HEVC encoder
  • HEVC decoder HEVC decoder
  • NAL units may be encapsulated into packets or similar structures.
  • NAL units of an access unit form a sample, the size of which is provided within file format metadata.
  • a bytestream format has been specified in H.264/ AVC and HEVC for
  • the bytestream format separates NAL units from each other by attaching a start code in front of each NAL unit.
  • encoders run a byte-oriented start code emulation prevention algorithm, which adds an emulation prevention byte to the NAL unit payload if a start code would have occurred otherwise.
  • start code emulation prevention may always be performed regardless of whether the bytestream format is in use or not.
  • a NAL unit may be defined as a syntax structure containing an indication of the type of data to follow and bytes containing that data in the form of a raw byte sequence payload (RBSP) interspersed as necessary with emulation prevention bytes.
  • RBSP may be defined as a syntax structure containing an integer number of bytes that is encapsulated in a NAL unit.
  • An RBSP is either empty or has the form of a string of data bits containing syntax elements followed by an RBSP stop bit and followed by zero or more subsequent bits equal to 0.
  • NAL units can be categorized into Video Coding Layer (VCL) NAL units and non-VCL NAL units.
  • VCL NAL units are typically coded slice NAL units.
  • a non-VCL NAL unit may be, for example, one of the following types: a sequence parameter set, a picture parameter set, a supplemental enhancement information (SEI) NAL unit, an access unit delimiter, an end of sequence NAL unit, an end of bitstream NAL unit, or a filler data NAL unit.
  • SEI Supplemental Enhancement Information
  • Parameter sets may be needed for the reconstruction of decoded pictures, whereas many of the other non-VCL NAL units are not necessary for the
  • An SEI NAL unit may contain one or more SEI messages, which are not required for the decoding of output pictures but may assist in related processes, such as picture output timing, rendering, error detection, error concealment, and resource reservation.
  • SEI messages are specified in H.264/AVC and HEVC, and the user data SEI messages enable organizations and companies to specify SEI messages for their own use.
  • H.264/AVC and HEVC contain the syntax and semantics for the specified SEI messages but no process for handling the messages in the recipient is defined.
  • encoders are required to follow the H.264/AVC standard or the HEVC standard when they create SEI messages, and decoders conforming to the H.264/AVC standard or the HEVC standard, respectively, are not required to process SEI messages for output order conformance.
  • One of the reasons to include the syntax and semantics of SEI messages in H.264/AVC and HEVC is to allow different system specifications to interpret the supplemental information identically and hence interoperate. It is intended that system specifications can require the use of particular SEI messages both in the encoding end and in the decoding end, and additionally the process for handling particular SEI messages in the recipient can be specified.
  • Video coding specifications may contain a set of constraints for associating data units (e.g., NAL units in H.264/AVC or HEVC) into access units. These constraints may be used to conclude access unit boundaries from a sequence of NAL units. For example, the following is specified in the HEVC standard:
  • An access unit consists of one coded picture with nuh layer id equal to 0, zero or more video coding layer (VCL) NAL units with nuh layer id greater than 0 and zero or more non- VCL NAL units.
  • VCL video coding layer
  • firstBlPicNalUnit be the first VCL NAL unit of a coded picture with nuh layer id equal to 0.
  • SPS sequence parameter set
  • RSV_NVCL44 (which represent reserved NAL units) with nuh_layer_id equal to 0 (when present),
  • nal_unit_type in the range of UNSPEC48... UNSPEC55 (which represent unspecified NAL units) with nuh layer id equal to 0 (when present).
  • the first NAL unit preceding firstBlPicNalUnit and succeeding the last VCL NAL unit preceding firstBlPicNalUnit, if any, can only be one of the above-listed NAL units.
  • firstBlPicNalUnit starts a new access unit.
  • Frame packing may be defined to comprise arranging more than one input picture, which may be referred to as (input) constituent frames, into an output picture.
  • frame packing is not limited to any particular type of constituent frames or the constituent frames need not have a particular relation with each other.
  • frame packing is used for arranging constituent frames of a stereoscopic video clip into a single picture sequence.
  • the arranging may include placing the input pictures in spatially non-overlapping areas within the output picture. For example, in a side-by-side arrangement, two input pictures are placed within an output picture horizontally adjacently to each other.
  • the arranging may also include partitioning of one or more input pictures into two or more constituent frame partitions and placing the constituent frame partitions in spatially non-overlapping areas within the output picture.
  • the output picture or a sequence of frame-packed output pictures may be encoded into a bitstream e.g., by a video encoder.
  • the bitstream may be decoded e.g., by a video decoder.
  • the decoder or a post-processing operation after decoding may extract the decoded constituent frames from the decoded picture(s), e.g., for displaying.
  • a basic building block in the ISO base media file format is called a box.
  • Each box has a header and a payload.
  • the box header indicates the type of the box and the size of the box in terms of bytes.
  • a box may enclose other boxes, and the ISO file format specifies which box types are allowed within a box of a certain type. Furthermore, the presence of some boxes may be mandatory in each file, while the presence of other boxes may be optional. Additionally, for some box types, it may be allowable to have more than one box present in a file. Thus, the ISO base media file format may be considered to specify a hierarchical structure of boxes.
  • a file includes media data and metadata that are encapsulated into boxes. Each box is identified by a four character code (4CC) and starts with a header which informs about the type and size of the box.
  • 4CC four character code
  • FileTypeBox contains information of the brands labeling the file.
  • the ftyp box includes one major brand indication and a list of compatible brands.
  • the major brand identifies the most suitable file format specification to be used for parsing the file.
  • the compatible brands indicate to which file format specifications and/or conformance points the file conforms. It is possible that a file is conformant to multiple specifications. All brands indicating compatibility to these specifications should be listed, so that a reader only understanding a subset of the compatible brands can get an indication that the file can be parsed.
  • Compatible brands also give a permission for a file parser of a particular file format specification to process a file containing the same particular file format brand in the ftyp box.
  • a file player may check if the ftyp box of a file comprises brands it supports, and may parse and play the file only if any file format specification supported by the file player is listed among the compatible brands.
  • the media data may be provided in one or more instances of MediaDataBox (‘mdat‘) and the MovieBox (‘moov’) may be used to enclose the metadata for timed media.
  • The‘moov’ box may include one or more tracks, and each track may reside in one corresponding TrackBox (‘trak’).
  • Each track is associated with a handler, identified by a four-character code, specifying the track type.
  • Video, audio, and image sequence tracks can be collectively called media tracks, and they contain an elementary media stream.
  • Other track types comprise hint tracks and timed metadata tracks.
  • Tracks comprise samples, such as audio or video frames.
  • a media sample may correspond to a coded picture or an access unit.
  • a media track refers to samples (which may also be referred to as media samples) formatted according to a media
  • a hint track refers to hint samples, containing cookbook instructions for constructing packets for transmission over an indicated communication protocol.
  • a timed metadata track may refer to samples describing referred media and/or hint samples.
  • the 'trak' box includes in its hierarchy of boxes the SampleTableBox (also known as the sample table or the sample table box).
  • the SampleTableBox contains the
  • SampleDescriptionBox which gives detailed information about the coding type used, and any initialization information needed for that coding.
  • the SampleDescriptionBox contains an entry-count and as many sample entries as the entry-count indicates.
  • the format of sample entries is track-type specific but derived from generic classes (e.g., VisualSampleEntry, AudioSampleEntry).
  • the type of sample entry form used for derivation the track-type specific sample entry format is determined by the media handler of the track.
  • a TrackTypeBox may be contained in a TrackBox.
  • the payload of TrackTypeBox has the same syntax as the payload of FileTypeBox.
  • TrackTypeBox shall be such that it would apply as the content of FileTypeBox, if all other tracks of the file were removed and only the track containing this box remained in the file.
  • Movie fragments may be used, for example, when recording content to ISO files, for example, in order to avoid losing data if a recording application crashes, runs out of memory space, or some other incident occurs. Without movie fragments, data loss may occur because the file format may require that all metadata, for example, a movie box, be written in one contiguous area of the file. Furthermore, when recording a file, there may not be sufficient amount of memory space to buffer a movie box for the size of the storage available, and re-computing the contents of a movie box when the movie is closed may be too slow. Moreover, movie fragments may enable simultaneous recording and playback of a file using a regular ISO file parser. Furthermore, a smaller duration of initial buffering may be required for progressive downloading, e.g., simultaneous reception and playback of a file when movie fragments are used and the initial movie box is smaller compared to a file with the same media content but structured without movie fragments.
  • the movie fragment feature may enable splitting the metadata that otherwise might reside in the movie box into multiple pieces. Each piece may correspond to a certain period of time of a track.
  • the movie fragment feature may enable interleaving file metadata and media data. Consequently, the size of the movie box may be limited and the use cases mentioned above be realized.
  • the media samples for the movie fragments may reside in an mdat box.
  • a moof box may be provided.
  • the moof box may include the information for a certain duration of playback time that would previously have been in the moov box.
  • the moov box may still represent a valid movie on its own, but in addition, it may include an mvex box indicating that movie fragments will follow in the same file.
  • the movie fragments may extend the presentation that is associated to the moov box in time.
  • the movie fragment there may be a set of track fragments, including anywhere from zero to a plurality of track fragments per track.
  • the track fragments may in turn include anywhere from zero to a plurality of track runs, each of which document a contiguous run of samples for that track (and hence are similar to chunks).
  • many fields are optional and can be defaulted.
  • the metadata that may be included in the moof box may be limited to a subset of the metadata that may be included in a moov box and may be coded differently in some cases. Details regarding the boxes that can be included in a moof box may be found in the ISOBMFF specification.
  • a self-contained movie fragment may be defined to consist of a moof box and an mdat box that are consecutive in the file order with the mdat box containing the samples of the movie fragment (for which the moof box provides the metadata) and does not contain samples of any other movie fragment, that is, any other moof box.
  • a media segment may comprise one or more self-contained movie fragments.
  • a media segment may be used for delivery, such as streaming, e.g., in MPEG-Dynamic Adaptive Streaming over Hypertext Transfer Protocol (HTTP) (MPEG-DASH).
  • HTTP MPEG-Dynamic Adaptive Streaming over Hypertext Transfer Protocol
  • the track reference mechanism can be used to associate tracks with each other.
  • the TrackReferenceBox includes box(es), each of which provides a reference from the containing track to a set of other tracks. These references are labelled through the box type (e.g., the four-character code of the box) of the contained box(es).
  • the ISO Base Media File Format contains three mechanisms for timed metadata that can be associated with particular samples: sample groups, timed metadata tracks, and sample auxiliary information. A derived specification may provide similar functionality with one or more of these three mechanisms.
  • TrackGroupBox which is contained in TrackBox, enables indication of groups of tracks where each group shares a particular characteristic or the tracks within a group have a particular relationship.
  • the box contains zero or more boxes, and the particular characteristic or the relationship is indicated by the box type of the contained boxes.
  • the contained boxes include an identifier, which can be used to conclude the tracks belonging to the same track group.
  • the tracks that contain the same type of a contained box within the TrackGroupBox and have the same identifier value within these contained boxes belong to the same track group.
  • the syntax of the contained boxes may be defined through TrackGroupTypeBox is follows:
  • the ISO Base Media File Format contains three mechanisms for timed metadata that can be associated with particular samples: sample groups, timed metadata tracks, and sample auxiliary information.
  • sample groups timed metadata tracks
  • sample auxiliary information sample auxiliary information
  • a sample grouping in the ISO base media file format and its derivatives may be defined as an assignment of each sample in a track to be a member of one sample group, based on a grouping criterion.
  • a sample group in a sample grouping is not limited to being contiguous samples and may contain non-adjacent samples. As there may be more than one sample grouping for the samples in a track, each sample grouping may have a type field to indicate the type of grouping.
  • Sample groupings may be represented by two linked data structures: (1) a SampleToGroupBox (sbgp box) represents the assignment of samples to sample groups; and (2) a SampleGroupDescriptionBox (sgpd box) contains a sample group entry for each sample group describing the properties of the group. There may be multiple instances of the SampleToGroupBox (sbgp box) represents the assignment of samples to sample groups; and (2) a SampleGroupDescriptionBox (sgpd box) contains a sample group entry for each sample group describing the properties of the group. There may be multiple instances of the
  • SampleToGroupBox and SampleGroupDescriptionBox based on different grouping criteria. These may be distinguished by a type field used to indicate the type of grouping.
  • SampleToGroupBox may comprise a grouping_type_parameter field that can be used e.g., to indicate a sub-type of the grouping.
  • Per-sample auxiliary information may be stored anywhere in the same file as the sample data itself. For self-contained media files, this is typically in a MediaDataBox or a box from a derived specification. It is stored either (a) in multiple chunks, with the number of samples per chunk, as well as the number of chunks, matching the chunking of the primary sample data or (b) in a single chunk for all the samples in a movie sample table (or a movie fragment).
  • the Sample Auxiliary Information for all samples contained within a single chunk (or track run) is stored contiguously (similarly to sample data).
  • Sample Auxiliary Information when present, is stored in the same file as the samples to which it relates as they share the same data reference ('dre ) structure. However, this data may be located anywhere within this file, using auxiliary information offsets ('saio') to indicate the location of the data.
  • the restricted video ('resv') sample entry and mechanism has been specified for the ISOBMFF in order to handle situations where the file author requires certain actions on the player or Tenderer after decoding of a visual track. Players not recognizing or not capable of processing the required actions are stopped from decoding or rendering the restricted video tracks.
  • the 'resv' sample entry mechanism applies to any type of video codec.
  • RestrictedSchemelnfoBox is present in the sample entry of 'resv' tracks and comprises an OriginalFormatBox, SchemeTypeBox, and SchemelnformationBox.
  • the original sample entry type that would have been unless the’resv’ sample entry type were used is contained in the OriginalFormatBox.
  • the SchemeTypeBox provides an indication which type of processing is required in the player to process the video.
  • the SchemelnformationBox comprises further information of the required processing.
  • the scheme type may impose requirements on the contents of the SchemelnformationBox.
  • the stereo video scheme indicated in the SchemeTypeBox indicates that when decoded frames either contain a representation of two spatially packed constituent frames that form a stereo pair (frame packing) or only one view of a stereo pair (left and right views in different tracks).
  • StereoVideoBox may be contained in a SchemelnformationBox to provide further
  • SAP Type 1 corresponds to what is known in some coding schemes as a“Closed group of pictures (GOP) random access point” (in which all pictures, in decoding order, can be correctly decoded, resulting in a continuous time sequence of correctly decoded pictures with no gaps) and in addition the first picture in decoding order is also the first picture in presentation order.
  • SAP Type 2 corresponds to what is known in some coding schemes as a“Closed GOP random access point” (in which all pictures, in decoding order, can be correctly decoded, resulting in a continuous time sequence of correctly decoded pictures with no gaps), for which the first picture in decoding order may not be the first picture in presentation order.
  • SAP Type 3 corresponds to what is known in some coding schemes as an“Open GOP random access point”, in which there may be some pictures in decoding order that cannot be correctly decoded and have presentation times less than intra-coded picture associated with the SAP.
  • a stream access point (SAP) sample group as specified in ISOBMFF identifies samples as being of the indicated SAP type.
  • a sync sample may be defined as a sample corresponding to SAP type 1 or 2.
  • a sync sample can be regarded as a media sample that starts a new independent sequence of samples; if decoding starts at the sync sample, it and succeeding samples in decoding order can all be correctly decoded, and the resulting set of decoded samples forms the correct presentation of the media starting at the decoded sample that has the earliest composition time.
  • Sync samples can be indicated with the SyncSampleBox (for those samples whose metadata is present in a TrackBox) or within sample flags indicated or inferred for track fragment runs.
  • Files conforming to the ISOBMFF may contain any non-timed objects, referred to as items, meta items, or metadata items, in a meta box (fourCC:‘meta’), which may also be called MetaBox. While the name of the meta box refers to metadata, items can generally contain metadata or media data.
  • the meta box may reside at the top level of the file, within a movie box (fourCC:‘moov’), and within a track box (fourCC:‘trak’), but at most one meta box may occur at each of the file level, movie level, or track level.
  • the meta box may be required to contain a‘hdlr’ box indicating the structure or format of the‘meta’ box contents.
  • the meta box may list and characterize any number of items that can be referred and each one of them can be associated with a file name and can be uniquely identified with the file by item identifier (item id) which is an integer value.
  • the metadata items may be for example stored in the 'idaf box of the meta box or in an 'mdaf box or reside in a separate file. If the metadata is located external to the file then its location may be declared by the DatalnformationBox (fourCC:‘dinf ).
  • the metadata may be encapsulated into either the XMLBox (fourCC:‘xml‘) or the BinaryXMLBox (fourcc:‘bxml’).
  • An item may be stored as a contiguous byte range, or it may be stored in several extents, each being a contiguous byte range. In other words, items may be stored fragmented into extents, e.g., to enable interleaving.
  • An extent is a contiguous subset of the bytes of the resource, and the resource can be formed by concatenating the extents.
  • High Efficiency Image File Format is a standard developed by the Moving Picture Experts Group (MPEG) for storage of images and image sequences.
  • MPEG Moving Picture Experts Group
  • the standard facilitates file encapsulation of data coded according to the High Efficiency Video Coding (HEVC) standard.
  • HEIF includes features building on top of the used ISO Base Media File Format (ISOBMFF).
  • the ISOBMFF structures and features are used to a large extent in the design of HEIF.
  • the basic design for HEIF comprises that still images are stored as items and image sequences are stored as tracks.
  • the following boxes may be contained within the root-level 'meta' box and may be used as described hereinafter.
  • the handler value of the Handler box of the 'meta' box is 'picf .
  • the resource (whether within the same file, or in an external file identified by a uniform resource identifier) containing the coded media data is resolved through the Data Information ('dinf ) box, whereas the Item Focation ('iloc') box stores the position and sizes of every item within the referenced file.
  • the Item Reference ('iref ) box documents relationships between items using typed referencing.
  • the 'meta' box is also flexible to include other boxes that may be necessary to describe items.
  • Any number of image items can be included in the same file. Given a collection of images stored by using the 'meta' box approach, certain relationships may be qualified between images. Examples of such relationships include indicating a cover image for a collection, providing thumbnail images for some or all of the images in the collection, and associating some or all of the images in a collection with an auxiliary image such as an alpha plane. A cover image among the collection of images is indicated using the 'pitm' box. A thumbnail image or an auxiliary image is linked to the primary image item using an item reference of type 'thmb' or 'auxf, respectively.
  • the ItemPropertiesBox enables the association of any item with an ordered set of item properties.
  • Item properties are small data records.
  • the ItemPropertiesBox consists of two parts: an ItemPropertyContainerBox that contains an implicitly indexed list of item properties, and one or more ItemPropertyAssociationBox(es) that associate items with item properties.
  • An item property is formatted as a box.
  • a descriptive item property may be defined as an item property that describes rather than transforms the associated item.
  • a transformative item property may be defined as an item property that transforms the reconstructed representation of the image item content.
  • An entity may be defined as a collective term of a track or an item.
  • An entity group is a grouping of items, which may also group tracks.
  • An entity group can be used instead of item references, when the grouped entities do not have clear dependency or a directional reference relation.
  • the entities in an entity group share a particular characteristic or have a particular relationship, as indicated by the grouping type.
  • An entity group is a grouping of items, which may also group tracks.
  • the entities in an entity group share a particular characteristic or have a particular relationship, as indicated by the grouping type.
  • Entity groups are indicated in GroupsListBox.
  • Entity groups specified in a GroupsListBox of a file-level MetaBox refer to tracks or file-level items.
  • Entity groups specified in a GroupsListBox of a movie-level MetaBox refer to movie-level items.
  • Entity groups specified in a GroupsListBox of a track-level MetaBox refer to track-level items of that track.
  • a GroupsListBox contains EntityToGroupBoxes, each specifying one entity group.
  • the syntax of EntityToGroupBox may be specified as follows:
  • entity id is resolved to an item, when an item with item ID equal to entity id is present in the hierarchy level (file, movie or track) that contains the GroupsListBox, or to a track, when a track with track ID equal to entity id is present and the GroupsListBox is contained in the file level.
  • Region-wise packing information may be encoded as metadata in or along the video bitstream, e.g., in a RegionWisePackingBox of OMAF and/or in a region-wise packing SEI message of HEVC or H.264/AVC.
  • the packing information may comprise a region- wise mapping from a pre-defined or indicated source format to the packed frame format, e.g., from a projected picture to a packed picture, as described earlier.
  • Region-wise packing may for example be rectangular region-wise packing.
  • An example rectangular region- wise packing metadata is described next in the context of associating regions of an omnidirectional projected picture (e.g., of equirectangular or cubemap projection) with regions of a packed picture (e.g., being subject to encoding or decoding): For each region, the metadata defines a rectangle in a projected picture, the respective rectangle in the packed picture, and an optional transformation of rotation by 90, 180, or 270 degrees and/or horizontal and/or vertical mirroring. Rectangles may for example be indicated by the locations of the top-left comer and the bottom-right comer.
  • the mapping may comprise resampling. As the sizes of the respective rectangles can differ in the projected and packed pictures, the mechanism infers region-wise resampling.
  • a Multipurpose Internet Mail Extension is an extension to an email protocol which makes it possible to transmit and receive different kinds of data files on the Internet, for example video and audio, images, software, etc.
  • An internet media type is an identifier used on the Internet to indicate the type of data that a file contains. Such internet media types may also be called content types.
  • MIME type/subtype combinations exist that can contain different media formats.
  • Content type information may be included by a transmitting entity in a MIME header at the beginning of a media transmission. A receiving entity thus may need to examine the details of such media content to determine if the specific elements can be rendered given an available set of codecs. Especially when the end system has limited resources, or the connection to the end system has limited bandwidth, it may be helpful to know from the content type alone if the content can be rendered.
  • receiving systems can determine if the codecs are supported by the end system, and if not, can take appropriate action (such as rejecting the content, sending notification of the situation, transcoding the content to a supported type, fetching and installing the required codecs, further inspection to determine if it will be sufficient to support a subset of the indicated codecs, etc.).
  • the codecs parameter specified in RFC 6381 is derived from the sample entry type(s) of the track(s) of the file described by the MIME type.
  • the profiles parameter specified in RFC 6381 can provide an overall indication, to the receiver, of the specifications with which the content complies. This is an indication of the compatibility of the container format and its contents to some specification. The receiver may be able to work out the extent to which it can handle and render the content by examining to see which of the declared profiles it supports, and what they mean.
  • the profiles MIME parameter contains a list of four- character codes that are indicated as compatible brands in the FileTypeBox of the file described by the MIME type.
  • the apparatus of an example embodiment may be provided by any of a wide variety of computing devices including, for example, a video encoder, a video decoder, a computer workstation, a server or the like, or by any of various mobile computing devices, such as a mobile terminal, e.g., a smartphone, a tablet computer, a video game player, or the like.
  • a mobile terminal e.g., a smartphone, a tablet computer, a video game player, or the like.
  • the apparatus 10 of an example embodiment includes, is associated with or is otherwise in communication with processing circuitry 12, a memory 14, a communication interface 16 and optionally, a user interface 18 as shown in Figure 1.
  • the processing circuitry 12 may be in communication with the memory device 14 via a bus for passing information among components of the apparatus 10.
  • the memory device may be non-transitory and may include, for example, one or more volatile and/or non volatile memories.
  • the memory device may be an electronic storage device (e.g., a computer readable storage medium) comprising gates configured to store data (e.g., bits) that may be retrievable by a machine (e.g., a computing device like the processing circuitry).
  • the memory device may be configured to store information, data, content, applications, instructions, or the like for enabling the apparatus to carry out various functions in accordance with an example embodiment of the present disclosure.
  • the memory device could be configured to buffer input data for processing by the processing circuitry. Additionally or alternatively, the memory device could be configured to store instructions for execution by the processing circuitry.
  • the apparatus 10 may, in some embodiments, be embodied in various computing devices as described above. However, in some embodiments, the apparatus may be embodied as a chip or chip set. In other words, the apparatus may comprise one or more physical packages (e.g., chips) including materials, components and/or wires on a structural assembly (e.g., a baseboard). The structural assembly may provide physical strength, conservation of size, and/or limitation of electrical interaction for component circuitry included thereon. The apparatus may therefore, in some cases, be configured to implement an embodiment of the present disclosure on a single chip or as a single“system on a chip.” As such, in some cases, a chip or chipset may constitute means for performing one or more operations for providing the functionalities described herein.
  • a chip or chipset may constitute means for performing one or more operations for providing the functionalities described herein.
  • the processing circuitry 12 may be embodied in a number of different ways.
  • the processing circuitry may be embodied as one or more of various hardware processing means such as a coprocessor, a microprocessor, a controller, a digital signal processor (DSP), a processing element with or without an accompanying DSP, or various other circuitry including integrated circuits such as, for example, an ASIC (application specific integrated circuit), an FPGA (field programmable gate array), a microcontroller unit (MCU), a hardware accelerator, a special-purpose computer chip, or the like.
  • the processing circuitry may include one or more processing cores configured to perform independently.
  • a multi-core processing circuitry may enable multiprocessing within a single physical package.
  • the processing circuitry may include one or more processors configured in tandem via the bus to enable independent execution of instructions, pipelining and/or multithreading.
  • the processing circuitry 12 may be configured to execute instructions stored in the memory device 14 or otherwise accessible to the processing circuitry. Alternatively or additionally, the processing circuitry may be configured to execute hard coded functionality. As such, whether configured by hardware or software methods, or by a combination thereof, the processing circuitry may represent an entity (e.g., physically embodied in circuitry) capable of performing operations according to an embodiment of the present disclosure while configured accordingly. Thus, for example, when the processing circuitry is embodied as an ASIC, FPGA or the like, the processing circuitry may be specifically configured hardware for conducting the operations described herein.
  • the processing circuitry when the processing circuitry is embodied as an executor of instructions, the instructions may specifically configure the processor to perform the algorithms and/or operations described herein when the instructions are executed.
  • the processing circuitry may be a processor of a specific device (e.g., an image or video processing system) configured to employ an embodiment of the present invention by further configuration of the processing circuitry by instructions for performing the algorithms and/or operations described herein.
  • the processing circuitry may include, among other things, a clock, an arithmetic logic unit (ALU) and logic gates configured to support operation of the processing circuitry.
  • ALU arithmetic logic unit
  • the communication interface 16 may be any means such as a device or circuitry embodied in either hardware or a combination of hardware and software that is configured to receive and/or transmit data, including video bitstreams.
  • the communication interface may include, for example, an antenna (or multiple antennas) and supporting hardware and/or software for enabling communications with a wireless communication network. Additionally or alternatively, the communication interface may include the circuitry for interacting with the antenna(s) to cause transmission of signals via the antenna(s) or to handle receipt of signals received via the antenna(s).
  • the communication interface may alternatively or also support wired communication.
  • the communication interface may include a communication modem and/or other hardware/software for supporting communication via cable, digital subscriber line (DSL), universal serial bus (USB) or other mechanisms.
  • the apparatus 10 may optionally include a user interface 18 that may, in turn, be in communication with the processing circuitry 12 to provide output to a user, such as by outputting an encoded video bitstream and, in some embodiments, to receive an indication of a user input.
  • the user interface may include a display and, in some embodiments, may also include a keyboard, a mouse, a joystick, a touch screen, touch areas, soft keys, a microphone, a speaker, or other input/output mechanisms.
  • the processing circuitry may comprise user interface circuitry configured to control at least some functions of one or more user interface elements such as a display and, in some embodiments, a speaker, ringer, microphone and/or the like.
  • the processing circuitry and/or user interface circuitry comprising the processing circuitry may be configured to control one or more functions of one or more user interface elements through computer program instructions (e.g., software and/or firmware) stored on a memory accessible to the processing circuitry (e.g., memory device 14, and/or the like).
  • the term file is sometimes used as a synonym of syntax structure or an instance of a syntax structure.
  • the term file may be used to mean a computer file, that is a resource forming a standalone unit in storage.
  • a syntax structure may be specified as described below.
  • a group of statements enclosed in curly brackets is a compound statement and is treated functionally as a single statement.
  • a "while" structure specifies a test of whether a condition is true, and if true, specifies evaluation of a statement (or compound statement) repeatedly until the condition is no longer true.
  • a "do ... while” structure specifies evaluation of a statement once, followed by a test of whether a condition is true, and if true, specifies repeated evaluation of the statement until the condition is no longer true.
  • else" structure specifies a test of whether a condition is true, and if the condition is true, specifies evaluation of a primary statement, otherwise, specifies evaluation of an alternative statement. The "else" part of the structure and the associated alternative statement is omitted if no alternative statement evaluation is needed.
  • a "for" structure specifies evaluation of an initial statement, followed by a test of a condition, and if the condition is true, specifies repeated evaluation of a primary statement followed by a subsequent statement until the condition is no longer true.
  • Volumetric video data represents a three-dimensional scene or object and can be used as input for AR, VR and MR applications. Such data describes geometry (shape, size and position in 3D-space) and respective attributes (e.g. color, opacity, reflectance, etc.), plus any possible temporal changes of the geometry and attributes at given time instances (like frames in 2D video).
  • Volumetric video is either generated from 3D models, i.e. CGI, or captured from real-world scenes using a variety of capture solutions, e.g., multi-camera, laser scan, a combination of video and dedicated depth sensors, and more. Also, a combination of CGI and real-world data is possible.
  • Typical representation formats for such volumetric data are triangle meshes, point clouds, or voxels.
  • Temporal information about the scene can be included in the form of individual capture instances, e.g.,“frames” in 2D video, or other mechanisms, e.g., position of an object as a function of time.
  • volumetric video represents a 3D scene (or object), such data can be viewed from any viewpoint. Therefore, volumetric video is an important format for any AR, VR, or MR applications, especially for providing 6DOF viewing capabilities.
  • 3D data acquisition devices has enabled reconstruction of highly detailed volumetric video representations of natural scenes.
  • Infrared, lasers, time-of- flight and structured light are all examples of devices that can be used to construct 3D video data.
  • Representation of the 3D data depends on how the 3D data is used.
  • Dense Voxel arrays have been used to represent volumetric medical data.
  • polygonal meshes are extensively used.
  • Point clouds on the other hand are well suited for applications such as capturing real world 3D scenes where the topology is not necessarily a 2D manifold.
  • Another way to represent 3D data is coding the 3D data as set of texture and depth map as is the case in the multi- view plus depth. Closely related to the techniques used in multi- view plus depth is the use of elevation maps, and multi-level surface maps.
  • the reconstructed 3D scene may contain tens or even hundreds of millions of points. If such representations need to be stored or
  • a 3D scene represented as meshes, points, and/or voxels, can be projected onto one, or more, geometries. These geometries are“unfolded” onto 2D planes (two planes per geometry: one for texture, one for depth), which are then encoded using standard 2D video compression technologies. Relevant projection geometry information is transmitted alongside the encoded video files to the decoder. The decoder decodes the video and performs the inverse projection to regenerate the 3D scene in any desired representation format (not necessarily the encoding format).
  • FIGS 2A and 2B provide an overview of the compression and decompression processes, respectively, of point clouds.
  • an apparatus such as apparatus 10 of Figure 1
  • the patch generation process aims at decomposing the point cloud into a minimum number of patches with smooth boundaries, while also minimizing the reconstruction error.
  • One example way of patch generation is specified in MPEG document Point Cloud Coding Test Model Category 2 version 0.0 (TMC2V0). The approach is described as follows.
  • Each point is associated with the plane that has the closest normal (e.g., maximizes the dot product of the point normal and the plane normal).
  • the initial clustering is then refined by iteratively updating the cluster index associated with each point based on its normal and the cluster indices of its nearest neighbours.
  • the final step consists of extracting patches by applying a connected component extraction procedure.
  • the apparatus includes means, such as the processing circuitry 12, for packing.
  • the packing process aims at mapping the extracted patches onto a 2D grid while trying to minimize the unused space and guaranteeing that every TxT (e.g., 16x16) block of the grid is associated with a unique patch.
  • T is a user-defined parameter that is encoded in a bitstream and sent to a decoder.
  • One example packing strategy is specified in TMC2vO.
  • the packing strategy specified in TMC2vO is a simple packing strategy that iteratively tries to insert patches into a WxH grid.
  • W and H are user defined parameters, which correspond to the resolution of the geometry/texture images that will be encoded.
  • the patch location is determined through an exhaustive search that is performed in raster scan order. The first location that can guarantee an overlapping- free insertion of the patch is selected and the grid cells covered by the patch are marked as used. If no empty space in the current resolution image can fit a patch then the height H of the grid is temporarily doubled and the search is conducted again. At the end of the process, H is clipped so as to fit the used grid cells.
  • the apparatus includes means, such as the processing circuitry 12, for image generation.
  • the image generation process utilizes the 3D to 2D mapping computed during the packing process to store the geometry and texture of the point cloud as images.
  • each patch is projected onto two images, referred to as layers. More precisely, let H(u,v) be the set of points of the current patch that get projected to the same pixel (u, v).
  • the first layer also called the near layer, stores the point of H(u,v) with the lowest depth DO.
  • the second layer captures the point of H(u,v) with the highest depth within the interval [DO, DO+D], where D is a user-defined parameter that describes the surface thickness.
  • the generated videos may have a geometry video of WxH YUV420-8bit and texture of WxH YUV420-8bit.
  • YUV is a color encoding system where Y stand for luminance component and UV stand for two chrominance components.
  • the apparatus includes means, such as the processing circuitry 12, for image padding.
  • the apparatus is configured to pad the geometry and texture images.
  • the padding process aims at filling the empty space between patches in order to generate a piecewise smooth image suited for video compression.
  • One example way of padding is provided in TMC2vO.
  • the padding strategy in TMC2vO is specified as follows:
  • TxT e.g., 16x16 pixels
  • the pixels of the block are filled by copying either the last row or column of the previous TxT block in raster order.
  • the apparatus further includes means, such as the processing circuitry 12, for video compression.
  • the apparatus is configured to compress the padded texture images and padded geometry images into geometry video and texture video.
  • the generated images/layers are stored as video frames and compressed using a video codec.
  • One example video codec used is the HE VC Test Model 16 (HM16) video codec according to the HM16 configurations provided as parameters.
  • the apparatus includes means, such as the processing circuitry 12, for auxiliary patch- info compression.
  • the following metadata may be encoded for every patch:
  • mapping information providing for each TxT block its associated patch index may be encoded as follows: • For each TxT block, let L be the ordered list of the indexes of the patches such that their 2D bounding box contains that block. The order in the list is the same as the order used to encode the 2D bounding boxes. L is called the list of candidate patches.
  • the apparatus may further include means, such as the processing circuitry 12, for compressing occupancy map.
  • the apparatus further includes means, such as the processing circuitry 12, for multiplexing the compressed geometry video, compressed texture video, compressed auxiliary patch information, and compressed occupancy map into a point cloud compression (PCC) coded bitstream.
  • PCC point cloud compression
  • the multiplexing representation is at bitstream level and some features need to be implemented, such as: 1) time-wise random access and calculation of exact byte location for such random access; 2) compact representation of metadata when it is repeated for group of frames; 3) file storage mechanism and delivery via mechanisms such as ISOBMFF and MPEG-DASH; and 4) Alternative representations of video and auxiliary data streams for delivery channel adaptation (quality, resolution, frame-rate, bitrate, etc.). Details regarding the multiplexing, storage and signaling mechanisms of the PCC coded bitstream are later described in conjunction with reference to Figures 4 to 7.
  • the apparatus includes means, such as the processing circuitry 12, for accessing the point cloud compression coded bitstream.
  • the point cloud compression coded bitstream includes a texture information bitstream, a geometry information bitstream, and an auxiliary metadata bitstream.
  • the apparatus includes means, such as the processing circuitry 12, for causing storage of the point cloud compression coded bitstream. Details regarding block 42 are later described in conjunction with Figures 5 to 7.
  • each sub- component of PCC encoded bitstream e.g., texture, geometry and auxiliary metadata
  • each sub- component of PCC encoded bitstream is stored as a separate track (2 video tracks and one timed metadata track).
  • These tracks are grouped together in order to indicate a logical relationship between them. This grouping may be done using the track grouping mechanism by defining a new group type or entity grouping mechanism of ISOBMFF at a movie or file level.
  • a track referencing mechanism may be utilized by the track reference definition in ISOBMFF in order to indicate the type of relationship between the tracks which comprise a PCC coded 3D representation.
  • Representation-level global metadata may be stored at the track level, movie level or file level using either relevant track containers such as sample entries, sample auxiliary information or metadata containers.
  • Each group of frames (GoF) as defined in the PCC specification working draft may be considered as a random access granularity structure.
  • each PCC metadata track’s first sample may be a sync sample indicating a random access location.
  • the apparatus includes means, such as the processing circuitry 12, for accessing the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream.
  • the apparatus includes means, such as the processing circuitry 12, for storing the texture information bitstream in a texture video media track.
  • the apparatus includes means, such as the processing circuitry 12, for storing the geometry information bitstream in a geometry video media track.
  • the apparatus includes means, such as the processing circuitry 12, for storing the auxiliary metadata information bitstream in a timed metadata track.
  • the apparatus includes means, such as the processing circuitry 12, the memory 14 or the like, for associating the texture video media track, the geometry video media track, and the timed metadata track as one group and for storing the texture video media track, the geometry video media track, and the timed metadata track in one group.
  • the timed metadata track may be referred as a PCC metadata track.
  • the term PCC track may refer to the texture video media track, the geometry video media track, or the PCC metadata track.
  • a brand may be defined to indicate that the file contains PCC encoded bitstreams.
  • a possible four-character code for the brand could be, but is not limited to‘pccf .
  • the brand may be included in a FileTypeBox and/or may be used in signalling, such as within the value for the optional profiles MIME parameter.
  • the brand may be present in a major, minor or compatible brands list.
  • the brand may be present in the track level, e.g., in a TrackTypeBox, if a track level brand storage mechanism is present.
  • the brand may be present in the group level, e.g., in the EntityToGroupBox that groups the texture, geometry, and PCC metadata tracks, if a group level brand storage mechanism is present.
  • a new media handler type for the PCC metadata track may be defined as“pcch” and stored in the handler type field of the HandlerBox (“hdlr”) in the MediaBox of the PCC metadata track.
  • the structure as defined in ISOBMFF could be as follows
  • handler type defines the media handler type for PCC metadata track.
  • This field may contain the 4-character code“pcch”.
  • syntax elements may be assigned to have a pre-defined value that the file author is required to obey. For example, in the syntax structure above, a file writer is required to set the value of the syntax element pre defined to 0. Similarly, variables or input parameters for functions, such as version in the syntax above, may have predefined values.
  • the value of the version parameter is required to be equal to 0 in the syntax above.
  • assignments to predefined values may not be necessary for other embodiments, and certain embodiments can be similarly realized with other predefined values or without the syntax elements, variables, or input parameters assigned to predefined values.
  • PCC global bitstream information such as but not limited to global scaling, translation and rotation, could be stored in a separate box/full box.
  • a box may be called a PCCInfoBox with a 4-character code“pcci”.
  • global scaling and global translation are defined as independent values matching each global axis x, y, and z.
  • global rotation is represented as a quaternion (w,x,y,z).
  • the benefit of using quaternions instead of Euler angles is that quaternions are more univocal as they do not depend on the order of Euler rotations. Also, quaternions are not vulnerable to “gimbal locking”.
  • transformation matrix can be generated to arbitrarily transform point cloud data in space.
  • the benefit of providing these values separately instead of a holistic transformation matrix is developer convenience and the possibility to ignore certain components of transformation.
  • the number of layers indicates the total number of layers.
  • a syntax element PCCIInfoBox may be present in the Handler Box of the PCC metadata track or the Media Header Box of the PCC metadata track. These boxes may be extended as a new box or extended with a new full box version or extended with a flag definition.
  • example implementations of this new data structure could be as follows:
  • class HandlerBox extends FullBox(‘hdlr’, version, 0) ⁇
  • the presence of PCCInfoBox is conditioned by the handler type being "pcch”.
  • the presence of PCCInfoBox is conditioned by the version field of the box header being equal to 1.
  • the presence of PCCInfoBox is conditioned by the flags field of the box header having a particular bit equal to 1.
  • "&" may be defined as a bit-wise "and” operation.
  • bit wise "and” operation operates on a two's complement representation of the integer value.
  • the shorter argument is extended by adding more significant bits equal to 0.
  • the PCCInfoBox contents may be stored in the PCC metadata sample entry.
  • a new sample entry type of“pcse” may be defined. An example could be as follows:
  • the PCC bitstream global information may be stored in the file/movie/track level MetaBox, utilizing the ItemlnfoBox.
  • the item which contains the PCC global bitstream information may utilize its own MIME type such as but not limited to‘pcci’ which is stored in the item type field of the ItemInfoEntry() box.
  • a new entity grouping may be defined in EntityToGroupBox() contained in a MetaBox.
  • a new Entity grouping may be defined such as‘pcgr’.
  • PCC metadata tracks, video and texture media tracks and related metadata items may be grouped together inside the defined‘pcgr’ grouping.
  • a new Entity Grouping may be defined with the following syntax: aligned(8) class EntityToGroupBox(grouping_type, version, flags)
  • grouping ype identifies the new entity grouping type.
  • a new grouping_type four- character code such as but not limited to‘pcgr’ may be defined for the entity grouping group id represents an identifier for the group in the form of a numerical value
  • num entities in group represents the number of entities in group, which is set to 4 in this example (texture track, geometry video track, PCC metadata track, and PCC bitstream global information stored as a metadata item) listed entity ids is a list of the entity ids for the entities which in this example is [1, 2, 3, 101].
  • the order of these entities may be predefined such as (texture, geometry, pec metadata, pec global information ⁇ .
  • the order may be any defined order.
  • the handler types of the tracks and item types may be used to identify the types of the grouped entities.
  • an EntitytoGroupBox box may be extended to store the PCCInfoBox.
  • the syntax may be as follows:
  • a track grouping may be defined. Such a track grouping information may be stored in each related media and timed metadata track or a subset of such tracks. Track Grouping may be defined using track grouping mechanism as follows:
  • class TrackGroupBox extends Box('trgr') ⁇
  • a new track group type such as but not limited to‘pctg’ may be defined for the PCC track grouping.
  • PCC global information may be embedded inside the TrackGroupTypeBox as follows:
  • a track referencing mechanism may be utilized to indicate the PCC track relationships.
  • the apparatus includes means, such as the processing circuitry 12, for associating the auxiliary timed metadata track with the track group defined before by using track referencing.
  • TrackReferenceBox with the following ISOBMFF-defined syntax may be utilized as follows: aligned(8) class TrackReferenceBox extends Box('tref ) ⁇
  • a PCC texture track may reference the PCC metadata track with a reference type of ‘pcct’.
  • a PCC geometry track may reference the PCC metadata track or a PCC texture track with a reference ype of‘pccg’.
  • a PCC metadata track may reference the PCC texture and geometry tracks with a reference type of‘pccm’. In such a case, both the texture and the geometry track ids could be listed in the TrackReferenceBox with type‘pccm’ of the PCC metadata track.
  • a texture or geometry track may reference multiple metadata tracks if such tracks are represented by the same PCC metadata track but differ in certain characteristics such as bitrate, resolution, video quality, etc.
  • Track alternatives may also be derived using the ISOBMFF-provided alternative track mechanisms. For example, the same altemate group value of TrackHeaderBox could be used for tracks that represent the same data with different characteristics.
  • a PCC metadata track may have two different track references pointing to the texture and geometry video tracks, such as a
  • TrackReferenceTypeBox with‘pcct’ reference type to point to the texture video track and a TrackReferenceTypeBox with‘pccg’ reference type to point to the geometry video track.
  • video and geometry tracks may be packed to a single video frame.
  • the packed video frames which are stored in a video track may have a TrackReferenceTypeBox with‘peep” reference type to point to the PCC metadata track or vice-versa (e.g., a PCC metadata track containing a TrackReferemceTypeBox with ‘peep’ reference type and pointing to the PCC packed video track.)
  • the texture and geometry tracks may be grouped using track grouping type of (but not limited to)“pctg” with a particular track group id.
  • the auxiliary metadata of PCC may be stored as a timed metadata track with a dedicated handler type such as (but not limited to)“pccm” and reference to the track group id by utilizing the “cdtg” track reference type. This may indicate the PCC auxiliary timed metadata describes the texture and geometry tracks collectively.
  • a PCC metadata track may contain PCC auxiliary information samples in the following methods.
  • the first sample of each PCC Group of Frames (GOF) may contain the GOF header information and the auxiliary metadata bitstream corresponding to the texture and geometry samples of the same composition time.
  • such samples may be marked as“sync” samples, e.g., by utilizing the sync sample box (a.k.a. SyncSampleBox). In such a case, random access could be possible at each GOF boundary.
  • All other PCC metadata samples may contain the corresponding PCC auxiliary metadata of a texture of geometry sample in their corresponding texture and geometry tracks.
  • each PCC metadata sample may have a corresponding sample auxiliary information in the sample table box which may store the GOF header bitstream.
  • This auxiliary sample information may be defined only for the sync samples of PCC metadata track or every sample of the PCC metadata track.
  • GOF Header information may be stored in the PCC metadata track sample entries. Each GOF may then refer to the sample entry index which contains the related GOF Header bitstream information.
  • PCCInfoBox() //may be present optionally
  • GOFHeaderBox may have the following syntax:
  • groupOfFramesSize indicates the number of frames in the current group of frames.
  • frameWidth indicates the geometry or texture picture width.
  • ffameHeight indicates the geometry/Texture picture height.
  • occupancyResolution indicates the occupancy map resolution.
  • radiusToSmoothing indicates the radius to detect neighbours for smoothing.
  • neighborCountSmoothing indicates the maximum number of neighbours used for smoothing.
  • radius2BoundaryDetection indicates the radius for boundary point detection.
  • thresholdSmoothing indicates the smoothing threshold.
  • a separate timed metadata track may be defined to store the GOF header samples.
  • a metadata track may have a separate handler type (e.g.,‘pcgh’) and a track reference to the PCC metadata track (e.g., of reference type ‘pcgr’). If such a track exists and track grouping is used to group texture, geometry and metadata tracks, then such a metadata track may also be listed in the track grouping.
  • Metadata samples that belong to a GOF may be grouped together in order to form sample groups and their related GOF header information may be stored in the SampleGroupDescriptionEntry in the SampleGroupDescriptionBox.
  • SampleGroupDescriptionEntry in the SampleGroupDescriptionBox.
  • a PCCMetadata sample may contain GOF Header information, GOF Auxiliary information as well as GOF occupancy map information.
  • the GOF Header information, GOF Auxiliary information as well as GOF occupancy map information is defined as their corresponding parts in the PCC Category 2 coding Working Draft.
  • the GOF Header information was provided above.
  • Example GOF auxiliary information includes the following:
  • FIG. 6 the operations performed, such as by the apparatus 10 of Figure 1, in order to cause storage of the point cloud compression coded bitstream in accordance with an example embodiment are depicted.
  • this embodiment to accomodate the situation where only one single video decoder instance is available at an instance of time, there is only one single video bitstream fed to the video decoder.
  • a spatial frame packing approach may be utilized to put texture and geometry video frames into a single video frame.
  • a frame packing arrangement SEI message or another SEI message with similar content as the frame packing arrangement SEI message may be utilized to signal such packing in a video bitstream level.
  • a restricted video scheme may be utilized to signal such packing at the file format level.
  • An alternative may be to perform a region-wise packing and create a“packed PCC frame”. Such an approach may enable different geometric transformations such as rotation, scaling, etc. to be applied on the texture and geometry frames.
  • a new type of“region- wise packing” restricted scheme type may be defined and signaled in the SchemelnformationBox in ISOBMFF.
  • Another alternative may be utilizing a temporal interleaving mechanism for texture and geometry frames and making sure that the prediction dependencies of the frames are only from the same type of frames (e.g., either texture or geometry).
  • a restriction may be relaxed.
  • Such an encoding configuration may be signaled in the
  • composition times for such interleaved frames would be the composition time of the second frame of the (texture, geometry ⁇ pair.
  • an indicator field may be defined to signal the type of the first video frame.
  • a group of video frames of the same type may be consecutive to each other.
  • the sample composition times may be signaled on the consecutive frames of the same type and then applied on the same number of frames adjacent to the group of frames in decoding order (e.g., sample timings of N texture frames are applied to the next N geometry frames, which are then followed by N texture frames and so on).
  • a new temporal packing type indicator may be defined to signal such an interleaving scheme.
  • the apparatus includes means, such as the processing circuitry 12, for accessing the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream.
  • the apparatus includes means, such as the processing circuitry 12, for packing the texture information bitstream and the geometry information bitstream into one video media track.
  • the apparatus includes means, such as the processing circuitry 12, for storing the auxiliary metadata information in a timed metadata track.
  • the apparatus includes means, such as the processing circuitry 12, the memory 14 or the like, for associating the video media track and the timed metadata track as one group and for storing the video media track and the time metadata track as one group.
  • texture and geometry video frames in the bitstream may be spatially packed to a single video frame.
  • a new restricted video scheme may be defined for PCC encoded packed texture and geometry video frames. Such a scheme may be identified by a scheme type 4-character code of‘ppvs’ inside the RestrictedSchemelnfoBox.
  • a new uniform resource identifier (URI) may be defined as the scheme uri and may be conditionally present depending on box flags.
  • the packed PCC video frame information may be then contained in the SchemelnformationBox of the related video track.
  • Such a box may be called a PCCPackedVideoBox with a 4-char code of‘ppvi’.
  • An example of such a box may be as follows:
  • class SchemeTypeBox extends FullBox('schm', 0, flags) ⁇
  • the following box may reside inside the Scheme Information Box (‘schi’) and contain PCC packed video specific information:
  • PCCSpatialPackingBox() may define how texture and geometry patches could be spatially packed in a video frame, an example is provided below:
  • class PCCSpatialPackingBox extends FullBox('rwpk', 0, 0) ⁇
  • the PCCSpatialPackingStruct() may contain the data structure needed to identify each packed region’s type, dimensions and spatial location. Such a box may have the following structure:
  • bit(2) reserved 0;
  • proj_texture_picture_width defines the width of the input video texture frame.
  • proj_texture_picture_height defines the height of the input video texture frame
  • proj gcomctry picturc width defines the width of the input geometry texture frame
  • proj gcomctry picturc hcight defines the height of the input geometry texture frame.
  • packed_picture_width defines the width of the output packed video frame.
  • packed_picture_height defines the height of the output packed video frame.
  • guard band flag indicates whether the reason has guard bands or not.
  • RectRegionPacking specifies the region- wise packing between packed region and the projected region.
  • a new projection type such as“orthonormal projection” may be added to the ProjectionFormatBox as defined in the MPEG OMAF specification, which resides inside the Projected Omnidirectional Video (‘podv’) scheme’s
  • a PCCPackedVideoBox may reside inside the ProjectedOmnivideoBox and its presence may be signaled by the projection type which will indicate an orthonormal projection.
  • texture and geometry video frames are temporally interleaved to form a single video bitstream.
  • a video prediction scheme may be defined so that texture frames are only dependent on the other texture frames and geometry frames are also only dependent on the other geometry frames.
  • the frame rate may be doubled.
  • composition time of the latter frame may be utilized to indicate the composition time of the temporally interleaved frames.
  • a new box called PCCVideoBox may be defined to indicate the temporal packing information.
  • Such a box may reside inside the
  • SchemelnformationBox when the schemeType may be defined as‘pcci’ in order to indicate temporally interleaved PCC texture and geometry frames.
  • schemeType may be defined as‘pcci’ in order to indicate temporally interleaved PCC texture and geometry frames.
  • packing indication type indicate the type of temporal or spatial packing.
  • Example packing types and values associated are provided below:
  • the track referencing mechanism between the auxiliary metadata track and the texture and geometry track described in conjunction with Figure 5 may apply.
  • a single track reference type which may be called‘pctg’ may be used from/to a PCC auxiliary metadata track to/from a packed/interleaved PCC texture and geometry video track.
  • FIG 7 the operations performed, such as by the apparatus 10 of Figure 1, in order to cause storage of the point cloud compression coded bitstream in accordance with an example embodiment are depicted.
  • This embodiment is applied to still scenes by only encoding a single frame (e.g. a single time instance) of texture, geometry and auxiliary metadata.
  • single video frames can be encoded as image items as defined in HEIF standard and be stored in the MetaBox of the file.
  • the apparatus includes means, such as the processing circuitry 12, for accessing the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream.
  • the apparatus includes means, such as the processing circuitry 12, for extracting one or more texture information frames, one or more geometry information frames, and one or more auxiliary metadata frames from the texture information bitstream, the geometry information bitstream, and the auxiliary metadata bitstream.
  • the apparatus includes means, such as the processing circuitry 12, the memory 14 or the like, for storing the one or more texture information frames, the one or more geometry information frames, and the one or more auxiliary metadata frames as one or more image items.
  • PCCPackedFrameProperty A descriptive item property called PCCPackedFrameProperty may be defined.
  • This property may contain PCCSpatialPackingBox() as defined above. Such a property is then linked to the image item which contains the packed texture and geometry image item.
  • PCCSpatialPackingBox() as defined above.
  • PCCAuxiliaryProperty(‘pcap’) may be defined to include the PCC global parameters and GOF header information.
  • PCCAuxiliaryProperty(‘pcap’) may be defined to include the PCC global parameters and GOF header information.
  • One example syntax is provided below:
  • PCC metadata item may be defined to contain the PCC auxiliary metadata information, which is referred to as a PCC auxiliary metadata sample in above sections. This item may be defined as type‘pcca’.
  • PCCTextureItemProperty(‘pcti’) may be defined to indicate the PCC encoded texture image item. This property may then be linked to the relevant image item.
  • PCCGeometryItemProperty(‘pcgi’) may be defined to indicate the PCC encoded geometry image item. This property may then be linked to the relevant image item.
  • PCC Auxiliary Property(‘pcap’) may be used to store the global header and GOF header information.
  • the above-mentioned PCC metadata item and the two image items with respective item texture and geometry item properties may be grouped together using an Entity Grouping mechanism to indicate that they create a PCC encoded 3D representation.
  • entity grouping may have a type of‘pcce’ in the EntitytoGroupBox() where all the relevant item ids of texture, geometry images and PCC auxiliary metadata item may be listed to indicate a group.
  • An example of such a data structure is as follows:
  • PCC metadata may be stored in and signalled as SEI messages and carried as part of the video bitstream.
  • SEI messages may contain the PCC auxiliary sample fields mapped to different or single SEI message and transmitted in conjunction with the texture and geometry elementary bitstreams.
  • PCC auxiliary metadata SEI messages may reside in one of the texture or geometry elementary streams and not repeated in both.
  • new NAL units may be defined to contain the PCC auxiliary metadata information.
  • Such NAL units may then also be stored in the ISOBMFF compliant format as part of the video samples. Again, in such a case, such NAL units may be stored as part of either texture or geometry video tracks or as part of the packed or interleaved texture and geometry track.
  • each PCC sample is comprised of one or more texture, geometry and auxiliary information NAL units.
  • a single media track or image item may be present in the PCC media file.
  • a decoder may comprise means, such as the processing circuitry 12, for receiving a point cloud compression coded bitstream, wherein the point cloud compression coded bitstream comprises a texture information bitstream, a geometry information bitstream, and an auxiliary metadata bitstream.
  • the decoder may further comprise means, such as the processing circuitry 12, for decoding the point cloud compression coded bitstream.
  • certain embodiments likewise cover an encoder that outputs a bitstream portion according to the syntax and semantics.
  • certain embodiments correspondingly cover a decoder that decodes a bitstream portion according to the syntax and semantics.
  • An example embodiment of the invention described above describes the codec in terms of separate encoder and decoder apparatus in order to assist the understanding of the processes involved.
  • the apparatus, structures and operations may be implemented as a single encoder-decoder apparatus/structure/operation.
  • the coder and decoder may share some or all common elements.
  • Figures 4-8 are flowcharts of an apparatus 10, method, and computer program product according to certain example embodiments. It will be understood that each block of the flowcharts, and combinations of blocks in the flowcharts, may be implemented by various means, such as hardware, firmware, processor, circuitry, and/or other devices associated with execution of software including one or more computer program instructions. For example, one or more of the procedures described above may be embodied by computer program instructions. In this regard, the computer program instructions which embody the procedures described above may be stored by a memory device 14 of an apparatus employing an embodiment of the present invention and executed by processing circuitry 12 of the apparatus.
  • any such computer program instructions may be loaded onto a computer or other programmable apparatus (e.g., hardware) to produce a machine, such that the resulting computer or other programmable apparatus implements the functions specified in the flowchart blocks.
  • These computer program instructions may also be stored in a computer-readable memory that may direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer- readable memory produce an article of manufacture, the execution of which implements the function specified in the flowchart blocks.
  • the computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operations to be performed on the computer or other programmable apparatus to produce a computer- implemented process such that the instructions which execute on the computer or other programmable apparatus provide operations for implementing the functions specified in the flowchart blocks.
  • a computer program product is therefore defined in those instances in which the computer program instructions, such as computer-readable program code portions, are stored by at least one non-transitory computer-readable storage medium with the computer program instructions, such as the computer-readable program code portions, being configured, upon execution, to perform the functions described above, such as in conjunction with the flowcharts of Figures 4-8.
  • the computer program instructions, such as the computer-readable program code portions need not be stored or otherwise embodied by a non-transitory computer-readable storage medium, but may, instead, be embodied by a transitory medium with the computer program instructions, such as the computer-readable program code portions, still being configured, upon execution, to perform the functions described above.
  • blocks of the flowcharts support combinations of means for performing the specified functions and combinations of operations for performing the specified functions for performing the specified functions. It will also be understood that one or more blocks of the flowcharts, and combinations of blocks in the flowcharts, may be implemented by special purpose hardware-based computer systems which perform the specified functions, or combinations of special purpose hardware and computer instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
EP19834318.8A 2018-07-11 2019-07-10 Verfahren und vorrichtung zur speicherung und signalisierung von komprimierten punktwolken Pending EP3821608A4 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201862696614P 2018-07-11 2018-07-11
PCT/FI2019/050539 WO2020012073A1 (en) 2018-07-11 2019-07-10 Method and apparatus for storage and signaling of compressed point clouds

Publications (2)

Publication Number Publication Date
EP3821608A1 true EP3821608A1 (de) 2021-05-19
EP3821608A4 EP3821608A4 (de) 2022-04-13

Family

ID=69143215

Family Applications (1)

Application Number Title Priority Date Filing Date
EP19834318.8A Pending EP3821608A4 (de) 2018-07-11 2019-07-10 Verfahren und vorrichtung zur speicherung und signalisierung von komprimierten punktwolken

Country Status (2)

Country Link
EP (1) EP3821608A4 (de)
WO (1) WO2020012073A1 (de)

Families Citing this family (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10897269B2 (en) 2017-09-14 2021-01-19 Apple Inc. Hierarchical point cloud compression
US11818401B2 (en) 2017-09-14 2023-11-14 Apple Inc. Point cloud geometry compression using octrees and binary arithmetic encoding with adaptive look-up tables
US10861196B2 (en) 2017-09-14 2020-12-08 Apple Inc. Point cloud compression
US10909725B2 (en) 2017-09-18 2021-02-02 Apple Inc. Point cloud compression
US11113845B2 (en) 2017-09-18 2021-09-07 Apple Inc. Point cloud compression using non-cubic projections and masks
US10699444B2 (en) 2017-11-22 2020-06-30 Apple Inc Point cloud occupancy map compression
US10607373B2 (en) 2017-11-22 2020-03-31 Apple Inc. Point cloud compression with closed-loop color conversion
US11010928B2 (en) 2018-04-10 2021-05-18 Apple Inc. Adaptive distance based point cloud compression
US10939129B2 (en) 2018-04-10 2021-03-02 Apple Inc. Point cloud compression
US10909727B2 (en) 2018-04-10 2021-02-02 Apple Inc. Hierarchical point cloud compression with smoothing
US10909726B2 (en) 2018-04-10 2021-02-02 Apple Inc. Point cloud compression
US11017566B1 (en) 2018-07-02 2021-05-25 Apple Inc. Point cloud compression with adaptive filtering
US11202098B2 (en) 2018-07-05 2021-12-14 Apple Inc. Point cloud compression with multi-resolution video encoding
US11012713B2 (en) 2018-07-12 2021-05-18 Apple Inc. Bit stream structure for compressed point cloud data
US11386524B2 (en) 2018-09-28 2022-07-12 Apple Inc. Point cloud compression image padding
US11367224B2 (en) 2018-10-02 2022-06-21 Apple Inc. Occupancy map block-to-patch information compression
US11430155B2 (en) 2018-10-05 2022-08-30 Apple Inc. Quantized depths for projection point cloud compression
US11348284B2 (en) 2019-01-08 2022-05-31 Apple Inc. Auxiliary information signaling and reference management for projection-based point cloud compression
US11200701B2 (en) 2019-03-19 2021-12-14 Nokia Technologies Oy Method and apparatus for storage and signaling of static point cloud data
US11057564B2 (en) 2019-03-28 2021-07-06 Apple Inc. Multiple layer flexure for supporting a moving image sensor
US11562507B2 (en) 2019-09-27 2023-01-24 Apple Inc. Point cloud compression using video encoding with time consistent patches
US11627314B2 (en) 2019-09-27 2023-04-11 Apple Inc. Video-based point cloud compression with non-normative smoothing
US11538196B2 (en) 2019-10-02 2022-12-27 Apple Inc. Predictive coding for point cloud compression
US11895307B2 (en) 2019-10-04 2024-02-06 Apple Inc. Block-based predictive coding for point cloud compression
US11798196B2 (en) 2020-01-08 2023-10-24 Apple Inc. Video-based point cloud compression with predicted patches
US11475605B2 (en) 2020-01-09 2022-10-18 Apple Inc. Geometry encoding of duplicate points
CN115398890B (zh) * 2020-04-12 2023-09-15 Lg电子株式会社 点云数据发送装置、点云数据发送方法、点云数据接收装置和点云数据接收方法
US11615557B2 (en) 2020-06-24 2023-03-28 Apple Inc. Point cloud compression using octrees with slicing
US11620768B2 (en) 2020-06-24 2023-04-04 Apple Inc. Point cloud geometry compression using octrees with multiple scan orders
US11758195B2 (en) * 2020-09-17 2023-09-12 Lemon Inc. Dependency information signaling in coded video
US11948338B1 (en) 2021-03-29 2024-04-02 Apple Inc. 3D volumetric content encoding using 2D videos and simplified 3D meshes

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150264404A1 (en) * 2014-03-17 2015-09-17 Nokia Technologies Oy Method and apparatus for video coding and decoding

Also Published As

Publication number Publication date
EP3821608A4 (de) 2022-04-13
WO2020012073A1 (en) 2020-01-16

Similar Documents

Publication Publication Date Title
WO2020012073A1 (en) Method and apparatus for storage and signaling of compressed point clouds
US11581022B2 (en) Method and apparatus for storage and signaling of compressed point clouds
US11595670B2 (en) Method and apparatus for storage and signaling of sub-sample entry descriptions
WO2020070379A1 (en) Method and apparatus for storage and signaling of compressed point clouds
US11200701B2 (en) Method and apparatus for storage and signaling of static point cloud data
KR102450781B1 (ko) 생성된 콘텐츠를 포함하는 미디어 데이터를 인코딩하기 위한 방법 및 장치
KR102406887B1 (ko) 시간 설정형 미디어 데이터를 발생시키는 방법, 디바이스, 및 컴퓨터 프로그램
CN109257624B (zh) 用于生成以及处理媒体文件的方法和设备及存储介质
JP7472220B2 (ja) 方法、プログラム、及びデバイス
CN110800311B (zh) 用于传输媒体内容的方法、装置和计算机程序
KR102559862B1 (ko) 미디어 콘텐츠 전송을 위한 방법, 디바이스, 및 컴퓨터 프로그램
US11638066B2 (en) Method, device and computer program for encapsulating media data into a media file
KR102655630B1 (ko) 3차원 비디오 컨텐츠를 포함하는 미디어 파일을 생성하는 방법 및 장치 및 3차원 비디오 컨텐츠를 재생하는 방법 및 장치
WO2021176133A1 (en) An apparatus, a method and a computer program for volumetric video
EP3873095A1 (de) Vorrichtung, verfahren und computerprogramm für omnidirektionales video
WO2023144439A1 (en) A method, an apparatus and a computer program product for video coding
WO2023175243A1 (en) A method, an apparatus and a computer program product for video encoding and video decoding
CN117581551A (zh) 动态封装媒体内容数据的方法、装置及计算机程序

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20210211

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
A4 Supplementary search report drawn up and despatched

Effective date: 20220314

RIC1 Information provided on ipc code assigned before grant

Ipc: H04N 21/81 20110101ALI20220307BHEP

Ipc: G06F 3/01 20060101ALI20220307BHEP

Ipc: G06T 9/00 20060101ALI20220307BHEP

Ipc: H04N 21/854 20110101ALI20220307BHEP

Ipc: H04N 21/234 20110101ALI20220307BHEP

Ipc: H04N 19/70 20140101ALI20220307BHEP

Ipc: H04N 19/597 20140101AFI20220307BHEP