WO2023001623A1 - V3c patch connectivity signaling for mesh compression - Google Patents

V3c patch connectivity signaling for mesh compression Download PDF

Info

Publication number
WO2023001623A1
WO2023001623A1 PCT/EP2022/069371 EP2022069371W WO2023001623A1 WO 2023001623 A1 WO2023001623 A1 WO 2023001623A1 EP 2022069371 W EP2022069371 W EP 2022069371W WO 2023001623 A1 WO2023001623 A1 WO 2023001623A1
Authority
WO
WIPO (PCT)
Prior art keywords
dimensional
patch
mesh
vertices
information
Prior art date
Application number
PCT/EP2022/069371
Other languages
French (fr)
Inventor
Sebastian Schwarz
Lukasz Kondrad
Lauri Aleksi ILOLA
Christoph BACHHUBER
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Publication of WO2023001623A1 publication Critical patent/WO2023001623A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/001Model-based coding, e.g. wire frame
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Definitions

  • the examples and non-limiting embodiments relate generally to volumetric video coding, and more particularly, to V3C patch connectivity signaling for mesh compression.
  • FIG. 1A shows an example process for encoding volumetric media.
  • FIG. IB shows an example process for decoding volumetric media.
  • FIG. 2 shows an example of block to patch mapping.
  • FIG. 3A shows an example of an atlas coordinate system.
  • FIG. 3B shows an example of a local 3D patch coordinate system.
  • FIG. 3C shows an example of a final target 3D coordinate system.
  • FIG. 4 shows elements of a mesh.
  • FIG. 5 shows an example V-PCC extension for mesh encoding, based on the embodiments described herein.
  • FIG. 6 shows an example V-PCC extension for mesh decoding, based on the embodiments described herein.
  • FIG. 7 illustrates a general CfP approach.
  • FIG. 8 shows illustrations of issues with decoder surface reconstruction.
  • FIG. 9 illustrates a patch with five (5) identified boundary vertices (white circles, A-E) and their x,y coordinates.
  • FIG. 10 shows example signaling of the sorting approach for boundary vertices.
  • FIG. 11 shows example signaling of the sorting approach for neighboring patches.
  • FIG. 12 shows example signaling of connected vertices.
  • FIG. 13 shows example signaling of connected vertices, while taking into account sorted boundary vertex lists of the neighboring patches.
  • FIG. 14 shows example signaling to indicate to an encoder how to handle vertex connectivity information for reconstruction and rendering
  • FIG. 15 is an example apparatus to implement V3C patch connectivity signaling for mesh compression, based on the examples described herein.
  • FIG. 16 is an example encoder method, based on the examples described herein.
  • FIG. 17 is an example decoder method, based on the examples described herein.
  • the examples described herein relate to the encoding, signaling and rendering a volumetric video based on mesh coding.
  • the examples described herein focus on improving the industry standard for reconstructing mesh surfaces for volumetric video. Signaling related to the mesh reconstruction is at the core of this description.
  • Volumetric video data represents a three-dimensional scene or object and can be used as input for AR, VR and MR applications. Such data describes geometry (shape, size, position in 3D-space) and respective attributes (e.g. color, opacity, reflectance, ...), plus any possible temporal transformations of the geometry and attributes at given time instances (like frames in 2D video). Volumetric video is either generated from 3D models, i.e. CGI, or captured from real-world scenes using a variety of capture solutions, e.g. multi-camera, laser scan, combination of video and dedicated depth sensors, and more. Also, a combination of CGI and real- world data is possible.
  • 3D models i.e. CGI
  • capture solutions e.g. multi-camera, laser scan, combination of video and dedicated depth sensors, and more.
  • a combination of CGI and real- world data is possible.
  • Typical representation formats for such volumetric data are polygon meshes, point clouds, or voxels.
  • Temporal information about the scene can be included in the form of individual capture instances, i.e. "frames" in 2D video, or other means, e.g. position of an object as a function of time.
  • volumetric video describes a 3D scene (or object), such data can be viewed from any viewpoint. Therefore, volumetric video is an important format for any AR, VR, or MR application, especially for providing 6DOF viewing capabilities .
  • 3D data acquisition devices have enabled reconstruction of highly detailed volumetric video representations of natural scenes.
  • Infrared, lasers, time-of-flight and structured light are all examples of devices that can be used to construct 3D video data.
  • Representation of the 3D data depends on how the 3D data is used. Dense voxel arrays have been used to represent volumetric medical data. In 3D graphics, polygonal meshes are extensively used. Point clouds on the other hand are well suited for applications such as capturing real world 3D scenes where the topology is not necessarily a 2D manifold.
  • Another way to represent 3D data is coding this 3D data as a set of texture and depth map(s) as is the case in the multi-view plus depth. Closely related to the techniques used in multi-view plus depth is the use of elevation maps, and multi-level surface maps.
  • V3C MPEG visual volumetric video-based coding
  • Visual volumetric video a sequence of visual volumetric frames, if uncompressed, may be represented by a large amount of data, which can be costly in terms of storage and transmission. This has led to the need for a high coding efficiency standard for the compression of visual volumetric data.
  • V3C specification enables the encoding and decoding processes of a variety of volumetric media by using video and image coding technologies. This is achieved through first a conversion of such media from their corresponding 3D representation to multiple 2D representations, also referred to as V3C components, before coding such information.
  • V3C components may include occupancy, geometry, and attribute components.
  • the occupancy component can inform a V3C decoding and/or rendering system of which samples in the 2D components are associated with data in the final 3D representation.
  • the geometry component contains information about the precise location of 3D data in space, while attribute components can provide additional properties, e.g. texture or material information, of such 3D data. An example is shown in FIG. 1A and FIG. IB.
  • FIG. 1A shows volumetric media conversion at the encoder
  • FIG. IB shows volumetric media conversion at the decoder side.
  • the 3D media 102 is converted to a series of 2D representations: occupancy 118, geometry 120, and attributes 122. Additional atlas information 108 is also included in the bitstream to enable inverse reconstruction. Refer to ISO/IEC 23090-5.
  • a volumetric capture operation 104 generates a projection 106 from the input 3D media 102.
  • the projection 106 is a projection operation.
  • an occupancy operation 110 generates the occupancy 2D representation 118
  • a geometry operation 112 generates the geometry 2D representation 120
  • an attribute operation 114 generates the attribute 2D representation 122.
  • the additional atlas information 108 is included in the bitstream 116.
  • the atlas information 108, the occupancy 2D representation 118, the geometry 2D representation 120, and the attribute 2D representation 122 are encoded into the V3C bitstream 124 to encode a compressed version of the 3D media 102.
  • V3C patch connectivity signaling 130 may also be signaled in the V3C bitstream 124.
  • the V3C patch connectivity signaling 130 may be used on the decoder side, as shown in FIG. IB.
  • a decoder using the V3C bitstream 124 derives 2D representations using an atlas information operation 126, an occupancy operation 128, a geometry operation 130 and an attribute operation 132.
  • the atlas information operation 126 provides atlas information into a bitstream 134.
  • the occupancy operation 128 derives the occupancy 2D representation 136
  • the geometry operation 130 derives the geometry 2D representation 138
  • the attribute operation 132 derives the attribute 2D representation 140.
  • the 3D reconstruction operation 142 generates a decompressed reconstruction 144 of the 3D media 102, using the atlas information 126/134, the occupancy 2D representation 136, the geometry 2D representation 138, and the attribute 2D representation 140.
  • Atlas Additional information that allows associating all these subcomponents and enables the inverse reconstruction, from a 2D representation back to a 3D representation is also included in a special component, referred to in this document as the atlas.
  • An atlas consists of multiple elements, named as patches. Each patch identifies a region in all available 2D components and contains information necessary to perform the appropriate inverse projection of this region back to the 3D space. The shape of such regions is determined through a 2D bounding box associated with each patch as well as their coding order. The shape of these regions is also further refined after the consideration of the occupancy information.
  • Atlases are partitioned into patch packing blocks of equal size. Refer for example to block 202 in FIG. 2.
  • FIG. 2 shows an example of block to patch mapping with 4 projected patches (204, 204-2, 204-3, 204-4) onto an atlas 201 when asps_patch_precedence_order_flag is equal to 0. Projected points are represented with dark grey. The area that does not contain any projected points is represented with light grey. Patch packing blocks are represented with dashed lines. The number inside each patch packing block 202 represents the patch index of the patch to which it is mapped. Refer to ISO/IEC 23090-5.
  • Axes orientations are specified for internal operations. For instance, the origin of the atlas coordinates is located on the top-left corner of the atlas frame. For the reconstruction step, an intermediate axes definition for a local 3D patch coordinate system is used. The 3D local patch coordinate system is then converted to the final target 3D coordinate system using appropriate transformation steps.
  • FIG. 3A shows an example of a single patch 302 packed onto an atlas image 304.
  • This patch is then converted, with reference to FIG. 3B, to a local 3D patch coordinate system (U, V, D) defined by the projection plane with origin O', tangent (U), bi-tangent (V), and normal (D) axes.
  • U tangent
  • V bi-tangent
  • D normal
  • the projection plane is equal to the sides of an axis-aligned 3D bounding box 306, as shown in FIG. 3B.
  • FIG. 3A shows an example of an atlas coordinate system
  • FIG. 3B shows an example of a local 3D patch coordinate system
  • FIG. 3C shows an example of a final target 3D coordinate system. Refer to ISO/IEC 23090-5.
  • V3C video components are referred to in this document as video bitstreams, while an atlas component is referred to as the atlas bitstream.
  • Video bitstreams and atlas bitstreams may be further split into smaller units, referred to here as video and atlas sub-bitstreams, respectively, and may be interleaved together, after the addition of appropriate delimiters, to construct a V3C bitstream
  • V3C patch information is contained in atlas bitstream, atlas_sub_bitstream (), which contains a sequence of NAL units
  • NAL unit is specified to format data and provide header information in a manner appropriate for conveyance on a variety of communication channels or storage media. All data are contained in NAL units, each of which contains an integer number of bytes.
  • a NAL unit specifies a generic format for use in both packet-oriented and bitstream systems. The format of NAL units for both packet-oriented transport and sample streams is identical except that in the sample stream format specified in Annex D of ISO/IEC 23090-5 each NAL unit can be preceded by an additional element that specifies the size of the NAL unit.
  • NAL units in atlas bitstream can be divided to atlas coding layer (ACL) and non-atlas coding layer (non-ACL) units.
  • the former dedicated to carry patch data while the later to carry data necessary to properly parse the ACL units or any additional auxiliary data.
  • nal_unit_header () syntax nal_unit_type specifies the type of the RBSP data structure contained in the NAL unit as specified in Table 4 of ISO/IEC 23090-5.
  • nal_layer_id specifies the identifier of the layer to which an ACL NAL unit belongs or the identifier of a layer to which a non-ACL NAL unit applies.
  • nal_layer_id shall be in the range of 0 to 62, inclusive.
  • the value of 63 may be specified in the future by ISO/IEC. Decoders conforming to a profile specified in Annex A of ISO/IEC 23090-5 shall ignore (i.e., remove from the bitstream and discard) all NAL units with values of nal_layer_id not equal to 0.
  • rbsp_byte[ i ] is the i-th byte of an RBSP.
  • An RBSP is specified as an ordered sequence of bytes as follows.
  • the RBSP contains a string of data bits (SODB) as follows. If the SODB is empty (i.e., zero bits in length), the RBSP is also empty. Otherwise, the RBSP contains the SODB as follows (i-ii). i) The first byte of the RBSP contains the first (most significant, left-most) eight bits of the SODB; the next byte of the RBSP contains the next eight bits of the SODB, etc., until fewer than eight bits of the SODB remain, ii) The rbsp_trailing_bits( ) syntax structure is present after the SODB as follows (a-c).
  • the first (most significant, left-most) bits of the final RBSP byte contain the remaining bits of the SODB (if any).
  • the next bit consists of a single bit equal to 1 (i.e., rbsp_stop_one_bit).
  • rbsp_stop_one_bit When the rbsp_stop_one_bit is not the last bit of a byte- aligned byte, one or more bits equal to 0 (i.e. instances of rbsp_alignment_zero_bit ) are present to result in byte alignment.
  • One or more cabac_zero_word 16-bit syntax elements equal to 0x0000 may be present in some RBSPs after the rbsp_trailing_bits ( ) at the end of the RBSP.
  • Syntax structures having these RBSP properties are denoted in the syntax tables using an "_rbsp" suffix. These structures are carried within NAL units as the content of the rbsp_byte [ i ] data bytes.
  • Atlas_sequence_parameter_set_rbsp ( ), which is used to carry parameters related related to atlas on a sequence level; ii) atlas_frame_parameter_set_rbsp ( ), which is used to carry parameters related to atlas on a frame level and are valid for one or more atlas frames; iii) sei_rbsp ( ), used to carry SEI messages in NAL units; iv) atlas_tile_group_layer_rbsp ( ), used to carry patch layout information for tile groups.
  • the decoder can extract the SODB from the RBSP by concatenating the bits of the bytes of the RBSP and discarding the rbsp_stop_one_bit, which is the last (least significant, right-most) bit equal to 1, and discarding any following (less significant, farther to the right) bits that follow it, which are equal to 0.
  • the data necessary for the decoding process is contained in the SODB part of the RBSP.
  • Atlas_tile_group_laye_rbsp contains metadata information for a list off tile groups, which represent sections of frame. Each tile group may contain several patches for which the metadata syntax is described below in Table 1.
  • Annex F of ISO/IEC 23090-5 describes different SEI messages that have been defined for V3C purposes. SEI messages assist in processes related to decoding, reconstruction, display, or other purposes. Annex F of ISO/IEC 23090-5 defines two types of SEI messages: essential and non-essential. SEI messages are signaled in sei_rspb() which is described below in Table 2.
  • Non-essential SEI messages are not required by the decoding process. Conforming decoders are not required to process this information for output order conformance. [0054] Specification for presence of non-essential SEI messages is also satisfied when those messages (or some subset of them) are conveyed to decoders (or to the HRD) by other means not specified in ISO/IEC 23090-5. When present in the bitstream, non-essential SEI messages shall obey the syntax and semantics as specified in Annex F of ISO/IEC 23090-5.
  • the representation of the content of the SEI message is not required to use the same syntax specified in Annex F of ISO/IEC 23090-5. For the purpose of counting bits, only the appropriate bits that are actually present in the bitstream are counted.
  • Essential SEI messages are an integral part of the V3C bitstream and should not be removed from the bitstream.
  • the essential SEI messages are categorized into two types, Type-A essential SEI messages and Type-B essential SEI messages .
  • Type-A essential SEI messages These SEIs contain information required to check bitstream conformance and for output timing decoder conformance. Every V3C decoder conforming to point A should not discard any relevant Type-A essential SEI messages and shall consider them for bitstream conformance and for output timing decoder conformance.
  • Type-B essential SEI messages V3C decoders that wish to conform to a particular reconstruction profile should not discard any relevant Type-B essential SEI messages and shall consider them for 3D point cloud reconstruction and conformance purposes.
  • a polygon mesh is a collection of vertices, edges and faces that defines the shape of a polyhedral object in 3D computer graphics and solid modelling.
  • the faces usually consist of triangles (triangle mesh), quadrilaterals (quads), or other simple convex polygons (n-gons), since this simplifies rendering, but may also be more generally composed of concave polygons, or even polygons with holes.
  • FIG. 4 illustrates elements of a mesh.
  • Polygon meshes are defined by the following elements:
  • Vertex (102) A position in 3D space defined as (x,y,z) along with other information such as color (r,g,b), normal vector and texture coordinates.
  • Edge (104) A connection between two vertices.
  • Face (106) A closed set of edges, in which a triangle face has three edges, and a quad face has four edges.
  • a polygon 108 is a coplanar set of faces 106. In systems that support multi-sided faces, polygons and faces are equivalent. Mathematically a polygonal mesh may be considered an unstructured grid, or undirected graph, with additional properties of geometry, shape and topology.
  • Groups Some mesh formats contain groups, which define separate elements of the mesh, and are useful for determining separate sub-objects for skeletal animation or separate actors for non-skeletal animation.
  • Materials defined to allow different portions of the mesh to use different shaders when rendered.
  • UV coordinates Most mesh formats also support some form of UV coordinates which are a separate 2D representation of the mesh "unfolded" to show what portion of a 2-dimensional texture map to apply to different polygons of the mesh. It is also possible for meshes to contain other such vertex attribute information such as color, tangent vectors, weight maps to control animation, etc. (sometimes also called channels) .
  • V-PCC mesh coding extension MPEG M49588
  • FIG. 5 and FIG. 6 show the extensions to the V-PCC encoder and decoder to support mesh encoding and mesh decoding, respectively.
  • the input mesh data 502 is demultiplexed 504 into vertex coordinate+attributes 506 and vertex connectivity 508.
  • the vertex coordinate+attributes data 506 is coded 510 using MPEG-I V-PCC
  • the vertex connectivity data 508 is coded (using vertex connectivity encoder 516) as auxiliary data 518. Both of these (encoded vertex coordinates and vertex attributes 517 and auxiliary data 518) are multiplexed 520 to create the final compressed output bitstream 522.
  • Vertex ordering 514 is carried out on the reconstructed vertex coordinates 512 at the output of MPEG-I V-PCC 510 to reorder the vertices for optimal vertex connectivity encoding 516.
  • the encoding process/apparatus 500 of FIG. 5 may be extended such that the encoding process/apparatus 500 signals V3C patch connectivity signaling 530 within the output bitstream 522.
  • V3C patch connectivity signaling 530 may be provided and signaled separately from the output bitstream 522.
  • the input bitstream 602 is demultiplexed 604 to generate the compressed bitstreams for vertex coordinates+attributes 605 and vertex connectivity 606.
  • the input/compressed bitstream 602 may comprise or may be the output from the encoder 500, namely the output bitstream 522 of FIG. 5.
  • the vertex coordinates+attributes 605 is decompressed using MPEG-I V-PCC decoder 608 to generate vertex attributes 612.
  • Vertex ordering 616 is carried out on the reconstructed vertex coordinates 614 at the output of MPEG-I V-PCC decoder 608 to match the vertex order at the encoder 500.
  • the vertex connectivity data 606 is also decompressed using vertex connectivity decoder 610 to generate vertex connectivity 618, and everything (including vertex attributes 612, the output of vertex reordering 616, and vertex connectivity 618) is multiplexed 620 to generate the reconstructed mesh 622.
  • the decoding process/apparatus 600 of FIG. 6 may be extended such that the decoding process/apparatus 600 receives and decodes V3C patch connectivity signaling 630, which may be part of the compressed bitstream 622.
  • V3C patch connectivity signaling 630 may be received and signaled separately from the compressed bitstream 602 or output bitstream 522.
  • the V3C patch connectivity signaling 630 of FIG. 6 may comprise or correspond to the V3C patch connectivity signaling 530 of FIG. 5
  • Alpha shapes generalize convex hulls. From an initial volume containing all vertices in the 3D space, sub volumes are iteratively removed. Intuitively, the size of the carved out subvolumes can be adjusted by a parameter.
  • Poisson surface reconstruction solves a regularized optimization problem to generate a smooth surface.
  • the surface's amount of detail can be adjusted with a parameter.
  • the input meshes may contain vertices with varying spatial density over different regions of the input model, which means that applying a surface reconstruction globally to the model with the same parameters might be sub-optimal.
  • input sequences Mitch and Thomas demonstrate the differences in this vertex density well, with significantly higher vertex density around facial region than on other parts of the model.
  • FIG. 8 illustrates gaps (left, 802) and patch edges (right, 804) on the mesh after initial reconstruction.
  • surface reconstruction tends to be slow and several improvements to it can be made by introducing encoder side pre-analysis and signaling of helpful information along the bitstream.
  • the main encoder embodiments include receiving a volumetric frame consisting of 3D data; encoding the content using a V3C-based mesh compression approach, i.e. decomposing 3D model into collection of 2D patches; identification of 3D vertices on 2D patch boundaries per patch; signaling neighboring patches per patch; and signaling connected boundary vertices.
  • the main decoder embodiments include receiving a V3C bitstream with connected boundary vertices information; extracting connected boundary vertices information; generating a mesh representation from the decoded data; identification of 3D vertices on 2D patch boundaries per patch; connecting boundary vertices between neighboring patches, as signaled; (optional) reconstructing texture for connected vertices; and rendering the reconstructed mesh.
  • the main signaling embodiments include that parameters for connected boundary vertices information could be carried in the V3C bitstream using multiple mechanisms.
  • information on how to determine and order boundary vertices is signaled.
  • information about neighboring patches to a patch is signaled.
  • information about connected vertices is signaled.
  • information on mesh reconstruction of connected vertices is signaled.
  • the above information is signaled in one or the combination of Atlas sequence parameter set extension, or Atlas frame parameter set extension, or Atlas patch data unit extension, or SEI messages.
  • An encoder receives one or more volumetric frames describing a 3D object or a scene and uses a V3C mesh encoder to compress the information in a mesh format, i.e. decomposing 3D model into collection of 2D patches. For each 2D patch, two lists are created: 1) a list of boundary vertices, e.g. 3D vertex points lying on the edge of the 2D patch, is generated, and 2) a list of all neighboring patches in 3D space.
  • FIG. 9 illustrates a patch 902 with 5 identified boundary vertices (white circles, A-E), and their x,y coordinates.
  • the list of boundary vertices can be sorted in various ways, e.g. starting with the vertex with the lowest value for combined x/y-location (vertex A in the example above). Other starting vertices are also possible, e.g. the highest combined value: D, the lowest x, then lowest y value: A, the lowest y then lowest x value: C.
  • the list of vertices is then sorted, starting with the selected starting vertex and following the patch boundary, either in clock-wise or anti-clockwise fashion:
  • Other sorting is possible, e.g. first by x, then y axis, ascending: A-B-E-C-D, first by x, then y axis, descending: D-C-E-B-A, first by y, then x axis, ascending: C- B-A-D-E, first by x, then x axis, descending: E-D-A-B-C, etc.
  • the starting boundary vertex approach is signaled in the bitstream, as shown in the example below.
  • the sorting approach for the boundary vertices is signaled in the bitstream, as in the example shown below and in FIG. 10 (refer to item 1002).
  • pdu_vert_start indicates the starting boundary vertex. A value of 0 indicates to start sorting from the vertex with the lowest x, then y coordinate value; a value of 1 indicates to start sorting from the lowest combined x,y coordinate value.
  • pdu_vert_sort indicates the boundary vertex sorting approach; a value of 0 indicates to follow the patch boundary clockwise, a value of 1 indicates to follow the vertex boundary anti-clockwise.
  • the starting boundary vertex approach and sorting approach for the boundary vertices can be signaled also on sequence (Atlas sequence parameter set) or frame level (Atlas frame parameter set).
  • the encoder assigns each generated 2D patch an index. Typically, the relation of this patch to other patches in 3D space is lost.
  • a V3C encoder stores for each patch the patch indices of neighboring patches, i.e. patches with shared boundary vertices, in a list.
  • This list of patch indices can be sorted, e.g. i) by patch index value, descending/ascending, or ii) Starting from one neighboring patch and then following the current patch boundaries clockwise/anticlockwise.
  • the sorting approach for the neighboring patches is signaled in the bitstream, as shown in the example below and in FIG. 11 (refer to item 1102).
  • pdu_patch_sort indicates the neighboring patch sorting; a value of 0 indicates to follow the patch boundary clockwise, a value of 1 indicates to sort by patch indices in ascending order.
  • the approach for the neighboring patch sorting can be signaled also on sequence (Atlas sequence parameter set) or frame level (Atlas frame parameter set).
  • connected vertices are signaled explicitly as shown below and in FIG. 12 (refer to item 1202):
  • pdu_num_bound_vert indicates the total number of boundary vertices for the patch [patchldx].
  • pdu_bound_vert_patch [patchldx] [i] indicates the index of the neighboring patch list to which the boundary vertex [i] of patch [patchldx] should be connected.
  • pdu_bound_vert_idx [patchldx] [i] indicates the index of the boundary vertex list of the patch identified by pdu_bound_vert_patch, to which the boundary vertex [i] of patch [patchldx] should be connected.
  • connected vertices are signaled also taking into account sorted boundary vertex lists of the neighboring patches, as shown below and in FIG. 13 (refer to item 1302).
  • pdu_bound_vert_start_idx [patchldx] [i] indicates the first index of the boundary vertex list of the patch identified by pdu_bound_vert_patch, to which the boundary vertex [i] of patch [patchldx] should be connected.
  • pdu_num_bound_vert_idx [patchldx] [i] indicates the number of boundary patch vertices that shall be connected in ascending fashion.
  • additional signaling is introduced to indicate to the encoder how to handle vertex connectivity information for reconstruction and rendering. For example, as shown below and in FIG. 14 (refer to item 1402):
  • pdu_bound_vert_handling indicates to the decoder how to handle the reconstruction of connected vertices.
  • pdu_bound_vert_handling 0 indicates that connected vertices shall be merged into a single vertex, i.e. connecting neighboring patches at exactly this vertex.
  • pdu_bound_vert_handling 0 indicates that a new face shall be drawn.
  • a decoder receives a V3C bitstream including boundary vertex connectivity information.
  • the decoder extracts connected boundary vertices information, such as how to generate and sort a list of boundary vertices for each patch, and how to generate and sort a list of neighboring patches for each patch.
  • the decoder For each 2D patch, the decoder generates two lists following this information: A. a list of boundary vertices, e.g. 3D vertex points lying on the edge of the 2D patch, and B. a list of all neighboring patches in 3D space.
  • the decoder For each vertex in list A, the decoder extracts information on connected vertices in neighboring patches, and reconstructs the mesh topology accordingly.
  • the decoder reconstructs any present attribute, e.g. texture, by interpolating from the values of the connected vertices.
  • FIG. 15 is an apparatus 1500 which may be implemented in hardware, configured to implement V3C patch connectivity signaling for mesh compression, based on any of the examples described herein.
  • the apparatus comprises a processor 1502, at least one memory 1504 (memory 1504 may be transitory or non-transitory) including computer program code 1505, wherein the at least one memory 1504 and the computer program code 1505 are configured to, with the at least one processor 1502, cause the apparatus to implement circuitry, a process, component, module, function, coding, and/or decoding (collectively 1506).
  • the apparatus 1500 is further configured to provide or receive signaling 1507, based on the signaling embodiments described herein.
  • the apparatus 1500 optionally includes a display and/or I/O interface 1508 that may be used to display an output (e.g., an image or volumetric video) of a result of coding/decoding 1506.
  • the display and/or I/O interface 1508 may also be configured to receive input such as user input (e.g. with a keypad).
  • the apparatus 1500 also includes one or more network (NW) interfaces (I/F(s)) 1510.
  • NW I/F(s) 1510 may be wired and/or wireless and communicate over a channel or the Internet/other network(s) via any communication technique.
  • the NW I/F(s) 1510 may comprise one or more transmitters and one or more receivers.
  • the NW I/F(s) 1510 may comprise standard well-known components such as an amplifier, filter, frequency-converter, (de)modulator, and encoder/decoder circuitry(ies) and one or more antennas.
  • the processor 1502 is configured to implement coding/decoding 1506 and/or signaling 1507 without use of memory 1504.
  • the apparatus 1500 may be a remote, virtual or cloud apparatus.
  • the apparatus 1500 may be either a writer or a reader (e.g. parser), or both a writer and a reader (e.g. parser).
  • the apparatus 1500 may be either a coder or a decoder, or both a coder and a decoder.
  • the apparatus 1500 may be a user equipment (UE), a head mounted display (HMD), or any other fixed or mobile device.
  • UE user equipment
  • HMD head mounted display
  • the memory 1504 may be implemented using any suitable data storage technology, such as semiconductor based memory devices, flash memory, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory.
  • the memory 1504 may comprise a database for storing data.
  • Interface 1512 enables data communication between the various items of apparatus 1500, as shown in FIG. 15.
  • Interface 1512 may be one or more buses, or interface 1512 may be one or more software interfaces configured to pass data within computer program code 1505.
  • the interface 1512 may be one or more buses such as address, data, or control buses, and may include any interconnection mechanism, such as a series of lines on a motherboard or integrated circuit, fiber optics or other optical communication equipment, and the like.
  • interface 1512 is an object- oriented software interface.
  • the apparatus 1500 need not comprise each of the features mentioned, or may comprise other features as well.
  • the apparatus 1500 may be an embodiment of and have the features of any of the apparatuses shown in FIG. 1A, FIG. IB, FIG. 5, FIG. 6, or FIG. 7.
  • FIG. 16 is an encoder method 1600 to implement the examples described herein.
  • the method includes receiving a volumetric frame comprising three-dimensional data content.
  • the method includes decomposing a three- dimensional model of the three-dimensional content into a collection of two-dimensional patches.
  • the method includes encoding the three-dimensional data content, the encoding comprising a coding of a mesh related to the volumetric frame.
  • the method includes identifying three-dimensional vertices on two-dimensional patch boundaries per patch.
  • the method includes signaling neighboring patches per patch.
  • the method includes signaling connected boundary vertices.
  • Method 1600 may be implemented with apparatus 1500 or with an encoder apparatus.
  • FIG. 17 is a decoder method 1700 to implement the examples described herein.
  • the method includes receiving a bitstream comprising connected boundary vertices information related to a volumetric frame comprising three- dimensional data content of a mesh, the bitstream further comprising neighboring patches per patch information related to the volumetric frame.
  • the method includes extracting the connected boundary vertices information.
  • the method includes extracting the neighboring patches per patch information.
  • the method includes generating a mesh representation of the mesh from the extracted connected boundary vertices information.
  • the method includes identifying three-dimensional vertices on two-dimensional patch boundaries per patch of the mesh representation.
  • the method includes connecting boundary vertices between neighboring patches per patch, as signaled.
  • the method includes rendering a reconstruction of the mesh.
  • Method 1700 may be implemented with apparatus 1500 or with a decoder apparatus.
  • references to a 'computer', 'processor', etc. should be understood to encompass not only computers having different architectures such as single/multi-processor architectures and sequential (Von Neumann)/parallel architectures but also specialized circuits such as field-programmable gate arrays (FPGA), application specific circuits (ASIC), signal processing devices and other processing circuitry.
  • References to computer program, instructions, code etc. should be understood to encompass software for a programmable processor or firmware such as, for example, the programmable content of a hardware device such as instructions for a processor, or configuration settings for a fixed-function device, gate array or programmable logic device, etc.
  • circuitry may refer to any of the following: (a) hardware circuit implementations, such as implementations in analog and/or digital circuitry, and (b) combinations of circuits and software (and/or firmware), such as (as applicable): (i) a combination of processor(s) or (ii) portions of processor(s)/software including digital signal processor(s), software, and memory(ies) that work together to cause an apparatus to perform various functions, and (c) circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.
  • circuitry would also cover an implementation of merely a processor (or multiple processors) or a portion of a processor and its (or their) accompanying software and/or firmware.
  • the term 'circuitry' would also cover, for example and if applicable to the particular element, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, or another network device.
  • Circuitry may also be used to mean a function or a process, such as one implemented by an encoder or decoder, or a codec.
  • An example apparatus includes at least one processor; and at least one memory including computer program code; wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to: receive a volumetric frame comprising three-dimensional data content; decompose a three-dimensional model of the three-dimensional content into a collection of two-dimensional patches; encode the three-dimensional data content, the encoding comprising a coding of a mesh related to the volumetric frame; identify three-dimensional vertices on two-dimensional patch boundaries per patch; signal neighboring patches per patch; and signal connected boundary vertices.
  • the apparatus may further include wherein the signaling of the neighboring patches per patch and the signaling of the connected boundary vertices is configured to be used to reconstruct texture for the connected boundary vertices.
  • the apparatus may further include wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: signal information related to determining and ordering the neighboring patches per patch.
  • the apparatus may further include wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: signal information related to determining and ordering the connected boundary vertices.
  • the apparatus may further include wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: signal information related to mesh reconstruction of the connected boundary vertices.
  • the apparatus may further include wherein the signaling of the neighboring patches per patch and the signaling of the connected boundary vertices is performed using at least one of: an atlas sequence parameter set extension; an atlas frame parameter set extension; an atlas patch data unit extension; or at least one supplemental enhancement information message.
  • the apparatus may further include wherein the connected boundary vertices signaling comprises information related to whether connected vertices are to be merged into a single vertex, or whether a new face is to be drawn.
  • An example apparatus includes at least one processor; and at least one memory including computer program code; wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to: receive a bitstream comprising connected boundary vertices information related to a volumetric frame comprising three-dimensional data content of a mesh, the bitstream further comprising neighboring patches per patch information related to the volumetric frame; extract the connected boundary vertices information; extract the neighboring patches per patch information; generate a mesh representation of the mesh from the extracted connected boundary vertices information; identify three-dimensional vertices on two-dimensional patch boundaries per patch of the mesh representation; connect boundary vertices between neighboring patches per patch, as signaled; and render a reconstruction of the mesh.
  • the apparatus may further include wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: reconstruct texture for the connected boundary vertices using the neighboring patches per patch information and the connected boundary vertices information.
  • the apparatus may further include wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: receive signaling of information related to determining and ordering the neighboring patches per patch.
  • the apparatus may further include wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: receive signaling of information related to determining and ordering the connected boundary vertices.
  • the apparatus may further include wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: receive signaling of information related to mesh reconstruction of the connected boundary vertices.
  • the apparatus may further include wherein the neighboring patches per patch information and the connected boundary vertices information is received with at least one of: an atlas sequence parameter set extension; an atlas frame parameter set extension; an atlas patch data unit extension; or at least one supplemental enhancement information message.
  • the apparatus may further include wherein the connected boundary vertices information comprises information related to whether connected vertices are to be merged into a single vertex, or whether a new face is to be drawn.
  • An example apparatus includes means for receiving a volumetric frame comprising three-dimensional data content; means for decomposing a three-dimensional model of the three- dimensional content into a collection of two-dimensional patches; means for encoding the three-dimensional data content, the encoding comprising a coding of a mesh related to the volumetric frame; means for identifying three- dimensional vertices on two-dimensional patch boundaries per patch; means for signaling neighboring patches per patch; and means for signaling connected boundary vertices.
  • An example apparatus includes means for receiving a bitstream comprising connected boundary vertices information related to a volumetric frame comprising three-dimensional data content of a mesh, the bitstream further comprising neighboring patches per patch information related to the volumetric frame; means for extracting the connected boundary vertices information; means for extracting the neighboring patches per patch information; means for generating a mesh representation of the mesh from the extracted connected boundary vertices information; means for identifying three- dimensional vertices on two-dimensional patch boundaries per patch of the mesh representation; means for connecting boundary vertices between neighboring patches per patch, as signaled; and means for rendering a reconstruction of the mesh.
  • An example method includes receiving a volumetric frame comprising three-dimensional data content; decomposing a three-dimensional model of the three-dimensional content into a collection of two-dimensional patches; encoding the three-dimensional data content, the encoding comprising a coding of a mesh related to the volumetric frame; identifying three-dimensional vertices on two-dimensional patch boundaries per patch; signaling neighboring patches per patch; and signaling connected boundary vertices.
  • An example method includes receiving a bitstream comprising connected boundary vertices information related to a volumetric frame comprising three-dimensional data content of a mesh, the bitstream further comprising neighboring patches per patch information related to the volumetric frame; extracting the connected boundary vertices information; extracting the neighboring patches per patch information; generating a mesh representation of the mesh from the extracted connected boundary vertices information; identifying three-dimensional vertices on two-dimensional patch boundaries per patch of the mesh representation; connecting boundary vertices between neighboring patches per patch, as signaled; and rendering a reconstruction of the mesh.
  • An example non-transitory program storage device readable by a machine, tangibly embodying a program of instructions executable with the machine for performing operations comprising: receiving a volumetric frame comprising three-dimensional data content; decomposing a three-dimensional model of the three-dimensional content into a collection of two-dimensional patches; encoding the three-dimensional data content, the encoding comprising a coding of a mesh related to the volumetric frame; identifying three-dimensional vertices on two-dimensional patch boundaries per patch; signaling neighboring patches per patch; and signaling connected boundary vertices.
  • An example non-transitory program storage device readable by a machine, tangibly embodying a program of instructions executable with the machine for performing operations comprising: receiving a bitstream comprising connected boundary vertices information related to a volumetric frame comprising three-dimensional data content of a mesh, the bitstream further comprising neighboring patches per patch information related to the volumetric frame; extracting the connected boundary vertices information; extracting the neighboring patches per patch information; generating a mesh representation of the mesh from the extracted connected boundary vertices information; identifying three-dimensional vertices on two-dimensional patch boundaries per patch of the mesh representation; connecting boundary vertices between neighboring patches per patch, as signaled; and rendering a reconstruction of the mesh.
  • ACL atlas coding layer afps atlas frame parameter set
  • UE user equipment ue (v) unsigned integer exponential Golomb coded syntax element with the left bit first u (n) unsigned integer using n bits, e.g. u(l), u(2)
  • UV or uv coordinate texture where "U” or “u” and “V” or “v” are axes of a 2D texture u (v) unsigned integer, where the number of bits is determined by the value of other syntax elements

Abstract

The embodiments relate to a method for volumetric video coding, wherein the method comprises receiving a volumetric frame comprising three-dimensional data content; decomposing a three-dimensional model of the three-dimensional content into a collection of two-dimensional patches; encoding the three-dimensional data content, the encoding comprising a coding of a mesh related to the volumetric frame; identifying three-dimensional vertices on two-dimensional patch boundaries per patch; signaling neighboring patches per patch; and signaling connected boundary vertices. The embodiments also relate to a method for decoding and to apparatuses for implementing the methods.

Description

V3C Patch Connectivity Signaling For Mesh Compression
TECHNICAL FIELD
[0001] The examples and non-limiting embodiments relate generally to volumetric video coding, and more particularly, to V3C patch connectivity signaling for mesh compression.
BACKGROUND
[0002] It is known to perform coding and decoding of video and image data.
BRIEF DESCRIPTION OF THE DRAWINGS
[0003] The foregoing aspects and other features are explained in the following description, taken in connection with the accompanying drawings, wherein:
[0004] FIG. 1A shows an example process for encoding volumetric media.
[0005] FIG. IB shows an example process for decoding volumetric media.
[0006] FIG. 2 shows an example of block to patch mapping.
[0007] FIG. 3A shows an example of an atlas coordinate system.
[0008] FIG. 3B shows an example of a local 3D patch coordinate system.
[0009] FIG. 3C shows an example of a final target 3D coordinate system.
[0010] FIG. 4 shows elements of a mesh. [0011] FIG. 5 shows an example V-PCC extension for mesh encoding, based on the embodiments described herein.
[0012] FIG. 6 shows an example V-PCC extension for mesh decoding, based on the embodiments described herein.
[0013] FIG. 7 illustrates a general CfP approach.
[0014] FIG. 8 shows illustrations of issues with decoder surface reconstruction.
[0015] FIG. 9 illustrates a patch with five (5) identified boundary vertices (white circles, A-E) and their x,y coordinates.
[0016] FIG. 10 shows example signaling of the sorting approach for boundary vertices.
[0017] FIG. 11 shows example signaling of the sorting approach for neighboring patches.
[0018] FIG. 12 shows example signaling of connected vertices.
[0019] FIG. 13 shows example signaling of connected vertices, while taking into account sorted boundary vertex lists of the neighboring patches.
[0020] FIG. 14 shows example signaling to indicate to an encoder how to handle vertex connectivity information for reconstruction and rendering,
[0021] FIG. 15 is an example apparatus to implement V3C patch connectivity signaling for mesh compression, based on the examples described herein.
[0022] FIG. 16 is an example encoder method, based on the examples described herein. [0023] FIG. 17 is an example decoder method, based on the examples described herein.
DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS
[0024] The examples described herein relate to the encoding, signaling and rendering a volumetric video based on mesh coding. The examples described herein focus on improving the industry standard for reconstructing mesh surfaces for volumetric video. Signaling related to the mesh reconstruction is at the core of this description.
[0025] Volumetric video data
[0026] Volumetric video data represents a three-dimensional scene or object and can be used as input for AR, VR and MR applications. Such data describes geometry (shape, size, position in 3D-space) and respective attributes (e.g. color, opacity, reflectance, ...), plus any possible temporal transformations of the geometry and attributes at given time instances (like frames in 2D video). Volumetric video is either generated from 3D models, i.e. CGI, or captured from real-world scenes using a variety of capture solutions, e.g. multi-camera, laser scan, combination of video and dedicated depth sensors, and more. Also, a combination of CGI and real- world data is possible. Typical representation formats for such volumetric data are polygon meshes, point clouds, or voxels. Temporal information about the scene can be included in the form of individual capture instances, i.e. "frames" in 2D video, or other means, e.g. position of an object as a function of time.
[0027] Because volumetric video describes a 3D scene (or object), such data can be viewed from any viewpoint. Therefore, volumetric video is an important format for any AR, VR, or MR application, especially for providing 6DOF viewing capabilities .
[0028] Increasing computational resources and advances in 3D data acquisition devices have enabled reconstruction of highly detailed volumetric video representations of natural scenes. Infrared, lasers, time-of-flight and structured light are all examples of devices that can be used to construct 3D video data. Representation of the 3D data depends on how the 3D data is used. Dense voxel arrays have been used to represent volumetric medical data. In 3D graphics, polygonal meshes are extensively used. Point clouds on the other hand are well suited for applications such as capturing real world 3D scenes where the topology is not necessarily a 2D manifold. Another way to represent 3D data is coding this 3D data as a set of texture and depth map(s) as is the case in the multi-view plus depth. Closely related to the techniques used in multi-view plus depth is the use of elevation maps, and multi-level surface maps.
[0029] MPEG visual volumetric video-based coding (V3C)
[0030] The following described examples refer to excerpts of ISO/IEC 23090-5 Visual Volumetric Video-based Coding and Video-based Point Cloud Compression 2nd Edition.
[0031] Visual volumetric video, a sequence of visual volumetric frames, if uncompressed, may be represented by a large amount of data, which can be costly in terms of storage and transmission. This has led to the need for a high coding efficiency standard for the compression of visual volumetric data.
[0032] V3C specification enables the encoding and decoding processes of a variety of volumetric media by using video and image coding technologies. This is achieved through first a conversion of such media from their corresponding 3D representation to multiple 2D representations, also referred to as V3C components, before coding such information. Such representations may include occupancy, geometry, and attribute components. The occupancy component can inform a V3C decoding and/or rendering system of which samples in the 2D components are associated with data in the final 3D representation. The geometry component contains information about the precise location of 3D data in space, while attribute components can provide additional properties, e.g. texture or material information, of such 3D data. An example is shown in FIG. 1A and FIG. IB.
[ 0033 ] FIG. 1A shows volumetric media conversion at the encoder, and FIG. IB shows volumetric media conversion at the decoder side. The 3D media 102 is converted to a series of 2D representations: occupancy 118, geometry 120, and attributes 122. Additional atlas information 108 is also included in the bitstream to enable inverse reconstruction. Refer to ISO/IEC 23090-5.
[ 0034 ] As further shown in FIG. 1A, a volumetric capture operation 104 generates a projection 106 from the input 3D media 102. In some examples, the projection 106 is a projection operation. From the projection 106, an occupancy operation 110 generates the occupancy 2D representation 118, a geometry operation 112 generates the geometry 2D representation 120, and an attribute operation 114 generates the attribute 2D representation 122. The additional atlas information 108 is included in the bitstream 116. The atlas information 108, the occupancy 2D representation 118, the geometry 2D representation 120, and the attribute 2D representation 122 are encoded into the V3C bitstream 124 to encode a compressed version of the 3D media 102. Based on the examples described herein, V3C patch connectivity signaling 130 may also be signaled in the V3C bitstream 124. The V3C patch connectivity signaling 130 may be used on the decoder side, as shown in FIG. IB.
[ 0035 ] As shown in FIG. IB, a decoder using the V3C bitstream 124 derives 2D representations using an atlas information operation 126, an occupancy operation 128, a geometry operation 130 and an attribute operation 132. The atlas information operation 126 provides atlas information into a bitstream 134. The occupancy operation 128 derives the occupancy 2D representation 136, the geometry operation 130 derives the geometry 2D representation 138, and the attribute operation 132 derives the attribute 2D representation 140. The 3D reconstruction operation 142 generates a decompressed reconstruction 144 of the 3D media 102, using the atlas information 126/134, the occupancy 2D representation 136, the geometry 2D representation 138, and the attribute 2D representation 140.
[ 0036 ] Additional information that allows associating all these subcomponents and enables the inverse reconstruction, from a 2D representation back to a 3D representation is also included in a special component, referred to in this document as the atlas. An atlas consists of multiple elements, named as patches. Each patch identifies a region in all available 2D components and contains information necessary to perform the appropriate inverse projection of this region back to the 3D space. The shape of such regions is determined through a 2D bounding box associated with each patch as well as their coding order. The shape of these regions is also further refined after the consideration of the occupancy information. [0037] Atlases are partitioned into patch packing blocks of equal size. Refer for example to block 202 in FIG. 2. The 2D bounding boxes of patches and their coding order determine the mapping between the blocks of the atlas image and the patch indices. FIG. 2 shows an example of block to patch mapping with 4 projected patches (204, 204-2, 204-3, 204-4) onto an atlas 201 when asps_patch_precedence_order_flag is equal to 0. Projected points are represented with dark grey. The area that does not contain any projected points is represented with light grey. Patch packing blocks are represented with dashed lines. The number inside each patch packing block 202 represents the patch index of the patch to which it is mapped. Refer to ISO/IEC 23090-5.
[0038] Axes orientations are specified for internal operations. For instance, the origin of the atlas coordinates is located on the top-left corner of the atlas frame. For the reconstruction step, an intermediate axes definition for a local 3D patch coordinate system is used. The 3D local patch coordinate system is then converted to the final target 3D coordinate system using appropriate transformation steps.
[0039] FIG. 3A shows an example of a single patch 302 packed onto an atlas image 304. This patch is then converted, with reference to FIG. 3B, to a local 3D patch coordinate system (U, V, D) defined by the projection plane with origin O', tangent (U), bi-tangent (V), and normal (D) axes. For an orthographic projection, the projection plane is equal to the sides of an axis-aligned 3D bounding box 306, as shown in FIG. 3B. The location of the bounding box 306 in the 3D model coordinate system, defined by a left-handed system with axes (X, Y, Z), can be obtained by adding offsets TilePatch3dOffsetU 308, TilePatch3DOf fsetV 310, and TilePatch3DOffsetD 312, as illustrated in FIG. 3C. [0040] Accordingly, FIG. 3A shows an example of an atlas coordinate system, FIG. 3B shows an example of a local 3D patch coordinate system, and FIG. 3C shows an example of a final target 3D coordinate system. Refer to ISO/IEC 23090-5.
[0041] V3C High Level Syntax
[0042] Coded V3C video components are referred to in this document as video bitstreams, while an atlas component is referred to as the atlas bitstream. Video bitstreams and atlas bitstreams may be further split into smaller units, referred to here as video and atlas sub-bitstreams, respectively, and may be interleaved together, after the addition of appropriate delimiters, to construct a V3C bitstream
[0043] V3C patch information is contained in atlas bitstream, atlas_sub_bitstream (), which contains a sequence of NAL units NAL unit is specified to format data and provide header information in a manner appropriate for conveyance on a variety of communication channels or storage media. All data are contained in NAL units, each of which contains an integer number of bytes. A NAL unit specifies a generic format for use in both packet-oriented and bitstream systems. The format of NAL units for both packet-oriented transport and sample streams is identical except that in the sample stream format specified in Annex D of ISO/IEC 23090-5 each NAL unit can be preceded by an additional element that specifies the size of the NAL unit.
[0044] NAL units in atlas bitstream can be divided to atlas coding layer (ACL) and non-atlas coding layer (non-ACL) units. The former dedicated to carry patch data while the later to carry data necessary to properly parse the ACL units or any additional auxiliary data. [0045] In the nal_unit_header () syntax nal_unit_type specifies the type of the RBSP data structure contained in the NAL unit as specified in Table 4 of ISO/IEC 23090-5. nal_layer_id specifies the identifier of the layer to which an ACL NAL unit belongs or the identifier of a layer to which a non-ACL NAL unit applies. The value of nal_layer_id shall be in the range of 0 to 62, inclusive. The value of 63 may be specified in the future by ISO/IEC. Decoders conforming to a profile specified in Annex A of ISO/IEC 23090-5 shall ignore (i.e., remove from the bitstream and discard) all NAL units with values of nal_layer_id not equal to 0.
[0046] rbsp_byte[ i ] is the i-th byte of an RBSP. An RBSP is specified as an ordered sequence of bytes as follows.
[0047] The RBSP contains a string of data bits (SODB) as follows. If the SODB is empty (i.e., zero bits in length), the RBSP is also empty. Otherwise, the RBSP contains the SODB as follows (i-ii). i) The first byte of the RBSP contains the first (most significant, left-most) eight bits of the SODB; the next byte of the RBSP contains the next eight bits of the SODB, etc., until fewer than eight bits of the SODB remain, ii) The rbsp_trailing_bits( ) syntax structure is present after the SODB as follows (a-c). a) The first (most significant, left-most) bits of the final RBSP byte contain the remaining bits of the SODB (if any). b) The next bit consists of a single bit equal to 1 (i.e., rbsp_stop_one_bit). c) When the rbsp_stop_one_bit is not the last bit of a byte- aligned byte, one or more bits equal to 0 (i.e. instances of rbsp_alignment_zero_bit ) are present to result in byte alignment.
[0048] One or more cabac_zero_word 16-bit syntax elements equal to 0x0000 may be present in some RBSPs after the rbsp_trailing_bits ( ) at the end of the RBSP. [0049] Syntax structures having these RBSP properties are denoted in the syntax tables using an "_rbsp" suffix. These structures are carried within NAL units as the content of the rbsp_byte [ i ] data bytes. As an example typical content (i- iv) : i) atlas_sequence_parameter_set_rbsp ( ), which is used to carry parameters related related to atlas on a sequence level; ii) atlas_frame_parameter_set_rbsp ( ), which is used to carry parameters related to atlas on a frame level and are valid for one or more atlas frames; iii) sei_rbsp ( ), used to carry SEI messages in NAL units; iv) atlas_tile_group_layer_rbsp ( ), used to carry patch layout information for tile groups.
[0050 ] When the boundaries of the RBSP are known, the decoder can extract the SODB from the RBSP by concatenating the bits of the bytes of the RBSP and discarding the rbsp_stop_one_bit, which is the last (least significant, right-most) bit equal to 1, and discarding any following (less significant, farther to the right) bits that follow it, which are equal to 0. The data necessary for the decoding process is contained in the SODB part of the RBSP.
[0051 ] atlas_tile_group_laye_rbsp () contains metadata information for a list off tile groups, which represent sections of frame. Each tile group may contain several patches for which the metadata syntax is described below in Table 1.
Table 1: Patch metadata syntax
Figure imgf000012_0001
Figure imgf000013_0001
[0052 ] Annex F of ISO/IEC 23090-5 describes different SEI messages that have been defined for V3C purposes. SEI messages assist in processes related to decoding, reconstruction, display, or other purposes. Annex F of ISO/IEC 23090-5 defines two types of SEI messages: essential and non-essential. SEI messages are signaled in sei_rspb() which is described below in Table 2.
Table 2: SEI message metadata syntax
Figure imgf000013_0002
[0053] Non-essential SEI messages are not required by the decoding process. Conforming decoders are not required to process this information for output order conformance. [0054] Specification for presence of non-essential SEI messages is also satisfied when those messages (or some subset of them) are conveyed to decoders (or to the HRD) by other means not specified in ISO/IEC 23090-5. When present in the bitstream, non-essential SEI messages shall obey the syntax and semantics as specified in Annex F of ISO/IEC 23090-5. When the content of a non-essential SEI message is conveyed for the application by some means other than presence within the bitstream, the representation of the content of the SEI message is not required to use the same syntax specified in Annex F of ISO/IEC 23090-5. For the purpose of counting bits, only the appropriate bits that are actually present in the bitstream are counted.
[0055] Essential SEI messages are an integral part of the V3C bitstream and should not be removed from the bitstream. The essential SEI messages are categorized into two types, Type-A essential SEI messages and Type-B essential SEI messages .
[0056] Type-A essential SEI messages: These SEIs contain information required to check bitstream conformance and for output timing decoder conformance. Every V3C decoder conforming to point A should not discard any relevant Type-A essential SEI messages and shall consider them for bitstream conformance and for output timing decoder conformance.
[0057] Type-B essential SEI messages: V3C decoders that wish to conform to a particular reconstruction profile should not discard any relevant Type-B essential SEI messages and shall consider them for 3D point cloud reconstruction and conformance purposes.
[0058] Rendering and meshes [0059] A polygon mesh is a collection of vertices, edges and faces that defines the shape of a polyhedral object in 3D computer graphics and solid modelling. The faces usually consist of triangles (triangle mesh), quadrilaterals (quads), or other simple convex polygons (n-gons), since this simplifies rendering, but may also be more generally composed of concave polygons, or even polygons with holes.
[0060] With reference to FIG. 4, objects 100 created with polygon meshes are represented by different types of elements. These include vertices 102, edges 104, faces 106, polygons 108 and surfaces 110 as shown in FIG. 4. Thus, FIG. 4 illustrates elements of a mesh.
[0061] Polygon meshes are defined by the following elements:
[0062] Vertex (102): A position in 3D space defined as (x,y,z) along with other information such as color (r,g,b), normal vector and texture coordinates.
[0063] Edge (104): A connection between two vertices.
[0064] Face (106): A closed set of edges, in which a triangle face has three edges, and a quad face has four edges. A polygon 108 is a coplanar set of faces 106. In systems that support multi-sided faces, polygons and faces are equivalent. Mathematically a polygonal mesh may be considered an unstructured grid, or undirected graph, with additional properties of geometry, shape and topology.
[0065] Surfaces (110): or smoothing groups, are useful, but not required to group smooth regions.
[0066] Groups: Some mesh formats contain groups, which define separate elements of the mesh, and are useful for determining separate sub-objects for skeletal animation or separate actors for non-skeletal animation.
[0067] Materials: defined to allow different portions of the mesh to use different shaders when rendered.
[0068] UV coordinates: Most mesh formats also support some form of UV coordinates which are a separate 2D representation of the mesh "unfolded" to show what portion of a 2-dimensional texture map to apply to different polygons of the mesh. It is also possible for meshes to contain other such vertex attribute information such as color, tangent vectors, weight maps to control animation, etc. (sometimes also called channels) .
[0069] V-PCC mesh coding extension (MPEG M49588)
[0070] FIG. 5 and FIG. 6 show the extensions to the V-PCC encoder and decoder to support mesh encoding and mesh decoding, respectively.
[0071] In the encoder extension 500, the input mesh data 502 is demultiplexed 504 into vertex coordinate+attributes 506 and vertex connectivity 508. The vertex coordinate+attributes data 506 is coded 510 using MPEG-I V-PCC, whereas the vertex connectivity data 508 is coded (using vertex connectivity encoder 516) as auxiliary data 518. Both of these (encoded vertex coordinates and vertex attributes 517 and auxiliary data 518) are multiplexed 520 to create the final compressed output bitstream 522. Vertex ordering 514 is carried out on the reconstructed vertex coordinates 512 at the output of MPEG-I V-PCC 510 to reorder the vertices for optimal vertex connectivity encoding 516.
[0072] Based on the examples described herein, as shown in FIG. 5, the encoding process/apparatus 500 of FIG. 5 may be extended such that the encoding process/apparatus 500 signals V3C patch connectivity signaling 530 within the output bitstream 522. Alternatively, V3C patch connectivity signaling 530 may be provided and signaled separately from the output bitstream 522.
[0073] As shown in FIG. 6, in the decoder 600, the input bitstream 602 is demultiplexed 604 to generate the compressed bitstreams for vertex coordinates+attributes 605 and vertex connectivity 606. The input/compressed bitstream 602 may comprise or may be the output from the encoder 500, namely the output bitstream 522 of FIG. 5. The vertex coordinates+attributes 605 is decompressed using MPEG-I V-PCC decoder 608 to generate vertex attributes 612. Vertex ordering 616 is carried out on the reconstructed vertex coordinates 614 at the output of MPEG-I V-PCC decoder 608 to match the vertex order at the encoder 500. The vertex connectivity data 606 is also decompressed using vertex connectivity decoder 610 to generate vertex connectivity 618, and everything (including vertex attributes 612, the output of vertex reordering 616, and vertex connectivity 618) is multiplexed 620 to generate the reconstructed mesh 622.
[0074] Based on the examples described herein, as shown in FIG. 6, the decoding process/apparatus 600 of FIG. 6 may be extended such that the decoding process/apparatus 600 receives and decodes V3C patch connectivity signaling 630, which may be part of the compressed bitstream 622. Alternatively, V3C patch connectivity signaling 630 may be received and signaled separately from the compressed bitstream 602 or output bitstream 522. The V3C patch connectivity signaling 630 of FIG. 6 may comprise or correspond to the V3C patch connectivity signaling 530 of FIG. 5
[0075] Surface reconstruction and hole filling algorithms [0076] A lot of research has taken place over the years on how to reconstruct a surface from a set of points. Most of the algorithms for surface reconstruction are quite expensive and not suitable for real-time frame-by-frame reconstruction. The following three algorithms may be used for surface reconstruction :
[0077] 1. Alpha shapes generalize convex hulls. From an initial volume containing all vertices in the 3D space, sub volumes are iteratively removed. Intuitively, the size of the carved out subvolumes can be adjusted by a parameter.
[0078] 2. Ball pivoting rolls a ball over the point's surface. As soon as it hits 3 points, it creates a triangle from these. The ball's radius is a vital parameter for this algorithm, as a too small ball 'falls through' the surface, while a too big ball conceals details from the point set.
[0079] 3. Poisson surface reconstruction solves a regularized optimization problem to generate a smooth surface. The surface's amount of detail (level of smoothing) can be adjusted with a parameter.
[0080] Hole filling algorithms tackle the simpler problem of connecting holes in already partially connected meshes. This is an active field of research, the community has not agreed on the best solutions yet. The majority of the proposed algorithms requires parametrization depending on the content type and vertex density, e.g. by defining the number of iterations for a refinement algorithm.
[0081] MPEG 3DG issued call for proposal (CfP) on integration of MESH compression into the V3C family of standards. With reference to FIG. 7, Nokia' CfP response focuses on defining patches of geometry, occupancy and attribute data with implicit connectivity and UV mapping. Refer to item 704. This conforms well with existing V3C high level syntax and overall design direction. Nokia's CfP approach relies on mesh reconstruction at the decoder, which allows reducing the amount of information required for compressing traditional mesh information 702. The general Nokia CfP approach is described in FIG. 7.
[0082] The input meshes may contain vertices with varying spatial density over different regions of the input model, which means that applying a surface reconstruction globally to the model with the same parameters might be sub-optimal. For example, input sequences Mitch and Thomas demonstrate the differences in this vertex density well, with significantly higher vertex density around facial region than on other parts of the model.
[0083] Nokia's CfP approach relies on surface reconstruction at the decoder side to generate a mesh representation 706 of the content. Reconstruction of meshes seem to work well within patches as the mesh 706 is constructed based on the information that the pixels in the geometry patch are expected to result in a solid mesh. However, connecting vertices from two or more patches to generate a surface for the mesh remains a problem. There may be several origins for the problem of connecting vertices from two or more patches. Firstly, as a consequence of quantization of 3d information, a shift of vertex positions is expected. Furthermore, video coding introduces additional artefacts when mesh information is compressed with a video codec. When re-projected back into 3D, the vertex positions for the same vertex between two patches no longer match and as a result gaps in the model start to appear.
[0084] FIG. 8 illustrates gaps (left, 802) and patch edges (right, 804) on the mesh after initial reconstruction. By default, surface reconstruction tends to be slow and several improvements to it can be made by introducing encoder side pre-analysis and signaling of helpful information along the bitstream.
[0085] Surface reconstruction algorithms such as the following may be used to alleviate the problem related to connecting regions of the mesh after initial per-patch reconstruction: i) Alpha shapes, Ball-pivoting, and other methods mentioned in section "Surface reconstruction and hole filling algorithms"; ii) Simple triangulation; iii) Others: http://www .cad.zju.edu.cn/hoitie/hwlin/pdf files/A-robust- hole-filling-algorithm-for-triangular-mesh .pdf (last accessed July 15, 2021).
[0086] Different parameters for surface reconstruction algorithms should be considered based on the region of content to improve the quality and performance of surface reconstruction. Currently there is no signaling in place that could be used to provide decoder this information.
[0087] The examples described herein disclose explicit mesh connectivity signaling between neighboring patches in a V3C mesh coding approach, as explained previously. Thus eliminating, or reducing, the need of further post-processing steps, as disclosed in NC323186.
[0088] Benefits of this approach are lower reconstruction complexity and more accurate representation of mesh connectivity between bordering patches.
[0089] The main encoder embodiments include receiving a volumetric frame consisting of 3D data; encoding the content using a V3C-based mesh compression approach, i.e. decomposing 3D model into collection of 2D patches; identification of 3D vertices on 2D patch boundaries per patch; signaling neighboring patches per patch; and signaling connected boundary vertices.
[0090] The main decoder embodiments include receiving a V3C bitstream with connected boundary vertices information; extracting connected boundary vertices information; generating a mesh representation from the decoded data; identification of 3D vertices on 2D patch boundaries per patch; connecting boundary vertices between neighboring patches, as signaled; (optional) reconstructing texture for connected vertices; and rendering the reconstructed mesh.
[0091] The main signaling embodiments include that parameters for connected boundary vertices information could be carried in the V3C bitstream using multiple mechanisms. In one embodiment, information on how to determine and order boundary vertices is signaled. In another embodiment, information about neighboring patches to a patch is signaled. In another embodiment, information about connected vertices is signaled. In another embodiment, information on mesh reconstruction of connected vertices is signaled. In one set of embodiments the above information is signaled in one or the combination of Atlas sequence parameter set extension, or Atlas frame parameter set extension, or Atlas patch data unit extension, or SEI messages.
[0092] Encoder embodiments
[0093] An encoder receives one or more volumetric frames describing a 3D object or a scene and uses a V3C mesh encoder to compress the information in a mesh format, i.e. decomposing 3D model into collection of 2D patches. For each 2D patch, two lists are created: 1) a list of boundary vertices, e.g. 3D vertex points lying on the edge of the 2D patch, is generated, and 2) a list of all neighboring patches in 3D space.
[0094] FIG. 9 illustrates a patch 902 with 5 identified boundary vertices (white circles, A-E), and their x,y coordinates.
[0095] Signaling embodiments [0096] Boundary vertex list sorting
[0097] The list of boundary vertices can be sorted in various ways, e.g. starting with the vertex with the lowest value for combined x/y-location (vertex A in the example above). Other starting vertices are also possible, e.g. the highest combined value: D, the lowest x, then lowest y value: A, the lowest y then lowest x value: C.
[0098] The list of vertices is then sorted, starting with the selected starting vertex and following the patch boundary, either in clock-wise or anti-clockwise fashion: Clockwise sorting example: A-B-C-D-E. Anti-clockwise sorting example: A-E-D-C-B. Other sorting is possible, e.g. first by x, then y axis, ascending: A-B-E-C-D, first by x, then y axis, descending: D-C-E-B-A, first by y, then x axis, ascending: C- B-A-D-E, first by x, then x axis, descending: E-D-A-B-C, etc.
[0099] Typically, sorting along the patch boundary in either clockwise or anti-clockwise fashion seems the most appropriate.
[00100] In one embodiment, the starting boundary vertex approach is signaled in the bitstream, as shown in the example below. [ 00101 ] In one embodiment, the sorting approach for the boundary vertices is signaled in the bitstream, as in the example shown below and in FIG. 10 (refer to item 1002).
Figure imgf000023_0001
[ 00102 ] pdu_vert_start indicates the starting boundary vertex. A value of 0 indicates to start sorting from the vertex with the lowest x, then y coordinate value; a value of 1 indicates to start sorting from the lowest combined x,y coordinate value.
[ 00103 ] pdu_vert_sort indicates the boundary vertex sorting approach; a value of 0 indicates to follow the patch boundary clockwise, a value of 1 indicates to follow the vertex boundary anti-clockwise.
[00104 ] The starting boundary vertex approach and sorting approach for the boundary vertices can be signaled also on sequence (Atlas sequence parameter set) or frame level (Atlas frame parameter set).
[00105 ] Patch neighbor sorting
[00106] During the V3C 3D decomposition, the encoder assigns each generated 2D patch an index. Typically, the relation of this patch to other patches in 3D space is lost.
[00107 ] In one embodiment, a V3C encoder stores for each patch the patch indices of neighboring patches, i.e. patches with shared boundary vertices, in a list.
[00108 ] This list of patch indices can be sorted, e.g. i) by patch index value, descending/ascending, or ii) Starting from one neighboring patch and then following the current patch boundaries clockwise/anticlockwise.
[00109] In one embodiment, the sorting approach for the neighboring patches is signaled in the bitstream, as shown in the example below and in FIG. 11 (refer to item 1102).
Figure imgf000024_0001
Figure imgf000025_0001
[00110] pdu_patch_sort indicates the neighboring patch sorting; a value of 0 indicates to follow the patch boundary clockwise, a value of 1 indicates to sort by patch indices in ascending order.
[00111 ] The approach for the neighboring patch sorting can be signaled also on sequence (Atlas sequence parameter set) or frame level (Atlas frame parameter set).
[00112 ] Connectivity signaling
[00113] Following the boundary vertices identification and patch neighbor signaling, explicit vertex connectivity can be signaled, as shown in the examples below.
[00114 ] In one embodiment, connected vertices are signaled explicitly as shown below and in FIG. 12 (refer to item 1202):
Figure imgf000025_0002
Figure imgf000026_0001
[00115] pdu_num_bound_vert indicates the total number of boundary vertices for the patch [patchldx].
[00116] pdu_bound_vert_patch [patchldx] [i] indicates the index of the neighboring patch list to which the boundary vertex [i] of patch [patchldx] should be connected.
[00117] pdu_bound_vert_idx [patchldx] [i] indicates the index of the boundary vertex list of the patch identified by pdu_bound_vert_patch, to which the boundary vertex [i] of patch [patchldx] should be connected.
[00118] In another embodiment, connected vertices are signaled also taking into account sorted boundary vertex lists of the neighboring patches, as shown below and in FIG. 13 (refer to item 1302).
Figure imgf000027_0001
[ 00119 ] pdu_bound_vert_start_idx [patchldx] [i] indicates the first index of the boundary vertex list of the patch identified by pdu_bound_vert_patch, to which the boundary vertex [i] of patch [patchldx] should be connected. [00120] pdu_num_bound_vert_idx [patchldx] [i] indicates the number of boundary patch vertices that shall be connected in ascending fashion.
[00121 ] Example: If i=0, pdu_bound_vert_patch=0, pdu_bound_vert_start_idx = 5, and pdu_num_bound_vert_idx = 3, the first three boundary vertices of the current patch are connected to the following boundary vertices of patch 0 as follows:
Boundary vertex 0 -> patch 0, vertex 5;
Boundary vertex l-> patch 0, vertex 6;
Boundary vertex 2 -> patch 0, vertex 7
[00122 ] Connectivity handling / rendering signaling
[00123] In one embodiment, additional signaling is introduced to indicate to the encoder how to handle vertex connectivity information for reconstruction and rendering. For example, as shown below and in FIG. 14 (refer to item 1402):
Figure imgf000028_0001
Figure imgf000029_0001
[00124] pdu_bound_vert_handling indicates to the decoder how to handle the reconstruction of connected vertices. pdu_bound_vert_handling equal to 0 indicates that connected vertices shall be merged into a single vertex, i.e. connecting neighboring patches at exactly this vertex. pdu_bound_vert_handling equal to 1 indicates that a new face shall be drawn.
[00125] There exist various ways on how such new faces shall be drawn, for example (1-3):
[00126] 1. if boundary vertex i and i+1 are connected to the same vertex j of a neighboring patch, a triangle is drawn between vertex i,i+l, and j.
[00127] 2. if boundary vertex i is connected to two vertices j and k of a neighboring patch, a triangle is drawn between i, j and k.
[00128] 3. if boundary vertex i is connected to vertex j of a neighboring patch, a new triangle is drawn between i, j and the closest third vertex either on the neighboring patch or the current patch.
[00129] Decoder embodiments
[00130] A decoder receives a V3C bitstream including boundary vertex connectivity information. The decoder extracts connected boundary vertices information, such as how to generate and sort a list of boundary vertices for each patch, and how to generate and sort a list of neighboring patches for each patch.
[00131] For each 2D patch, the decoder generates two lists following this information: A. a list of boundary vertices, e.g. 3D vertex points lying on the edge of the 2D patch, and B. a list of all neighboring patches in 3D space.
[00132] For each vertex in list A, the decoder extracts information on connected vertices in neighboring patches, and reconstructs the mesh topology accordingly.
[00133] In the case of new faces being generated, the decoder reconstructs any present attribute, e.g. texture, by interpolating from the values of the connected vertices.
[00134] FIG. 15 is an apparatus 1500 which may be implemented in hardware, configured to implement V3C patch connectivity signaling for mesh compression, based on any of the examples described herein. The apparatus comprises a processor 1502, at least one memory 1504 (memory 1504 may be transitory or non-transitory) including computer program code 1505, wherein the at least one memory 1504 and the computer program code 1505 are configured to, with the at least one processor 1502, cause the apparatus to implement circuitry, a process, component, module, function, coding, and/or decoding (collectively 1506). The apparatus 1500 is further configured to provide or receive signaling 1507, based on the signaling embodiments described herein. The apparatus 1500 optionally includes a display and/or I/O interface 1508 that may be used to display an output (e.g., an image or volumetric video) of a result of coding/decoding 1506. The display and/or I/O interface 1508 may also be configured to receive input such as user input (e.g. with a keypad). The apparatus 1500 also includes one or more network (NW) interfaces (I/F(s)) 1510. The NW I/F(s) 1510 may be wired and/or wireless and communicate over a channel or the Internet/other network(s) via any communication technique. The NW I/F(s) 1510 may comprise one or more transmitters and one or more receivers. The NW I/F(s) 1510 may comprise standard well-known components such as an amplifier, filter, frequency-converter, (de)modulator, and encoder/decoder circuitry(ies) and one or more antennas. In some examples, the processor 1502 is configured to implement coding/decoding 1506 and/or signaling 1507 without use of memory 1504.
[00135] The apparatus 1500 may be a remote, virtual or cloud apparatus. The apparatus 1500 may be either a writer or a reader (e.g. parser), or both a writer and a reader (e.g. parser). The apparatus 1500 may be either a coder or a decoder, or both a coder and a decoder. The apparatus 1500 may be a user equipment (UE), a head mounted display (HMD), or any other fixed or mobile device.
[00136] The memory 1504 may be implemented using any suitable data storage technology, such as semiconductor based memory devices, flash memory, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The memory 1504 may comprise a database for storing data. Interface 1512 enables data communication between the various items of apparatus 1500, as shown in FIG. 15. Interface 1512 may be one or more buses, or interface 1512 may be one or more software interfaces configured to pass data within computer program code 1505. For example, the interface 1512 may be one or more buses such as address, data, or control buses, and may include any interconnection mechanism, such as a series of lines on a motherboard or integrated circuit, fiber optics or other optical communication equipment, and the like. In another example, interface 1512 is an object- oriented software interface. The apparatus 1500 need not comprise each of the features mentioned, or may comprise other features as well. The apparatus 1500 may be an embodiment of and have the features of any of the apparatuses shown in FIG. 1A, FIG. IB, FIG. 5, FIG. 6, or FIG. 7.
[ 00137 ] FIG. 16 is an encoder method 1600 to implement the examples described herein. At 1602, the method includes receiving a volumetric frame comprising three-dimensional data content. At 1604, the method includes decomposing a three- dimensional model of the three-dimensional content into a collection of two-dimensional patches. At 1606, the method includes encoding the three-dimensional data content, the encoding comprising a coding of a mesh related to the volumetric frame. At 1608, the method includes identifying three-dimensional vertices on two-dimensional patch boundaries per patch. At 1610, the method includes signaling neighboring patches per patch. At 1612, the method includes signaling connected boundary vertices. Method 1600 may be implemented with apparatus 1500 or with an encoder apparatus.
[ 00138 ] FIG. 17 is a decoder method 1700 to implement the examples described herein. At 1702, the method includes receiving a bitstream comprising connected boundary vertices information related to a volumetric frame comprising three- dimensional data content of a mesh, the bitstream further comprising neighboring patches per patch information related to the volumetric frame. At 1704, the method includes extracting the connected boundary vertices information. At 1706, the method includes extracting the neighboring patches per patch information. At 1708, the method includes generating a mesh representation of the mesh from the extracted connected boundary vertices information. At 1710, the method includes identifying three-dimensional vertices on two-dimensional patch boundaries per patch of the mesh representation. At 1712, the method includes connecting boundary vertices between neighboring patches per patch, as signaled. At 1714, the method includes rendering a reconstruction of the mesh. Method 1700 may be implemented with apparatus 1500 or with a decoder apparatus.
[ 00139 ] References to a 'computer', 'processor', etc. should be understood to encompass not only computers having different architectures such as single/multi-processor architectures and sequential (Von Neumann)/parallel architectures but also specialized circuits such as field-programmable gate arrays (FPGA), application specific circuits (ASIC), signal processing devices and other processing circuitry. References to computer program, instructions, code etc. should be understood to encompass software for a programmable processor or firmware such as, for example, the programmable content of a hardware device such as instructions for a processor, or configuration settings for a fixed-function device, gate array or programmable logic device, etc.
[ 00140 ] As used herein, the term 'circuitry' may refer to any of the following: (a) hardware circuit implementations, such as implementations in analog and/or digital circuitry, and (b) combinations of circuits and software (and/or firmware), such as (as applicable): (i) a combination of processor(s) or (ii) portions of processor(s)/software including digital signal processor(s), software, and memory(ies) that work together to cause an apparatus to perform various functions, and (c) circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present. As a further example, as used herein, the term 'circuitry' would also cover an implementation of merely a processor (or multiple processors) or a portion of a processor and its (or their) accompanying software and/or firmware. The term 'circuitry' would also cover, for example and if applicable to the particular element, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, or another network device. Circuitry may also be used to mean a function or a process, such as one implemented by an encoder or decoder, or a codec.
[ 00141 ] An example apparatus includes at least one processor; and at least one memory including computer program code; wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to: receive a volumetric frame comprising three-dimensional data content; decompose a three-dimensional model of the three-dimensional content into a collection of two-dimensional patches; encode the three-dimensional data content, the encoding comprising a coding of a mesh related to the volumetric frame; identify three-dimensional vertices on two-dimensional patch boundaries per patch; signal neighboring patches per patch; and signal connected boundary vertices.
[ 00142 ] The apparatus may further include wherein the signaling of the neighboring patches per patch and the signaling of the connected boundary vertices is configured to be used to reconstruct texture for the connected boundary vertices.
[00143] The apparatus may further include wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: signal information related to determining and ordering the neighboring patches per patch.
[00144] The apparatus may further include wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: signal information related to determining and ordering the connected boundary vertices.
[00145] The apparatus may further include wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: signal information related to mesh reconstruction of the connected boundary vertices.
[00146] The apparatus may further include wherein the signaling of the neighboring patches per patch and the signaling of the connected boundary vertices is performed using at least one of: an atlas sequence parameter set extension; an atlas frame parameter set extension; an atlas patch data unit extension; or at least one supplemental enhancement information message.
[00147] The apparatus may further include wherein the connected boundary vertices signaling comprises information related to whether connected vertices are to be merged into a single vertex, or whether a new face is to be drawn. [00148] An example apparatus includes at least one processor; and at least one memory including computer program code; wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to: receive a bitstream comprising connected boundary vertices information related to a volumetric frame comprising three-dimensional data content of a mesh, the bitstream further comprising neighboring patches per patch information related to the volumetric frame; extract the connected boundary vertices information; extract the neighboring patches per patch information; generate a mesh representation of the mesh from the extracted connected boundary vertices information; identify three-dimensional vertices on two-dimensional patch boundaries per patch of the mesh representation; connect boundary vertices between neighboring patches per patch, as signaled; and render a reconstruction of the mesh.
[00149] The apparatus may further include wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: reconstruct texture for the connected boundary vertices using the neighboring patches per patch information and the connected boundary vertices information.
[00150] The apparatus may further include wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: receive signaling of information related to determining and ordering the neighboring patches per patch.
[00151] The apparatus may further include wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: receive signaling of information related to determining and ordering the connected boundary vertices.
[00152] The apparatus may further include wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus at least to: receive signaling of information related to mesh reconstruction of the connected boundary vertices.
[00153] The apparatus may further include wherein the neighboring patches per patch information and the connected boundary vertices information is received with at least one of: an atlas sequence parameter set extension; an atlas frame parameter set extension; an atlas patch data unit extension; or at least one supplemental enhancement information message.
[00154] The apparatus may further include wherein the connected boundary vertices information comprises information related to whether connected vertices are to be merged into a single vertex, or whether a new face is to be drawn.
[00155] An example apparatus includes means for receiving a volumetric frame comprising three-dimensional data content; means for decomposing a three-dimensional model of the three- dimensional content into a collection of two-dimensional patches; means for encoding the three-dimensional data content, the encoding comprising a coding of a mesh related to the volumetric frame; means for identifying three- dimensional vertices on two-dimensional patch boundaries per patch; means for signaling neighboring patches per patch; and means for signaling connected boundary vertices. [00156] An example apparatus includes means for receiving a bitstream comprising connected boundary vertices information related to a volumetric frame comprising three-dimensional data content of a mesh, the bitstream further comprising neighboring patches per patch information related to the volumetric frame; means for extracting the connected boundary vertices information; means for extracting the neighboring patches per patch information; means for generating a mesh representation of the mesh from the extracted connected boundary vertices information; means for identifying three- dimensional vertices on two-dimensional patch boundaries per patch of the mesh representation; means for connecting boundary vertices between neighboring patches per patch, as signaled; and means for rendering a reconstruction of the mesh.
[00157] An example method includes receiving a volumetric frame comprising three-dimensional data content; decomposing a three-dimensional model of the three-dimensional content into a collection of two-dimensional patches; encoding the three-dimensional data content, the encoding comprising a coding of a mesh related to the volumetric frame; identifying three-dimensional vertices on two-dimensional patch boundaries per patch; signaling neighboring patches per patch; and signaling connected boundary vertices.
[00158] An example method includes receiving a bitstream comprising connected boundary vertices information related to a volumetric frame comprising three-dimensional data content of a mesh, the bitstream further comprising neighboring patches per patch information related to the volumetric frame; extracting the connected boundary vertices information; extracting the neighboring patches per patch information; generating a mesh representation of the mesh from the extracted connected boundary vertices information; identifying three-dimensional vertices on two-dimensional patch boundaries per patch of the mesh representation; connecting boundary vertices between neighboring patches per patch, as signaled; and rendering a reconstruction of the mesh.
[00159] An example non-transitory program storage device readable by a machine, tangibly embodying a program of instructions executable with the machine for performing operations is provided, the operations comprising: receiving a volumetric frame comprising three-dimensional data content; decomposing a three-dimensional model of the three-dimensional content into a collection of two-dimensional patches; encoding the three-dimensional data content, the encoding comprising a coding of a mesh related to the volumetric frame; identifying three-dimensional vertices on two-dimensional patch boundaries per patch; signaling neighboring patches per patch; and signaling connected boundary vertices.
[00160] An example non-transitory program storage device readable by a machine, tangibly embodying a program of instructions executable with the machine for performing operations is provided, the operations comprising: receiving a bitstream comprising connected boundary vertices information related to a volumetric frame comprising three-dimensional data content of a mesh, the bitstream further comprising neighboring patches per patch information related to the volumetric frame; extracting the connected boundary vertices information; extracting the neighboring patches per patch information; generating a mesh representation of the mesh from the extracted connected boundary vertices information; identifying three-dimensional vertices on two-dimensional patch boundaries per patch of the mesh representation; connecting boundary vertices between neighboring patches per patch, as signaled; and rendering a reconstruction of the mesh.
[00161] It should be understood that the foregoing description is only illustrative. Various alternatives and modifications may be devised by those skilled in the art. For example, features recited in the various dependent claims could be combined with each other in any suitable combination (s). In addition, features from different embodiments described above could be selectively combined into a new embodiment. Accordingly, the description is intended to embrace all such alternatives, modifications and variances which fall within the scope of the appended claims.
[00162] The following acronyms and abbreviations that may be found in the specification and/or the drawing figures are defined as follows:
2D or 2d two-dimensional
3D or 3d three-dimensional
6DOF six degrees of freedom
ACL atlas coding layer afps atlas frame parameter set
AR augmented reality
ASIC application-specific integrated circuit asps atlas sequence parameter set
CABAC context-adaptive binary arithmetic coding
CfP call for proposal
CGI computer-generated imagery
HMD head mounted display
HRD hypothetical reference decoder id or ID identifier
Idx index
IEC International Electrotechnical Commission I/F interface
I/O input/output
ISO International Organization for Standardization
LOD or lod level of detail MPEG moving picture experts group MPEG-I MPEG immersive MR mixed reality
NAL or nal network abstraction layer
NW network pos position
RBSP raw byte sequence payload
SEI supplemental enhancement information se (v) syntax element coded by the signed integer exponent Golomb code, with the left bit first
SODB string of data bits
UE user equipment ue (v) unsigned integer exponential Golomb coded syntax element with the left bit first u (n) unsigned integer using n bits, e.g. u(l), u(2)
UV or uv coordinate texture, where "U" or "u" and "V" or "v" are axes of a 2D texture u (v) unsigned integer, where the number of bits is determined by the value of other syntax elements
V3C visual volumetric video-based coding
V-PCC video-based point cloud coding/compression
VPS V3C parameter set
VR virtual reality
X horizontal axis
Y vertical axis

Claims

CLAIMS What is claimed is:
1. An apparatus comprising: at least one processor; and at least one memory including computer program code; wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to: receive a volumetric frame comprising three- dimensional data content; decompose a three-dimensional model of the three- dimensional content into a collection of two-dimensional patches; encode the three-dimensional data content, the encoding comprising a coding of a mesh related to the volumetric frame; identify three-dimensional vertices on two- dimensional patch boundaries per patch; signal neighboring patches per patch; and signal connected boundary vertices.
2. An apparatus comprising: at least one processor; and at least one memory including computer program code; wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to: receive a bitstream comprising connected boundary vertices information related to a volumetric frame comprising three-dimensional data content of a mesh, the bitstream further comprising neighboring patches per patch information related to the volumetric frame; extract the connected boundary vertices information; extract the neighboring patches per patch information; generate a mesh representation of the mesh from the extracted connected boundary vertices information; identify three-dimensional vertices on two- dimensional patch boundaries per patch of the mesh representation; connect boundary vertices between neighboring patches per patch, as signaled; and render a reconstruction of the mesh.
3. An apparatus comprising: means for receiving a volumetric frame comprising three-dimensional data content; means for decomposing a three-dimensional model of the three-dimensional content into a collection of two- dimensional patches; means for encoding the three-dimensional data content, the encoding comprising a coding of a mesh related to the volumetric frame; means for identifying three-dimensional vertices on two-dimensional patch boundaries per patch; means for signaling neighboring patches per patch; and means for signaling connected boundary vertices.
4. The apparatus of claim 3, wherein the signaling of the neighboring patches per patch and the signaling of the connected boundary vertices is configured to be used to reconstruct texture for the connected boundary vertices.
5. The apparatus of claim 3 or 4, further comprising means for signaling information related to determining and ordering the neighboring patches per patch.
6. The apparatus of claim 3 or 4 or 5, further comprising means for signaling information related to determining and ordering the connected boundary vertices.
7. The apparatus of any of the claims 3 to 6, further comprising means for signaling information related to mesh reconstruction of the connected boundary vertices.
8. The apparatus of any of the claims 3 to 7, wherein the signaling of the neighboring patches per patch and the signaling of the connected boundary vertices is performed using at least one of: an atlas sequence parameter set extension; an atlas frame parameter set extension; an atlas patch data unit extension; or at least one supplemental enhancement information message.
9. The apparatus of any of the claims 3 to 8, wherein the connected boundary vertices signaling comprises information related to whether connected vertices are to be merged into a single vertex, or whether a new face is to be drawn.
10. An apparatus comprising: means for receiving a bitstream comprising connected boundary vertices information related to a volumetric frame comprising three-dimensional data content of a mesh, the bitstream further comprising neighboring patches per patch information related to the volumetric frame; means for extracting the connected boundary vertices information; means for extracting the neighboring patches per patch information; means for generating a mesh representation of the mesh from the extracted connected boundary vertices information; means for identifying three-dimensional vertices on two-dimensional patch boundaries per patch of the mesh representation; means for connecting boundary vertices between neighboring patches per patch, as signaled; and means for rendering a reconstruction of the mesh.
11. The apparatus of claim 10, further comprising means for reconstructing texture for the connected boundary vertices using the neighboring patches per patch information and the connected boundary vertices information .
12. The apparatus of claim 10 or 11, further comprising means for receiving signaling of information related to determining and ordering the neighboring patches per patch.
13. The apparatus of claim 10 or 11 or 12, further comprising : means for receiving signaling of information related to determining and ordering the connected boundary vertices.
14. The apparatus of any of the claims 10 to 13, further comprising means for receiving signaling of information related to mesh reconstruction of the connected boundary vertices.
15. The apparatus of any of the claims 10 to 14, wherein the neighboring patches per patch information and the connected boundary vertices information is received with at least one of: an atlas sequence parameter set extension; an atlas frame parameter set extension; an atlas patch data unit extension; or at least one supplemental enhancement information message.
16. The apparatus of any of the claims 10 to 15, wherein the connected boundary vertices information comprises information related to whether connected vertices are to be merged into a single vertex, or whether a new face is to be drawn.
17. A method comprising: receiving a volumetric frame comprising three- dimensional data content; decomposing a three-dimensional model of the three- dimensional content into a collection of two-dimensional patches; encoding the three-dimensional data content, the encoding comprising a coding of a mesh related to the volumetric frame; identifying three-dimensional vertices on two- dimensional patch boundaries per patch; signaling neighboring patches per patch; and signaling connected boundary vertices.
18. A method comprising: receiving a bitstream comprising connected boundary vertices information related to a volumetric frame comprising three-dimensional data content of a mesh, the bitstream further comprising neighboring patches per patch information related to the volumetric frame; extracting the connected boundary vertices information; extracting the neighboring patches per patch information; generating a mesh representation of the mesh from the extracted connected boundary vertices information; identifying three-dimensional vertices on two- dimensional patch boundaries per patch of the mesh representation; connecting boundary vertices between neighboring patches per patch, as signaled; and rendering a reconstruction of the mesh.
19. A non-transitory program storage device readable by a machine, tangibly embodying a program of instructions executable with the machine for performing operations, the operations comprising: receiving a volumetric frame comprising three- dimensional data content; decomposing a three-dimensional model of the three- dimensional content into a collection of two-dimensional patches; encoding the three-dimensional data content, the encoding comprising a coding of a mesh related to the volumetric frame; identifying three-dimensional vertices on two- dimensional patch boundaries per patch; signaling neighboring patches per patch; and signaling connected boundary vertices.
20. A non-transitory program storage device readable by a machine, tangibly embodying a program of instructions executable with the machine for performing operations, the operations comprising: receiving a bitstream comprising connected boundary vertices information related to a volumetric frame comprising three-dimensional data content of a mesh, the bitstream further comprising neighboring patches per patch information related to the volumetric frame; extracting the connected boundary vertices information; extracting the neighboring patches per patch information; generating a mesh representation of the mesh from the extracted connected boundary vertices information; identifying three-dimensional vertices on two- dimensional patch boundaries per patch of the mesh representation; connecting boundary vertices between neighboring patches per patch, as signaled; and rendering a reconstruction of the mesh.
PCT/EP2022/069371 2021-07-21 2022-07-12 V3c patch connectivity signaling for mesh compression WO2023001623A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163224086P 2021-07-21 2021-07-21
US63/224,086 2021-07-21

Publications (1)

Publication Number Publication Date
WO2023001623A1 true WO2023001623A1 (en) 2023-01-26

Family

ID=82799810

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2022/069371 WO2023001623A1 (en) 2021-07-21 2022-07-12 V3c patch connectivity signaling for mesh compression

Country Status (1)

Country Link
WO (1) WO2023001623A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210090301A1 (en) * 2019-09-24 2021-03-25 Apple Inc. Three-Dimensional Mesh Compression Using a Video Encoder
WO2021136878A1 (en) * 2020-01-02 2021-07-08 Nokia Technologies Oy A method, an apparatus and a computer program product for volumetric video encoding and decoding

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210090301A1 (en) * 2019-09-24 2021-03-25 Apple Inc. Three-Dimensional Mesh Compression Using a Video Encoder
WO2021136878A1 (en) * 2020-01-02 2021-07-08 Nokia Technologies Oy A method, an apparatus and a computer program product for volumetric video encoding and decoding

Similar Documents

Publication Publication Date Title
JP4832975B2 (en) A computer-readable recording medium storing a node structure for representing a three-dimensional object based on a depth image
US20230050860A1 (en) An apparatus, a method and a computer program for volumetric video
US11711535B2 (en) Video-based point cloud compression model to world signaling information
WO2021240069A1 (en) Offset Texture Layers For Encoding And Signaling Reflection And Refraction For Immersive Video And Related Methods For Multi-Layer Volumetric Video
US20220377327A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
WO2021260266A1 (en) A method, an apparatus and a computer program product for volumetric video coding
US20230306646A1 (en) Adaptive Filtering of Occupancy Map for Dynamic Mesh Compression
US20220383552A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
WO2021191495A1 (en) A method, an apparatus and a computer program product for video encoding and video decoding
WO2021245326A1 (en) A method, an apparatus and a computer program product for video encoding and video decoding
WO2023144445A1 (en) A method, an apparatus and a computer program product for video encoding and video decoding
US20220321914A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
WO2023001623A1 (en) V3c patch connectivity signaling for mesh compression
WO2021186103A1 (en) A method, an apparatus and a computer program product for volumetric video encoding and video decoding
US20230326138A1 (en) Compression of Mesh Geometry Based on 3D Patch Contours
US20230298217A1 (en) Hierarchical V3C Patch Remeshing For Dynamic Mesh Coding
US20230300336A1 (en) V3C Patch Remeshing For Dynamic Mesh Coding
US20230298218A1 (en) V3C or Other Video-Based Coding Patch Correction Vector Determination, Signaling, and Usage
WO2024003683A1 (en) Method apparatus and computer program product for signaling boundary vertices
WO2023002315A1 (en) Patch creation and signaling for v3c dynamic mesh compression
WO2024084326A1 (en) Adaptive displacement packing for dynamic mesh coding
US20230171427A1 (en) Method, An Apparatus and a Computer Program Product for Video Encoding and Video Decoding
US20230316647A1 (en) Curvature-Guided Inter-Patch 3D Inpainting for Dynamic Mesh Coding
WO2024079653A1 (en) Parameterization-guided packing of displacements for dynamic mesh coding
WO2023203416A1 (en) Wavelet coding and decoding of dynamic meshes based on video components and metadata

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22751006

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE