CN117897732A - Lattice face syntax - Google Patents

Lattice face syntax Download PDF

Info

Publication number
CN117897732A
CN117897732A CN202380013363.6A CN202380013363A CN117897732A CN 117897732 A CN117897732 A CN 117897732A CN 202380013363 A CN202380013363 A CN 202380013363A CN 117897732 A CN117897732 A CN 117897732A
Authority
CN
China
Prior art keywords
encoding
implementation
bin
mesh
bins
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202380013363.6A
Other languages
Chinese (zh)
Inventor
D·格拉兹斯
A·扎格托
A·塔巴塔贝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Group Corp
Sony Optical Archive Inc
Original Assignee
Sony Group Corp
Optical Archive Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/987,848 external-priority patent/US20230306644A1/en
Application filed by Sony Group Corp, Optical Archive Inc filed Critical Sony Group Corp
Priority claimed from PCT/IB2023/052109 external-priority patent/WO2023180845A1/en
Publication of CN117897732A publication Critical patent/CN117897732A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Generation (AREA)

Abstract

The new syntax element is used to extend the bin type and add syntax to the V3C standard. The new syntax defines a bin that encodes the mesh by projecting the connected triangle onto the 2D surface, a bin that encodes the triangle or triangle strip without any projection, or a bin that tracks over time and encodes by projecting the connected triangle onto the 2D surface. Furthermore, the syntax allows different ways of encoding mesh-specific information. For example, this syntax enables three different encoding methods for vertex positions: explicit (added directly to the map header), embedded on the video data (occupying the map data), or encoded using an external grid encoder.

Description

Lattice face syntax
Cross Reference to Related Applications
The present application claims priority according to 35U.S. C. ≡119 (e) of U.S. provisional patent application Ser. No. 63/269,112, "MESH PATCH SYNTAX," filed on 3/25 at 2022, which is incorporated herein by reference in its entirety for all purposes.
Technical Field
The present invention relates to three-dimensional graphics. More particularly, the present invention relates to the encoding of three-dimensional graphics.
Background
Recently, new methods of compressing volumetric content, such as point clouds, based on projections from 3D to 2D are being standardized. This approach, also known as V3C (visual volume video based compression), maps 3D volume data into several 2D bins (patches) which are then further arranged into an atlas image, which is then encoded with a video encoder. The atlas image corresponds to the geometry of the points, the corresponding texture, and an occupancy map (occupancy map) indicating which locations are to be considered for the point cloud reconstruction.
In 2017, MPEG issued a proposal solicitation (CfP) for compression of point clouds. After evaluating several proposals, MPEG is currently considering two different techniques for point cloud compression: 3D native coding techniques (based on octree and similar coding methods), or 3D to 2D projection, followed by conventional video coding. In the case of dynamic 3D scenes, MPEG is using test model software (TMC 2) based on surface modeling of bins, projection of the bins from 3D images to 2D images, and encoding 2D images with a video encoder such as HEVC. This approach has proven to be more efficient than native 3D coding and can achieve competitive bit rates with acceptable quality.
Due to the success of projection-based methods (also known as video-based methods or V-PCC) to encode 3D point clouds, it is expected that this standard will include more 3D data, such as a 3D mesh, in future versions. However, the current version of the standard is only suitable for the transmission of a set of unconnected points, so that there is no mechanism of connectivity of the sending points, which is required in 3D mesh compression.
Methods have been proposed to extend the functionality of V-PCC also to grids. One possible way is to encode the vertices using V-PCC and then encode connectivity using a mesh compression method such as TFAN or edgebreak. A limitation of this approach is that the original mesh must be dense so that the point cloud generated from the vertices is not sparse and can be efficiently encoded after projection. Furthermore, the order of vertices affects the encoding of connectivity, thereby proposing different methods of reorganizing mesh connectivity. An alternative way to encode a sparse grid is to encode vertex positions in 3D using RAW face metadata. Since RAW bins encode (x, y, z) directly, in this approach all vertices are encoded as RAW data, while connectivity is encoded by a similar grid compression method, as previously described. Note that in a RAW bin, vertices may be sent in any preferred order, so that the order resulting from connectivity encoding may be used. This approach may encode sparse point clouds, however, RAW bins are not efficient in encoding 3D data, and further data such as the attributes of triangle facets may be lost from this approach.
Disclosure of Invention
The new syntax element is used to extend the bin type and add syntax to the V3C standard. The new syntax defines a bin that encodes the mesh by projecting the connected triangle onto the 2D surface, a bin that encodes the triangle or triangle strip without any projection, or a bin that tracks over time and encodes by projecting the connected triangle onto the 2D surface. Furthermore, the syntax allows different ways of encoding mesh-specific information. For example, this syntax enables three different encoding methods for vertex positions: explicit (added directly to the map header), embedded on the video data (occupying the map data), or encoded using an external grid encoder.
In one aspect, a method of programming in a non-transitory memory of a device includes encoding a mesh using a bin and encoding vertex positions, the encoding implementation selected from: the vertex position information is directly added to the graph stream, embedded on the video data, or an external grid encoder is used. Encoding the mesh using the bins includes projecting the connected triangles onto a two-dimensional surface. Encoding the mesh using the bins includes using the bins that encode triangles or triangle strips without any projection. Encoding a mesh using bins includes using bins that are tracked over time and encoded by projecting connected triangles onto a two-dimensional surface. The video data includes occupancy map data. Encoding vertex positions includes incremental information for geometric correction. The method further includes encoding the bin connectivity using a binary implementation or an explicit implementation; encoding the vertex of the bin using a binary implementation, an occupancy map implementation, or an explicit implementation; and encoding the bin map coordinates using a binary implementation, an explicit implementation, or an implicit implementation.
In another aspect, an apparatus includes a non-transitory memory for storing an application to: encoding the mesh using the bins and encoding the vertex positions, the encoding being effected selected from: the method may further include adding vertex position information directly to the graph stream, embedding the vertex position information onto the video data, or using an external grid encoder, and a processor coupled to the memory, the processor configured to process the application. Encoding the mesh using the bins includes projecting the connected triangles onto a two-dimensional surface. Encoding the mesh using the bins includes using the bins that encode triangles or triangle strips without any projection. Encoding a mesh using bins includes using bins that are tracked over time and encoded by projecting connected triangles onto a two-dimensional surface. The video data includes occupancy map data. Encoding vertex positions includes incremental information for geometric correction. The apparatus further includes encoding the bin connectivity using a binary implementation or an explicit implementation; encoding the vertex of the bin using a binary implementation, an occupancy map implementation, or an explicit implementation; and encoding the bin map coordinates using a binary implementation, an explicit implementation, or an implicit implementation.
In another aspect, a system includes one or more cameras for acquiring three-dimensional content and an encoder configured to: encoding the mesh using the bins and encoding the vertex positions, the encoding being effected selected from: the vertex position information is directly added to the graph stream, embedded on the video data, or an external grid encoder is used. Encoding the mesh using the bins includes projecting the connected triangles onto a two-dimensional surface. Encoding the mesh using the bins includes using the bins that encode triangles or triangle strips without any projection. Encoding a mesh using bins includes using bins that are tracked over time and encoded by projecting connected triangles onto a two-dimensional surface. The video data includes occupancy map data. Encoding vertex positions includes incremental information for geometric correction. The system further includes encoding the bin connectivity using a binary implementation or an explicit implementation; encoding the vertex of the bin using a binary implementation, an occupancy map implementation, or an explicit implementation; and encoding the bin map coordinates using a binary implementation, an explicit implementation, or an implicit implementation.
Drawings
FIG. 1 illustrates a diagram of a grid surface metadata syntax in accordance with some embodiments.
FIG. 2 illustrates a diagram of triangle primitives in accordance with some embodiments.
Fig. 3 illustrates a diagram of color expansion, in accordance with some embodiments.
FIG. 4 illustrates a diagram of a trace grid surface metadata syntax in accordance with some embodiments.
Fig. 5 illustrates a flow chart of a bin trellis encoding method in accordance with some embodiments.
Fig. 6 illustrates a block diagram of an exemplary computing device configured to implement the binning encoding method in accordance with some embodiments.
Detailed Description
The new syntax element is used to extend the bin type and add syntax to the V3C standard. The new syntax defines a bin that encodes the mesh by projecting the connected triangle onto the 2D surface, a bin that encodes the triangle or triangle strip without any projection, or a bin that tracks over time and encodes by projecting the connected triangle onto the 2D surface. Furthermore, the syntax allows different ways of encoding mesh-specific information. For example, this syntax enables three different encoding methods for vertex positions: explicit (added directly to the map header), embedded on the video data (occupying the map data), or encoded using an external grid encoder.
An alternative method of representing mesh information using existing face syntax elements is to use mesh extensions, similar to the operations performed on MIVs. This also minimizes the impact on the V3C specification. The text herein shows what will be added to the specification. For example, a mesh bin type may be an extension of a bin data unit, as shown herein:
triangle primitives may be extensions of the original surface metadata as follows:
finally, the mesh bin type tracked may be an extension of the inter-bin data, as follows:
the extension contains specific mesh information associated with each bin. Syntax allows the following coding options:
two types of connectivity coding: explicit encoding (each triangle represented by 3 or 4 vertex indices in the face metadata) or binary encoding (where an outer encoder is used);
three types of vertex coding: explicit coding (where two coordinates indicate the position in the bin from which the third coordinate will be derived), embedded in the occupancy map (which will carry symbols to identify the position of the vertex in the bin), or binary coding (using an external encoder);
three types of UV coordinate coding: explicit coding (sending coordinates inside the bin data), implicit coding (assuming it has the same value as the bin position of the corresponding vertex), or binary (using an external trellis encoder).
FIG. 1 illustrates a diagram of a grid surface metadata syntax in accordance with some embodiments. The bin connectivity may be encoded using a binary implementation or an explicit implementation. The bin vertices may be encoded using a binary implementation, an occupancy map implementation, or an explicit implementation. The bin map coordinates may be encoded using binary, explicit, or implicit implementations.
New syntax elements and semantics are described herein:
the mpdu_bin_object_present_flag [ tileID ] [ p ] is equal to 1, specifies that for a bin with index p of the current atlas tile, there are syntax elements mpdu_mesh_bin_size_bytes [ tileID ] [ p ] and mpdu_mesh_bin_object [ tileID ] [ p ] [ i ], where tile ID is equal to tileID. If mpdu_bin_object_present_flag [ tileID ] [ p ] is equal to 0, then for the current bin, there are no syntax elements mpdu_mesh_bin_object_size_bytes [ tileID ] [ p ] and mpdu_mesh_bin_object [ tileID ] [ p ] [ i ]. If the mpdu_binary_object_present_flag [ tileID ] [ p ] does not exist, its value should be inferred to be equal to 0.
The mpdu_mesh_bin_object_size_bytes [ tileID ] [ p ] specifies a bin with index p for the current atlas tile, which is used to represent the number of bytes of mesh information in binary form, where tile ID is equal to tileID.
The mpdu_mesh_bin_object [ tileID ] [ p ] [ i ] specifies the i bytes of the binary representation of the bin of the index p bin of the current atlas tile, where tile ID is equal to tileID.
mpdu_vertex_count_minus3[ tileID ] [ p ] plus 3 specifies the number of vertices present in the bin of index p for the current atlas tile, where tile ID equals tileID.
The mpdu_face_count [ tileID ] [ p ] specifies the number of triangles that exist in the bin of index p for the current atlas tile, where tile ID is equal to tileID. When not present, the value of mpdu_face_count [ tileID ] [ p ] should be zero.
The mpdu_face_vertex [ tileID ] [ p ] [ i ] [ k ] specifies the kth value of the vertex index of the ith triangle or quadrilateral of the current bin with index p for the current atlas tile, where tile ID is equal to tileID. The value of mpdu_face_vertex [ tileID ] [ p ] [ i ] [ k ] should be in the range of 0 (inclusive) to mpdu_vert_count_minus3[ tileID ] [ p ] +2 (inclusive).
The mpdu_vertex [ tileID ] [ p ] [ i ] specifies the value of the x coordinate of the ith vertex of the current bin with index p for the current atlas tile, where tile ID is equal to tileID. The value of mpdu_vertex_pos_x [ p ] [ i ] should be in the range of 0 (inclusive) to mpdu_2d_size_x_minus1[ tileid ] [ p ] (inclusive).
mpdu_vertex_pos_y [ tileID ] [ p ] [ i ] specifies the value of the y-coordinate of the ith vertex of the current bin p of the current atlas tile, where tile ID is equal to tileID. The value of mpdu_vertex_pos_y [ tileID ] [ p ] [ i ] should be in the range of 0 (inclusive) to mpdu_2d_size_y_minus1[ tileiid ] [ p ] (inclusive).
mpdu_vertex_pos_delta_z [ tileID ] [ p ] [ i ] specifies the difference between the value derived from the geometric video and the z-coordinate of the ith vertex of the current bin p of the current atlas tile, where tile ID is equal to tileID.
mpdu_vertex_u_coord[tileID][p][i]The value of the u coordinate of the mapping of the ith vertex of the current bin with index p for the current atlas tile is specified, where tile ID is equal to tileID. MPdu_vertex_u_chord [ p ]][i]The value of (2) is 0 (inclusive) asps _ mesh _ coordinates _ bit _ depth _ minus1+1 -1 (inclusive).
mpdu_vertex_pos_y[tileID][p][i]The value of the y-coordinate of the ith vertex of the current bin p of the current atlas tile is specified, where tile ID is equal to tileID. mpdu_vertex_pos_y [ tileID ]][p][i]The value of (1) should be from 0 (inclusive) to 2 asps _ mesh _ coordinates _ bit _ depth _ minus1+1 -1 (inclusive).
FIG. 2 illustrates a diagram of triangle primitives in accordance with some embodiments. Triangles may be configured as individual triangles 200, triangle strips 202, or triangle sectors 204. Other triangular configurations are also possible.
Fig. 3 illustrates a diagram of color expansion, in accordance with some embodiments. For color expansion, there are x, y, and z package implementations and row package implementations.
tpdu_vertical_minus3 [ tileID ] [ p ] plus 3 specifies the number of vertices present in the triangle encoding bin indexed p in the current atlas tile, where tile ID equals tileID. the value of tpdu_vertical_minus3 [ tileID ] [ p ] should be in the range of 0 (inclusive) ((tpdu_2d_size_x_minus1 [ tileiid ] [ p ] +1)) x (tpdu_2d_size_y_minus1 [ tileID ] +1))/3-3 (inclusive).
tpdu_private_idc [ tileID ] [ p ] indicates how to obtain the geometric primitive of a triangle from vertices present in the encoded bin indexed p in the current atlas tile, where tile ID is equal to tileID. If tpdu_private_idc [ tileID ] [ p ] does not exist, its value should be inferred to be equal to 0.
the tpdu_color_expansion_flag [ tile ID ] [ p ] is equal to 1, the coordinates of the specified vertices are row-interleaved, and the color value is extended for the current bin p of the current atlas tile, where tile ID is equal to tile ID. If tpdu_color_expansion_flag [ tileID ] [ p ] is equal to 0, the coordinates of the vertices are sequentially packed and the color is not expanded for the current bin. If tpdu_color_expansion_flag [ tileID ] [ p ] does not exist, its value should be inferred to be equal to 0.
FIG. 4 illustrates a diagram of a trace grid surface metadata syntax in accordance with some embodiments.
/>
/>
tmpdu_rotation_present_flag [ t ] [ p ] is equal to 1, indicating that there is a bin with index p and a rotation parameter for the tile with tile ID t. tmpdu_rotation_present_flag [ t ] [ p ] is equal to 0, indicating that there is no rotation parameter for the bin with index p and the tile with tile ID t. When tmpdu_rotation_present_flag [ t ] [ p ] is not present, it should be inferred to be equal to 0.
tmpdu_3d_rotation_qx[t][p]The x component qX of the geometric rotation of the bin with index p and the tile with tile ID t, represented using the quaternion, is specified. tmpdu_3d_rotation_qx [ t ]][p]The value of (2) should be-2 14 (containing) to 2 14 -1 (inclusive). When tmpdu_3d_rotation_qx [ t ]][p]In the absence, its value should be inferred to be equal to 0. The value of qX is calculated as follows:
qX=tmpdu_3d_rotation_qx,2 14
tmpdu_3d_rotation_qy[t][p]the y component qY of the geometric rotation of the bin with index p and the tile with tile ID t, represented using the quaternion, is specified. tmpdu_3d_rotation_qy [ t ]][p]The value of (2) should be-2 14 (containing) to 2 14 -1 (inclusive). When tmpdu_3d_rotation_qy [ t ]][p]In the absence, its value should be inferred to be equal to 0. The value of qY is calculated as follows:
qY=tmpdu_3d_rotation_qy,2 14
tmpdu_3d_rotation_qz[t][p]the z-component qZ of the geometric rotation of the bin with index p and the tile with tile ID t, represented using the quaternion, is specified. tmpdu_3d_rotation_qz [ t ]][p]The value of (2) should be-2 14 (containing) to 2 14 -1 (inclusive). When tmpdu_3d_rotation_qz [ t ]][p]In the absence, its value should be inferred to be equal to 0. The value of qZ is calculated as follows:
qZ=tmpdu_3d_rotation_qz,2 14
the fourth component qW of the geometric rotation of the tile with index p and tile ID t, represented using the quaternion, is calculated as follows:
qW=Sqrt(1-(qX 2 +qY 2 +qZ 2 ))
the unit quaternion may be represented as a rotation matrix R as follows:
tmpdu_vertical_changed_position_flag [ t ] [ p ] is equal to 1, indicating that there is a vertex displacement for the bin with index p and the tile with tile ID t. tmpdu_vertical_changed_position_flag [ t ] [ p ] is equal to 0, indicating that there is no vertex displacement for the bin with index p and the tile with tile ID t. When tmpdu_perpendicular_changed_position_flag [ t ] [ p ] is not present, it should be inferred to be equal to 0.
tmpdu_vertex_delta_pos_x [ t ] [ p ] [ i ] specifies the bin with index p and the difference between the ith vertex of the tile with tile ID t and the x coordinate value of the matching bin indicated by tmpdu_ref_index [ t ] [ p ]. the value of tmpdu_vertex_x [ t ] [ p ] [ i ] should be in the range of 0 (inclusive) to 2afps_num_bits_vertex-delta_x-1 (inclusive).
tmpdu_vertex_delta_pos_y [ t ] [ p ] [ i ] specifies the bin with index p and the difference between the ith vertex of the tile with tile ID t and the y-coordinate value of the matching bin indicated by tmpdu_ref_index [ t ] [ p ]. the value of tmpdu_vertex_pos_y [ t ] [ p ] [ i ] should be in the range of 0 (inclusive) to 2afps_num_bits_vertex_delta_y-1 (inclusive).
tmpdu_vertex_pos_delta_z [ tileID ] [ p ] [ i ] specifies the value of the difference between the value derived from the geometric video and the z-coordinate of the ith vertex of the current bin p of the current atlas tile, where tile ID is equal to tileID.
the tmpdu_bin_object_present_flag [ tileID ] [ p ] is equal to 1, specifies that for a bin with index p of the current atlas tile, there are syntax elements tmpdu_mesh_bin_size_bytes [ tileID ] [ p ] and tmpdu_mesh_bin_object [ tileID ] [ p ] [ i ], where tile ID is equal to tileID. If tmpdu_bin_object_present_flag [ tileID ] [ p ] is equal to 0, then there are no syntax elements tmpdu_mesh_bin_object_size_bytes [ tileID ] [ p ] and tmpdu_mesh_bin_object [ tileID ] [ p ] [ i ] for the current bin. If tmpdu_bin_object_present_flag [ tileID ] [ p ] does not exist, its value should be inferred to be equal to 0.
tmpdu_mesh_binary_object_size_bytes [ tileID ] [ p ] designates a bin with index p for the current atlas tile, which is used to represent the number of bytes of mesh information in binary form, where tile ID is equal to tileID.
tmpdu_mesh_binary_object [ tileID ] [ p ] [ i ] specifies the i bytes of the binary representation of the bin of the index p bin of the current atlas tile, where tile ID is equal to tileID.
tmpdu_vertex_count_minus3[ tileID ] [ p ] +3 specifies the number of vertices present in the bin of index p for the current atlas tile, where tile ID is equal to tileID.
tmpdu_face_count [ tileID ] [ p ] specifies the number of triangles that exist in the bin of index p for the current atlas tile, where tile ID is equal to tileID. When not present, the value of tmpdu_face_count [ tileID ] [ p ] should be zero.
tmpdu_face_vertex [ tileID ] [ p ] [ i ] [ k ] specifies the kth value of the vertex index of the ith triangle or quadrilateral of the current bin with index p for the current atlas tile, where tile ID is equal to tileID. the value of tmpdu_face_vertex [ tileID ] [ p ] [ i ] [ k ] should be in the range of 0 (inclusive) to tmpdu_vert_count_minus3[ tileIID ] [ p ] +2 (inclusive).
tmpdu_vertex [ tileID ] [ p ] [ i ] specifies the value of the x coordinate of the ith vertex of the current bin with index p for the current atlas tile, where tile ID is equal to tileID. the value of tmpdu_vertex_pos_x [ p ] [ i ] should be in the range of 0 (inclusive) to tmpdu_2d_size_x_minus1[ tileid ] [ p ] (inclusive).
tmpdu_vertex_pos_y [ tileID ] [ p ] [ i ] specifies the value of the y-coordinate of the ith vertex of the current bin p of the current atlas tile, where tile ID is equal to tileID. the value of tmpdu_vertex_pos_y [ tileID ] [ p ] [ i ] should be in the range of 0 (inclusive) to tmpdu_2d_size_y_minus1[ tileID ] [ p ] (inclusive).
tmpdu_vertex_u_coord[tileID][p][i]The value of the u coordinate of the mapping of the ith vertex of the current bin with index p for the current atlas tile is specified, where tile ID is equal to tileID. tmpdu_vertex_u_chord [ p ]][i]The value of (2) is 0 (inclusive) asps_mesh_coordinates_bit_depth_minus1+1 -1 (inclusive).
tmpdu_vertex_v_coord[tileID][p][i]The value of the y-coordinate of the ith vertex of the current bin p of the current atlas tile is specified, where tile ID is equal to tileID. tmpdu_vertex_v_chord [ tileID ]][p][i]The value of (2) is 0 (inclusive) asps_mesh_coordinates_bit_depth_minus1+1 -1 (inclusive).
In addition to the above-described syntax elements, by adding and also modifying the current atlas sequence parameter set, the following mesh extension is proposed:
/>
/>
the following semantic modifications:
the asps_extension_present_flag is equal to 1, and specified syntax elements asps_vpc_extension_present_flag, asps_ miv _extension_present_flag, asps_mesh_extension_present_flag, and asps_extension_5bits exist in the atlas_sequence_parameter_set_rbsp () syntax structure. The asps_extension_present_flag is equal to 0, and the specified syntax elements asps_vpc_extension_present_flag, asps_ miv _extension_present_flag, asps_mesh_extension_present_flag, and asps_extension_5bits do not exist.
The asps_mesh_extension_present_flag is equal to 1, and a specified asps_mesh_extension () syntax structure exists in the ats_sequence_parameter_set_rbsp () syntax structure. The asps_mesh_extension_present_flag is equal to 0, specifying that the syntax structure does not exist. When not present, the value of the asps_mesh_extension_present_flag is inferred to be equal to 0.
The asps_extension_5bits is equal to 0, specifying that there is no asps_extension_data_flag syntax element in the ASPS RBSP syntax structure. When present, the asps_extension_5bits should be equal to 0 in the bitstream conforming to the present version of the present document. Values of asps_extension_5bits that are not equal to 0 are reserved for future use by ISO/IEC. The decoder should allow the value of asps_extension_5bits to be unequal to 0 and should ignore all asps_extension_data_flag syntax elements in the ASPS NAL unit. When not present, the value of asps_extension_5bits is inferred to be equal to 0.
ASPS mesh extensions are used to identify binary encoding for connectivity, use of quadrilaterals, presence of vertex data in occupancy maps, and explicitly encoded mesh codecs that map coordinates, as shown in the following syntax and semantics:
the asssh_mesh_binary_coding_enabled_flag is equal to 1, indicating that the vertex and connectivity information associated with the bin exists in binary format.
The asps_mesh_binary_coding_enabled_flag is equal to 0, specifying that the mesh vertex and connectivity data do not exist in binary format. When not present, the asps_mesh_binary_coding_enabled_flag is inferred to be 0.
The asps_mesh_bin_codec_id indicates an identifier of a codec used to compress vertices and connectivity information of a bin. The asps_mesh_binary_codec_id should be in the range of 0 (inclusive) to 255 (inclusive).
The asps_mesh_quad_face_flag is equal to 1, indicating that a quadrilateral is used for polygon representation. The asps_mesh_quad_face_flag is equal to 0, indicating that triangles are used for the polygon representation of the mesh. When not present, the value of asps_mesh_quad_flag is inferred to be equal to 0.
The asps_mesh_plastics_in_video_data_flag is equal to 1, indicating that vertex information exists in occupied video data. The asps_mesh_plastics_in_video_data_flag is equal to 0, indicating that vertex information exists in the face metadata. When not present, the value of the asps_mesh_cosmetics_in_security_video_data_flag is inferred to be equal to 0.
The asps_mesh_mapping_coordinates_present_flag is equal to 1, indicating that mapping information associated with vertices of a bin exists. The asps_mesh_mapping_coordinates_present_flag is equal to 0, specifies that mapping information associated with vertices of a bin does not exist, and should be assumed to be equal to the vertex atlas position. When not present, the asps_mesh_mapping_connectors_present_flag is inferred to be 0.
asps_mesh_coordinates_bit_depth_minus1 plus 1 indicates the bit depth of the mapping information associated with the vertices of the bins. The asps_mesh_connectors_bit_depth_minus1 should be in the range of 0 (inclusive) to 31 (inclusive).
The asps_mesh_metrics_delta_z_present_flag is equal to 1, indicating that there is a difference between the z coordinate reconstructed from the geometric image and the actual value. The asps_mesh_metrics_delta_z_present_flag is equal to 0, the specified disparity value does not exist, and should be assumed to be equal to 0. When not present, the asps_mesh_plastics_delta_z_present_flag is inferred to be 0.
The vertex position conversion from the atlas coordinates to 3D coordinates will be modified by:
the inputs to this process are:
the variable pIdx, the bin index,
the variable depthValue, the depth of the point,
the variable x, the x-atlas coordinates,
the variables y, y are the set coordinates.
The output of this process is a 2D array pos3D of size 3 for specifying the three-dimensional coordinates of the point.
The following applies:
the variable oIdx is set to atlasPatchOrientationIndex [ pAdx ].
Variables posX, posY, sizeX, sizeY, lodX and lodY are assigned as follows:
posX=AtlasPatch2dPosX[pIdx]
posY=AtlasPatch2dPosY[pIdx]
deltaZ=AtlasPatch2dPosDeltaZ[pIdx]
sizeX=AtlasPatch2dSizeX[pIdx]
sizeY=AtlasPatch2dSizeY[pIdx]
lodX=AtlasPatchLoDScaleX[pIdx]
lodY=AtlasPatchLoDScaleY[pIdx]
the atlas coordinates (x, y) are converted into local bin coordinate pairs (u, v) as follows:
where Ro and Rs are specified (e.g., in a table).
The local bin coordinate pair (u, v) is converted to 3D coordinates as follows:
pos3D[AtlasPatchAxisU[pIdx]]=AtlasPatch3dOffsetU[pIdx]+u
pos3D[AtlasPatchAxisV[pIdx]]=AtlasPatch3dOffsetV[pIdx]+v
tempD=(1–2*AtlasPatchProjectionFlag[pIdx])*(depthValue+deltaZ)
some V-MESH decoder implementations of pos3D [ atlas patch axisd [ pIdx ] = Max (0, atlas patch3D offsetd [ pIdx ] +temp) may choose to clip reconstructed 3D coordinates to a range of 0 (inclusive) to (1 < < (asps_geometry_3d_bit_depth_minus1+1)) -1 (inclusive). Other decoder implementations may apply the cropping operation after a 45 degree transform or post-reconstruction processing.
As described in U.S. patent application serial No. Sony-75300, which is incorporated herein by reference in its entirety for all purposes, there are three ways to encode vertex mapping information-implicit, explicit, and binary. For implicit implementations, if projected on a 2D surface, the projection is the same as the mapping. For example, the place that hits when projected onto the projection surface is the UV coordinates. For explicit implementations, even if projection is performed, different coordinates are sent for the texture. For binary implementations, the explicit information is encoded with an external encoder (e.g., drago or AFX).
If binary encoding is implemented, the face element mesh information may be encoded using an external mesh encoder. U and V are added to ply, and the vertex mapping information is ply encoded. In some embodiments, delta information for the z coordinate is added. The incremental information may be used for geometric correction.
Further details regarding mesh compaction can be found in U.S. patent application Ser. No. 17/322,662, "VIDEO BASED MESH COMPRESSION," U.S. provisional patent application Ser. No. 63/088,705, "U.S. provisional patent application Ser. No. VIDEO BASED MESH COMPRESSION," U.S. provisional patent application Ser. No. 63/087,958, "U.S. provisional patent application Ser. No. VIDEO BASED MESH COMPRESSION," U.S. 10/6, 2020, which are incorporated herein by reference in their entirety for all purposes. The 3D mesh or 2D binning mesh connectivity may be encoded using occupancy maps and exploiting the temporal correlation of applying video-based mesh compression.
Fig. 5 illustrates a flow chart of a bin trellis encoding method in accordance with some embodiments. In step 500, the mesh is encoded using the bins. Encoding the mesh using the bins includes projecting the connected triangles onto a two-dimensional surface, using the bins encoding triangles or triangle strips without any projection, or using the bins tracking over time and encoded by projecting the connected triangles onto a two-dimensional surface. In step 502, the vertex positions are encoded for an implementation selected from the group consisting of: the vertex position information is directly added to the graph stream, embedded on the video data, or an external grid encoder is used. Incremental information may be sent to make geometric corrections. In some embodiments, the order of the steps is altered. In some embodiments, fewer or additional steps are implemented. For example, encoding bin connectivity using a binary implementation or an explicit implementation; encoding the vertex of the bin using a binary implementation, an occupancy map implementation, or an explicit implementation; and encoding the bin map coordinates using a binary implementation, an explicit implementation, or an implicit implementation.
Fig. 6 illustrates a block diagram of an exemplary computing device configured to implement the binning encoding method in accordance with some embodiments. Computing device 600 can be used to obtain, store, calculate, process, communicate, and/or display information such as images and video including 3D content. The computing device 600 is capable of implementing any of the encoding/decoding aspects. In general, hardware structures suitable for implementing computing device 600 include a network interface 602, memory 604, processor 606, I/O device(s) 608, bus 610, and storage device 612. The choice of processor is not critical as long as the appropriate processor with sufficient speed is chosen. Memory 604 may be any conventional computer memory known in the art. Storage 612 may include a hard disk drive, CDROM, CDRW, DVD, DVDRW, high definition optical disk/drive, ultra high definition drive, flash memory card, or any other storage device. Computing device 600 may include one or more network interfaces 602. Examples of network interfaces include a network card connected to an ethernet or other type of LAN. The I/O device(s) 608 may include one or more of the following: keyboard, mouse, monitor, screen, printer, modem, touch screen, button interface, and other devices. The binning trellis encoding application 630 for implementing the binning trellis encoding implementation may be stored in the storage device 612 and memory 604 and processed in the manner of a usual processing application. More or fewer components shown in fig. 6 may be included in computing device 600. In some embodiments, bin-grid encoding hardware 620 is included. Although computing device 600 in fig. 6 includes application 630 and hardware 620 for a binning trellis-encoding implementation, the binning trellis-encoding method may be implemented on a computing device in hardware, firmware, software, or any combination thereof. For example, in some embodiments, the binning grid encoding application 630 is written in memory and executed using a processor. In another example, in some embodiments, the binning trellis encoding hardware 620 is programmed hardware logic, including logic gates specifically designed to implement the binning trellis encoding method.
In some embodiments, the face mesh encoding application(s) 630 include several applications and/or modules. In some embodiments, the module further comprises one or more sub-modules. In some embodiments, fewer or additional modules may be included.
Examples of suitable computing devices include personal computers, laptop computers, computer workstations, servers, mainframe computers, handheld computers, personal digital assistants, cellular/mobile phones, smart appliances, gaming machines, digital cameras, digital video cameras, camera phones, smart phones, portable music players, tablet computers, mobile devices, video players, video disc recorders/players (e.g., DVD recorders/players, high-definition disc recorders/players, ultra-high-definition disc recorders/players), televisions, home entertainment systems, augmented reality devices, virtual reality devices, smart jewelry (e.g., smart watches), vehicles (e.g., autopilots), or any other suitable computing device.
To utilize the bin-grid encoding method, a device acquires or receives 3D content (e.g., point cloud content). The bin mesh coding method can be implemented with the help of a user or automatically without user involvement.
In operation, the binning trellis encoding method enables more efficient and accurate 3D content encoding than previous implementations.
Some embodiments of mesh surface element syntax
1. A method of programming in a non-transitory memory of a device, comprising:
encoding the grid using the bins; and
encoding the vertex positions, the encoding being selected from: the vertex position information is directly added to the graph header, the vertex position message is embedded on the video data, or an external grid encoder is used.
2. The method according to clause 1, wherein encoding the mesh using the bins comprises projecting the connected triangles onto a two-dimensional surface.
3. The method according to clause 1, wherein encoding the mesh using the bins comprises using the bins encoding triangles or triangle strips without any projections.
4. The method according to clause 1, wherein encoding the mesh using the bins comprises using bins that are tracked over time and encoded by projecting the connected triangles onto the two-dimensional surface.
5. The method according to clause 1, wherein the video data comprises occupancy map data.
6. The method according to clause 1, wherein encoding the vertex positions includes incremental information for geometric correction.
7. The method according to clause 1, further comprising encoding the bin connectivity using a binary implementation or an explicit implementation; encoding the vertex of the bin using a binary implementation, an occupancy map implementation, or an explicit implementation; and encoding the bin map coordinates using a binary implementation, an explicit implementation, or an implicit implementation.
8. An apparatus, comprising:
a non-transitory memory for storing an application for:
encoding the grid using the bins; and
encoding the vertex positions, the encoding being selected from: directly adding vertex position information to the graph header, embedding the vertex position information onto the video data, or using an external grid encoder; and
a processor coupled to the memory, the processor configured to process the application.
9. The apparatus according to clause 8, wherein encoding the mesh using the bins comprises projecting the connected triangles onto a two-dimensional surface.
10. The apparatus according to clause 8, wherein encoding the mesh using the bins comprises using the bins encoding triangles or triangle strips without any projections.
11. The apparatus according to clause 8, wherein encoding the mesh using the bins comprises using bins that are tracked over time and encoded by projecting the connected triangles onto the two-dimensional surface.
12. The apparatus according to clause 8, wherein the video data comprises occupancy map data.
13. The apparatus according to clause 8, wherein the encoding of the vertex positions comprises incremental information for geometric correction.
14. The apparatus according to clause 8, further comprising encoding the bin connectivity using a binary implementation or an explicit implementation; encoding the vertex of the bin using a binary implementation, an occupancy map implementation, or an explicit implementation; and encoding the bin map coordinates using a binary implementation, an explicit implementation, or an implicit implementation.
15. A system, comprising:
one or more cameras for acquiring three-dimensional content; and
an encoder configured to:
encoding the grid using the bins; and
encoding the vertex positions, the encoding being selected from: the vertex position information is directly added to the graph header, the vertex position message is embedded onto the video data, or an external grid encoder is used.
16. The system according to clause 15, wherein encoding the mesh using the bins comprises projecting the connected triangles onto a two-dimensional surface.
17. The system according to clause 15, wherein encoding the mesh using the bins includes using the bins encoding triangles or triangle strips without any projections.
18. The system according to clause 15, wherein encoding the mesh using the bins includes using bins that are tracked over time and encoded by projecting the connected triangles onto the two-dimensional surface.
19. The system according to clause 15, wherein the video data comprises occupancy map data.
20. The system according to clause 15, wherein the encoding of the vertex positions includes incremental information for geometric correction.
21. The system according to clause 15, further comprising encoding the bin connectivity using a binary implementation or an explicit implementation; encoding the vertex of the bin using a binary implementation, an occupancy map implementation, or an explicit implementation; and encoding the bin map coordinates using a binary implementation, an explicit implementation, or an implicit implementation.
The invention has been described in terms of specific embodiments incorporating details to facilitate the understanding of the principles of construction and operation of the invention. Such reference herein to specific embodiments and details thereof is not intended to limit the scope of the claims appended hereto. It will be apparent to those skilled in the art that other various modifications can be made in the embodiment chosen for illustration without departing from the spirit and scope of the invention as defined in the claims.

Claims (21)

1. A method of programming in a non-transitory memory of a device, comprising:
encoding the grid using the bins; and
encoding the vertex positions, the encoding being selected from: the vertex position information is directly added to the graph header, the vertex position message is embedded on the video data, or an external grid encoder is used.
2. The method of claim 1, wherein encoding the mesh using the bins comprises projecting the connected triangles onto a two-dimensional surface.
3. The method of claim 1, wherein encoding the mesh using the bins comprises using bins that encode triangles or triangle strips without any projection.
4. The method of claim 1, wherein encoding the mesh using the bins comprises using bins that are tracked over time and encoded by projecting connected triangles onto a two-dimensional surface.
5. The method of claim 1, wherein the video data comprises occupancy map data.
6. The method of claim 1, wherein encoding vertex positions includes incremental information for geometric correction.
7. The method of claim 1, further comprising encoding the bin connectivity using a binary implementation or an explicit implementation; encoding the vertex of the bin using a binary implementation, an occupancy map implementation, or an explicit implementation; and encoding the bin map coordinates using a binary implementation, an explicit implementation, or an implicit implementation.
8. An apparatus, comprising:
a non-transitory memory for storing an application for:
encoding the grid using the bins; and
encoding the vertex positions, the encoding being selected from: directly adding vertex position information to the graph header, embedding the vertex position information onto the video data, or using an external grid encoder; and
a processor coupled to the memory, the processor configured to process the application.
9. The apparatus of claim 8, wherein encoding the mesh using the bins comprises projecting the connected triangles onto a two-dimensional surface.
10. The apparatus of claim 8, wherein encoding the mesh using the bins comprises using bins that encode triangles or triangle strips without any projection.
11. The apparatus of claim 8, wherein encoding the mesh using the bins comprises encoding the bins using time tracking and by projecting connected triangles onto a two-dimensional surface.
12. The apparatus of claim 8, wherein the video data comprises occupancy map data.
13. The apparatus of claim 8, wherein the encoding of vertex positions comprises incremental information for geometric correction.
14. The apparatus of claim 8, further comprising encoding the bin connectivity using a binary implementation or an explicit implementation; encoding the vertex of the bin using a binary implementation, an occupancy map implementation, or an explicit implementation; and encoding the bin map coordinates using a binary implementation, an explicit implementation, or an implicit implementation.
15. A system, comprising:
one or more cameras for acquiring three-dimensional content; and
an encoder configured to:
encoding the grid using the bins; and
encoding the vertex positions, the encoding being selected from: the vertex position information is directly added to the graph header, the vertex position message is embedded onto the video data, or an external grid encoder is used.
16. The system of claim 15, wherein encoding the mesh using the bins comprises projecting the connected triangles onto a two-dimensional surface.
17. The system of claim 15, wherein encoding the mesh using the bins comprises using bins that encode triangles or triangle strips without any projection.
18. The system of claim 15, wherein encoding the mesh using the bins comprises using bins that are tracked over time and encoded by projecting connected triangles onto a two-dimensional surface.
19. The system of claim 15, wherein the video data comprises occupancy map data.
20. The system of claim 15, wherein the encoding of vertex positions includes incremental information for geometric correction.
21. The system of claim 15, further comprising encoding the bin connectivity using a binary implementation or an explicit implementation; encoding the vertex of the bin using a binary implementation, an occupancy map implementation, or an explicit implementation; and encoding the bin map coordinates using a binary implementation, an explicit implementation, or an implicit implementation.
CN202380013363.6A 2022-03-25 2023-03-07 Lattice face syntax Pending CN117897732A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US63/269,912 2022-03-25
US17/987,848 US20230306644A1 (en) 2022-03-25 2022-11-15 Mesh patch syntax
US17/987,848 2022-11-15
PCT/IB2023/052109 WO2023180845A1 (en) 2022-03-25 2023-03-07 Mesh patch syntax

Publications (1)

Publication Number Publication Date
CN117897732A true CN117897732A (en) 2024-04-16

Family

ID=90649558

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202380013363.6A Pending CN117897732A (en) 2022-03-25 2023-03-07 Lattice face syntax

Country Status (1)

Country Link
CN (1) CN117897732A (en)

Similar Documents

Publication Publication Date Title
CN107454468B (en) Method, apparatus and stream for formatting immersive video
CN113939849A (en) Mesh compression via point cloud representation
US11375235B2 (en) Method and apparatus for encoding and decoding three-dimensional scenes in and from a data stream
US20230050860A1 (en) An apparatus, a method and a computer program for volumetric video
JP2023544618A (en) Video-based mesh compression
US20220321912A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
US20230328285A1 (en) Point cloud data transmission method, point cloud data transmission device, point cloud data reception method, and point cloud data reception device
WO2022023002A1 (en) Methods and apparatus for encoding and decoding a 3d mesh as a volumetric content
CN114041169A (en) Point cloud data transmitting device, point cloud data transmitting method, point cloud data receiving device, and point cloud data receiving method
CN114731411A (en) Decoded tile hash SEI message for V3C/V-PCC
EP4200807A1 (en) Dense mesh compression
KR20200007733A (en) Encoding/decoding method and apparatus for efficient compression of point cloud coding(pcc) data
CN114945946A (en) Volumetric video with auxiliary chunking
US12003769B2 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
CN117897732A (en) Lattice face syntax
CN116438799A (en) Point cloud data transmitting device, point cloud data transmitting method, point cloud data receiving device and point cloud data receiving method
US20230306644A1 (en) Mesh patch syntax
CN114731416A (en) Method and apparatus for encoding, transmitting and decoding volumetric video
CN115428456A (en) Method and apparatus for encoding and decoding multi-view video sequence
WO2023180845A1 (en) Mesh patch syntax
US20240233189A1 (en) V3c syntax extension for mesh compression using sub-patches
US20230306642A1 (en) Patch mesh connectivity coding
US20240127489A1 (en) Efficient mapping coordinate creation and transmission
US20240153147A1 (en) V3c syntax extension for mesh compression
US20230306641A1 (en) Mesh geometry coding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination